Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda) (Read 468168 times) previous topic - next topic
0 Members and 3 Guests are viewing this topic.

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #225
377MB is easily within the realm of HDD speed for a 7 second time.  A good drive like an F3 should easily be able to get over 100MB/s  Obviously it's not limited by Hard Drive speed.

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #226
Yes, most likely. At these speeds the whole i/o system has to do some work, so HD speed is only one factor. For example it reads and writes the file at once at the same HDD makes something completely different then just writing. Like mentioned earlier on my HDD, an older 500GB WD i get hiccups that worsen the whole encoding speed result.
Using compression -0 makes things much faster up to 800x, so my SSD isn´t limiting.

One other thing i noticed. Using Flaccl seems to stress my GPU more as FlaCuda. Evga Precission shows a GPU load of ~75% for Cuda and up to the 90% for Flaccl. Also it creates higher temperatures. Or is it that it doesn´t use 100% cause it already is faster as the data can be delivered?

Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #227
One of possible bottlenecks is the PCI express transfer speed.

For example, i've read many reports on AMD forums that AMD GPU drivers fail to provide decent DMA transfer speeds on certain (i.e. non-AMD) chipsets.

NVIDIA drivers usually don't have this problem, and recent motherboards with PCI express 2.0 X16 can do up to 6Gbit per second (which should be enough for up to ~1600x encoding speed), but older PCI Express 1.0 can be capped at ~800x encoding speed, and if you have some kind of Crossfire/SLI configuration, or some other adapter, like SATA 6Gbit/USB 3.0/SSD using a second PCI express slot, speeds can drop drastically.
CUETools 2.1.6

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #228
Interesting. Having that close numbers between SSD and HDD makes me wonder. Do you have Win7 and some kind of Readyboost kicking in?


I'm using the Windows 7 64 Professional but all HDD features are disabled, the IntelSSDToolbox did the trick and that is one way of disabling all, the other way is manual via registry but I bet you know all this. Well even the pagefile is disabled which makes me wonder if that was the case.


pagefile shouldn't be disabled on an ssd.

read:

http://blogs.msdn.com/b/e7/archive/2009/05...drives-and.aspx

(search for pagefile)


FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #230
FLACCL 0.2:[attachment=6172:flaccl02.rar]

Thanks! Works perfect on my gtx470. Any chance, that this encoder will support non 16bits per sample format?
Code: [Select]
Unhandled Exception: System.Exception: Bits per sample must be 16.
   at CUETools.Codecs.FLACCL.FLACCLWriter..ctor(String path, Stream IO, AudioPCM
Config pcm)
   at CUETools.FLACCL.cmd.Program.Main(String[] args)

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #231
I'm back after replacing my vcard to a Radeon HD5670. I've tried your OpenCL encoder too, and... I can't tell you exact speeds (varies like crazy...), but it looks at least twice as fast as the stock CPU encoder on my Core 2 Duo (Conroe) at 3.1GHz and gets a bit smaller output.
At least it isn't worse than my 8600GT + CUDA encoder used to be. Which I unfortunately can't (don't want to do the hassle to) re-test
I'll do a search back in the topic as I haven't yet copied my last test results to my new drives yet (this is why it would have been better to store it on my pendrive as I do with many things).
Anyway, this time the result shouldn't be limited by I/O, it's been tested on my new SSD.

edit. it seems it's faster. Thank goodness, replacing an HDD to an SSD and a video card to a faster one actually led to some improvement  Back than I wrote it's faster than CPU encoding on 2 threads (and I mentioned speeds like ~70x at 3.33GHz). Now it's twice as fast.

Is this encoder compatible with an onboard HD3200, or it's something newer...?

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #232
Just got GTX 580 and had to test encoding speeds. FlacCL 0.2 seems to lose to old GTX 285 with compression ratios >8 but wins with the others. Once again --cpu-threads 2 setting was fastest for me but this card got a bit slower with --slow-gpu setting.
[attachment=6202:flaccl.png]

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #233
reporting that i got further with the 10.11 driver, but still fails:



maybe it'll get even further with 10.12...

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #234
Just got GTX 580 and had to test encoding speeds. FlacCL 0.2 seems to lose to old GTX 285 with compression ratios >8 but wins with the others. Once again --cpu-threads 2 setting was fastest for me but this card got a bit slower with --slow-gpu setting.
[attachment=6202:flaccl.png]


Funny to see 1400x speed, holy sh...

Btw. i found some files flaccl failes to encode, so if anyone else finds some, Gregory fixed it and will hopefully release another version soon.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #235
Thanks! Works perfect on my gtx470. Any chance, that this encoder will support non 16bits per sample format?

Maybe. If i figure out how to do 64-bit arithmetic effectively on GPU.

Is this encoder compatible with an onboard HD3200, or it's something newer...?

Unfortunately HD3xxx and HD4xxx do not seem to support OpenCL properly.

Just got GTX 580 and had to test encoding speeds.

Thanks a lot! Your graphs are very helpful as always.

reporting that i got further with the 10.11 driver, but still fails:

I'm afraid i give up on HD4XXX... It was bad enough it doesn't support atomics, but it doesn't seem to support barrier synchronization properly either, and i can't think of a way to do without them.

Funny to see 1400x speed, holy sh...

Let's hope it can do even better
CUETools 2.1.6

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #236
[attachment=6219:flaccl03.rar]
Supported devices:

1) NVIDIA Geforce 4XX (Fermi) and older GF200 series GPUs
Requires fresh drivers (e.g. http://www.nvidia.com/object/win7-winvista...hql-driver.html)
2) ATI Radeon HD 5XXX
Requires fresh drivers (e.g. http://sites.amd.com/us/game/downloads/Pag...on_win7-64.aspx)
Be sure to download "AMD Catalyst™ Accelerated Parallel Processing (APP) Technology Edition", not "Catalyst Software Suite (64 bit) English Only". This contains both display drivers and opencl.
Option to select opencl platform if you have both NVIDIA and AMD installed on single computer: --opencl-platform "ATI Stream"

3) Multicore CPU
Requires "AMD Catalyst™ Accelerated Parallel Processing (APP) Technology Edition" or "Intel OpenCL SDK" (alpha version, 32-bit systems only) http://software.intel.com/en-us/articles/intel-opencl-sdk/
Option to use CPU encoding: --opencl-type cpu --opencl-platform "Intel OpenCL" (or "ATI Stream")

New in this version: experimental option --fast-gpu forces encoder to do even more work on GPU, which can slow things up, but can be a bit faster if you are limited by PCIe transfer speeds in lower compression modes, or it can be effective if you don't want to give additional cpu threads to encoder with --cpu-threads, or if you use --verify. Which you should, if you use --fast-gpu, because it's experimental and it might corrupt your data.

This version processes 32 frames at a time (previous did 16), to better utilize high-end GPUs, but it can make slower GPU a bit unresponsive during encoding, in which case you can use option --task-size 16.
CUETools 2.1.6

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #237
I did encode a bunch of files. The ones that caused an error before encode without a problem now!

-8 with verify and 2 threads on 965P/GTX260/q9550 doesn´t run much faster with "--fast-gpu" enabled. Seems my GTX260 is more on the slow side meanwhile. Strangely this switch uses ~ the same amount of video memory as without. I expected a change here.

Thanks for the new version, good to know it works on Fermi now.

Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #238
i'll keep on testing this with each new catalyst release, maybe it's just a driver problem. or have you been told hd4xxx is unable to do this by hw?

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #239
This version can work without atomics, which are not supported by hd4xxx hardware, and the rest of the problem might in theory be driver related, but i don't think AMD would fix it if it could. They are more interested in selling new cards then extending the life of old ones.
CUETools 2.1.6

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #240
This version can work without atomics, which are not supported by hd4xxx hardware, and the rest of the problem might in theory be driver related, but i don't think AMD would fix it if it could. They are more interested in selling new cards then extending the life of old ones.


ah, i see the point there

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #241
Seems to be somewhat faster than the previous version. On my system '--fast-gpu' switch is faster up to compression mode -7, -8 and higher are faster without it. Option '--fast-gpu' combined with any '--cpu-threads' setting slows things down regardless of compression mode. This time '--cpu-threads 4' was faster than the usual '--cpu-threads 2'. Also for some reason compression speed results varied from run to run a bit more than normally so I ran each test 3-6 times and picked the fastest results for the stats.
Highest compression modes start to be so close to each other in performance that the graph gets unclear there. CL finally beats old CUDA results in speed for me but filesize is a bit larger.
[attachment=6220:flaccl.png] [attachment=6224:flaccl2.png]

Edit: added benchmark results for Radeon HD 5870 with the new FLACCL v0.3 encoder.

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #242
CL finally beats old CUDA results in speed for me but filesize is a bit larger.


I wonder if it is possible to squeeze out some more kbs, or at least the same compression as FlaCuda? At these speeds i´d trade some better compression for speed.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #243
Ugh... it's almost scary. I've seen speeds above 200x during conversion when it was mid-files... decoding them from TAK itself wasn't much faster than 400x, it became a bottleneck when encoding FLAC

And it's a stock clocked radeon 5670 with passive cooling...

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #244
There is not an official logo yet(I dont have Fermi series GPU too)
so this is simple
FLACCL
 
FlaCuda

   
   

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #245
Just curious. Did anyone get the Cuda version somehow running on a Fermi GPU?
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #246
Just a tiny quirk for FLACCL#0.4 that came with CueTools 2.11. When using it with a groupsize fo 256 i get "Error: size reported incorrectly" and it crashes.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #247
It is not getting better with the bitrates.
My test sample: Opeth - For Absent Friends. The sample is very "basic", just two guitars.

FLAC 1.2.1 -8    472 kbps
libFlake#0.1 -11    457 kbps
FlaCuda#.91 -11    460 kbps
FLACCL#0.3 -11    461 kbps
FLACCL#0.4 -11    462 kbps

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #248
It is not getting better with the bitrates.
My test sample: Opeth - For Absent Friends. The sample is very "basic", just two guitars.

FLAC 1.2.1 -8    472 kbps
libFlake#0.1 -11    457 kbps
FlaCuda#.91 -11    460 kbps
FLACCL#0.3 -11    461 kbps
FLACCL#0.4 -11    462 kbps


What you want with that? Do encode some hundred files and report again.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #249
Here i did a quick and dirty comparison, 414 files of mixed music, -8

FlaCuda#0.91
9.25GB (9 934 589 919 bytes), 739kbps
FlacCL#0.3 Groupsize 256
9.25GB (9 936 741 894 bytes), 739kbps
FlacCL#0.4
9.25GB (9 936 310 351 bytes), 739kbps
Flake#0.1
9.25GB (9 941 925 617 bytes), 739kbps

So ~2MB difference with 9GB of music is not really a degeneration for the OpenCL port.
Seing the GPU encoder speeds against Flake#0.1 for encoding is a funny eye opener still

Edit: i never liked the idea flake and alike encoders can encode non-standard files, so -11 is a setting i´ll never touch. Better use a different codec then.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!