Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda) (Read 468232 times) previous topic - next topic
0 Members and 2 Guests are viewing this topic.

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #150
according to nVidia's driver page, you are not using the latest drivers .. try these:

http://www.nvidia.com/object/win7_winvista...96.21_whql.html
Not sure they will work. Geforce 9300 isn't listed under supported products. I've been struggling previously to get newer releases to work on this. Note that while it's called Geforce 9300, it's a mainboard chipset based on a nForce 730i chip - Not to mix that up with the Geforce 9300 GS chip, which are entirely different.

as for your graphics card, it should be able to handle CUDA, if it has at least 256MB of local memory
I have not yet verified with 100% accuracy that I have activated 256MB on it, but according to Windows, it seems that I have. I'll be back on this one

but to be honest, I wouldn't expect any miracles, since it seems to be equipped with a mere of 16 cores
Well it IS sold as having CUDA capability, so no matter how good it would ever perform it should, and I'll be glad if I could just get FlaCUDA up and running. It compresses better than native FLAC, so if it's just able to compress my lossless music even further I'm happy
Can't wait for a HD-AAC encoder :P

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #151
Having a mere 16 cores is not that bad - my 8600GT has only 32 and it's faster in FLAC encoding (I've installed new drivers a few weeks ago and tried the actual flaCUDA) than 2 stock flac encoders running in parallel on my 3.33GHz conroe core2duo - so if these 16 cores have the same clock rate (which I'm not sure about at all...) it can still be faster than a single threaded software encoder on virtually any non-overclocked CPU.
It's almost scary how well these low level GPUs stand against much higher class CPUs of their own age

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #152
Problem found: It's due to Microsoft's RDP-lameness. When using remote desktop, the graphics adapter is disabled and replaced by the one used for RDP.

So thanks MS, I can't use CUDA programs using RDP!
Can't wait for a HD-AAC encoder :P

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #153
according to nVidia's driver page, you are not using the latest drivers .. try these:

http://www.nvidia.com/object/win7_winvista...96.21_whql.html
Not sure they will work. Geforce 9300 isn't listed under supported products. I've been struggling previously to get newer releases to work on this. Note that while it's called Geforce 9300, it's a mainboard chipset based on a nForce 730i chip - Not to mix that up with the Geforce 9300 GS chip, which are entirely different.


look again ...
Quote
GeForce 9 series:
9500 GS, 9600 GT, 9200, 9800 GX2, 9500 GT, 9600 GS, 9300, 9800 GT, 9400 GT, 9300 GS, 9400, 9600 GSO, 9300 GE, 9800 GTX/GTX+



as for your graphics card, it should be able to handle CUDA, if it has at least 256MB of local memory
I have not yet verified with 100% accuracy that I have activated 256MB on it, but according to Windows, it seems that I have. I'll be back on this one


you could try and run GPU-z for getting those details, as well as information about which APIs are supported by your card

http://www.techpowerup.com/gpuz/


but to be honest, I wouldn't expect any miracles, since it seems to be equipped with a mere of 16 cores
Well it IS sold as having CUDA capability, so no matter how good it would ever perform it should, and I'll be glad if I could just get FlaCUDA up and running. It compresses better than native FLAC, so if it's just able to compress my lossless music even further I'm happy


fair enough ...


Problem found: It's due to Microsoft's RDP-lameness. When using remote desktop, the graphics adapter is disabled and replaced by the one used for RDP.

So thanks MS, I can't use CUDA programs using RDP!


now that's a major bummer ... how about using eg. TightVNC for your remote activities ?

http://www.tightvnc.com/

Cheers,
Maggi

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #154
I'm getting "Error    : Exception of type 'GASS.CUDA.CUDAException' was thrown."

Code: [Select]
CUETools.FlaCuda.exe -11 Priceless.wav
FlaCuda#.91, Copyright © 2009 Gregory S. Chudov.
This is free software under the GNU GPLv3+ license; There is NO WARRANTY, to
the extent permitted by law. <http://www.gnu.org/licenses/> for details.
Filename  : Priceless.wav
File Info : 44100kHz; 2 channel; 16 bit; 00:04:07.6270000
Error    : Exception of type 'GASS.CUDA.CUDAException' was thrown.

I ran the deviceQuery.rar  and got this
Code: [Select]
CUDA Device Query (Driver API) statically linked version
There is 1 device supporting CUDA

Device 0: "GeForce GTX 480"
  CUDA Driver Version:                          3.0
  CUDA Capability Major revision number:        2
  CUDA Capability Minor revision number:        0
  Total amount of global memory:                1576468480 bytes
  Number of multiprocessors:                    15
  Number of cores:                              120
  Total amount of constant memory:              65536 bytes
  Total amount of shared memory per block:      49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                    32
  Maximum number of threads per block:          1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:    65535 x 65535 x 1
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                            512 bytes
  Clock rate:                                    0.81 GHz
  Concurrent copy and execution:                Yes
  Run time limit on kernels:                    No
  Integrated:                                    No
  Support host page-locked memory mapping:      Yes
  Compute mode:                                  Default (multiple host threads can use this device simultaneously)

Test PASSED

OS: Windows 7 x64
GPU: GeForce GTX 480
Graphics Driver:  197.55 (8.17.11.9755)



FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #155
Wow. Congrats on getting a GTX 480  Sorry, Fermi cards are not supported yet.
I think i'll have to wait for the release of GTX 460, because GTX 480/470 are a bit over my budget.
CUETools 2.1.6

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #156
Hi. Did some tests on Bel Canto's CD comparing Flac 1.21 to ure newest version .91.
HW: Intel Core 2 Quad Q9550, 8GB RAM, Nvidia GTX260 Driver: 197.45 @ Win7x64, Intel X-25 SSD

BelCanto.wav File Info : 44100kHz; 2 channel; 16 bit; 00:47:44.2000000 Results

Results:

FLAC 1.21
Mode -3 : Belcanto.wav: wrote 293901039 bytes, ratio=0,582 ,15,2Sec
Mode -6 : Belcanto.wav: wrote 284872007 bytes, ratio=0,564 ,20,4Sec
Mode -8 : Belcanto.wav: wrote 283904326 bytes, ratio=0,562 ,72.4Sec

FlaCuda#.91,

Mode -3 : 495,34x; 284585708 bytes in 00:00:05.7823308 seconds;
Mode -6 : 504,41x; 283252159 bytes in 00:00:05.6783248 seconds;
Mode -8 : 418,60x; 283217473 bytes in 00:00:06.8423914 seconds;

CPU Options:
c:\CDRIPS\cuda>CUETools.FlaCuda.exe -8 --cpu-threads 2 ..\BelCanto.wav
Results  : 433,81x; 283217473 bytes in 00:00:06.6023776 seconds;

c:\CDRIPS\cuda>CUETools.FlaCuda.exe -8 --cpu-threads 3 ..\BelCanto.wav
Results  : 392,28x; 283217473 bytes in 00:00:07.3014177 seconds;

c:\CDRIPS\cuda>CUETools.FlaCuda.exe -8 --cpu-threads 4 ..\BelCanto.wav
Results  : 406,71x; 283217473 bytes in 00:00:07.0424028 seconds;


Every other time i get a :

Error    : Exception of type 'GASS.CUDA.CUDAException' was thrown.
Unhandled Exception: ErrorLaunchTimeout

Description:
  Stopped working

Problem signature:
  Problem Event Name:   CLR20r3
  Problem Signature 01:   cuetools.flacuda.exe
  Problem Signature 02:   1.0.0.0
  Problem Signature 03:   4b49fea7
  Problem Signature 04:   CUDA.NET
  Problem Signature 05:   2.3.7.0
  Problem Signature 06:   4ae56b31
  Problem Signature 07:   345
  Problem Signature 08:   22
  Problem Signature 09:   GASS.CUDA.CUDAException
  OS Version:   6.1.7600.2.0.0.256.1
  Locale ID:   1044

Besides the crash, i must say, IMPRESSIVE

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #157
Wow, FLAC -8 works at ~50x on my laptop, FlaCuda -11 does ~150x, very impressive.

Is FlaCuda with the "--verify" switch considered to be safe for archive use? I understand that software can never be guaranteed to be error free and I don't ask for it, I just wonder if you consider your code (with the verify option) robust enough to be an alternative to the official FLAC.

As far as I understand you use the parallel processors on the GPU find the best "next step" (don't know how it's called in FLAC terminology) and then execute it on the CPU. Is this approach limited to FLAC or can similar computations of other audio/video formats use it?

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #158
Unhandled Exception: ErrorLaunchTimeout

Exactly how often does it happen? Is there any pattern to this?
Does it look like a screenshot in this article: http://www.microsoft.com/whdc/device/displ...dm_timeout.mspx ?
Anybody else having those problems?

Is FlaCuda with the "--verify" switch considered to be safe for archive use?

Yes. --verify guarantees that produced file can be decoded, at least with CUETools.Flake decoder, and it's audio contents is identical to the source.
In theory, it cannot give a 100% guarantee that produced file can be decoded with reference FLAC decoder, because --verify uses other decoder, but so far nobody reported any such problems.

As far as I understand you use the parallel processors on the GPU find the best "next step" (don't know how it's called in FLAC terminology) and then execute it on the CPU.

More or less. Recent versions can do almost everything on GPU, and latest version does this by default (can be disabled with --slow-gpu option). CPU only does some sanity checks, formats the resulting data as a FLAC bitstream and writes it to file.

Is this approach limited to FLAC or can similar computations of other audio/video formats use it?

Effective parallel processing is possible only if format is suitable for it. For example, ALAC uses adaptive compression, which makes it very inconvenient for parallel processing. Maybe FLAC isn't the only codec which can benefit from GPU encoding, but for most codecs the task will be much harder and the speed won't be that impressive. Most of the GPU code in FlaCuda is very specific for FLAC.

As for video, there are several GPU encoders for x264 video codec, most if not all of them are proprietary.
CUETools 2.1.6

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #159
Getting a crash when I try to convert individual wave files using FlaCuda 091. The error codes are nearly the same as mentioned in an earlier post. Interestingly, FlaCuda does not crash if converting a wavpack image file with embedded cue to flac image with embedded cue.

Windows error report below.

Problem signature:
  Problem Event Name:   CLR20r3
  Problem Signature 01:   cuetools.flacuda.exe
  Problem Signature 02:   1.0.0.0
  Problem Signature 03:   4b49fea7
  Problem Signature 04:   mscorlib
  Problem Signature 05:   2.0.0.0
  Problem Signature 06:   4a27471d
  Problem Signature 07:   349e
  Problem Signature 08:   1c5
  Problem Signature 09:   System.IO.IOException
  OS Version:   6.1.7600.2.0.0.256.48
  Locale ID:   1033


This was using foobar2000. I also grabbed that error code:


Conversion failed: The encoder has terminated prematurely with code -532459699 (0xE0434F4D); please re-check parameters


Commandline parameters are set to: -8 - -o %d

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #160
Hell FlaCuda091 is ultra fast. Using a NVidia GT8800 with foobar v1.0.3 and "-8 - -o %d --verify" parameters. Only thing is that my HDD is limiting the encoding speed. A 46min Wav file took up less than 10sec. Detailed results coming up soon.
Thanks for that encoder. I hope and wish that the FlaCuda will be compatibel with all other software player, devices and decoder.

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #161
nVidia 8800 GT with Intel Q6600 Quad core.

Foobar 1.03 transcoding FLAC to FLAC (Pink Floyd Final Cut [13 tracks])

Code: [Select]
*** FLAC 1.2.1 @ 4 threads ***

level 8
Total encoding time: 0:22.277, 124.90x realtime

*** FlaCuda 0.91 ***

-8 - -o %d --verify
Total encoding time: 1:01.184, 45.47x realtime

-8 --cpu-threads 2 - -o %d --verify
Total encoding time: 0:50.529, 55.06x realtime

-8 --cpu-threads 3 - -o %d --verify
Total encoding time: 0:42.807, 65.00x realtime

-8 --cpu-threads 3 - -o %d
Total encoding time: 0:42.011, 66.23x realtime

-8 --cpu-threads 4 - -o %d --verify
Total encoding time: 0:42.027, 66.20x realtime

-8 --cpu-threads 4 - -o %d
Total encoding time: 0:41.356, 67.28x realtime

-8 --slow-gpu --cpu-threads 4 - -o %d --verify
Total encoding time: 0:37.939, 73.34x realtime


CPU usage with FlaCuda never peaks above 25% per core. Seems for a practical scenario with a quad core CPU it doesn't compete.

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #162
I'm amused by flacuda's speed.... I can't think of too much use for 800x realtime flac encoding, but I thought I throw out something that I'm too lazy to implement that flacuda's speed would make almost reasonable:

_Optimal_ block size selection.  Flac lets you change the frame size on the fly. Truly optimal selection across all supported sizes would be a bit insane, but globally optimal selection on a subset of sizes is not too terrible. 

Lets consider all powers of two from 64 to 32768, there are ten sizes. At every 64 sample offset through the file, encode all ten sizes, and store the resulting sizes. Making the hand-wavy assumption that the computation per sample is constant this will be 1023x slower than normal.

Take the sizes and construct a directed graph with a vertex at every 64th sample and 10 edges leaving the sample connecting it to the vertex for the sample 64,128,256,etc.  away. Assign the coding cost for the block at each of the sizes to each of the edges.  Now run the Dijkstra shortest path algorithm from the first to last or last to first vertex. The result will be the globally optimal frame size selection given the  available block sizes.

Either re-encode or, if you wasted a lot of ram saving the results of the first past, reassemble the final stream.

Limiting yourself to powers of two in the flac subset over the range 64-4096 would be 127x the number of processed samples processed, 32-4096 would be 255x.  The cuda implementation might be able to maintain almost decent speeds while doing this extra work. ;) This isn't limited to power of two sizes, but you probably want to arrange it so that your smallest size is a common factor of all the sizes you use.

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #163
Is there anybody here who knows the math behind Cholesky decomposition used in ffmpeg as an alternative method of LPC coefficients search?
This method is too slow for CPU, but i thought i'd give it a shot on GPU.
The problem is, GPU doesn't do double precision very well.



Gregory,

maybe you can find background info here : http://www.cise.ufl.edu/research/sparse/ch...OLMOD/Cholesky/

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #164
Man, that's some fast encoding!  Nice work!

Is there any chance that tag writing will be added to the binary, so that it can be used with EAC?


FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #165
Man, that's some fast encoding!  Nice work!

Is there any chance that tag writing will be added to the binary, so that it can be used with EAC?


You already can with metaflac. I gave an example here flacuda.exe & metaflac.exe in EAC
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #166
Has anyone tested this when using it multiple times in parallel?  The ability to single threaded encoding on several files at once is pretty amazing, wondered if this wouldn't use up all the GPU and could also be run several in parallel.  I don't run into a hard drive bottleneck as easily as most as I use a raid 0 configuration of high end desktop hard drives.

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #167
My 8600GT got used fully by one instance of the CUDA encoder, more threads gave no advantage. On the top of that it can be I/O limited very quickly (I tried it with the source being on different HDD than the target).

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #168
Man, that's some fast encoding!  Nice work!

Is there any chance that tag writing will be added to the binary, so that it can be used with EAC?


You already can with metaflac. I gave an example here flacuda.exe & metaflac.exe in EAC


Yep, you did.  Clever! Thanks.

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #169
Is there any further developement on FlaCuda? Current version is 0.91?

Another test:

FlaCuda 091 + Foobar 1.03
Music: Edenbridge - Solitair / Symphonic Metal Album in a wav file, timelenght 57:25min
Hardware: Intel Dual Core E8400 / Nvidia GT8800 (with newer 92b core)

FLAC 1.2.1 level 8 (2 threads)
Total encoding time: 1:13.711, 46.74x realtime

FlaCuda -8 - -o %d --verify
Total encoding time: 0:31.216, 110.37x realtime

FlaCuda -8 --cpu-threads 2 - -o %d --verify
Total encoding time: 0:24.492, 140.67x realtime

FlaCuda is a good choice for Dual Core system. As Bad Monkey above wrote a quad core may be faster than gpu.

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #170
I have an 8800GT too but I was unable to get much above 70x, per my post above, with FlaCuda. Your results would beat my Q6600's benchmark of 125x. Am I missing something?

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #171
I have an 8800GT too but I was unable to get much above 70x, per my post above, with FlaCuda. Your results would beat my Q6600's benchmark of 125x. Am I missing something?


Hm. As i wrote i have the newer version of 8800GT that came out in february 2008 with 512MB RAM instead of 378MB. This GPU core (G92) was the fastest out there. followed by the G200b core a year later which is nearly the same.
Might be that you have an older modell or drivers? I used the same setting as you. Maybe your harddrive is too slow?

My GPU card spec:
512 MByte GDDR3
65 nm
Stream-Processors: 112
RAM bandwith: 256-bit
Core-frequenz:    600 MHz
Shader-frequenz: 1500 MHz
RAM-frequenze:   900 MHz

 

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #172
Yeah I have 512 MB but core clock is only 450 MHz / VRAM 700 MHz. Okay.

Am going to upgrade to a GTX 460 sometime soon. So that'll be interesting. Haha.

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #173
There must be some other limit, because this 70x matches my results with a 8600GT and an early version of flaCUDA. Any 8800GT should be much faster than it.
HDD speed, perhaps? Those are mechanical and thus seriously limited when they have to read/write more threads at once (have to move their heads back and forth). Whenever I tested any encoder I used a different HDD for destination and did not use more than 2 threads, ever (it wouldn't even benefit my core2duo, to begin with ).
I'm planning on getting an SSD in a few months (for system and some temp area) so I'll test 2-thread encoding again.

edit. I forgot that I'm planning on replacing my vcard to a Redeon too. Well, so much for CUDA...

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #174
If there is another limit clearly it would have to be something not shared with the CPU [turning in faster results @ 125x], which is obviously not the case with a HDD restriction. In any case the FLAC result above is only 260 MB.