Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda) (Read 468204 times) previous topic - next topic
0 Members and 3 Guests are viewing this topic.

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #25
Wow, nice work Gregory!

Just wondering... how did you get around the limitation mentioned by Garf earlier on this thread about GPUs only doing floating point and therefore not being suitable for lossless encoding?

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #26
Current GPUs do integer computations quite alright.
CUETools 2.1.6

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #27
Hmm, does the encoder do pipe encoding (i.e. for proper foobar2000 use)?

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #28
Some questions regarding Flaccuda.
Back when flake was new i had problems encoding at higher compression then standard Flac and playback on my Slimdevice.
Does Flaccuda use the same options at the corresponding compression level of Flac? At least it looks like i can play back Flaccuda -8 on my Slimdevice. How does it come it compresses better then?
Shouldn´t it be named "FlakeCuda" in the end?
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #29
How hard would it be to convert this CUDA version into a more versatile OpenCL implementation? It is said that OpenCL is largely based on CUDA but non vendor-specific. That suggests it should be easy to adapt.

That way it wouldn't be limited to NVIDIA GPUs. In fact, it would even remove the limit of just using a GPU as OpenCL can combine all available GPUs and processor cores in the system as if it was one unit.
Every night with my star friends / We eat caviar and drink champagne
Sniffing in the VIP area / We talk about Frank Sinatra
Do you know Frank Sinatra? / He's dead

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #30
Hmm, does the encoder do pipe encoding (i.e. for proper foobar2000 use)?

It does now (version 02), but i would suggest to be careful with your precious files while this is still an alfa version.

Some questions regarding Flaccuda.
Back when flake was new i had problems encoding at higher compression then standard Flac and playback on my Slimdevice.
Does Flaccuda use the same options at the corresponding compression level of Flac? At least it looks like i can play back Flaccuda -8 on my Slimdevice. How does it come it compresses better then?
Shouldn´t it be named "FlakeCuda" in the end?

It doesn't use the same options at the corresponding compression levels. But it does stick to a so called FLAC subset (supported by hardware devices) for compression levels 0-8. Compression levels 9-11 are non-subset, and might not play on some devices. Flake has the same conventions.

Better compression is achieved mainly by brute-force search of optimal compression parameters (stereo modes, LPC orders, and window functions). Flac does this only at level 8, and it only tries one window function, and not the best one.

As much as i'm greateful to Justin for his wonderful Flake encoder, but unlike my C# Flake port, FlaCuda is not a derivative work. Flake's algorithm was written for CPU, not GPU, and those are two very different realms. Flake does a great job at smart guessing the best compression parameters, while FlaCuda just makes a brute-force search on a GPU. FlaCuda however contains a C# Flake library, and uses it for FLAC decompression, if source file is flac, or if --verify mode is enabled.

How hard would it be to convert this Cuda version into a more versatile OpenCL implementation?

That way it wouldn't be limited to NVIDIA GPUs. In fact, it would even remove the limit of just using a GPU as OpenCL can combine all available GPUs and processor cores in the system as if it was one unit.

I'm not yet experienced enough in this matter, but i assume that this versatility will come for a price of speed. I will try to verify this later.
CUETools 2.1.6

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #31
Thanks for explaining it. Really a nice work you have done, thanks for that. Now i know what i can use Cuda for, it should really be mentioned on the nvidia Cuda pages.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #32
How hard would it be to convert this Cuda version into a more versatile OpenCL implementation?

That way it wouldn't be limited to NVIDIA GPUs. In fact, it would even remove the limit of just using a GPU as OpenCL can combine all available GPUs and processor cores in the system as if it was one unit.

I'm not yet experienced enough in this matter, but i assume that this versatility will come for a price of speed. I will try to verify this later.


That's possible. Although a performance hit might be offset by the fact that OpenCL combines the CPU and all available GPUs. The biggest difference I seem to find after some research is that there are a couple of things implemented in CUDA that OpenCL doesn't have yet. However, if you don't use any of these additional features for your specific implementation that wouldn't matter.

I must say that I am only speculating, I don't know much about this matter either, I was just wondering...
Every night with my star friends / We eat caviar and drink champagne
Sniffing in the VIP area / We talk about Frank Sinatra
Do you know Frank Sinatra? / He's dead

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #33
Hmm, does the encoder do pipe encoding (i.e. for proper foobar2000 use)?

It does now (version 02), but i would suggest to be careful with your precious files while this is still an alfa version.

Wow, thanks! Pipe encoding  seems to be working, with no differences in the decoded data of the resulting file. Speed for -8 is ~70x on my 8800GT vs ~40x on a single core of my Q6600. However, the resulting file appears to be lacking any length and bitrate information and so seeking is impossible.

Also, obviously foobar2000 isn't ready to properly handle GPU encoding. With the converter set to handle three simultaneous encoding processes for my quad core, FlaCuda actually slows down to around ~35x overall, whereas the standard Flac naturally scales well to ~110x

So yeah, not quite flac-replacement ready, then  Looks promising for inherently single thread things like rip+encodes, though (once/if it gets tagging arguments implemented, that is)

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #34
My results on a old Core2Duo E6300 and small Nvidia GeForce 9400 GT. I took two disks: one solo piano disc that compress very well (<400 kbps) and a baroque orchestral work that doesn't (750 kbps).


PIANO MUSIC

Code: [Select]
WAV        594.191 KB
FLAC -5    163.122 KB    49313 milliseconds    x69.94
FLAC -8    159.276 KB   116641 milliseconds    x29.57
CUDA -0    158.750 KB    60188 milliseconds    x57.30
CUDA -4    158.024 KB    88531 milliseconds    x38.96
CUDA -8    156.881 KB   176656 milliseconds    x19.52
CUDA 11    156.799 KB   527922 milliseconds     x6.53



VIVALDI

Code: [Select]
WAV        754.037 KB
FLAC -5    393.834 KB    68047 milliseconds    x64.32
FLAC -8    393.279 KB   160109 milliseconds    x27.33
CUDA -0    394.796 KB    78688 milliseconds    x55.62
CUDA -4    394.034 KB   111469 milliseconds    x39.26
CUDA -8    393.191 KB   223328 milliseconds    x19.59
CUDA 11    392.079 KB    675656 milliseconds    x6.47


On this cheap GPU, FlaCuda 0.2 performs rather well. It can't be as fast as the CPU but this encoder could approach this speed at -0 and sometimes compress better than flac.exe -8! Nevertheless the CPU has two cores and only one was used for this benchmark.
If I'm not wrong a similar 9400 GPU is used in the ION system. It means that cheap and powerless nettops or netbooks with ION chipset could perfectly be used for batch flac encoding. To be confirmed...

SMALL DECODING SPEED:

Code: [Select]
FLAC -8:   x409
CUDA -8:   x392
CUDA 11:   x285


As you can see there's a drastic fall in decoding speed with flacuda -11 (tested with latest foobar2000). On my Sansa Clip (2GB) the playback seems to be fine (I just tried one file though).


More tests are needed but it looks like a very interesting encoder which should work nicely on a ION chipset.

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #35
Current GPUs do integer computations quite alright.


It used to be so, on the Nvidia side, that you can only do 24 bit arithmetic, which might be enough for FLAC. I don't know about ATI. 32-bits (i.e. normal) arithmetic is only possible with a huge performance penalty.

New versions of CUDA or the cards might have changed this, or FLAC might have been simple enough that it wasn't an issue.

PS. Are these posts comparing multithreaded FLAC implementations on the host? (I don't know if those exist)

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #36
Just finished my tests: 

image.wav:
flac 1.2.1   -8 : 52 sec
flac-cuda  -8 : 32 sec

image.wav divided in 10 songs (1.wav, 2.wav etc.)
flac 1.2.1  -8 : 52 sec
flac-cuda  -8 : 32 sec

flac 1.2.1-icl : 30 sec

flac 1.2.1-icl is operating on both cores on my processor.
Intel Core 2 Duo E8500; Nvidia 8800 GT

flac 1.2.1-icl I found sometime ago somewhere in hydrogenaudio 

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #37
Just finished my tests: 

image.wav:
flac 1.2.1   -8 : 52 sec
flac-cuda  -8 : 32 sec

image.wav divided in 10 songs (1.wav, 2.wav etc.)
flac 1.2.1  -8 : 52 sec
flac-cuda  -8 : 32 sec

flac 1.2.1-icl : 30 sec

flac 1.2.1-icl is operating on both cores on my processor.
Intel Core 2 Duo E8500; Nvidia 8800 GT

flac 1.2.1-icl I found sometime ago somewhere in hydrogenaudio 


Afaik there isn´t a good Multi-Core version and i can´t believe a different compile can speed up by 75%. Please upload this version somewhere or link to its source.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #38
Afaik there isn´t a good Multi-Core version and i can´t believe a different compile can speed up by 75%. Please upload this version somewhere or link to its source.


I think those different flac encoders I have came from rarewares


FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #39
Afaik there isn´t a good Multi-Core version and i can´t believe a different compile can speed up by 75%. Please upload this version somewhere or link to its source.


I think those different flac encoders I have came from rarewares

It can´t be that compile and please don´t waste my time with trying some versions you link to cause you "think" it may be the one.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #40
I made a more thorough comparison with the new version. I combined a wav from 18 different genres giving hopefully a better representation of real abilities. This compares each compression mode. Horizontal scale is compression ratio and vertical scale is encoding speed vs realtime. With this test set CUDA version was more efficient starting from compression mode 6 but then only faster than FLAC's modes 7 and 8.
[attachment=5395:flac_vs_flacuda.png]

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #41
It sure isn't that compile, as they (at least for me) run at the same speed for -8.
Error 404; signature server not available.

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #42
I've done a quick test, how a 2+ year old full-fledged mainstream CPU (to be more precise: one core of it) stands against a pretty cheap, a little better than low-end GPU of its own era, both overclocked. The Core2 (E6420, Conroe core) duo runs at 3328Mhz with ddr2-832 cl4; the 8600GT runs at 580/1296/837MHz, this is all it can do with passive cooling (probably at a decreased core voltage).

CPU: 49.8x (3328/416MHz)
GPU: 69.4x (580/1296/837MHz)
GPU: 66.4x (540/1188/702MHz)

lv6:
GPU: 54.3x (580/1296/837MHz)

I've tested both -5 and -6 because for my test material file size with FLAC 1.2.1 -8 fell right between FLACuda -5 and -6.
Decoding speed (performed by fb2k):
1.1.2 -8: 615x
CUDA -5: 618x
CUDA -6: 572x
(FLACUDA -11 encoded much slower, ~12x; and it also decoded slower, ~300x)

Considering how insane performance (and extremely power hogging) GPUs are around these days, a GPU FLAC encoder seems a good idea.

I just found one glitch: the decoded voice data seems identical but the FLAC/Cuda files are not seekable in my fb2k 0.9.6.9. The parameters were -6 - -o %d
(OK, I see, I'm not alone with this problem)

[p.s. I also made a comparison with TAK -p2m what I regularly use: 77.7x encoding by one CPU core, 3.5% smaller (968 vs 1002kbps) and decodes at 384x speed - definitely slower than FLAC, except extreme FLACuda files]

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #43
Thank you for detailed test results. Looking at them i decided to focus on optimizing performance at lower compression levels. Version 03 must be noticeably faster at levels 0..7. I also fixed the problem with files being unseekable when using pipe encoding from fb2k.
CUETools 2.1.6

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #44
Thank you for detailed test results. Looking at them i decided to focus on optimizing performance at lower compression levels. Version 03 must be noticeably faster at levels 0..7. I also fixed the problem with files being unseekable when using pipe encoding from fb2k.


The good #1: the resulting FLAC is seekable!
The good #2: -6 is definitely faster, 60.4x vs 54.3x
The bad: the files are slightly larger, now I need -7 to get smaller result than Flac 1.1.2 -8 (CPU -8: 37810k; CUDA -6: 37857k; CUDA -7: 37791k)
The ugly: FLACuda -7 is slower than CPU FLAC -8. On my 'nose heavy' system, that is.

Hm, I probably should try with different tracks (my ad-hoc test sample is a ZUN theme from the Changeability of Strange Dream album, strictly speaking it's not a Touhou soundtrack, but similar to the game background music).
Is it the seek table that makes -6 files larger?

update: In case of a Rammstein track GPU -6 got smaller than CPU -8. Need more samples to test.
(Sorry, I was a bit hasty to post about it. Human error )

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #45
Is it the seek table that makes -6 files larger?

Nope. Old-style -6 can be invoked by parameters "-5 -l 12". That's a lower-case L there, not a digit 1.
CUETools 2.1.6

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #46
Seems to me like other modes got a speed boost too:
[attachment=5397:flac_vs_flacuda.png]

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #47
Phew. I think i finally squeezed everything i could out of it, at least for now.

Version 04 should be faster than anything.
CUETools 2.1.6

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #48
Impressive.
[attachment=5399:flac_vs_flacuda.png]

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)

Reply #49
Thank you.
CUETools 2.1.6