Skip to main content

Topic: FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda) (Read 326525 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • gib
  • [*][*][*]
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
With my recent purchase of a 9000 series nVidia graphics card, I started thinking, has anyone investigated if nVidia's CUDA could be useful for lossless compression?  I'm not even remotely close to being a programmer, so I haven't a clue how the code works, but it seems like CUDA is valuable for coding/decoding.  I know nVidia is already holding a contest to speed up LAME (which ends in about 2 weeks), so perhaps it could be used to speed up lossless compressors?  The fastest modes of several codecs are already blazing fast, approaching the limits of hard drives, but perhaps the high-compression modes could be sped-up through CUDA.  Maybe, if the speed-up is enough, developers could even implement more ways to gain compression while still maintaining good encoding rates.  It would be pretty cool if compression levels like La's best could be done at 50x or something.

Anyway, my curiosity is large, so just thought I'd ask.  :)

  • Martel
  • [*][*][*][*][*]
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #1
I apologize for being completely incorrect.
  • Last Edit: 13 July, 2008, 05:53:01 AM by Martel
IE4 Rockbox Clip+ AAC@192; HD 668B/HD 518 Xonar DX FB2k FLAC;

  • Garf
  • [*][*][*][*][*]
  • Developer (Donating)
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #2
If I'm not mistaken, lossless coding usually employs dictionary methods (like LZW/LZMA) which generate a lot of random access and branching operations.


Not at all!

Most lossless audio compressors use large predictive LPC filters. This would be an operation that is well fit to a GPU, if it weren't for a small detail: because of the need to be LOSSLESS, the operations are often integer, not floating point. It would be possible to do it in floating point also, but then there is a need to have PRECISELY defined operations, rounding, precision. Exactly what GPU's dont have.

Despite all the hype, there aren't that many things GPUs are actually good at.

  • gib
  • [*][*][*]
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #3
Ah, I see now.  Thanks very much for the response, Garf.

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #4
Here is good news.

An alfa version of flac encoder for GPU.

I only tested it on GTS 250, so i'm eager to hear from people with other cards.

As all my applications, this requires .NET framework.

And this time of course a CUDA-enabled graphics card.

Source code as usual on SourceForge.

UPD1: A bit more optimized version re-tuned to not so paranoid compression levels.
UPD2: added pipe encoding for use with fb2k (encoder parameters: -5 - -o %d)
UPD3: seeking problem with pipe encoding in fb2k fixed, lower compression levels speed up.
UPD4: general speed improvement
UPD5: wasted_bits/lossyWav support
UPD6: final optimizations
UPD7: rice partitioning on GPU (--gpu-only), multi-core CPU utilization support (--cpu-threads #)
UPD8: default compression level changed to -7, rice partitioning on GPU on by default, memory/IO optimizations
UPD9: bugfix release; UPD91 - fb2k pipe input fix

* Download: [ Specified attachment is not available ]* Old version: [ Specified attachment is not available ]
  • Last Edit: 10 January, 2010, 11:30:21 AM by Gregory S. Chudov
CUETools 2.1.4

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #5
Sounds awesome, care to elaborate on the performance for those of us without a CUDA capable card.

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #6
Less impressive than i hoped to, but this is only initial version, and GPUs grow faster each day.
On my GTS 250 it's approximately as fast as my C# encoder (which is fast by the way).
FlaCuda -4 achieves the same compression ratio as reference flac -8 (version 1.2.1 on Core 2 Duo@3Gz) at approximately double-triple speed.
FlaCuda -8 is as slow as flac -8, but gives an extra 0.5% of compression ratio.
Would be nice if someone could thoroughly compare them on a different hardware and post his/her results here.
  • Last Edit: 10 September, 2009, 01:17:20 AM by Gregory S. Chudov
CUETools 2.1.4

  • Grunpfnul
  • [*][*]
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #7
No love for ati? *sniff*

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #8
There is love, but there's no implementation ^^
But i guess someone else can do it, now that we have a proof-of-concept
CUETools 2.1.4

  • Case
  • [*][*][*][*][*]
  • Developer (Donating)
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #9
I ran some tests with my Core i7 940 (stock speed) and GeForce GTX 285. Original wav file was 237368588 bytes in size. Not too impressive results:
FLAC -5 : Elapsed Time :  00:00:08.268 (181929373 bytes)
FLAC -8 : Elapsed Time :  00:00:30.560 (181788832 bytes)
FlaCuda -4 : Elapsed Time :  00:00:09.204 (181892106 bytes)
FlaCuda -5 : Elapsed Time :  00:00:10.904 (181763725 bytes)
FlaCuda -8 : Elapsed Time :  00:00:12.370 (181676614 bytes)
FlaCuda -11: Elapsed Time :  00:00:23.883 (181734405 bytes)

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #10
Thank you!
CUETools 2.1.4

  • Ron Jones
  • [*][*][*][*]
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #11
I'm anxious to see how this would perform on the next generation of NVIDIA hardware (GT300), which is supposedly significantly faster in general computational performance than the previous architecture (G200).

Very exciting -- thank you!

  • thundat00th
  • [*][*][*]
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #12
No love for ati? *sniff*

  as much as i love ati i wish they had as much support for things as nvidia does (they need to get to work on that hardware havok physics)

maybe the "evergreen" release here in a bit will improve things (i hope)

as far as this goes, i would be interested in lossy gpu encoding, and that might work a bit better regarding the inaccurate floating point calculations

ati stream support  pwease?
  • Last Edit: 10 September, 2009, 04:21:49 PM by thundat00th
My $.02, may not be in the right currency

  • hlloyge
  • [*][*][*][*][*]
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #13
Here are my test results:

Klaus Shultze - Dreams Deluxe Edition, size 797 MB
Core2Duo 8200, Geforce 9600GT with passive cooling

Encoding with FLAC 1.2.1 in command line, -6, version from Sourceforge, 38 seconds

And this...

PS D:\temp_2> .\CUETools.FlaCuda.exe -6 '.\Klaus Schulze - Dreams Deluxe Edition.wav'
CUETools.FlaCuda, Copyright © 2009 Gregory S. Chudov.
This is free software under the GNU GPLv3+ license; There is NO WARRANTY, to
the extent permitted by law. <http://www.gnu.org/licenses/> for details.
Filename  : .\Klaus Schulze - Dreams Deluxe Edition.wav
File Info : 44100kHz; 2 channel; 16 bit; 01:19:00.8800000
Results  : 61,11x; 499280528 bytes in 00:01:17.5764372 seconds;

Windows 7 32 bit.

Well... not that impressive 

(edit) wrote 10 seconds too much for flac encode...
  • Last Edit: 10 September, 2009, 05:43:03 PM by hlloyge

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #14
What was the file size for flac -6? We should compare the speed at the same compression ratio, e.g. output file size, not at the same compression level, because e.g. -6 for flac is much lower compression than -6 for flacuda. Please, try to compare flacuda -5 vs flac -8, and compare both execution times and file sizes.

Here's a graph i made of Case's results:


This shows x3 speedup of flac -8 compression.
  • Last Edit: 10 September, 2009, 06:40:42 PM by Gregory S. Chudov
CUETools 2.1.4

  • Wombat
  • [*][*][*][*][*]
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #15
Not to shabby. Tried it on a C2D@3600+GTX260

Dream Theater, Awake

Original    793.976.444 Bytes
Flac 1.21 -8    568.604.561 Bytes ~94 sec. encoding time
Flaccuda  -8   567.956.198 Bytes ~53 sec.

I don´t have a recent Flake version at hand so i don´t know how much comes from Cuda alone.

Edit:
Flaccuda -6      568.280.716 Bytes ~48 sec.
  • Last Edit: 10 September, 2009, 07:50:03 PM by Wombat
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

  • GHammer
  • [*][*][*]
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #16
This is on a 9500 GT

FlaCuda
Filename  : Clocks.wav
File Info : 44100kHz; 2 channel; 16 bit; 00:05:07.4670000
Results  : 43.10x; 35657424 bytes in 00:00:07.1331000 seconds;

Flac 1.2.1
Clocks.wav: wrote 35796074 bytes, ratio=0.660
2.91 seconds

Both were just run as <executable> Clocks.wav

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #17
Flac 1.2.1
Clocks.wav: wrote 35796074 bytes, ratio=0.660
2.91 seconds

That's a bit too small file for comparison. And it's better to compare against flac -8. Default flac compression level is very fast, i don't think it can be beaten by FlaCuda, at least yet. FlaCuda is focusing on higher compression.
CUETools 2.1.4

  • Lucho
  • [*]
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #18
GPU audio encoding will be useful when OpenCL get adopted by both ATI and Nvidia for now is just "proof of concept"

  • hlloyge
  • [*][*][*][*][*]
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #19
Here I am again, this time, more detailed:

Flac 1.2.1 vs Cuda 01

File: album.wav  643566044

Windows 7, C2Q9400 @ 2.66 GHz, Geforce 9500 GS

flac -8: wrote 405957413 bytes, ratio=0,631 in 99 seconds
cuda -8: 34,98x; 405731414 bytes in 00:01:44.2910429 seconds;

Is there multicore flac encoder?  that would be a nice thing to test...

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #20
Is there multicore flac encoder?  that would be a nice thing to test...

http://softlab-pro-web.technion.ac.il/Proj.../downloads.html

I haven't tested this personally or done anything about trying to adapt the code for inclusion in Flake.

  • gib
  • [*][*][*]
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #21
Hey, wow.  This topic of mine was bumped, and with proof of concept software to boot.  Thank you, Gregory!

Here are my results to add to the data (I used flac 1.2.1 -8 and Flacuda01 -8 as suggested):

CPU:  Athlon X2 @ 2.35 GHz
GPU:  9600 GSO @ 600 MHz

File 1:  656647868 bytes
Flac:    466183490 in 148 seconds
cuda:  465898530 in 65 seconds

File 2:  654389948 bytes
Flac:    362792762 in 145 seconds
cuda:  360670158 in 63 seconds

More than 2x faster and better compression too.  That's pretty impressive.

  • PatchWorKs
  • [*][*][*][*]
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #22
Well, I believe that even a small gain is always welcome.

I'm not a developer, so I dunno if possible, but: what about a liboil-like library but for GPGPU encodings, so *any* codec could benefit from GPU computations ?

  • hlloyge
  • [*][*][*][*][*]
FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #23
Again: C2D8200, Geforce 9600GT

album.wav to flac -8

original: 578046380
flac: 344489508 in 80 seconds
cuda: 344226134 bytes in 00:00:52.8150209 seconds

Nice.

FLACCL: CUDA-enabled FLAC encoder by Gregory S. Chudov (prev. FlaCuda)
Reply #24
I'm not a developer, so I dunno if possible, but: what about a liboil-like library but for GPGPU encodings, so *any* codec could benefit from GPU computations ?

Not sure. The code i wrote is quite codec specific. The catch is in a relatively slow connection between CPU and GPU. I had to implement practically the whole FLAC algorithm on the device, so that i won't have to transfer intermediate values between host and GPU, only the final result.

FLAC turned out to be very convenient for GPU. Probably the most convenient. One look at e.g. ALAC algorithm was enough to understand it can never get the same benefit.

original: 578046380
flac: 344489508 in 80 seconds
cuda: 344226134 bytes in 00:00:52.8150209 seconds

Nice.

Thank you. And how about FlaCuda -5? It should provide enough compression to beat flac -8.
  • Last Edit: 12 September, 2009, 06:34:42 AM by Gregory S. Chudov
CUETools 2.1.4