The filterbank (including the DFT) is the slowest part of the decode process, using roughly half the total time. However, for decoding at least, the filterbank only uses 10-20 MHz (or perhaps lower with SIMD), so I'm not sure CUDA makes sense.If you're interested, I've been working little by little on improving filterbank performance for various embedded devices, where performance can make quite a difference due to battery limitations.
Decoding [...] is carefully defined in the standard. Most decoders are "bitstream compliant", meaning that the decompressed output they produce from a given MP3 file will be the same (within a specified degree of rounding tolerance) as the output specified mathematically in the ISO/IEC standard document.
If I understand right, you mean that there isn't much of a gain in speeding up the filter bank, since it operates in a slow rate anyway?
In your work in improving the filterbank performance, are you modifying the design itself (changing the prototype filter, structure, etc.),or are you making the filter bank code smarter?
I am curious to know weather there is a point in experimenting MP3 decoding with a different filter-bank. For example, maybe a different prototype filter can give better sound quality?
Perhaps GPU-based transcoding could be interesting. Doing 1000s of files in a batch means large potential for threading/vectorization.
Quote from: knutinh on 04 February, 2012, 05:59:25 PMPerhaps GPU-based transcoding could be interesting. Doing 1000s of files in a batch means large potential for threading/vectorization.You're going to run into I/O slowdown far before you can utilize that degree of parallelism.