I'd like to know what you think about this project so contact me if you have any comments or feedback.
Out of curiosity, would the multithreaded one be better?
Otherwise, running two encodes in parallel will almost certainly be faster due to less overhead from threading, synchronization, etc.
Quote from: Mike Giacomelli on 02 August, 2009, 12:21:22 PMOtherwise, running two encodes in parallel will almost certainly be faster due to less overhead from threading, synchronization, etc.Are you sure?
Doesn't foobar2000 already do this if you have more than one core in your computer? I was getting similar numbers on quad core machine when I had to do some coding for my dad. My core duo goes about 43x if its not running warm.
Running two encodes in parallel results in essentially perfect parallelization. The only overhead comes from disk contention, which is still a problem for the multithreaded single process case anyway.
Running one process will encounter additional overhead due to thread synchronization, lack of granularity in parallelism, overhead for inter-thread communication, etc. In order to make up for this, there would have to be additional work saved by running in one process and I don't see what that would be for MP3.
QuoteRunning one process will encounter additional overhead due to thread synchronization, lack of granularity in parallelism, overhead for inter-thread communication, etc. In order to make up for this, there would have to be additional work saved by running in one process and I don't see what that would be for MP3.As you mentioned above, concurrent disk access is a problem when you execute multiple encoder processes in parallel. If you run them as tasks in one process you can use a file I/O scheduler that takes care of it, as I did.
Why is there a speedup for scheduling sequential reads vs. letting the OS's file caching schedule?
So you're getting a huge speed up by essentially just buffering the entire file into memory before processing?
In the more general case, an optimal scheduler would read just enough from one file before it needs to switch to reading for another file. This would lead to pretty big buffers, but not necessarily whole-file buffering.
No idea yet if this encoder is implementing something like that, but if it is, that sort of scheduler would be extremely valuable for other open-source applications.
In "fpMP3Enc" I've not determined the optimal memory strategy yet because I'm still working on parallelizing the encoder. It will depend on the final performance.
Will it be optimised for fpMP3Enc or will it be more adaptable to different encoders or even decoders?
There are codec settings that decode very slowly, Monkey's Audio's extra high for example. The encoding tasks might run idle if the I/O scheduler reads too much data from such files. Maybe when the scheduler adjusts the buffer size dynamically it should look in both directions, and if decoding takes longer than encoding even give the decoding tasks higher priority?
I know fpMP3Enc currently just supports WAV, but that might change in the future, or not be the case in other projects that might want to use your library.
The benchmarks (Vista x64, Intel Q9450@2.66GHz, 8 GiB RAM):LAME 3.98.2 (x32): 32.3xfpMP3Enc (x64; single file encoding): 60.3xfpMP3Enc (x64; multi file encoding): 109.7xThis means that fpMP3Enc is about 87% or 1.87x faster than LAME in single file encoding, while the speedup is 3.4x in multi file encoding.
x80 encoding speed equates to 14MB a second read (from an uncompressed file, if lossless then half that). A modern HDD should be able to do 3x that without breaking a sweat, so I am not sure where the speed differentials come from when buffering to memory. Windows can have radically different read speeds depending on the transfer buffer size selected, off the top of my head on XP 64KB was the optimum size.
How many threads did you use with the presented results of fpMP3Enc ? I suspect the multi file encoding is done with 4 cores. If so, Lame encoding 4 or more files with one file per core would result in about 4 x 32.3=129.2 times encoding speed.
Also, should a native well designed 64 bits encoder be nearly twice as fast as its 32 bits counterpart or do the Core2Duo chips handle this well through some kind of emulation?