One idea I'm looking into for the next version is to cache the file accesses more intelligently. A better solution though would be to multi-processor enable the decoding parts of the library, so that consecutive blocks of a single file are transparently processed in parallel & cached for when they're needed (I've mentioned this before). David, have you ever looked into doing that?
There's no reason why it wouldn't work, although for decoding I wonder if the disk I/O would ever be able to keep up (or justify it)
Also, for encoding (especially the -x modes) doing the same thing would certainly be a gain.
It would probably be a better use of time than going to hand-coded assembler for the single threads (which is another thing I think about), but unfortunately I don't have time at this point to look into either. I am happy to kibitz, however...
kibitz? : ) I can provide all the threading and CPU code - in fact if you could split the decoding context into independent block-specific ones (ie. ones that have all the decoding state needed to decode an entire block in isolation), I can do the rest. With multi-cores so widespread now, this not only gets you a massive bang-for-buck, but better yet all apps benefit transparently.
I think the functionality you're looking for already exists in the current library, although it's a little convoluted to get to it. You just need to start up a WavPack decoder context for each thread you want to have, and set the OPEN_STREAMING flag for each one.The main code would read the WavPack file and parse it into blocks, then pass the blocks (using the WavpackStreamReader callbacks) to the decoding contexts to convert a single block each. Using the OPEN_STREAMING flag makes the contexts not care that the data they suck in is from discontiguous blocks (although your parser must make sure they do get complete blocks). The WavPack DirectShow filter uses this same method (not to use multiple cores but to implement the parsing/seeking separate from the actual decoding).