lossywav for lossy codecs?
Reply #38 – 2010-03-23 02:33:36
{edit: started this post, then came back to it. Now I see moozooh also mentions Wavpack's approach in the post above. I'll leave my text unmodified, however} I going off on a bit of a wild tangent that I haven't looked into fully, and I expect a number of ideas to be shot down, but maybe there's something viable, though I suspect this whole idea would be a heck of a lot of work. I wonder if the Wavpack hybrid approach would allow bitrate-peeling (or preferably VBR quality layering). David Bryant's book chapter about the design of wavpack was a good read if you're interested in the way he coded the residual to allow the lossy part to use a curtailed code, while the least significant part was put straight into the correction file. The predictor has to be based on the lossy version's decoded signal so that if the correction file isn't used it still decodes properly and doesn't drift. If you had multiple degrees of lossiness you'd base the predictor on the most lossy then provide correction files for each level of scaling towards lossless that you wish to retain. As an example, you might envisage something like: 1) full agressive psymodel VBR aimed at transparency, guessing perhaps somewhere in the range 200-300 kbps, with strong noise shaping to exploit tonal masking, noise masking and temporal masking close to the psymodel's predicted limits. 2) less agressive psymodel approach or something like lossyWAV portable, perhaps 350-400 kbps (and portable would be readily transcodable to FLAC) 3) lossyWAV standard, perhaps 450-550 kbps, probably a suitable near-lossless archive for transcoding to conventional lossy. 4) lossless You might also add 0) Non-transparent VBR, aimed at decent non-critical listening. Pure guess, maybe 150-200 kbps. Whichever level is lowest is the waveform on which the predictor must act in case the higher levels of residual correction are not provided (or fail to verify). The file format could be a simple form of interleaving in time chunks (similar to audio and video interleaving in AVI or MP4 containers for example) in which you'd interleave as many layers as you wished to keep for each purpose or strip only those you need. The WavPack residual approach might allow finer temporal variation of allowed noise, finer than the codec block size of 512 samples that's typical of lossyWAV. It wouldn't necessarily exploit the wasted bits feature in WV. In conjunction with a psymodel, I'd imagine this could be exploited aggressively for temporal pre-masking and post-masking curves that allow more broad band noise near transient events, mainly for some milliseconds after them, but also for a smaller time before them (which must be controlled to minimise pre-echo). In conjunction with frequency masking it's not as simple as it might sound. I don't know if it's possible to aggressively noise shape at one layer and progressively add better residual coding to lower noise and flatten the noise spectrum, or whether instead one would have to retain the spectral shaping and simply lower the noise across the spectrum by improving the residual accuracy. The other approach to this sort of thing might be if there were a lossless reversible transformation into subbands that might still be compatible with predictor/residual coding. But I suspect the types of transform that work for Musepack and MP2 aren't reversible in integer mathematics, so wouldn't allow scaling to lossless. Is this all rubbish?