What is the current status of wavpack 4? Reply #25 – 2003-05-30 18:19:04 While making a post the mentions Wavpack lossy in the Musepack forum, I think I got a better feeling for how Wavpack works.(Please correct me if I'm wrong, bryant)1. Generate a single predictor based on numerous preceding samples. This guesses the sample value for the current sample. (A later post gave a simple example with a linear prediction)2. Work out the difference between actual sample and predictor and store this "error", allowing perfect reconstruction.3. Pack the files as well as possible (e.g. Huffman coding, like Zip) to make use of redundancies.With lossy hybrid mode, after the prediction is made, you can choose to knock the least significant few bits off the error. Of course, it's possible that error can build up, since each sample is predicted from previous ones and those are now slightly incorrect, so it must be a little more complicated than that, presumably ensuring that the net error is close to zero over a reasonable time period. Reducing the error term to fewer bits creates more redundancy and makes the packing more efficient, so reducing the bitrate. The removed bits are compressed separately in the correction file, to enable perfect reconstruction once again.Now, I can see that there's only one predictor for each sample, and the simple mathematical error (subtraction) is all that remains to be stored.So, I can imagine how it's quite possible to look at the error terms for a series of samples and round them in such a way that the correction file contains noise that follows the shape of the ATH curve pretty well but has an average amplitude of zero over a certain timescale, causing no DC shift.From the udial.ape thread (test your soundcard for clipping), where someone damaged his tweeters, one ought to be careful about putting too much ultrasonic noise content in files (but this was full scale against a quiet audible tone, causing him to increase the volume knob!).Soft ATH noise shaping might have lower ultrasonic content, for example.Then again, the decoder could be required to instigate a lowpass filter to protect the user if we're pushing to very low bitrate with lots of error in the ultrasonic range (but not so much to cause regular clipping). The filter would be turned off, of course, if the correction file is being used to restore lossless playback, but in lower bitrate lossy modes a flag in the lossy stream could indicate the attenuation required for frequencies above, say 19-20 kHz. This would safeguard against this potential tweeter risk without breaking the predictor.It is also plausible to shape the correction file noise in different ways, e.g. to follow some calculated frequency dependent masking threshold based on simple psychoacoustics, but this would require some frequency analysis. Maybe the analysis could be greatly simplified from full blown lossy coders (e.g. using RMS amplitude of sub-bands instead of using transforms) and the masking threshold could be ultraconservative, but this is a lot more work than a consistent noise shape.It doesn't seem (on limited evidence) that splitting the signal into reconstructable bands before running the predictor is viable. It might be possible to use such a method to shape the noise, however.It does seem plausible to make a rough measurement of the loudness (e.g. RMS value of the signal is very easy) and modulate the allowed noise that way, with no consideration of the frequency-dependence of masking, simply the loudness. That might make a reasonable easy "standard" lossy mode, which remains audibly transparent for the vast majority of samples at non-painful volume.Clipping might be a plausible concern if strong noise shaping has a high enough amplitude in the high frequencies.Just some further thoughts on the subject, which might clarify my contribution to this thread down to stuff that's reasonably viable to implement (rather than some of my sub-band-with-separate-predictor ideas, which don't look too promising).