Hydrogenaudio Forums

Lossless Audio Compression => WavPack => Topic started by: superdumprob on 2003-05-21 13:00:25

Title: What is the current status of wavpack 4?
Post by: superdumprob on 2003-05-21 13:00:25
What is happening on the wavpack front these days? Is there much progress towards a new version? I've read a little about wavpack 4 around here on HA but that was a few weeks ago now.
Title: What is the current status of wavpack 4?
Post by: bryant on 2003-05-22 06:42:16
The reason that I have not been discussing WavPack 4.0 is that I don't want to build up a lot of expectations that I might not be able to meet. Since the core of WavPack development is done by only one person, there is always the risk that the progress could be slowed significantly (or even stopped) by any number of circumstances. And for me one of the biggest advantages of writing free software is that I can do exactly what I want, when I want. This is not to say that I have no interest in what users would like to see, but the last thing I want is for this to become a "job". 

I can imagine someone e-mailing me with "hey, I've been waiting three months to encode my CDs because you said you'd have a new version!" and have them get the response "this is David's wife and we have a new baby and David isn't allowed to play with the computer anymore!". See my point? 

However, this has come up a few times and I do feel a little funny just not saying anything, so I will say that I am working on WavPack 4.0 and that it will be a completely new format from the ground up. The current WavPack format was designed over 5 years ago and although I have tried to improve it along the way, the fact that it lacks any sort of block structure makes is impossible to add certain desirable features like fast seeking, streaming and error tolerance. I considered hacking a block structure onto the existing architecture, but eventually decided that to make WavPack the compression format that I really wanted, I would have to make a clean break with the past and start fresh. (Of course, there will always be decoding support for all previous versions and it will remain open source).

As for timing, I would like to have something for testing in 2-3 months and some sort of release in 4-6 months. But again, this assumes that I don't lose interest or get hit by a bus or (worst of all) get a "job". 

Thanks for the interest and I hope that this (sort of) answers your question. 
Title: What is the current status of wavpack 4?
Post by: den on 2003-05-22 06:45:02
Cool, thanks David. I can rest easy that my current encodes will always be playable anyway.

Den.
Title: What is the current status of wavpack 4?
Post by: Dologan on 2003-05-22 06:48:18
Thanks for all your effort developing this excellent software. I think we're all are looking forward to the next version of Wavpack; but of course, your life comes first! 

Am I right to suppose WavPack lossy will continue to be supported? Will it also be redesigned?


~Dologan
Title: What is the current status of wavpack 4?
Post by: superdumprob on 2003-05-22 12:31:52
Thank you for the information David.  I understand what you mean. Wavpack is more of a hobby and an interest which is perfectly understandable. I shan't expect a new version anytime soon or ever, I'll just consider it a nice surprise if it does come.  Nevertheless, good work!

Have fun!
Title: What is the current status of wavpack 4?
Post by: yourtallness on 2003-05-22 15:10:56
It would be cool if wavpack lossy could use some psychoacoustics, instead
of dropping bits in order to reach the desired bitrate. Of course, I don't
know if this is feasible, but imagine if the lossy file was of mpc quality!
Also, better seeking is vital, 'cos the current seeking mode simply fast-forwards
to the selected point, which is kinda frustrating...
Title: What is the current status of wavpack 4?
Post by: den on 2003-05-22 15:48:44
Superdumprob hit the nail on the head. If Wavpack 3.97 is as good as it gets, well that's great, and if or when something else comes along, it will be a real bonus.

@yourtallness
I understand your thinking, but I'm not sure if I entirely agree. I actually think the lack of psychoacoustic modelling in wavpack lossy is a feature, in that it reduces artifacts introduced from the encoder second guessing what you won't hear. A few less bits would be nice, but if you start getting a model that is too involved, David is just making another MPC/MP3/Vorbis. Wavpack lossy is currently brilliant as a transcoding source based on my tests, and I'd hate to lose this from it getting too clever with its bit/frequency allocation. Some basic smart/safe switching joint stereo would be nice, but I'm not sure if introducing much else might start to introduce problems.

I suppose as long as there is an option to turn modelling off if required, it wouldn't matter.

Den.
Title: What is the current status of wavpack 4?
Post by: GeSomeone on 2003-05-22 15:51:29
Quote
but imagine if the lossy file was of mpc quality!

But bigger?  You would use Musepack wouldn't you?
--
Ge Someone
Title: What is the current status of wavpack 4?
Post by: yourtallness on 2003-05-22 16:05:43
As I see it:

If Wavpack lossy at 200 kbps were transparent, one could keep the lossy
file on their HDD, and back up the restoration file on CD-R so as to spare
themselves from re-ripping. That way u would have a great sounding
lossy file for casual listening (without needing to go as high as 320~384kbps)
and u would always be able to recreate the original wav, when needed.
Title: What is the current status of wavpack 4?
Post by: AstralStorm on 2003-05-22 16:41:43
Why not use MPC for playback and lossless (preferably FLAC/Wavpack) for archiving then?
MusePack isn't lower quality than Wavpack lossy...
(oh, it is, but it's nearly always inaudible)
Title: What is the current status of wavpack 4?
Post by: bryant on 2003-05-23 07:19:21
Thanks for the support, guys! I don't mind talking about what I'm up to as long as everyone understands that until something's out it's just vaporware.

As for the lossy/hybrid mode, yes, it's definitely going to be in there. In fact, that mode is really more interesting to me than pure lossless (although I assume that more people use WavPack for the lossless mode).

The new format is going to be more in line with existing formats in that the decoding specification is going to be fixed, but there will be considerable flexibility in the encoding process. This way, it will be possible for other people to try different ideas improving the encoder without breaking the decoder. The lossy mode will probably go down to 196 kbps (instead of the current 265) and I'm sure there will be some improvement in quality for a given bitrate, but I don't know how much yet.

I am looking at applying psychoacoustic models at some point (although this will be optional). Like den mentions, smart LR/MS switching would be a first step. Also, I have been playing around with using psychoacoustic information to vary the bitrate (i.e. adding VBR) or using it to adapt the noise shaping to fit the signal. I will not allow any digital filtering or other frequency domain processing in the signal path, however. Again, this will all be built into the decode spec and the encoders can use the features or not at their whim.
Title: What is the current status of wavpack 4?
Post by: DickD on 2003-05-23 14:06:32
Wavpack lossy and the ability to have recovery files to restore to lossless is a fascinating subject, though I confess that I haven't used it yet because it's not necessary for my relatively limited collection of rips to date. When I get a PC with an 80 gig drive, and start ripping everything in sight, then it could be tempting.

Using only very basic psychoacoustics to estimate the maximum noise or quantization error over the full bandwidth that would remain inaudible, might be a good approach to VBR. I understand you already use some quantization error versus average power. (If it's allowable prediction error, not quantization error that you use, please, read 'prediction error' wherever I've written 'quantization error' or 'distortion' below)

From whan I read (Den?), Wavpack lossy at 320 kbps can be ABXd every time, but only very loud - almost painfully? - when the background hiss becomes audible. Possibly, some of what I suggest below could save bits in other ways (while retaining the option of recovering the lossless PCM) so allowing you to be less aggressive in allowing noise, at least within the frequencies where it's most audible.

Now, it's getting to the boundaries of my knowledge, but I believe sub-band codecs like Musepack (and good EQ's like Foobar's) use methods of splitting the signal into frequency bands that are lossless and can be completely reversible in a bit-identical way (but then they start throwing away bits in quantizing each band as coarsely as the psymodel allows). I believe the band splitting takes the form of a complementary pairs of FIR-type convolution filters, whose outputs can be summed to identically reconstruct the input. (e.g. FB2k's EQ with all bands set to zero is bit-identical to disabling the Equalizer DSP).

Perhaps a new Wavpack Lossy, while not taking such an extreme route as Musepack and avoiding knife-edge psychoacoustic decisions based on FFTs and tonality estimation etc, could use lossless sub-band splitting to make fuller use of noise thresholds to allow, for example, much greater distortion or noise in the 17-22 kHz band because the ATH is much higher, and an intermediate amount of distortion from, say, 15-17 kHz. (The shape of the ATH curve is similar to Strong ATH Noise Shaping dither or foobar2000 (see spectrum below). It might even be possible to use different lossless stereo modes MS/LR in each sub-band. If this is safe enough for dither that sounds like it's 15dB quieter than unshaped dither even during the fade-out, I think it has to be safe as a minimum level of noise even in quiet parts of Wavpack lossy. As the music level increases, presumably the amount of noise in the high frequencies can increase further, and it really ought to be very safe to completely discard frequencies above 20 kHz from the lossy mode entirely (and probably those as low as 19 kHz too, especially as the sub-band splitting would involve a gradual roll-off from one band to another, this frequency transition width being inversely proportional to the length of the FIR filter).

(http://members.lycos.co.uk/bhafool1/rarities/silence_strong_ATH_noise_shaping_dither.png)
Frequency spectrum of silence plus Foobar2000's Strong ATH Noise Shaping dither

If you're splitting into sub-bands and aren't trying to use tonality (peakiness of the spectrum, which differentiates sinusoidal waves from noiselike signals), you shouldn't need to do an FFT, but can simply measure the variations in RMS level of each band to estimate the allowable levels of distortion in the worst case. IIRC, the worst case is for noiselike, rather than tonal signals, I believe, because noiselike signals don't mask as well as tonal ones (this paper (http://ccrma-www.stanford.edu/~bosse/proj/node18.html) might give some clues). Given the power in the band, you could probably use a simple estimate of allowable distortion that never exceeds the worst case (being that of noise as the masker) and set the quantization accordingly. You could use any fast-to-calculate estimator that never exceeds the value given by the more accurate predictor based on FFTs even when tonality is low (t = 0.3 for noise in the link above).

Just a thought (possibly flawed if I'm missing something) of approaches to reaching still lower bitrates than current Wavpack lossy without risking the typical artifacts of psychoacoustic lossy compression schemes or reducing the encoding speed to their levels.

I'd imagine, if you play on the safe side, and don't assume that tonality and similar principles apply, you can probably shave off quite a few bits without adding too much computation. And certainly, given the listening tests on HA for "how high can you hear" using lowpassed real music, it should be possible to build in a completely transparent lowpass to be used for only the most aggressive settings of a new Wavpack lossy.

Of course, being able to recover the lossless audio, also requires the lossless sub-band splitting to be work bit-identically. I can't think where I read that - probably Frank Klemm, Andre Buschmann, or the TooLAME site, and I guess you also want something that doesn't damage the efficiency of the lossless coding of the residual (or difference) data used for reconstructing the lossless PCM. You might find it's best to compress each sub-band's separately, or to sum them and compress the whole bandwidth residual losslessly. I couldn't guess, as I don't know how the predictors work.

Just throwing out thoughts really, in case any of them might suit your objectives. Feel free to disregard any and all of them - no offence will be taken! (After all, I did include some frequency-domain stuff, but all done without recourse to FFTs and only if totally bit-identical reconstruction is possible)
Title: What is the current status of wavpack 4?
Post by: superdumprob on 2003-05-23 22:19:16
Interesting stuff. Thanks again bryant and thanks to DickD for the lengthy post also.
Title: What is the current status of wavpack 4?
Post by: den on 2003-05-24 08:15:41
@DickD
It's when I read posts like these that I realise just how little I know about audio compression. Having said that, some of your points seemed quite feasible to this layman.

I can ABX Wavpack lossy every time at 320 kbits, but I am only picking up a slight increase in noise most of the time, and that beats having significant artifacts in the music like those from other lossy codecs in my opinion. If it was possible to employ some of your ideas and remove the slight noise at extreme volumes, Wavpack would be simply amazing and/or it would be possible to get the bitrate down to achieve the current, low, very acceptable amount of noise introduced.

Thanks for the post.

Den.
Title: What is the current status of wavpack 4?
Post by: bryant on 2003-05-25 05:58:38
DickD:
Yes, thanks for the in-depth post! I will digest it for a few days and post again. 

There is one thing that I want to bring up now though because it relates to something den mentioned in another thread and you touch on here. There are currently only two options for the noise shaping in the WavPack lossy/hybrid mode: none and first-order. Without noise shaping the added quantization noise spectrum is absolutely flat and with noise shaping it is (as expected) 3 dB higher level (overall) but drops off at 6 dB/octave at lower frequencies:

) and am not sure why.

My current guess is that usually the only portions of the noise that aren't masked by the music are the very highest frequencies (because there's very little energy up there). Noise shaping in this case essentially moves noise from an area where it's completely masked by the music into the range where there's nothing to hide it. This seems to turn conventional noise shaping theory (where the idea is to move the noise away from the music and leave greater S/N in the midrange) on its head. What do you think?

And this also makes me wonder if there are noise shaping algorithms around that shift the noise downward in frequency and whether or not they would work better here. Hmm...

den:
Welcome back from holiday, den! To answer your question from the other thread, the first track I found that kills joint stereo was actually pointed out to me by Dologan. It's called hard.wav and I think it was created by Ivan just to challenge joint-stereo encoders (I can't find it on the Internet but I can upload it if you'd like).

After that I put in the command-line option to allow disabling JS, but left the default on. Later, after I found some hard-panned tracks that had audible noise in JS mode I decided to change the default.

The reason that Furious works better in JS mode is because those hi-freq spikes are in the middle; if they were on one side then LR mode would be better. So, I'm still not convinced that the default should change; better to just get on with WavPack 4 so I can switch on the fly...

Those are the only killer tracks I've found, but I suspect that the reason is that I don't listen to a lot of music with artificial electronic sounds (the only reason I had The Fast and the Furious ST is because it was the first U.S. release that was copy protected ).
Title: What is the current status of wavpack 4?
Post by: eltoder on 2003-05-25 15:12:35
you can try to add more noise in frequencies with more power (so that noise shape follows music shape), but this will require some sort of frequency analysis (or completely changing the way wavpack works  )

on the other hand (may be I ask something stupid), have you tried to allow bigger errors for bigger (by abs value) samples?
Title: What is the current status of wavpack 4?
Post by: DickD on 2003-05-27 18:09:41
bryant:
Quote
Noise shaping in this case essentially moves noise from an area where it's completely masked by the music into the range where there's nothing to hide it. This seems to turn conventional noise shaping theory (where the idea is to move the noise away from the music and leave greater S/N in the midrange) on its head. What do you think?


Well, I'm not sure what "conventional noise shaping theory" is. I've sort of only come into it recently for audio. What to me is conventional is the type of thing implemented by Garf for Foobar2000, where it follows the ATH and concentrates most of the noise at the inaudible frequencies (I mean 18+ kHz, not 10-12 kHz). I think it was Garf [Edit: no, it was KikeG] who noted that for a realistic quality comparison of dither types, it should be very quiet - barely audible - partly because the Fletcher Munson curves (see top graph) (http://replaygain.hydrogenaudio.org/equal_loudness.html) indicate that the audibility of the HF would be higher, relative to the other noise, if the volume were un-naturally loud.

I think you could put much more noise in the very highest 3-5 kHz (assuming 44.1 kHz sampling) and reduce the noise in the more audible band (to about 15 kHz) with a transition between these regions. If you note carefully just how many dB are involved (10 dB is 10 times the power), there's MUCH more noise power from 18.5 kHz to 22 kHz than there is from, say 10-15 kHz. Something like 40 dB more, which is of the order of 10,000 (ten thousand) times the power (100 times the amplitude, is probably more relevant from the point of view of how much data you can remove - where I guess 100 ~= 128 = 2^7)

I believe that Garf's noise shaping includes slightly increased noise at low frequencies (see Fletcher Munson curve on log frequency scale and compare to the image I posted in my previous post, but this doesn't give much bang-for-buck as the high frequency bit because it's 30-40 dB/Hz lower over a smaller range of frequencies (in Hz).

For an idea of just how powerful the noiseshaping is, try foobar2000 with 8-bit playback and strong ATH noise shaping dither. Then try disabling the dither and going for 12-bit padded to 16-bit (you need to "show all bitdepth modes") or alternatively compare it to no-noiseshaping dither at 12-bit, for example.

If I were setting up an old/budget machine with an 8-bit soundcard (my mother had one until only a short while ago, which I'd set up using MAD in Winamp), I'd use FB2K and strong ATH for sure.

I noticed that the ABC/Hidden Reference (http://ff123.net/abchr/abchr.html) blind rating/testing program included the well-known castanets sample for the About dialog as a WAV file. I noticed it was a remarkably small WAV and wondered if it was compressed. It was mono, 44.1 kHz, 8-bit, with strong noise-shaping dither by the look of the spectrogram (below), and sounded far better than I expected an 8-bit file to sound. (It was actually 353 kbps in mono and 1.3 seconds duration, so it's hardly tightly compressed, but if you must have WAV, it's far better to halve the bit-depth than halve the sampling rate in this case):
(http://members.lycos.co.uk/bhafool1/rarities/c_wav_8bit_spectrogram.jpg)
Spectrogram of Castanets: WAV PCM, 44.1 kHz, 8-bit mono, strong noise shaping

You can see how strongly the noise shaping uses the frequencies from 16 kHz and up, while it adds little or no noise to those below 15 kHz (and probably less noise than standard dither would).

Incidentally, Frank Klemm's page on dither (http://www.personal.uni-jena.de/~pfk/mpp/dither.html) indicates that with a 32 kHz sampling rate, noise shaping can gain about 5 dB of audible SNR increase (at cost of 8 dB of measured SNR reduction over the full bandwidth). This jumps to about 15 dB of audible SNR increase at 44.1 kHz (at 29 dB measured SNR cost), indicating just how important the frequencies above 16 kHz are to making major perceptual gains by noise shaping.

So, I think you could gain considerably more bang-for-buck than the sloping line you show in your post above by adopting a strong noise-shaped curve and accepting much more noise at the frequencies greater than 18 kHz (maybe >16 kHz) while reducing the noise in the audible band to below the level of the flat spectrum noise, showing in cyan. Your diagram indicates there's more noise power from about 7 kHz up in Wavpack's current mildly noise-shaped spectrum. I'd guess the 7-15 kHz region being up to 6 dB louder is why it sounds worse to you and den (I haven't tried it myself). Perhaps if the 7-15 kHz were quieter or as loud, and the ramp went very steep from about 16 kHz up to 18 kHz, it would be perceived as quieter. I'm sure Garf has some good computational techniques for creating this sort of shape, and a lot could be learned from the FB2K source code of recent versions.

den:
Yes, I think that if some of my ideas worked, you'd have the choice of either lower bitrate with the same noise, or same bitrate with no audible noise (how extreme are these extreme volumes anyway - are we talking painful?), and in fact, with some of the more complicated minimum-masking ideas (on a band-by-band basis and achievable without FFTs), the bitrate might become variable but chosen to just keep the noise inaudible, resulting in a lower average bitrate but sometimes a higher instantaneous bitrate. (Actually, I think wavpack lossy is already slightly VBR, because IIRC, it adjusts the noise floor according to the average power)

eltoder:
The masking idea will actually do what you suggest. It will cause the loudest subband to naturally mask itself more than adjacent bands. Unlike codecs aiming for the maximum compression, I propose treating it like noise (low tonality), so the masking effect is never falsely assumed to be greater than it is because of incorrect tonality assumptions or measurements (i.e. assume worst case, which I believe is noise). This would also achieve the "bigger errors for bigger-valued samples" effect you mention, but it goes further by allowing even more noise in frequency bands where there's a lot of energy and a fair bit more in adjacent bands to those with high energy, but much less error in bands which are well separated from the loudest signals. I'd envisage it would use masking far less aggressively than traditional codecs, though. In addition I envisage strong noise shaping (as above) to at all times allow extra large errors in the >16 kHz band where they're least noticeable.

DickD
Title: What is the current status of wavpack 4?
Post by: den on 2003-05-28 02:20:38
DickD:
Depends on the sample.

Two tracks I posted about before are quite different from each other in this regard.

Mandela Day - Simple Minds, shows subtle background hiss when encoded at 320 kbits, which can be picked up at moderate to high levels of volume, ie if I was sitting and taking in the music for the sole sake of enjoying it. Not painful volumes, but high enough so that you can only hear the music. Much more obvious at the beginning of the track where there is some space in the recording. Once the song get's going, it is very hard to pick. The original recording on CD is very clean, so any added noise becomes obvious. I also have problems with Vorbis with this track (added background noise). With Wavpack it is not distracting, but it is there.

Blue Monday - New Order, I can't pick it unless I crank it up to where it gets painful. This is an older recording than above, originally recorded on analogue equipment, using older synthesizers/drum machines/samplers and tape loops, so there is a fair amount of background hiss in the original recording anyway. I suspect this masks the added noise.

With most stuff, I can pick it up at 320 kbits in the gaps within the music, if the volume is near painful. I usually have to be intentionally listening for it to notice it, but if I choose to, I can hear it nearly every time (such as when purposely conducting ABX testing.)

Den.
Title: What is the current status of wavpack 4?
Post by: DickD on 2003-05-28 10:32:47
Thanks for the loudness info, Den. I guess the noise threshold needs to be adaptive to be transparent at normal (non-painful) listening levels, though noise shaping will help reduce it's audibility even in the quiet sections. (I've had the same experience listening to very quiet tracks or quiet intros on cassette with Dolby B turned on - i.e. normally the noise is un-noticeable, but it's obvious on quiet bits).

If the noise is always inaudible, adaptive noise thresholding is OK (as with masking in good lossy codecs), but if it is audible, you could get noise pumping or noise modulation effects if it is allowed to vary fast enough. This wouldn't be a problem if the sub-band maskng idea were used, and used conservatively enough to remain transparent.

My anecdotal experience tells me that lossless codecs get better compression on quiet music (e.g. music with low ReplayGain volume measurements) than on louder music in the main. My guess is that Wavpack lossy with subband masking (or a full-spectrum adaptive noise threshold) would tend to be less variable among music of different loudness (though the residual file, to restore the lossless PCM would still be smaller for the quieter music).
Title: What is the current status of wavpack 4?
Post by: eltoder on 2003-05-28 16:38:05
@DickD:
my suggestion, being very simple, allows easy implementation and fast computation, that wavpack is famous for  any kind of masking, while having greater potential, requires frequency analysis (and sub-band decomposition is also a frequency analysis). this envolves some common problems (e.g. it operates on blocks of data, so it needs some kind of "transients handling" when data changes to fast within the block).

and it allows noise shapping on top of it, of course.

-Eugene
Title: What is the current status of wavpack 4?
Post by: DickD on 2003-05-28 19:57:21
Good point, Eugene.

However, one thing about convolution-based (finite impulse response = FIR) sub-band splitting and recombining (the sort I think is used in many EQ plugins) is that the audio stream is not divided into blocks, transients are perfectly reconstructed and there are no boundary problems.

However, the moment you start making decisions based on a chunk of data (e.g. a frequency analysis on a block of 1152 samples, say) then you do have boundary problems, as you point out and need to detect transients if making decisions about noise threshold for the whole block because the ear can hear increased noise as an unmasked pre-echo if it's more than about 5 ms before the loud transient (the exact numbers may be wrong, but that's my recollection).

So, it may be quite possible to ensure that the length of the convolution FIR filter and the decision window are less than 5 ms, so that this isn't a problem. This is equivalent to 220 samples. For computation speed, the shorter the FIR filter the better, and I'd envisage that a filter of about 110 or 57 samples (~200 Hz or ~400 Hz roll-off between bands by a rough estimate that might be a factor of two out) would be adequate to estimate a worst-case allowable noise by assuming masking by noise, not tonal signals. It's certainly faster than doing Fourier or Cosine transforms as even the very quick Musepack encoder does (just for analysis).

Once you calculate the masking thresholds in different bands for a whole stream of successive time periods (partly overlapping) for which you've calculated the RMS amplitude, you could choose to use the info in a variety of ways, depending on the complexity of implementing it.

Here are a few thoughts:

1. Over all the spectral bands, one of them will have the lowest masking threshold. You could use this as a simple flat quantization noise floor that's safe over all frequencies. (I use quantization to mean allowable error relative to the lossless file)

1a. You could scale a computationally fast noise-shaping form of quantization error so it's below the higher of the masking threshold and the ATH over all frequencies.

2. You could have individual predictors for each sub-band and allow the prediction error to reach the precise masking threshold calculated in that band. (This is more bitrate efficient, but it might be harder to implement, especially if there's any risk of having errors from the overlapping frequency region of two adjacent sub-bands combining to exceed the calculated masking - e.g. band 5 and band 6 both contain 50% of the 2500 Hz frequency content, say, and both allow an error of the same polarity, which when added during band recombination, results in a 6dB greater error so might exceed the masking threshold). Simple approach is to subtract 6 dB guardband to the masking threshold for extra conservatism. After all we want safety first, not a really low bitrate.

3. With either of the above approaches you have to consider that the time resolution of your masking analysis partly overlaps with the next block. I guess the safest solution is to use 50% overlapping blocks and if the masking in the next block or the preceding block was lower for any band, you set your allowable quantization noise relative to the lower mask, so that if you switch the allowable quantization error suddenly the change in noise won't be audible.

I.e. I'm not sure it's necessary to divide the audio into blocks (e.g. the equivalent of lapping transform windows, which overlap in time in a manner that permits lossless reconstruction) if you're cautious about switching the quantization error noise down a block early and up a block late so the transition to and from louder noise is never audible.

I guess the timing uncertainty between blocks in the analysis part, is rather like the frequency uncertainty between analysis sub-bands in point 2.

4. With a FIR filter, you do generate a longer file and have to make assumptions about what precedes the first sample and follows the last. However, if you assume zeroes, which is normal, there's no problem because any "pop" from finishing on a non-zero sample value will just cause the masking threshold to go up and use a few more bits to make it accurate enough up to the end of the file. You can then crop back in the decoder to the always discard the first n samples and the last n samples.

5. You have the choice of:
..a. encoding each sub-band losslessly with a separate predictor (and allowing prediction errors within the masking threshold to remain in each sub-band). The decoder than decodes each sub-band and adds all the corresponding sample values from each sub-band together afterwards to generate the full bandwidth signal. Also, I'm not sure if you could then have the gradual transition from lossy to lossless while still achieving the same lossless compression ratios.
..b. encoding the whole bandwidth as one stream with one predictor (like any lossless encoder) then removing terms in some way I don't quite understand, until the prediction error in each band is just below the masking threshold (which presumably requires you to measure the prediction error distortion at least loosely as a function of frequency and compare it to the masking at that approximate frequency.

I don't really know enough about how a lossless codec works in detail to be more precise and choose between these or use the right terminology.

I certainly don't know the detail of how Wavpack lossy works right now. For example, does it change the allowable error with time, or is it simply constant?

It seems likely that 5a is promising as the sub-bands naturally restrict the range of possible predictors to those with the appropriate frequencies, making each of them more compressible, I'd presume, from an information theory sort of standpoint.

Hmm, I think my knowledge is starting to strain.

Perhaps it is possible to do some simple tests, for example splitting the spectrum into just a few bands and estimating a very crude but very conservative masking threshold from the RMS amplitude of each band.

If we had say four bands created by complementary FIR filters, we could create 4 WAV files and test that they'd recombine losslessly by adding the sample values together.

We could then try compressing each with the current Wavpack lossless to see how the total size of those four compares to the size of the one file losslessly packed.

We could then start trying the current Wavpack lossy with different amounts of allowable error for each subband, and seeing how big the residual file is for each, and again check that recombination by adding makes a file that sounds good (we might manually estimate some adequate masking levels for a test sample or be able to guess it from data provided by a psychoacoustic encoder tool such as lamex)

If one of the bands was simply 19 kHz+ we could losslessly compress it and keep that as one of the residual files that isn't kept with the lossy set of files but is only used for restoring the lossless and see what that does to the overall size of the lossy bundle.

Apart from creating sub-bands this might require only minimal coding effort and it would demonstrate whether the predictors used in Wavpack would work anywhere near as efficiently if the audio is first split into sub-bands.
Title: What is the current status of wavpack 4?
Post by: ProtectYaNeck36 on 2003-05-28 20:05:53
If you place the noise in the 3-5kHz region don't you risk bringing the music to distortion, since I have read that this region of the audio spectrum is most susceptible to this.
Title: What is the current status of wavpack 4?
Post by: DickD on 2003-05-29 09:42:46
Quote
If you place the noise in the 3-5kHz region don't you risk bringing the music to distortion, since I have read that this region of the audio spectrum is most susceptible to this.

If you're asking about my post, I'm only talking about allowing noise up to the masking threshold at any freq (including 3-5 kHz), which by definition is inaudible because it's masked by a louder sound.

If it's unmasked by other sounds, then the noise floor at the Absolute Threshold of Hearing (ATH) as used in Strong ATH Noise Shaping dither, is actually exceptionally low at 3-5 kHz (and a little beyond), which is the kind of region where the ear is most sensitive (e.g. babies crying are somewhere in the sensitive region). But of course, for computer audio, you have uncalibrated loudness and variable volume controls, so ATH will be somewhat dynamic as people adjust their volume controls.

Actually, an ATH relative to ReplayGain loudness might be plausible given that people adjust the volume to control the perceived loudness, though sometimes they'll turn it up during a quiet passage to hear the detail, so perhaps a perceived loudness (perhaps by a similar method to RG) closer of the surrounding 5 seconds or so (rather than the whole track) might be a better bet for dynamic noise floor adjustment.
Title: What is the current status of wavpack 4?
Post by: eltoder on 2003-05-29 13:25:00
Quote
However, one thing about convolution-based (finite impulse response = FIR) sub-band splitting and recombining (the sort I think is used in many EQ plugins) is that the audio stream is not divided into blocks, transients are perfectly reconstructed and there are no boundary problems.

How do you imagine that?    If you say that your signal has some frequency, this means that you have observed your signal for at least few periods. This is surely "a block". You can't get the frequency "in the point", you'll always get some frequency/time resolution.

Moreover, when you use masking, you want to use calculated threshold in each band for at least few samples, right? So you definately need blocks. (at least 16 bands * 16 samples = 256 samples, but 32x32=1024 seems to be more reasonable)

Codec needs blocks anyway, to have good seeking (and usually decoding) speed, and reduce side-information needed for forward adaptation.

About 5a. I have actually tried this about a month ago. may be it will work good, but it definatelly requires predictors different from ones used for full band. (for higher bands - very different, for low-middle - i-dont-know-what  )
But, if I understood bryant's post right, he does not want to implement anything like that.

-Eugene
Title: What is the current status of wavpack 4?
Post by: DickD on 2003-05-29 14:01:57
Eugene: What I meant about time resolution was that for simply splitting and recombining, the reconstruction is sample-perfect, so the process of splitting the bands is perfect with no time resolution problem in the output file.

Sure, when it comes to analysis, you do lose time resolution and you have to use blocks (although you can use a sort of rolling average if you want, which isn't quite the same as a block), which is why I'm suggesting a short FIR filter for band splitting and probably only a few bands to avoid the pre-echo audible noise increase problem and remain conservative in our masking assumptions (by keeping block length short, so that the distinction between transients and noise isn't a problem). Also, by only making the most conservative assumptions, and always choosing the lower of the possible masking thresholds, we should avoid introducing errors that are too large.

I think 16 bands might be un-necessarily high for obtaining a moderately improved Wavpack lossy type of codec with pretty rough masking calculations that are only assured to be OK in the worst case, but aren't aiming for optimally high amounts of distortion/prediction error to squeeze out every last bit of bitrate.

For example, maybe the bands could be roughly chosen to go with the shape of the ATH curve (this assumes 44.1 kHz sampling rate), for example:
0-1250 Hz
1250-6000 Hz
6000-12000 Hz
12000-15000 Hz
15000-18500 Hz
18500-22050 Hz

So you might get away with about 6 bands. The last band could be discarded for the lowest bitrate lossy, as it's mostly inaudible in real music, but the prediction error could be pretty high and remain inaudible, so it might not take too many bits anyway.

Thanks for the info about your trying 5a previously and the predictors becoming less efficient. This may be the sort of stumbling block that makes this approach simply not worthwhile to pursue because it's a big effort to change the encoder like that, so I'm glad of your information.

Options 1 and 1b might then be viable for a smaller improvement in avoiding the audible noise that Den reported at normal playing volume in a quiet intro.

I'm still not sure how even the current noise-shaping (sloped noise) is coded into a coder that doesn't look too far ahead, unless it's looking at the amount of allowable error in the differential of the waveform (i.e. its slope) rather than in the amplitude, and that's how it's not spectrally flat, yet is calculated on a near-instantaneous basis (with no blocks and very little look-ahead) so that the error can be calculated to be within the allowable limits without knowing anything about its frequency distribution.

If it's something as simple as the error in the first differential that's needed for implementing shaped noise, then something pretty clever would be needed to implement making the allowed error follow the ATH curve or Fletcher-Munson curves better.
Title: What is the current status of wavpack 4?
Post by: DickD on 2003-05-30 18:19:04
While making a post the mentions Wavpack lossy (http://www.hydrogenaudio.org/forums/index.php?s=&act=ST&f=11&t=9836&hl=&#entry99364) in the Musepack forum, I think I got a better feeling for how Wavpack works.

(Please correct me if I'm wrong, bryant)

1. Generate a single predictor based on numerous preceding samples. This guesses the sample value for the current sample. (A later post gave a simple example with a linear prediction)

2. Work out the difference between actual sample and predictor and store this "error", allowing perfect reconstruction.

3. Pack the files as well as possible (e.g. Huffman coding, like Zip) to make use of redundancies.

With lossy hybrid mode, after the prediction is made, you can choose to knock the least significant few bits off the error. Of course, it's possible that error can build up, since each sample is predicted from previous ones and those are now slightly incorrect, so it must be a little more complicated than that, presumably ensuring that the net error is close to zero over a reasonable time period. Reducing the error term to fewer bits creates more redundancy and makes the packing more efficient, so reducing the bitrate. The removed bits are compressed separately in the correction file, to enable perfect reconstruction once again.

Now, I can see that there's only one predictor for each sample, and the simple mathematical error (subtraction) is all that remains to be stored.

So, I can imagine how it's quite possible to look at the error terms for a series of samples and round them in such a way that the correction file contains noise that follows the shape of the ATH curve pretty well but has an average amplitude of zero over a certain timescale, causing no DC shift.

From the udial.ape thread (test your soundcard for clipping) (http://www.hydrogenaudio.org/forums/index.php?s=&act=ST&f=1&t=9772&hl=), where someone damaged his tweeters, one ought to be careful about putting too much ultrasonic noise content in files (but this was full scale against a quiet audible tone, causing him to increase the volume knob!).

Soft ATH noise shaping might have lower ultrasonic content, for example.

Then again, the decoder could be required to instigate a lowpass filter to protect the user if we're pushing to very low bitrate with lots of error in the ultrasonic range (but not so much to cause regular clipping). The filter would be turned off, of course, if the correction file is being used to restore lossless playback, but in lower bitrate lossy modes a flag in the lossy stream could indicate the attenuation required for frequencies above, say 19-20 kHz. This would safeguard against this potential tweeter risk without breaking the predictor.

It is also plausible to shape the correction file noise in different ways, e.g. to follow some calculated frequency dependent masking threshold based on simple psychoacoustics, but this would require some frequency analysis. Maybe the analysis could be greatly simplified from full blown lossy coders (e.g. using RMS amplitude of sub-bands instead of using transforms) and the masking threshold could be ultraconservative, but this is a lot more work than a consistent noise shape.

It doesn't seem (on limited evidence) that splitting the signal into reconstructable bands before running the predictor is viable. It might be possible to use such a method to shape the noise, however.

It does seem plausible to make a rough measurement of the loudness (e.g. RMS value of the signal is very easy) and modulate the allowed noise that way, with no consideration of the frequency-dependence of masking, simply the loudness. That might make a reasonable easy "standard" lossy mode, which remains audibly transparent for the vast majority of samples at non-painful volume.

Clipping might be a plausible concern if strong noise shaping has a high enough amplitude in the high frequencies.

Just some further thoughts on the subject, which might clarify my contribution to this thread down to stuff that's reasonably viable to implement (rather than some of my sub-band-with-separate-predictor ideas, which don't look too promising).
Title: What is the current status of wavpack 4?
Post by: sony666 on 2003-05-31 22:06:40
Cheers bryant B)
I know an open source project can become much of a pressure once it starts becoming popular , you should read the flames we receive when a new eMule version screws up something and ppl lose their partial Anime XXX downloads.

Best wishes to your family, stay away from public transportation and I hope you find a job (one that allows some free time for wavpack though:))
Title: What is the current status of wavpack 4?
Post by: bryant on 2003-06-03 04:08:48
sony666:
Yeah, I have thought about what kind of e-mail I might receive if a WavPack bug trashed all of somebody's original music. In fact, I could even imagine a rude knock at the door! Somehow I don't think that "Dude, didn't you read the disclaimer!?" would make them very happy. 

DickD:
Your description of WavPack's lossless mode is pretty much on the mark. The predictor first makes a prediction based on some number of previous samples. In the case of the "high quality" mode this is a polynomial applied to the last 16 samples, however because the polynomial terms adapt to the changing audio, in a sense it's really looking at hundreds of previous samples. The difference between the prediction and the actual sample is called the error (or residual) and this is stored using Rice Coding (which is a kind of Huffman code for Gaussian numbers).

In the hybrid mode the user's kbps number is converted to a number of bits per sample (for example 320 kbps = 3.63 bits/sample) and we only store the residual with as much resolution as we can given that average number of bits. So, if the error is running with an average magnitude of 100 and we are allowed 3.63 bits per sample, then we can store the errors with an accuracy of about +/-20. Note that if a big error comes along we use more bits to store that sample while samples close to zero require fewer bits, but every sample is stored with the same accuracy and we achieve the average bitrate. If a transient comes along and the average residual value goes up suddenly, we will store the first few with a lot of extra bits to maintain the accuracy, but then the exponentially lagging average will start going up and we will start storing with less and less accuracy until we hit the target bitrate again. When the average is falling (after the transient) we will be storing fewer bits because the average will be high (it always lags) and this will balance the extra bits we stored at the beginning. It's actually pretty interesting how it can maintain the average bitrate to within about 1% over the long term even though it's completely open-loop (no feedback).

So, the mode is essentially CBR (with a little "slop") and the only factors affecting the noise level are the target bitrate and the accuracy of the predictor. The reason that the "high" mode works so much better than the default mode on the Furious sample is simply because the predictor works better on the high frequency signals. In fact, the predictor is the only difference between the modes.

Which brings me to a clarification on one of your points. The noise really shouldn't be any more audible in quiet parts than in louder parts because the noise is always scaled to the signal (lower the level 6 dB and the added noise drops 6 dB) and at very low levels the coder will actually go lossless if it has enough bits to do so. What I think is that the noise is more audible when there's less going on in the music (more "air" around the instruments) and the noise is less audible when there's stuff going on all up and down the spectrum. In this way it really works the opposite from conventional codecs which have the worst time with complex music but shine with simple stuff because they can pour all bits into the "active" subbands. Perhaps den can comment on this as well.

Without filtering, the quantization noise added is perfectly flat in frequency and no dithering is required because the quantization size is always small compared to the residual size. I think it would be easy to implement the ATH noise shaping curves and it would be interesting to see if they lower the audibility of the noise. I don't think that ultrasonic noise would be a problem because it would always be much lower in amplitude than the signal (unless the predictor failed, I guess). I also agree with you that it would be possible to use simple subband level checking to both determine the optimum noise shaping algorithm and to implement a VBR mode to achieve a lower average bitrate for the same quality. The advantage of all this stuff is that it can be done solely on the encode side and therefore does not burden the decode side with aggressive CPU usage and can be implemented after the spec is complete.

I have also thought some about actually using subband coding directly like you describe. One issue here is that to be efficient in lossless mode you cannot increase the number of samples that need to be encoded (this is not an issue in EQ where you sum up everything before you're done). The type of filters that I am familiar with that could work this way are the symmetrical type that split the band exactly in half: frequencies below (Nyquist / 4) go into the lower part and frequencies between (Nyquist / 4) and (Nyquist / 2) go into the upper part. Then you throw out every other sample in both band (because half are redundant) and you have the original number of samples. You can do this as many times as you like and generate 1 octave wide bands all the way down to 20 Hz (or even 1 Hurt ), although I think that probably 4 bands would be the most that would be useful.

This would make it easier to move the quantization noise around where it couldn't be heard. For example, if you had little or no signal over 11 kHz you would have very little noise up there, and I think this is impossible to achieve without breaking into subbands. (I considered pre-emphasis in this case, but am not sure if it would work).

On the other hand, I have the concern that even though the filters can sum to recreate the signal losslessly, what happens when you encode a signal at the crossover point and have different quantization levels on either side of the divide? I am afraid that all of the problems of subband coding will come out and I'll lose the "characterless" nature of the noise that makes is so easy to live with.

At some point I would like to experiment with subband coding for my own curiosity, but I definitely don't want to start down the path of creating an inferior MPC! And my real interest is lossy encoding of high-resolution audio (like 24/96) and this is to some extent directly opposed to psychoacoustic modeling. After all, according to the current models with which I am familiar, the first step would be to downsample to 44.1!

Anyway, thanks for the input and I hope this clears things up a little and gives you some more ideas...
Title: What is the current status of wavpack 4?
Post by: den on 2003-06-03 04:50:25
@sony666
Quote
I know an open source project can become much of a pressure once it starts becoming popular , you should read the flames we receive when a new eMule version screws up something and ppl lose their partial Anime XXX downloads.

Best wishes to your family, stay away from public transportation and I hope you find a job (one that allows some free time for wavpack though:))


Certainly Wavpack is getting a bit more attention, but hopefully David doesn't have anything too much to worry about. His cat's appearance is getting quite well known though, so it may have to hide from public view, should any problems arise... 

@bryant
Quote
The noise really shouldn't be any more audible in quiet parts than in louder parts because the noise is always scaled to the signal (lower the level 6 dB and the added noise drops 6 dB) and at very low levels the coder will actually go lossless if it has enough bits to do so. What I think is that the noise is more audible when there's less going on in the music (more "air" around the instruments) and the noise is less audible when there's stuff going on all up and down the spectrum. In this way it really works the opposite from conventional codecs which have the worst time with complex music but shine with simple stuff because they can pour all bits into the "active" subbands. Perhaps den can comment on this as well.


I agree with the above. It's only in quiet sections with solo instruments that the noise becomes noticeable, particularly in the gaps between the individual notes being played. Once you get a few more instruments and/or some vocals on board, the noise gets covered and I can't usually hear it.

The hiss reminds me of that heard from cassette recordings with the Dolby NR switched off, just not as obvious. If you recall back to those days, the hiss would stand out at the beginning of a track, and perhaps appear in quiet solo sections mid track, but generally disappear the rest of the time, particularly with "busy" genres of music. Wavpack is similar, except that I find the hiss much less annoying than the cassette example with my music, even at the lowest available bit rate setting.

If I hadn't used other formats previously, and didn't ABX Wavpack lossy against the original, I wouldn't have even picked it up, as it is virtually identical to background circuit hiss you get from many audio devices just by turning up the volume without any signal playing.

Den.
Title: What is the current status of wavpack 4?
Post by: bryant on 2003-06-03 05:22:38
Quote
Certainly Wavpack is getting a bit more attention, but hopefully David doesn't have anything too much to worry about. His cat's appearance is getting quite well known though, so it may have to hide from public view, should any problems arise... 

Yeah, especially since she runs the QA department! 
Title: What is the current status of wavpack 4?
Post by: eltoder on 2003-06-03 07:10:41
David,

From what I understood, the noise is not exactly scaled to signal, but to the average residual (and how many samples you use for calculating the average?). This should give exactly the effect Den hears (more noise in quiet parts right after some more loud parts). I'm not an expert  , but I'd suggest to scale error to the actual (or few samples average) sample magnitude and using non-uniform quantization (use more bits for smaller values).

-Eugene
Title: What is the current status of wavpack 4?
Post by: bryant on 2003-06-03 18:37:41
Eugene:
There are a couple reasons that I use the residual level rather than the signal level in determining the quantization level. The first is that I wanted to create something that was close to CBR, and this can only be done using the residual. Second, I found that the residual level is more closely tied to the masking properties of the signal (at least in most cases). For example, low frequencies and regular tones (as opposed to noise) are much worse at masking quantization noise, and these generate small residuals because they're more predictable. On the other hand, broadband noisy signals are very good at masking quantization noise, and these generate large residuals. However, you did give me an idea that by simply looking for cases where the original signal level was high compared to the residual, I might be able to detect those cases (like Furious) where the predictor is not working well.

Once I decided to use the residual for the quantization level, I needed to put in the averaging because there are times when the spectral characteristics of the signal suddenly changes and the predictor takes some time to adjust. During these times the residual level spikes and was actually one of the problems with the old WavPack lossy mode. The time constant of the averaging is less than 6 ms though, so I don't think it could be contributing to more audible noise after transients (it falls faster than most transients and so only lags by a tiny amount).

As for the non-uniform quantization, the old WavPack lossy mode also had that (because it seemed more intuitive to me at the time). What I discovered was that because the perceived noise level is based on the RMS value of the errors, a few big values really drives up the average. Even though it may not be intuitive, uniform quantization is the most efficient way of storing these values. In fact, this is why ADPCM is at such a disadvantage here. By restricting the coding of each sample to a fixed number of bits, you get a much higher RMS error level than a variable bit scheme, and you also get distortion. If you listen to the error generated by ADPCM you can clearly hear the music playing (and this, of course, means distortion). Listen to the difference in WavPack lossy and it's pure noise.
Title: What is the current status of wavpack 4?
Post by: DickD on 2003-06-03 19:35:22
Thanks for the detailed response, David.

I'd read about Rice Coding before, probably on the Monkey's Audio site. Having pretty much Gaussian distributions of residuals is reassuring when you're trying to treat them in a noiselike manner.

16 samples duration for the polynomial is useful info, as is the info that the polynomial terms of the predictor adapt. Presumably the terms are stored somewhere every so often and it's a case of some sort of best-fit algorithm (e.g least-squares) over a reasonable timescale.

For lossless, I can see that the exact previous 16 samples are known. For lossy, I presume that despite the inaccuracy of the previous 16 samples, the inaccuracies cancel out pretty well in aggregate so that the predictor of the next sample is pretty close to what it would have been had the previous 16 samples been stored losslessly, so the decoder is still pretty accurate. (Or perhaps, and more likely, you base the predictor on the previous 16 lossy samples in the first place so the reconstruction is sure to be as accurate as possible). I also presume that an exact sample value may be stored on a periodic basis to enable seeking or so that data corruption doesn't cause loss of all audio after the error. (E.g. FLAC usually does this in 4608-sample or 1152-sample blocks)

Regarding this bit:
Quote
In the hybrid mode the user's kbps number is converted to a number of bits per sample (for example 320 kbps = 3.63 bits/sample) and we only store the residual with as much resolution as we can given that average number of bits. So, if the error is running with an average magnitude of 100 and we are allowed 3.63 bits per sample, then we can store the errors with an accuracy of about +/-20. Note that if a big error comes along we use more bits to store that sample while samples close to zero require fewer bits, but every sample is stored with the same accuracy and we achieve the average bitrate. If a transient comes along and the average residual value goes up suddenly, we will store the first few with a lot of extra bits to maintain the accuracy, but then the exponentially lagging average will start going up and we will start storing with less and less accuracy until we hit the target bitrate again. When the average is falling (after the transient) we will be storing fewer bits because the average will be high (it always lags) and this will balance the extra bits we stored at the beginning. It's actually pretty interesting how it can maintain the average bitrate to within about 1% over the long term even though it's completely open-loop (no feedback).


Right, so you measure the average magnitude of the prediction error (either by its average absolute magnitude or by its RMS value) with a rolling average over a longish period (e.g. tens to hundreds of samples), then calculate how much error you'd typically allow for the next sample given the typical efficiency or Rice Coding.

I see, so it's a feed-forward control mechanism based on recent history. Actually, isn't that the same as negative feedback?
When the transient hits, you've still got the low residuals of the non-transient forcing you to retain a small error in that residual and employ a large Rice Code. If the predictor doesn't get better, the average residual (RMS or absolute) increases, so you naturally begin allowing greater error shortly afterwards, and this greater error naturally starts using fewer bits once the predictor adapts to the new sound (i.e. after at least 16 samples).

In a sense this takes advantage of temporal masking (http://ccrma-www.stanford.edu/~bosse/proj/node21.html#SECTION00043000000000000000) - specifically post-masking, albeit accidentally and with no calculated masking threshold. The loud transient (burst of loud sound) causes the ear to become less sensisitve to noise/distortion for a period and it happens that the codec also becomes more noisy shortly after the transient because the transient didn't meet the predictions of the predictor.

The beauty is in the lack of pre-echo distortion. The error before the transient doesn't increase. In contrast, frame-based lossy codecs that use FFT or DCT for analysis spread the error/distortion across the analysis frame if a transient occurs within the frame and they decide it will mask the whole frame. In some cases, where the time resolution isn't enough, this occurs before the onset of pre-masking, where the ear doesn't notice increased distortion just before a transient. The effect is of sudden hiss occuring before the main sound of the transient - and where the transient is noise-like too, it sounds like an echo occuring before the main sound - pre-echo.

With knowledge of temporal post-masking, e.g. from the link in the last-but-one paragraph, it may be possible to make use of this effect more smartly (e.g. in VBR mode) to deliberately allow greater error after transients, that decays away and also follows some sort of known post-masking decay threshold. It does seem that the length of the masker burst affects the post-masking effect, with a 200 ms burst providing longer-lived masking than a 5 ms burst, so for conservatism, the 5 ms burst's post-masking profile may be the one to aim for, even though more bits could be saved if one could work out the difference.

In fact, a very sudden cessation of a loud sound would also cause a transient rise in predictor residual, so it would also require more bits. This coder wouldn't distinguish it from a transient onset of sound, so it's important that the allowed error reduces within a sufficiently short time. That time currently depends on the decay of the trailing average of the previous so-many residuals (either RMS or absolute).

Imagine a loud sound with typical prediction residuals of 500 (and allowed error of about 100) that suddenly stops, giving way to a quieter sound (perhaps 20 dB down).
16 samples after the cessation, the predictor ought to be getting close to predicting the quieter sound most of the time. Let's say the typical residuals after these 16 samples come to about 50. Eventually the allowed error should come down to about 10. The allowed error will gradually come down from 100 towards 10 as the moving average residual comes down from 500 to 50. The error will actually remain higher than the typical residual of 50 until half of the pre-cessation sounds have been lost from the moving average window (assuming it's a rectangular, unweighted average, not one with a time decay). Actually, the extra-large error at the sudden cessation transient will push the average up higher still for longer.

So, the audibility of noise after cessation of loud sounds will depend on the speed of decay, which depends on the length of the moving average window for the residual. If the decay curve happens to cross the post-masking threshold, the noise will audibly increase.

I wonder if this is what's happening in Den's example of Mandela Day by Simple Minds. Perhaps a shorter averaging time, or better-still some sort of weighting curve on the average that decays the further back in time you go, would help reduce the audibility of the noise.

I guess with the right tools it's possible to analyze the residuals in that sample, the curve of the average residual and the estimated error being allowed after cessation transients, assuming there are some. (I'm trying to think if I can remember the character of the track from an old cassette I haven't heard in years)

Ah, I just read David's post, which says the decay time is about 6ms, so this does seem small enough not to be a problem (unless of course the quiet sound it hard to predict). (Or even crazy ultrasound, like the tweeter-frying udial.ape test sample)

DickD
Title: What is the current status of wavpack 4?
Post by: eltoder on 2003-06-04 06:33:33
Thanks for responce, David.

The difference from output of most lossy codecs sounds like music, so I'm not quite sure that it's a dissadvantage.

I just think that if you assign more bits to lower residual values you can always store low residuals better, even if the signal becomes more predicatble "suddenly" - faster than your average adapts. May be this will help. But if the adaptation time is close to post-masking time, it's not very likely

We can be more constructive, if we'd take a short sample that den can ABX and actually look what's wrong with it

-Eugene
SimplePortal 1.0.0 RC1 © 2008-2020