Skip to main content

Topic: What is the current status of wavpack 4? (Read 9389 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
What is the current status of wavpack 4?
What is happening on the wavpack front these days? Is there much progress towards a new version? I've read a little about wavpack 4 around here on HA but that was a few weeks ago now.
superdumprob
____________________________________________

"If we knew what it was we were doing, it would not be called research, would it?" - Albert Einstein

  • bryant
  • [*][*][*][*][*]
  • Developer (Donating)
What is the current status of wavpack 4?
Reply #1
The reason that I have not been discussing WavPack 4.0 is that I don't want to build up a lot of expectations that I might not be able to meet. Since the core of WavPack development is done by only one person, there is always the risk that the progress could be slowed significantly (or even stopped) by any number of circumstances. And for me one of the biggest advantages of writing free software is that I can do exactly what I want, when I want. This is not to say that I have no interest in what users would like to see, but the last thing I want is for this to become a "job". 

I can imagine someone e-mailing me with "hey, I've been waiting three months to encode my CDs because you said you'd have a new version!" and have them get the response "this is David's wife and we have a new baby and David isn't allowed to play with the computer anymore!". See my point? 

However, this has come up a few times and I do feel a little funny just not saying anything, so I will say that I am working on WavPack 4.0 and that it will be a completely new format from the ground up. The current WavPack format was designed over 5 years ago and although I have tried to improve it along the way, the fact that it lacks any sort of block structure makes is impossible to add certain desirable features like fast seeking, streaming and error tolerance. I considered hacking a block structure onto the existing architecture, but eventually decided that to make WavPack the compression format that I really wanted, I would have to make a clean break with the past and start fresh. (Of course, there will always be decoding support for all previous versions and it will remain open source).

As for timing, I would like to have something for testing in 2-3 months and some sort of release in 4-6 months. But again, this assumes that I don't lose interest or get hit by a bus or (worst of all) get a "job". 

Thanks for the interest and I hope that this (sort of) answers your question. 

  • den
  • [*][*][*][*]
What is the current status of wavpack 4?
Reply #2
Cool, thanks David. I can rest easy that my current encodes will always be playable anyway.

Den.

  • Dologan
  • [*][*][*][*]
  • Members (Donating)
What is the current status of wavpack 4?
Reply #3
Thanks for all your effort developing this excellent software. I think we're all are looking forward to the next version of Wavpack; but of course, your life comes first! 

Am I right to suppose WavPack lossy will continue to be supported? Will it also be redesigned?


~Dologan
  • Last Edit: 22 May, 2003, 01:49:44 AM by dologan

What is the current status of wavpack 4?
Reply #4
Thank you for the information David.  I understand what you mean. Wavpack is more of a hobby and an interest which is perfectly understandable. I shan't expect a new version anytime soon or ever, I'll just consider it a nice surprise if it does come.  Nevertheless, good work!

Have fun!
superdumprob
____________________________________________

"If we knew what it was we were doing, it would not be called research, would it?" - Albert Einstein

What is the current status of wavpack 4?
Reply #5
It would be cool if wavpack lossy could use some psychoacoustics, instead
of dropping bits in order to reach the desired bitrate. Of course, I don't
know if this is feasible, but imagine if the lossy file was of mpc quality!
Also, better seeking is vital, 'cos the current seeking mode simply fast-forwards
to the selected point, which is kinda frustrating...
Wanna buy a monkey?

  • den
  • [*][*][*][*]
What is the current status of wavpack 4?
Reply #6
Superdumprob hit the nail on the head. If Wavpack 3.97 is as good as it gets, well that's great, and if or when something else comes along, it will be a real bonus.

@yourtallness
I understand your thinking, but I'm not sure if I entirely agree. I actually think the lack of psychoacoustic modelling in wavpack lossy is a feature, in that it reduces artifacts introduced from the encoder second guessing what you won't hear. A few less bits would be nice, but if you start getting a model that is too involved, David is just making another MPC/MP3/Vorbis. Wavpack lossy is currently brilliant as a transcoding source based on my tests, and I'd hate to lose this from it getting too clever with its bit/frequency allocation. Some basic smart/safe switching joint stereo would be nice, but I'm not sure if introducing much else might start to introduce problems.

I suppose as long as there is an option to turn modelling off if required, it wouldn't matter.

Den.

  • GeSomeone
  • [*][*][*][*][*]
What is the current status of wavpack 4?
Reply #7
Quote
but imagine if the lossy file was of mpc quality!

But bigger?  You would use Musepack wouldn't you?
--
Ge Someone
In theory, there is no difference between theory and practice. In practice there is.

What is the current status of wavpack 4?
Reply #8
As I see it:

If Wavpack lossy at 200 kbps were transparent, one could keep the lossy
file on their HDD, and back up the restoration file on CD-R so as to spare
themselves from re-ripping. That way u would have a great sounding
lossy file for casual listening (without needing to go as high as 320~384kbps)
and u would always be able to recreate the original wav, when needed.
Wanna buy a monkey?

What is the current status of wavpack 4?
Reply #9
Why not use MPC for playback and lossless (preferably FLAC/Wavpack) for archiving then?
MusePack isn't lower quality than Wavpack lossy...
(oh, it is, but it's nearly always inaudible)
ruxvilti'a

  • bryant
  • [*][*][*][*][*]
  • Developer (Donating)
What is the current status of wavpack 4?
Reply #10
Thanks for the support, guys! I don't mind talking about what I'm up to as long as everyone understands that until something's out it's just vaporware.

As for the lossy/hybrid mode, yes, it's definitely going to be in there. In fact, that mode is really more interesting to me than pure lossless (although I assume that more people use WavPack for the lossless mode).

The new format is going to be more in line with existing formats in that the decoding specification is going to be fixed, but there will be considerable flexibility in the encoding process. This way, it will be possible for other people to try different ideas improving the encoder without breaking the decoder. The lossy mode will probably go down to 196 kbps (instead of the current 265) and I'm sure there will be some improvement in quality for a given bitrate, but I don't know how much yet.

I am looking at applying psychoacoustic models at some point (although this will be optional). Like den mentions, smart LR/MS switching would be a first step. Also, I have been playing around with using psychoacoustic information to vary the bitrate (i.e. adding VBR) or using it to adapt the noise shaping to fit the signal. I will not allow any digital filtering or other frequency domain processing in the signal path, however. Again, this will all be built into the decode spec and the encoders can use the features or not at their whim.

  • DickD
  • [*][*][*][*]
What is the current status of wavpack 4?
Reply #11
Wavpack lossy and the ability to have recovery files to restore to lossless is a fascinating subject, though I confess that I haven't used it yet because it's not necessary for my relatively limited collection of rips to date. When I get a PC with an 80 gig drive, and start ripping everything in sight, then it could be tempting.

Using only very basic psychoacoustics to estimate the maximum noise or quantization error over the full bandwidth that would remain inaudible, might be a good approach to VBR. I understand you already use some quantization error versus average power. (If it's allowable prediction error, not quantization error that you use, please, read 'prediction error' wherever I've written 'quantization error' or 'distortion' below)

From whan I read (Den?), Wavpack lossy at 320 kbps can be ABXd every time, but only very loud - almost painfully? - when the background hiss becomes audible. Possibly, some of what I suggest below could save bits in other ways (while retaining the option of recovering the lossless PCM) so allowing you to be less aggressive in allowing noise, at least within the frequencies where it's most audible.

Now, it's getting to the boundaries of my knowledge, but I believe sub-band codecs like Musepack (and good EQ's like Foobar's) use methods of splitting the signal into frequency bands that are lossless and can be completely reversible in a bit-identical way (but then they start throwing away bits in quantizing each band as coarsely as the psymodel allows). I believe the band splitting takes the form of a complementary pairs of FIR-type convolution filters, whose outputs can be summed to identically reconstruct the input. (e.g. FB2k's EQ with all bands set to zero is bit-identical to disabling the Equalizer DSP).

Perhaps a new Wavpack Lossy, while not taking such an extreme route as Musepack and avoiding knife-edge psychoacoustic decisions based on FFTs and tonality estimation etc, could use lossless sub-band splitting to make fuller use of noise thresholds to allow, for example, much greater distortion or noise in the 17-22 kHz band because the ATH is much higher, and an intermediate amount of distortion from, say, 15-17 kHz. (The shape of the ATH curve is similar to Strong ATH Noise Shaping dither or foobar2000 (see spectrum below). It might even be possible to use different lossless stereo modes MS/LR in each sub-band. If this is safe enough for dither that sounds like it's 15dB quieter than unshaped dither even during the fade-out, I think it has to be safe as a minimum level of noise even in quiet parts of Wavpack lossy. As the music level increases, presumably the amount of noise in the high frequencies can increase further, and it really ought to be very safe to completely discard frequencies above 20 kHz from the lossy mode entirely (and probably those as low as 19 kHz too, especially as the sub-band splitting would involve a gradual roll-off from one band to another, this frequency transition width being inversely proportional to the length of the FIR filter).


Frequency spectrum of silence plus Foobar2000's Strong ATH Noise Shaping dither

If you're splitting into sub-bands and aren't trying to use tonality (peakiness of the spectrum, which differentiates sinusoidal waves from noiselike signals), you shouldn't need to do an FFT, but can simply measure the variations in RMS level of each band to estimate the allowable levels of distortion in the worst case. IIRC, the worst case is for noiselike, rather than tonal signals, I believe, because noiselike signals don't mask as well as tonal ones (this paper might give some clues). Given the power in the band, you could probably use a simple estimate of allowable distortion that never exceeds the worst case (being that of noise as the masker) and set the quantization accordingly. You could use any fast-to-calculate estimator that never exceeds the value given by the more accurate predictor based on FFTs even when tonality is low (t = 0.3 for noise in the link above).

Just a thought (possibly flawed if I'm missing something) of approaches to reaching still lower bitrates than current Wavpack lossy without risking the typical artifacts of psychoacoustic lossy compression schemes or reducing the encoding speed to their levels.

I'd imagine, if you play on the safe side, and don't assume that tonality and similar principles apply, you can probably shave off quite a few bits without adding too much computation. And certainly, given the listening tests on HA for "how high can you hear" using lowpassed real music, it should be possible to build in a completely transparent lowpass to be used for only the most aggressive settings of a new Wavpack lossy.

Of course, being able to recover the lossless audio, also requires the lossless sub-band splitting to be work bit-identically. I can't think where I read that - probably Frank Klemm, Andre Buschmann, or the TooLAME site, and I guess you also want something that doesn't damage the efficiency of the lossless coding of the residual (or difference) data used for reconstructing the lossless PCM. You might find it's best to compress each sub-band's separately, or to sum them and compress the whole bandwidth residual losslessly. I couldn't guess, as I don't know how the predictors work.

Just throwing out thoughts really, in case any of them might suit your objectives. Feel free to disregard any and all of them - no offence will be taken! (After all, I did include some frequency-domain stuff, but all done without recourse to FFTs and only if totally bit-identical reconstruction is possible)
  • Last Edit: 23 May, 2003, 09:56:03 AM by DickD

What is the current status of wavpack 4?
Reply #12
Interesting stuff. Thanks again bryant and thanks to DickD for the lengthy post also.
superdumprob
____________________________________________

"If we knew what it was we were doing, it would not be called research, would it?" - Albert Einstein

  • den
  • [*][*][*][*]
What is the current status of wavpack 4?
Reply #13
@DickD
It's when I read posts like these that I realise just how little I know about audio compression. Having said that, some of your points seemed quite feasible to this layman.

I can ABX Wavpack lossy every time at 320 kbits, but I am only picking up a slight increase in noise most of the time, and that beats having significant artifacts in the music like those from other lossy codecs in my opinion. If it was possible to employ some of your ideas and remove the slight noise at extreme volumes, Wavpack would be simply amazing and/or it would be possible to get the bitrate down to achieve the current, low, very acceptable amount of noise introduced.

Thanks for the post.

Den.

  • bryant
  • [*][*][*][*][*]
  • Developer (Donating)
What is the current status of wavpack 4?
Reply #14
DickD:
Yes, thanks for the in-depth post! I will digest it for a few days and post again. 

There is one thing that I want to bring up now though because it relates to something den mentioned in another thread and you touch on here. There are currently only two options for the noise shaping in the WavPack lossy/hybrid mode: none and first-order. Without noise shaping the added quantization noise spectrum is absolutely flat and with noise shaping it is (as expected) 3 dB higher level (overall) but drops off at 6 dB/octave at lower frequencies:

) and am not sure why.

My current guess is that usually the only portions of the noise that aren't masked by the music are the very highest frequencies (because there's very little energy up there). Noise shaping in this case essentially moves noise from an area where it's completely masked by the music into the range where there's nothing to hide it. This seems to turn conventional noise shaping theory (where the idea is to move the noise away from the music and leave greater S/N in the midrange) on its head. What do you think?

And this also makes me wonder if there are noise shaping algorithms around that shift the noise downward in frequency and whether or not they would work better here. Hmm...

den:
Welcome back from holiday, den! To answer your question from the other thread, the first track I found that kills joint stereo was actually pointed out to me by Dologan. It's called hard.wav and I think it was created by Ivan just to challenge joint-stereo encoders (I can't find it on the Internet but I can upload it if you'd like).

After that I put in the command-line option to allow disabling JS, but left the default on. Later, after I found some hard-panned tracks that had audible noise in JS mode I decided to change the default.

The reason that Furious works better in JS mode is because those hi-freq spikes are in the middle; if they were on one side then LR mode would be better. So, I'm still not convinced that the default should change; better to just get on with WavPack 4 so I can switch on the fly...

Those are the only killer tracks I've found, but I suspect that the reason is that I don't listen to a lot of music with artificial electronic sounds (the only reason I had The Fast and the Furious ST is because it was the first U.S. release that was copy protected ).

  • eltoder
  • [*][*][*]
What is the current status of wavpack 4?
Reply #15
you can try to add more noise in frequencies with more power (so that noise shape follows music shape), but this will require some sort of frequency analysis (or completely changing the way wavpack works  )

on the other hand (may be I ask something stupid), have you tried to allow bigger errors for bigger (by abs value) samples?
The  greatest  programming  project of all took six days;  on the seventh  day  the  programmer  rested.  We've been trying to debug the !@#$%&* thing ever since. Moral: design before you implement.

  • DickD
  • [*][*][*][*]
What is the current status of wavpack 4?
Reply #16
bryant:
Quote
Noise shaping in this case essentially moves noise from an area where it's completely masked by the music into the range where there's nothing to hide it. This seems to turn conventional noise shaping theory (where the idea is to move the noise away from the music and leave greater S/N in the midrange) on its head. What do you think?


Well, I'm not sure what "conventional noise shaping theory" is. I've sort of only come into it recently for audio. What to me is conventional is the type of thing implemented by Garf for Foobar2000, where it follows the ATH and concentrates most of the noise at the inaudible frequencies (I mean 18+ kHz, not 10-12 kHz). I think it was Garf [Edit: no, it was KikeG] who noted that for a realistic quality comparison of dither types, it should be very quiet - barely audible - partly because the Fletcher Munson curves (see top graph) indicate that the audibility of the HF would be higher, relative to the other noise, if the volume were un-naturally loud.

I think you could put much more noise in the very highest 3-5 kHz (assuming 44.1 kHz sampling) and reduce the noise in the more audible band (to about 15 kHz) with a transition between these regions. If you note carefully just how many dB are involved (10 dB is 10 times the power), there's MUCH more noise power from 18.5 kHz to 22 kHz than there is from, say 10-15 kHz. Something like 40 dB more, which is of the order of 10,000 (ten thousand) times the power (100 times the amplitude, is probably more relevant from the point of view of how much data you can remove - where I guess 100 ~= 128 = 2^7)

I believe that Garf's noise shaping includes slightly increased noise at low frequencies (see Fletcher Munson curve on log frequency scale and compare to the image I posted in my previous post, but this doesn't give much bang-for-buck as the high frequency bit because it's 30-40 dB/Hz lower over a smaller range of frequencies (in Hz).

For an idea of just how powerful the noiseshaping is, try foobar2000 with 8-bit playback and strong ATH noise shaping dither. Then try disabling the dither and going for 12-bit padded to 16-bit (you need to "show all bitdepth modes") or alternatively compare it to no-noiseshaping dither at 12-bit, for example.

If I were setting up an old/budget machine with an 8-bit soundcard (my mother had one until only a short while ago, which I'd set up using MAD in Winamp), I'd use FB2K and strong ATH for sure.

I noticed that the ABC/Hidden Reference blind rating/testing program included the well-known castanets sample for the About dialog as a WAV file. I noticed it was a remarkably small WAV and wondered if it was compressed. It was mono, 44.1 kHz, 8-bit, with strong noise-shaping dither by the look of the spectrogram (below), and sounded far better than I expected an 8-bit file to sound. (It was actually 353 kbps in mono and 1.3 seconds duration, so it's hardly tightly compressed, but if you must have WAV, it's far better to halve the bit-depth than halve the sampling rate in this case):

Spectrogram of Castanets: WAV PCM, 44.1 kHz, 8-bit mono, strong noise shaping

You can see how strongly the noise shaping uses the frequencies from 16 kHz and up, while it adds little or no noise to those below 15 kHz (and probably less noise than standard dither would).

Incidentally, Frank Klemm's page on dither indicates that with a 32 kHz sampling rate, noise shaping can gain about 5 dB of audible SNR increase (at cost of 8 dB of measured SNR reduction over the full bandwidth). This jumps to about 15 dB of audible SNR increase at 44.1 kHz (at 29 dB measured SNR cost), indicating just how important the frequencies above 16 kHz are to making major perceptual gains by noise shaping.

So, I think you could gain considerably more bang-for-buck than the sloping line you show in your post above by adopting a strong noise-shaped curve and accepting much more noise at the frequencies greater than 18 kHz (maybe >16 kHz) while reducing the noise in the audible band to below the level of the flat spectrum noise, showing in cyan. Your diagram indicates there's more noise power from about 7 kHz up in Wavpack's current mildly noise-shaped spectrum. I'd guess the 7-15 kHz region being up to 6 dB louder is why it sounds worse to you and den (I haven't tried it myself). Perhaps if the 7-15 kHz were quieter or as loud, and the ramp went very steep from about 16 kHz up to 18 kHz, it would be perceived as quieter. I'm sure Garf has some good computational techniques for creating this sort of shape, and a lot could be learned from the FB2K source code of recent versions.

den:
Yes, I think that if some of my ideas worked, you'd have the choice of either lower bitrate with the same noise, or same bitrate with no audible noise (how extreme are these extreme volumes anyway - are we talking painful?), and in fact, with some of the more complicated minimum-masking ideas (on a band-by-band basis and achievable without FFTs), the bitrate might become variable but chosen to just keep the noise inaudible, resulting in a lower average bitrate but sometimes a higher instantaneous bitrate. (Actually, I think wavpack lossy is already slightly VBR, because IIRC, it adjusts the noise floor according to the average power)

eltoder:
The masking idea will actually do what you suggest. It will cause the loudest subband to naturally mask itself more than adjacent bands. Unlike codecs aiming for the maximum compression, I propose treating it like noise (low tonality), so the masking effect is never falsely assumed to be greater than it is because of incorrect tonality assumptions or measurements (i.e. assume worst case, which I believe is noise). This would also achieve the "bigger errors for bigger-valued samples" effect you mention, but it goes further by allowing even more noise in frequency bands where there's a lot of energy and a fair bit more in adjacent bands to those with high energy, but much less error in bands which are well separated from the loudest signals. I'd envisage it would use masking far less aggressively than traditional codecs, though. In addition I envisage strong noise shaping (as above) to at all times allow extra large errors in the >16 kHz band where they're least noticeable.

DickD
  • Last Edit: 03 June, 2003, 02:21:58 PM by DickD

  • den
  • [*][*][*][*]
What is the current status of wavpack 4?
Reply #17
DickD:
Depends on the sample.

Two tracks I posted about before are quite different from each other in this regard.

Mandela Day - Simple Minds, shows subtle background hiss when encoded at 320 kbits, which can be picked up at moderate to high levels of volume, ie if I was sitting and taking in the music for the sole sake of enjoying it. Not painful volumes, but high enough so that you can only hear the music. Much more obvious at the beginning of the track where there is some space in the recording. Once the song get's going, it is very hard to pick. The original recording on CD is very clean, so any added noise becomes obvious. I also have problems with Vorbis with this track (added background noise). With Wavpack it is not distracting, but it is there.

Blue Monday - New Order, I can't pick it unless I crank it up to where it gets painful. This is an older recording than above, originally recorded on analogue equipment, using older synthesizers/drum machines/samplers and tape loops, so there is a fair amount of background hiss in the original recording anyway. I suspect this masks the added noise.

With most stuff, I can pick it up at 320 kbits in the gaps within the music, if the volume is near painful. I usually have to be intentionally listening for it to notice it, but if I choose to, I can hear it nearly every time (such as when purposely conducting ABX testing.)

Den.

  • DickD
  • [*][*][*][*]
What is the current status of wavpack 4?
Reply #18
Thanks for the loudness info, Den. I guess the noise threshold needs to be adaptive to be transparent at normal (non-painful) listening levels, though noise shaping will help reduce it's audibility even in the quiet sections. (I've had the same experience listening to very quiet tracks or quiet intros on cassette with Dolby B turned on - i.e. normally the noise is un-noticeable, but it's obvious on quiet bits).

If the noise is always inaudible, adaptive noise thresholding is OK (as with masking in good lossy codecs), but if it is audible, you could get noise pumping or noise modulation effects if it is allowed to vary fast enough. This wouldn't be a problem if the sub-band maskng idea were used, and used conservatively enough to remain transparent.

My anecdotal experience tells me that lossless codecs get better compression on quiet music (e.g. music with low ReplayGain volume measurements) than on louder music in the main. My guess is that Wavpack lossy with subband masking (or a full-spectrum adaptive noise threshold) would tend to be less variable among music of different loudness (though the residual file, to restore the lossless PCM would still be smaller for the quieter music).

  • eltoder
  • [*][*][*]
What is the current status of wavpack 4?
Reply #19
@DickD:
my suggestion, being very simple, allows easy implementation and fast computation, that wavpack is famous for  any kind of masking, while having greater potential, requires frequency analysis (and sub-band decomposition is also a frequency analysis). this envolves some common problems (e.g. it operates on blocks of data, so it needs some kind of "transients handling" when data changes to fast within the block).

and it allows noise shapping on top of it, of course.

-Eugene
The  greatest  programming  project of all took six days;  on the seventh  day  the  programmer  rested.  We've been trying to debug the !@#$%&* thing ever since. Moral: design before you implement.

  • DickD
  • [*][*][*][*]
What is the current status of wavpack 4?
Reply #20
Good point, Eugene.

However, one thing about convolution-based (finite impulse response = FIR) sub-band splitting and recombining (the sort I think is used in many EQ plugins) is that the audio stream is not divided into blocks, transients are perfectly reconstructed and there are no boundary problems.

However, the moment you start making decisions based on a chunk of data (e.g. a frequency analysis on a block of 1152 samples, say) then you do have boundary problems, as you point out and need to detect transients if making decisions about noise threshold for the whole block because the ear can hear increased noise as an unmasked pre-echo if it's more than about 5 ms before the loud transient (the exact numbers may be wrong, but that's my recollection).

So, it may be quite possible to ensure that the length of the convolution FIR filter and the decision window are less than 5 ms, so that this isn't a problem. This is equivalent to 220 samples. For computation speed, the shorter the FIR filter the better, and I'd envisage that a filter of about 110 or 57 samples (~200 Hz or ~400 Hz roll-off between bands by a rough estimate that might be a factor of two out) would be adequate to estimate a worst-case allowable noise by assuming masking by noise, not tonal signals. It's certainly faster than doing Fourier or Cosine transforms as even the very quick Musepack encoder does (just for analysis).

Once you calculate the masking thresholds in different bands for a whole stream of successive time periods (partly overlapping) for which you've calculated the RMS amplitude, you could choose to use the info in a variety of ways, depending on the complexity of implementing it.

Here are a few thoughts:

1. Over all the spectral bands, one of them will have the lowest masking threshold. You could use this as a simple flat quantization noise floor that's safe over all frequencies. (I use quantization to mean allowable error relative to the lossless file)

1a. You could scale a computationally fast noise-shaping form of quantization error so it's below the higher of the masking threshold and the ATH over all frequencies.

2. You could have individual predictors for each sub-band and allow the prediction error to reach the precise masking threshold calculated in that band. (This is more bitrate efficient, but it might be harder to implement, especially if there's any risk of having errors from the overlapping frequency region of two adjacent sub-bands combining to exceed the calculated masking - e.g. band 5 and band 6 both contain 50% of the 2500 Hz frequency content, say, and both allow an error of the same polarity, which when added during band recombination, results in a 6dB greater error so might exceed the masking threshold). Simple approach is to subtract 6 dB guardband to the masking threshold for extra conservatism. After all we want safety first, not a really low bitrate.

3. With either of the above approaches you have to consider that the time resolution of your masking analysis partly overlaps with the next block. I guess the safest solution is to use 50% overlapping blocks and if the masking in the next block or the preceding block was lower for any band, you set your allowable quantization noise relative to the lower mask, so that if you switch the allowable quantization error suddenly the change in noise won't be audible.

I.e. I'm not sure it's necessary to divide the audio into blocks (e.g. the equivalent of lapping transform windows, which overlap in time in a manner that permits lossless reconstruction) if you're cautious about switching the quantization error noise down a block early and up a block late so the transition to and from louder noise is never audible.

I guess the timing uncertainty between blocks in the analysis part, is rather like the frequency uncertainty between analysis sub-bands in point 2.

4. With a FIR filter, you do generate a longer file and have to make assumptions about what precedes the first sample and follows the last. However, if you assume zeroes, which is normal, there's no problem because any "pop" from finishing on a non-zero sample value will just cause the masking threshold to go up and use a few more bits to make it accurate enough up to the end of the file. You can then crop back in the decoder to the always discard the first n samples and the last n samples.

5. You have the choice of:
..a. encoding each sub-band losslessly with a separate predictor (and allowing prediction errors within the masking threshold to remain in each sub-band). The decoder than decodes each sub-band and adds all the corresponding sample values from each sub-band together afterwards to generate the full bandwidth signal. Also, I'm not sure if you could then have the gradual transition from lossy to lossless while still achieving the same lossless compression ratios.
..b. encoding the whole bandwidth as one stream with one predictor (like any lossless encoder) then removing terms in some way I don't quite understand, until the prediction error in each band is just below the masking threshold (which presumably requires you to measure the prediction error distortion at least loosely as a function of frequency and compare it to the masking at that approximate frequency.

I don't really know enough about how a lossless codec works in detail to be more precise and choose between these or use the right terminology.

I certainly don't know the detail of how Wavpack lossy works right now. For example, does it change the allowable error with time, or is it simply constant?

It seems likely that 5a is promising as the sub-bands naturally restrict the range of possible predictors to those with the appropriate frequencies, making each of them more compressible, I'd presume, from an information theory sort of standpoint.

Hmm, I think my knowledge is starting to strain.

Perhaps it is possible to do some simple tests, for example splitting the spectrum into just a few bands and estimating a very crude but very conservative masking threshold from the RMS amplitude of each band.

If we had say four bands created by complementary FIR filters, we could create 4 WAV files and test that they'd recombine losslessly by adding the sample values together.

We could then try compressing each with the current Wavpack lossless to see how the total size of those four compares to the size of the one file losslessly packed.

We could then start trying the current Wavpack lossy with different amounts of allowable error for each subband, and seeing how big the residual file is for each, and again check that recombination by adding makes a file that sounds good (we might manually estimate some adequate masking levels for a test sample or be able to guess it from data provided by a psychoacoustic encoder tool such as lamex)

If one of the bands was simply 19 kHz+ we could losslessly compress it and keep that as one of the residual files that isn't kept with the lossy set of files but is only used for restoring the lossless and see what that does to the overall size of the lossy bundle.

Apart from creating sub-bands this might require only minimal coding effort and it would demonstrate whether the predictors used in Wavpack would work anywhere near as efficiently if the audio is first split into sub-bands.

What is the current status of wavpack 4?
Reply #21
If you place the noise in the 3-5kHz region don't you risk bringing the music to distortion, since I have read that this region of the audio spectrum is most susceptible to this.

  • DickD
  • [*][*][*][*]
What is the current status of wavpack 4?
Reply #22
Quote
If you place the noise in the 3-5kHz region don't you risk bringing the music to distortion, since I have read that this region of the audio spectrum is most susceptible to this.

If you're asking about my post, I'm only talking about allowing noise up to the masking threshold at any freq (including 3-5 kHz), which by definition is inaudible because it's masked by a louder sound.

If it's unmasked by other sounds, then the noise floor at the Absolute Threshold of Hearing (ATH) as used in Strong ATH Noise Shaping dither, is actually exceptionally low at 3-5 kHz (and a little beyond), which is the kind of region where the ear is most sensitive (e.g. babies crying are somewhere in the sensitive region). But of course, for computer audio, you have uncalibrated loudness and variable volume controls, so ATH will be somewhat dynamic as people adjust their volume controls.

Actually, an ATH relative to ReplayGain loudness might be plausible given that people adjust the volume to control the perceived loudness, though sometimes they'll turn it up during a quiet passage to hear the detail, so perhaps a perceived loudness (perhaps by a similar method to RG) closer of the surrounding 5 seconds or so (rather than the whole track) might be a better bet for dynamic noise floor adjustment.

  • eltoder
  • [*][*][*]
What is the current status of wavpack 4?
Reply #23
Quote
However, one thing about convolution-based (finite impulse response = FIR) sub-band splitting and recombining (the sort I think is used in many EQ plugins) is that the audio stream is not divided into blocks, transients are perfectly reconstructed and there are no boundary problems.

How do you imagine that?    If you say that your signal has some frequency, this means that you have observed your signal for at least few periods. This is surely "a block". You can't get the frequency "in the point", you'll always get some frequency/time resolution.

Moreover, when you use masking, you want to use calculated threshold in each band for at least few samples, right? So you definately need blocks. (at least 16 bands * 16 samples = 256 samples, but 32x32=1024 seems to be more reasonable)

Codec needs blocks anyway, to have good seeking (and usually decoding) speed, and reduce side-information needed for forward adaptation.

About 5a. I have actually tried this about a month ago. may be it will work good, but it definatelly requires predictors different from ones used for full band. (for higher bands - very different, for low-middle - i-dont-know-what  )
But, if I understood bryant's post right, he does not want to implement anything like that.

-Eugene
The  greatest  programming  project of all took six days;  on the seventh  day  the  programmer  rested.  We've been trying to debug the !@#$%&* thing ever since. Moral: design before you implement.

  • DickD
  • [*][*][*][*]
What is the current status of wavpack 4?
Reply #24
Eugene: What I meant about time resolution was that for simply splitting and recombining, the reconstruction is sample-perfect, so the process of splitting the bands is perfect with no time resolution problem in the output file.

Sure, when it comes to analysis, you do lose time resolution and you have to use blocks (although you can use a sort of rolling average if you want, which isn't quite the same as a block), which is why I'm suggesting a short FIR filter for band splitting and probably only a few bands to avoid the pre-echo audible noise increase problem and remain conservative in our masking assumptions (by keeping block length short, so that the distinction between transients and noise isn't a problem). Also, by only making the most conservative assumptions, and always choosing the lower of the possible masking thresholds, we should avoid introducing errors that are too large.

I think 16 bands might be un-necessarily high for obtaining a moderately improved Wavpack lossy type of codec with pretty rough masking calculations that are only assured to be OK in the worst case, but aren't aiming for optimally high amounts of distortion/prediction error to squeeze out every last bit of bitrate.

For example, maybe the bands could be roughly chosen to go with the shape of the ATH curve (this assumes 44.1 kHz sampling rate), for example:
0-1250 Hz
1250-6000 Hz
6000-12000 Hz
12000-15000 Hz
15000-18500 Hz
18500-22050 Hz

So you might get away with about 6 bands. The last band could be discarded for the lowest bitrate lossy, as it's mostly inaudible in real music, but the prediction error could be pretty high and remain inaudible, so it might not take too many bits anyway.

Thanks for the info about your trying 5a previously and the predictors becoming less efficient. This may be the sort of stumbling block that makes this approach simply not worthwhile to pursue because it's a big effort to change the encoder like that, so I'm glad of your information.

Options 1 and 1b might then be viable for a smaller improvement in avoiding the audible noise that Den reported at normal playing volume in a quiet intro.

I'm still not sure how even the current noise-shaping (sloped noise) is coded into a coder that doesn't look too far ahead, unless it's looking at the amount of allowable error in the differential of the waveform (i.e. its slope) rather than in the amplitude, and that's how it's not spectrally flat, yet is calculated on a near-instantaneous basis (with no blocks and very little look-ahead) so that the error can be calculated to be within the allowable limits without knowing anything about its frequency distribution.

If it's something as simple as the error in the first differential that's needed for implementing shaped noise, then something pretty clever would be needed to implement making the allowed error follow the ATH curve or Fletcher-Munson curves better.
  • Last Edit: 29 May, 2003, 09:04:45 AM by DickD