Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: LossyWav clone with ANS (adaptive noise shaping) (Read 9453 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

LossyWav clone with ANS (adaptive noise shaping)

Although I'm not much of a LossyWav user myself I was curious enough to see whether adaptive noise shaping could improve LossyWav. I havn't reached a final conclusion yet but it sounds promising. The clone is not based on Nick's code. It's a completely new C++ version (still lacking features the original LossyWav has! So, don't expect miracles). But it includes "ANS" coupled with a simple psychoacoustic model.

So far I only did a few tests. I did't hear any differences with the current version running at about 300 kbps. The difference signals show at least that the noise shaper works like intended.

The attachment is a Win32 binary of a commandline tool called "LossyWavANS". My apologies for the choice of name.

Happy new year!
SG

edit: The most recent version is 0.3.1.

LossyWav clone with ANS (adaptive noise shaping)

Reply #1
Great to see folks who create instead of drinking and behaving badly on new years eve ;D

I tried it with the -5 switch and it created clipping quite badly (in the higer frequencies it seems). But I am impressed by how low I think the added noise is.
I tried the standard setting now and it clips too.

LossyWav clone with ANS (adaptive noise shaping)

Reply #2
Don't drink and code!

LossyWav clone with ANS (adaptive noise shaping)

Reply #3
Of course, the version I uploaded here was buggy. So, if you're intested in testing it, you should probably use the new version attached to this post.

I tried it with the -5 switch and it created clipping quite badly (in the higer frequencies it seems).

and by "quite badly" you mean it sounded terrible?

But I am impressed by how low I think the added noise is.

So it doesn't sound terrible? 

I havn't had any issues with clipping so far, but then again, the peaks in my test files are nowhere near full scale.

LossyWav clone with ANS (adaptive noise shaping)

Reply #4
To me, one of LossyWav's advantages compared to other codecs is artifact predictability. The worst you can get seems to be a somewhat higher noise floor in rare situations. Does this behavior change with ANS? If LossyWav+ANS delivers a better average over most samples but worse exceptions, a full blown psycho acoustic codec such as AAC or MP3 might be the better trade-off.

LossyWav clone with ANS (adaptive noise shaping)

Reply #5
To me, one of LossyWav's advantages compared to other codecs is artifact predictability. The worst you can get seems to be a somewhat higher noise floor in rare situations. Does this behavior change with ANS?

The answer to that depends on the exact definition of "noise floor". The program also just adds quantization noise. The difference is that the added noise has a possibly non-uniform power spectral density. But it is meant to be at least 6...12 dB lower than the signal (in every time/frequency region!) depending on the settings. This means that hissing artefacts should be very very unlikely by design. If anything, it might sound a bit "dirtier". The overall power spectral density is preserved, however. Actually, the preservation of the power spectral density is one of the important design goals. It keeps the "predictability" of the signal w.r.t. encoders like FLAC, WavPack, etc, as high as possible.

If LossyWav+ANS delivers a better average over most samples but worse exceptions, a full blown psycho acoustic codec such as AAC or MP3 might be the better trade-off.

Sure. On the other hand we have more headroom and can tolerate model errors. If you intent to go down to bitrates around 200 kbps you'd be using the wrong codec. ;-)

LossyWav clone with ANS (adaptive noise shaping)

Reply #6
Sounds promising, I'll give it a try!

LossyWav clone with ANS (adaptive noise shaping)

Reply #7
I updated the code again. Attachment coming soon.

It'll run slower than the former versions because four filters per channel instead of only one are computed per block to avoid the otherwise possibly hard transitions at block boundaries which might otherwise lead to glitches. This version also supports 24 bit Wave files as long as they still use a simple 44-byte header. The psychoacoustic model has also changed a little.

If you're interested in how LossyWavANS determines the tolerable noise levels have a look at this post.

Cheers!
SG

LossyWav clone with ANS (adaptive noise shaping)

Reply #8
So, here it is. If you managed to get the 0.3.0 version you should download the 0.3.1 version. Some nasty bug snuck in there.

For listening tests I used 8 songs (Artists: Backini, Blue States, Ratatat, RJD2, Zero 7) and -3dB as the SNR offset (slightly lower quality than default). Using flac -5 -b 512, the lossless flac files sum up to 199 MB and the lossy versions sum up to 79 MB. That's an average bit rate of about 330 kbps. But my ears are probably not that good. I didn't hear a difference but you might.

LossyWav clone with ANS (adaptive noise shaping)

Reply #9
Well done SebG. I knew you wouldn't be able to resist!

For big changes from one block to another, rather than crippling the values in one block, I tried creating an intermediate filter for the transition (so there was no single big jump), and also cross fading the filter coefficients themselves (not a good approach for IIR filters, but can work tolerably well if the filters are similar) - but IIRC in the end it wasn't necessary. I'll have to check.

However, like you, within the block I constrained the frequency response of the filter to ensure the result was stable, and short-ish in the time domain, but picked arbitrary values which I was uncertain how to tune.

It seems to me that a good result could be achieved by taking an know good psy model, and forcing upon it the constraints needed to yield a stable filter design. But I can see that developing your own from scratch is far more fun and interesting.

There is the "advantage" with lossyWAV that there are only two constraints:
1. creating a stable filter
2. not doing something that increases the subsequent "lossless" bitrate
This means you can play with psychoacoustic parameters to your heart's content without worrying about the underlying structure of the codec. No fixed filterbank. No fixed block size (in theory at least). Though arguably constraint number 2 is a crippling unknown when trying to tune for efficiency.

Cheers,
David.

LossyWav clone with ANS (adaptive noise shaping)

Reply #10
As for changing/interpolating filter parameters, the choice of the filter structure and parameterization makes a big difference. That's why I went for a lattice structure. I actually don't interpolate filter parameters for now, but I compute 4 different sets of filter parameters within a block based on two frequency response target curves (one target curve per block boundary). For each subblock I simply blend the frequency response target curves together according to the subblock index and derive the filter parameters from this (like interpolation in the frequency domain instead of filter parameters). I could go further by interpolating filter parameters between these four "sub filters" but I don't expect to see significant improvements in this area.

It seems to me that a good result could be achieved by taking an known good psy model, and forcing upon it the constraints needed to yield a stable filter design.

I totally agree.

But I can see that developing your own from scratch is far more fun and interesting.

Well, it was not my intention to develop a completely new psy model. I used fixed SNRs mainly for testing purposes. Garf was kind enough to send me code of a psychoacoustic model implementation. But I have yet to dig through it.

There is the "advantage" with lossyWAV that there are only two constraints:
1. creating a stable filter
2. not doing something that increases the subsequent "lossless" bitrate
This means you can play with psychoacoustic parameters to your heart's content without worrying about the underlying structure of the codec. No fixed filterbank. No fixed block size (in theory at least). Though arguably constraint number 2 is a crippling unknown when trying to tune for efficiency.

I wouldn't say "crippling unknown"
1 <=> the frequency response target curves (for the noise shaping filters) should have a reasonably limited slope (on a dB scale).
2 <=> the SNRs should be positive, at least 6 dB.

Edit: I'm inclined to claim that the default setting is "mostly transparent". In that case the SNR offset parameter represents the "transparency safety headroom" you are willing to spend bits on. An additional dB amounts to about 10 kbps (for 44kHz) and lowers the noise at average by about 1 dB (duh!) ;-).

Cheers!
SG

LossyWav clone with ANS (adaptive noise shaping)

Reply #11
Sebastian, if I may ask: is your noise shaper related to Verhelst's and de Koning's approach (section 3), or do you compute an IIR (LPC-style) filter?

Chris
If I don't reply to your reply, it means I agree with you.

LossyWav clone with ANS (adaptive noise shaping)

Reply #12
Sebastian, if I may ask: is your noise shaper related to Verhelst's and de Koning's approach (section 3), or do you compute an IIR (LPC-style) filter?


See

G. Schuller, B. Yu, D. Huang, and B. Edler: "Perceptual Audio Coding using Adaptive Pre- and Post-Filters and Lossless Compression", IEEE Transactions on Speech and Audio Processing, September 2002, pp. 379–390

I basically turned their "post-filter" into a noise shaper.

LossyWav clone with ANS (adaptive noise shaping)

Reply #13
I have problems seeing the utility of doing a less-than-ideal lossy end-to-end function just to be able to use an established lossless intermediate format (if this is the concept).

Can it be compared to image coding standards like gif and png, where the format itself is lossless, but various pre-processing algos reduce the bit-depth and use dithering to make the total file-size smaller?

-k

LossyWav clone with ANS (adaptive noise shaping)

Reply #14
I have problems seeing the utility of doing a less-than-ideal lossy end-to-end function just to be able to use an established lossless intermediate format (if this is the concept).

Can it be compared to image coding standards like gif and png, where the format itself is lossless, but various pre-processing algos reduce the bit-depth and use dithering to make the total file-size smaller?

That's a very good example! Yes, it pretty much works like that. You will also notice that if you turn dithering (i.e. Floyd-Steinberg) on, the pictures will look better, but compress worse. That's precisely what I'm deliberately avoiding by using signal-to-noise rations above 6 dB.

Currently, I see it as an "experiment". I actually wouldn't recommend using it for any serious stuff right now, only for testing.

But lossy codecs have different target bitrates. For every design there's a sweet spot where the quality-per-bit ratio peaks. Some codecs work well with low bitrates and don't scale well above that. And some codecs work well with high bitrates and don't scale well below that. In my opinion, an approach like LossyWavANS+FLAC represents a unique trade-off between compression and quality. It sort of fills the gap between lossy transform coders (typically 250 kbps or less) and lossless coders (typically 800 kbps or more, possibly much more with 24bit PCM signals). If you want to save bits but lots of "headroom" w.r.t. transparency an approach like this might be the solution. If you aim for transparency without any "headroom", an approach like this would definitely not be the right choice.

LossyWav clone with ANS (adaptive noise shaping)

Reply #15
Today I gave lossyWavANS a try. I wanted to test all those samples I tested yesterday using lossyWAV 1.2.2.z.
The average bitrate of lossyWavANS on my standard test set of tracks of various pop type is 345 kbps (using lossyWavANS with default accuracy and FLAC -8).
I started with furious. I could ABX it at sec. 0...0.8 5/5 -> 7/8 -> 8/10. The deviation is very subtle and hardly audible.
I continued with eig. Unfortunately the deviation from the original is very audible in an annoying way.
I didn't continue because I think the issue with eig should be fixed before going into a complete ABX tour.

You can download these samples from here:
eig_essence.flac and Furious.flac.
lame3995o -Q1.7 --lowpass 17

LossyWav clone with ANS (adaptive noise shaping)

Reply #16
I can't test it myself right now, but I appreciate your efforts so far, halb27. I'll probably check your samples later in the evening.