Encoder clipping prevention...

Topic: Encoder clipping prevention... (Read 5649 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Encoder clipping prevention...

2007-03-19 20:05:33

Hi guys, I normally lurk here, but I've been a followwer of audio compression techniques for a long time now and am much in awe of the work on Frank, Ivan & Monty...

For the last couple of months I've been playing around with a new (from scratch) coder, and now that I have the psychoacoustics working reasonably well over a large range of settings I'm at a loss over (partic ularly) the low bitrate region of ~64Kbps... I'm getting such a huge number of clipped samples on the transform back to the 16bits sample domain.. I realise this is the need for such things as replaygain etc. etc... but I want to provide as close to a non-clipped coding as possible..

I've compared many other codec, in particular aac & vorbis @ 64Kbps, and AAC seems to perform amazingly well with regards to clipped samples.. CUrrently my codec sits between AAC & vorbis with total clipped samples.. I've also compared against MP3 & MP2 (Current Lame build and TwoLame) at higher rates, by this I mean comparing my 64Kbps stream against MP3 & MP2 at ~128Kbps + ~192Kbps (and ~256Kbps for MP2) and they both perform poorer than my codec right now..

But, and this is the big but, AAC sounds overall quieter.. An ABX test proved this, not through the quality, but I as able to identify merely through the gain of the decoded signal.. Perhaps a flaw, but I'm questioning how (Nero this is) AAC is in some circumstances 3dB quiter, yet still clips so much..

I'm aiming to support streaming, so 2-pass methods are immediatealy out of the window.. Which is why I want to achieve minimum clipping without resorting to 2 pass techinques..

I've tried various techniques akin to Vorbis Noise Normalisation, which seem to bear little fruit, though do help a bit... Better transient detection helped enourmously and the repsective boost in SMR for the temporal regin of interest.. I'm still sadly short (by a long way!!) of achieving a one-pass compression that isn't horrendously clipped at such rate (~64Kbps)...

I can see why, but I"m lost as too how I should fix this at the various Quality settings ? Should I just 'assume' that at Q 0.0 the overall error is XdB and then compensate the source signal for this ? Or is there some magical solution to balancing the energy (taking into account the gibbs effect) for a given band of the spectrum ?

Can anyone recommend any reading on material relating to this ?

I'd like to just add, that it's an MDCT based coder with the MPC Psychoacoustic model (very heavily modified) with no entropy coding as of yet, and a very simple Mid/Side strategy based purely upon preceptual energy... yet still able to achieve ABC/HR results surpassing Ayuomis (sp?) Vorbis B5 @ Q0.0 at the moment..

If it wasn't for the current trend of rail-to-rail music I wouldn't worry about this too much, but the poor performance of literally all existing coders (to 16bits output) under such circumstances seems like a failure..

In listening test I perform, for me stereo imaging is important, yet when I listen to AAC (Nero, sorry Ivan) the imaging is destroyed even as low as 96Kbps, at 64Kbps (I'm talking LC here!) it's disastrous... I'm curious for resources pointing to the general public perception of the destruciton of the stereo imaging ? Are there any ? Many years ago when I first started with audio codiing, I remember showing someone how their 128Kbps MP3 lost all of the positioning of the instruments, from that point on he was aware of that and began to listen to the localised position of individual instruments in a mix, which he didn't before.. Yeah, yeah, that was a long time ago, and now 128 Kbps MP3 is damn good stuff (via Lame) bar the slightly dubious ATH levels, and audible cutting off of noise... Either way, not important..

How important is stereo imaging to achieving high scores in an ABC/HR test ?
My gut feeling is that a destroyed stereo image is less important than a few soft/smeared hihat splashes

Should I drive for quality over imaging ?

Just thought I'd ask the (amazingly competent!!) brains of HA..

Encoder clipping prevention...

Reply #1 – 2007-03-21 03:34:39

I'm not an expert in codec design by any means, but your project sounds interesting and quite promising, especially if you've managed to bring the relative simplicity of tuning the psymodel from MPC (from what I've read of Frank and Andre's websites) into a VBR-oriented transform codec that presumably has few of the inefficiencies and inflexibilities enforced by the MP3 format.

Anyway, to my point, your post mentions horrendous numbers of clipping samples many times, but on no occasions do you say whether they're audible. You can actually get away with a surprising amount of clipping (often a few consecutive samples at 44100Sa/s) without it being audible in my limited experience and hearing ability.

I believe all Dibrom's tuning on Lame --alt-preset standard (now -V2) was done with no volume reduction and certainly no ReplayGain and was done with clipped decoded peaks, albeit with audio less distorted and compressed on the original CD than today's CDs. It was considered transparent in all but very few problem samples, and I don't believe clipping was considered to be to blame for lack of transparency.

However, many of today's peak-distorted CDs might sound no worse (or even no different) with a modicum of clipping. Some are clipped already on the CD in ways that have introduced only modest levels of harmonic distortion. If those distortion harmonics are small enough to be already masked, they may be removed during encoding and "restore" the peak to an unclipped appearance. However, on decoding without ReplayGain they may well be clipped once again, causing a very similar range of distortion harmonics (usually odd-order harmonics) to the original, which would be masked to about the same degree, and hence cannot be perceived. If ReplayGain is used and the decoded peaks remain unclipped, the lack of such clipping harmonics should be inaudible according to the originally calculated masking thresholds that led to their removal.

In other cases, some of the clipping distortion harmonics present on the original CD would be unmasked and audible, so the encoder should retain those that are audible. This is quite likely to produce a decoded peak that isn't clipped or is only slightly clipped (to a degree that will be inaudible - though this is the point which needs ABXing).

If none of this clipping turns out to be audible with or without ReplayGain, your principal concern over comparative performance might only be whether any of the more dynamic and well-mastered recordings (which happen to be those that require little ReplayGain adjustment) would exhibit peaks so unnaturally extreme that they force RG with Clipping Prevention to lower the volume to much less than the Target Volume of 89 dB in situations when other codecs (and/or the original CD) manage to achieve the Target Volume without clipping.

Encoder clipping prevention...

Reply #2 – 2007-04-05 08:16:45

Quote from: m00 on 2007-03-19 20:05:33

Should I just 'assume' that at Q 0.0 the overall error is XdB and then compensate the source signal for this ?

better than nothing

Quote

Or is there some magical solution to balancing the energy (taking into account the gibbs effect) for a given band of the spectrum ?

I was in the way of doing something similiar but did not came as far as you, so that idea might be useless or stupid.
How about something like gain-factors for each band, which represent the magnitute of original frequency values in each band?

Encoder clipping prevention...

Reply #3 – 2007-04-06 03:29:51

You could detect blocks with clipped data when decoding, then you just have to adjust the scale factors accordingly and decode again til clipping no longer appears. It will increase CPU usage and the volume won't remain constant when clipping occurs.

Just a simple and naive idea.

Encoder clipping prevention...

Reply #4 – 2007-04-08 08:52:22

Sorry for the delay.. Been a little busy with other things and just not found the time to reply..

I decided to do some ABX'ing of the clipped output of my coder and okay, in 7/10 tests pieces I can't tell the difference.. Though I think I've managed to turn my ears into useless cabbages with all the listening I've done on this coder recently.. At the low quality compression levels, where the clipping is really a problem, I'm automatically identifying the compressed output merely through my recognition of the coders characteristics and am not able to be very objective about this..

I've decided for now to have one last attempt to resolve clipping issues without having to resort to outright modification of the output levels..

It's just an idea, no idea if its been tried before, but I'll basically (attempt to) solve the clipping issue by performing the inverse transform back, then convert back to the frequency domain and examine the spectrum for evidence of clipping, if clipping seems to be prevalent, then it'll go attempt to identify the band that maybe suffers worst and overcode by a smidge, repeat the transform back to time again, and repeat this whole clipping analysis process repeatedly until there's little or no clipping or at least clipping identifiable in the frequency domain.. And hope this isn't going to be too much of a pain..

I've got a feeling I'm wasting my time, and that the process will spiral out of control though..
At the moment the clipped output is no worse than any other coders, and I think I'm just fussing about something that isn't important really.. It's just seeing the spectral noise of the clipping that's making me nervous.. Perhaps I can extract the clipped samples in the time domain, transform just the clipped samples, use the spectrum of clipping to guide the refinement ? At least that'll provide a guide me as to where all the excess energy is located and allow me to identify energy levels over the source taking into account the energy loss of the quantisation process..

One quick test I performed for another solution to this was to examine the energy loss on a per band basis then if the energy differential was significant then it uses the refinement stages (similiar to Ogg noise normalisation I guess, but very different in practice) to improve the quantisers output.. This works remarkably well, but the increase in bitrate is unacceptable and the data may as well have coded at a higher quality in the first place..

On the matter of using gain factors restore levels, I did try this originally, I had all the framework in place essentially to just drop this in, and the results were unpleasant from the very first fully debugged listen and I never revisited it.. Now with the refinement stages in place which attempt to balance the energy in a guided way the results would probably be a lot better.. Hmmm, I think I'll give this a play.. The results of the refinement stages are not pleasant to look at, but do sound very good imho, they achieve a similar effect to Oggs noise normalisation but in very different way...

A few more improvements have made for lower bitrates, but I think I'm about as far as I can get with a traditional mid/side model, and will now investigate Intensity Stereo coding for pushing the rates down further.. But the goal was to achieve high quality stereo imaging at ~64Kbps, which is there just about, this is mid/side with no trimming at ~64Kbps.. There's a few more Kbps to be had I think since the all the sideinfo for channels coded in isloation and I think there's some benefit to coupling all of that in many cases.. With both channel independent allocation & scale factors eating up ~18 Kbps on 128Kbps streams..

Also there's some more tuning to be done on the psychoacoustics for each quality setting since at the moment it's freely coding whatever the psychoacoustic model deems fit over the full spectrum, there's no specific cutoff, which I like, but I think for sanity I should limit each setting to some specific bandwidth.. But again on this I'm not sure.. The transient detection code is doing a really marvellous job on the high end, and I find the results more pleasing with an open ended cutoff, but I guess that's just me enjoying seeing the code do 'the right thing' and not going to lead to any higher scores in tests, if anything those bits could probably be put to better use across the spectrum if point scoring is all that matters..

I also finally got around to implementing huffman coding for this whole contrpation, yet surprisingly I'm find very little advantage over the brute force (and slow!) bit coding methods I've implemented.. Even generating optimal huffman tables on a per file basis gives me almost no advantage, which puzzles me a lot.. In the high(er) realms of ~128Kbps, I'm seeing maybe a 1-2 Kbps benefit from the Huffman coding..

I'm implemented a simple bit coder for reference, and the Huffman coding gives me a ~20% gain so the implementation seems correct, just the optimal bit coder with all of it's additional coding tools is on par with the optimal huffman coding, and costs nothing in terms of decode power (which was one of the biggests points of this coder in the first place), and is very slow at encode time.. Currently the optimal bit coder is only x3 realtime, where as the basic guts run at 30x..

Anyway, enough rambling for this morning..
Happy easter people

Encoder clipping prevention...

Reply #5 – 2007-04-08 10:14:30

Did you tried something with arithmetic-coding (or a rangecoder) for the entropy-coding. This should give even better compression at low bitrate.

Encoder clipping prevention...

Reply #6 – 2007-04-08 13:36:54

I haven't yet.. I find the Arithmetic coder stuff a bit mind-boggling, and there's also the patent issues with this method which I believe are still an issue..

I've tried both Rice coding and Golomb coding as additional backend coders.. Both gave good results, most especially as the spectrum begins to become sparser, but the brute force bit-coder still exceeded their performance..

There's more investigation to be done still, but at the moment the actual coder mechanics are the priority, the backend coder is essentially independent of everything else and is very easy to plug in new coding forms as and when, maybe even hopping between different coders on a frame by frame basis, whichever gives the lowest bitcount..

Encoder clipping prevention...

Reply #7 – 2007-04-08 14:40:35

You got me curious with this talk of Arithmetic coders, so I thought I'd measure the actual entropy of some test cases and the rates average too:

Golomb Coder:   80416.214 bps
Huffman Coder:   75326.896 bps
Custom Bit Coder:   71868.565 bps
Theoretical Entropy:   73515.022 bps

This is over the 13 piece set that was used in some test (iirc) a while ago on HA..
   I - 2.01. 41_30sec.flac
   I - 2.02. DaFunk.flac
   I - 2.03. EnolaGay.flac
   I - 2.04. experiencia.flac
   I - 2.05. fourbros.flac
   I - 2.06. getiton.flac
   I - 2.07. Leahy.flac
   I - 2.08. LifeShatters.flac
   I - 2.10. Paris_Combo.flac
   I - 2.11. TomsDiner.flac
   I - 2.12. trust.flac
   I - 2.13. Waiting.flac

So my Huffman coding was fairly bang on then..
But for obvious reasons I still prefer my custom bit coder, which I think I might well be sticking with then..

Encoder clipping prevention...

Reply #8 – 2007-04-08 19:44:35

Range-Coding is patent-free as far as I know. Doesn't the entropy depends on the model?
You can extend the Golomb/Rice-Coder easy if you code the most significant part with a static order-0 model and code the least significant part raw.
I expected the Huffman-Coder to be better at higher compression, how does the BitCoder actually works?
What programming language did you choose? If you like I can provide some basic C++-Classes showing Entropy-Coding with a Range-Coder.

Encoder clipping prevention...

Reply #9 – 2007-04-10 00:55:06

Quote from: m00 on 2007-04-08 08:52:22

decided to do some ABX'ing of the clipped output of my coder and okay, in 7/10 tests pieces I can't tell the difference.. Though I think I've managed to turn my ears into useless cabbages with all the listening I've done on this coder recently.. At the low quality compression levels, where the clipping is really a problem, I'm automatically identifying the compressed output merely through my recognition of the coders characteristics and am not able to be very objective about this..

I know you've probably got bigger fish to fry, but I'd suggest you can test clipping at low quality levels by pre-scaling the input PCM by -6 dB before you encode it (e.g. use wavgain or SSRC with --att 6) to provide enough headroom to eliminate clipping.

After encoding and decoding back to lossy PCM, scale one version up by 6 dB to make it clip then scale it back down (-6 dB).

Then try to ABX the clipped lossy decode against the unclipped lossy decode. Both will suffer identical compression artifacts (both being derived from the same lossy source) but only one will suffer clipping, and it's only the clipping that should enable you to identify a difference.

Notice