Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Pathological example of a intersample peak, 11dB, discussion. (Read 49972 times) previous topic - next topic
0 Members and 2 Guests are viewing this topic.

Pathological example of a intersample peak, 11dB, discussion.

Pathological example of a intersample peak that was artificially created:

~0dB peak, ~20dBFS RMS (squarewave), +10.87dB intersample peak, 44.1KHz, 32bit float.

http://www.hydrogenaudio.org/forums/index....showtopic=98752

Please keep any discussion of the test sample in this tread, rather than where it's simply "stored".


The problem:
If oversampled the true peak is reveal to be almost +11dB.
A DAC would need 11dB headroom (or alternatively ~12dB which equals 2 bits) to handle this wav correctly.

The solution?:
A "quick fix" for a 24bit (or float) audio chain, would be to reduce the volume by 12dB somewhere.
Volume loss can be later compensated by simply increasing the analog volume (the user turning the knob a little higher).

*** The rest is somewhat opinionated. ***


Thoughts:
As such the "bottom 3 bits" of audio could be considered waste-able, 2 bits to handle pathological intersample peaks, + 1 bit due to quantization/noisefloor/dither.
A "24bit" DAC would have no issues, 21bits to use is a lot. Likewise a "20bit" DAC would still have 17bits to use.
Ideally the 11 (or 12) dB volume reduction would be done by the DAC just before the reconstruction stage.

Issues?:
For a 12dB headroom DAC one would need to crank up the playback volume, so such a DAC would sound more quiet than most other DACs.
Noisefloor of the amplifier and other parts of the equipment/audio chain is also an issue.
But even "cheap" gear has around -80dBFS to -100dBFS noisefloor.
Also considering that a normal living room can easily have a +50dB noisefloor, so loosing out on the 12dB or so of the quietest audio is not an issue.

So if taking CD audio as an example, a 12dB adjustment would cause the content in the -96dBFS to -84dBFS range to be lost.
The loss can be avoided by simply passing the 16bit audio as 24bit or 32bit float instead.
Under Windows Vista and Windows 7 and Windows 8 all audio is changed to 32bit so this is a non-issue.

How to avoid intersample peaks on gear without the needed headroom?:
On Windows you can simply make sure that you never raise the volume (in Windows) above -12dBFS (~45% volume),
and instead use the analog volume knob (if there is one on your system or gear) instead.

11dB really?
Yep! Then again this is a pathological example.
"Normally" the intersample peak is within 1dB of the digital peak, and in some rare cases up to 2 to 3dB higher.
If you make/master music, then the final mix/pressing master/encoding/exporting should have 2 or 3dB headroom.
So as long as no peaks go above 3dB you should be pretty darn safe from causing any clicks or distortion for the end user.

The example here is a pure spike, and humans tend not to like to listen to pops, clicks, static, test tones, or similar.
So encountering anything like this "in the wild" is very rare.

Is it really that bad?:
Please remember that intersample peaks do not damage equipment, at least I've never heard or read about such happening, and the CD was invented like ages ago.
So if this was a practical issue we'd have heard about it along time ago as equipment got fried etc. And we'd have had a solution years ago as well.

The only thing it does is damage the audio quality, that is if you actually can hear/notice it at all. You are more likely to hear crackling/distortion from overly compressed music.
And ironically it is that type of overly compressed music that has the most intersample peaks that go above 0dBFS.
Solution? Stop compressing the hell out of music. Use 20dB or more headroom and intersample peaks will most likely never be an issue.


Pathological example of a intersample peak, 11dB, discussion.

Reply #1
Pathological example of a intersample peak that was artificially created:

~0dB peak, ~20dBFS RMS (squarewave), +10.87dB intersample peak, 44.1KHz, 32bit float.


Discussion thread for this sample: http://www.hydrogenaudio.org/forums/index....showtopic=98753

The red line is 0dBFS, the white line is -1dBFS, even without oversampling one can see the intersample peak is at ~+8.5dBFS, and settles on +10.87dBFS after 2x (or more) oversampling.


Does this ever happen in the real world or does the equipment have to be broken to do this?


Pathological example of a intersample peak, 11dB, discussion.

Reply #2
http://forums.digitalspy.co.uk/showthread....25#post56472825
Quote from: Martin Watkins link=msg=0 date=
From 1999 to 2003 the BBC adopted the very sensible policy on DSat of leaving about 12 dB of headroom above PPM 6 so that if peaks did slip through they were well handled.



Pathological example of a intersample peak, 11dB, discussion.

Reply #3
At a first glance it seems that this test signal isn't properly band-limited and therefore isn't a valid signal in the strict sense. Although signals like this can be created in the digital domain, they won't appear in the output of an ADC with proper anti-aliasing filtering.


Pathological example of a intersample peak, 11dB, discussion.

Reply #5
While we're quoting past debates, I remember one where I said that if someone releases audio with such huge inter-sample overs, then those audio fans who values the intentions of the original engineer should leave them as they are - the original engineer, in all likelihood, monitored a sound with horrible inter-sample clipping from a standard DAC which clips inter-sample overs.

The EBU loudness group is recommending to stay below -1 dBTP - i.e. measure the maximum inter-sample over, and ensure it is below -1dBFS.
section 3.4. of https://tech.ebu.ch/docs/tech/tech3344.pdf

They're doing some other great work which I keep meaning to report on.

Cheers,
David.

Pathological example of a intersample peak, 11dB, discussion.

Reply #6
When designing Benchmark's new DAC2 HGC D/A converter, we chose to add 3.5 dB of digital headroom to accommodate inter-sample overs.  We are working with a 32-bit fixed-point conversion system, and a 32-bit fixed-point gain control.  The conversions subsystem has a 133 dB SNR, so we can afford to throw away 3.5 dB SNR to eliminate the clipping of inter-sample overs.

A survey of our in-house music library showed inter-sample overs reaching peak levels of +1.5 to +2 dBFS worst-case.  However, please note that our entire library is ripped in lossless formats.  I suspect that inter-sample overs could be higher in amplitude, and more frequent, when the audio is reconstructed with an MP3 decoder.  Does anyone have test results for MP3 audio sources?

I believe 3.5 dB of headroom above 0 dBFS is sufficient to handle all continuous waveforms, including square waves.  Can anyone provide examples or calculations to prove otherwise?

3.5 dB of headroom should also be more than sufficient to handle music (but my tests are limited to lossless rips at standard sample rates between 44.1 and 192 kHz).

The example cited in this thread is high-amplitude high-frequency transient - something we are unlikely to see in a typical recording.  It should not be necessary to provide the full 11 dB of headroom required for this pathological example.
John Siau
Vice President
Benchmark Media Systems, Inc.

Pathological example of a intersample peak, 11dB, discussion.

Reply #7
I suspect that inter-sample overs could be higher in amplitude, and more frequent, when the audio is reconstructed with an MP3 decoder.  Does anyone have test results for MP3 audio sources?


This is true.  For rockbox we needed an extra dB or two IIRC for lossy audio to account for rounding errors (since we wrap around rather then clip!). 

That said, if your system can clamp rather then wrap around, I doubt lossy is a big concern.  I've never seen evidence that clipping the quantization error added by mp3 is audible. 

Pathological example of a intersample peak, 11dB, discussion.

Reply #8
I've seen +4.8dB intersample peak in a song, but it is in mp3 format.

Legal file by an indie artist, feel free to download the whole song

http://zonble.net/MIDI/orz.mp3

EDIT: I would say after decoding the mp3 into 32-bit float wav the file has a +4.8dB peak, so the case may not be directly relevant to this discussion.

Pathological example of a intersample peak, 11dB, discussion.

Reply #9
I don't see how on-sample values above 0dB FS in an mp3 file are relevant to the DAC - they can't get to the DAC (unless the mp3 decoder is built in to the DAC, or you re-define "FS").

I guess some (most?) clipressed/trashed music, encoded to mp3, decoded and re-clipped in/after the decoder at 0dB FS, and then fed to a DAC may have higher inter-sample overs than the same music in original lossless form - but I wonder if the worst case inter-sample overs all come from mp3 encoded tracks?

Beware of mp3s of unknown provenance - some people increase the gain of their mp3s after encoding, meaning you have something that could never have been encoded from LPCM at that level.

Cheers,
David.


Pathological example of a intersample peak, 11dB, discussion.

Reply #11
A survey of our in-house music library showed inter-sample overs reaching peak levels of +1.5 to +2 dBFS worst-case.  However, please note that our entire library is ripped in lossless formats.  I suspect that inter-sample overs could be higher in amplitude, and more frequent, when the audio is reconstructed with an MP3 decoder.  Does anyone have test results for MP3 audio sources?

I believe 3.5 dB of headroom above 0 dBFS is sufficient to handle all continuous waveforms, including square waves.  Can anyone provide examples or calculations to prove otherwise?

3.5 dB of headroom should also be more than sufficient to handle music (but my tests are limited to lossless rips at standard sample rates between 44.1 and 192 kHz).


I'm seeing similar mentioned elsewhere too, normally you would never see above 3. Usually the same as or +1 to +2, very rarely +3, and above +3 probably almost ever, so 3.5 is a good margin. Anything above that are either test/synthetic like my test sample, or a single (or similar) pop that a ed user rarely hears. (usually data corruption during transmission) Although vinyl (being not just analog but mechanical) could cause high intersample peaks by mistake. (vinyl music is usually 40Hz-16KHz)

I'm curious of raw number tests as well, but I can't find a R128 scanner with True Peak with log generation for later processing.
At least I have not found such a tool, I'd be happy to scan and provide the results as I'm sure others would be.

I tried with Sox but upsampling (if it's the correct way, or is rate the better option?) even if it's correct, the stats option seem to clip the peak at full signal, and there seem to be no way to change that, using the vol option to reduce by 12dB and then do the upsample does make things better, but -5.20dB (and thyen calculating +12dB) it is still nowhere close to the actual 10.87dB.
Shame as the stats that Sox output is otherwise pretty nice.

Pathological example of a intersample peak, 11dB, discussion.

Reply #12
$ sox InterSamplePeak.wav -n gain -11.9 stats

Pk lev dB      -12.00


$ sox InterSamplePeak.wav -n gain -11.9 rate 441k stats

Pk lev dB      -5.24


$ sox InterSamplePeak.wav -n gain -11.9 rate -vb 99.7 441k stats

Pk lev dB      -0.08

Pathological example of a intersample peak, 11dB, discussion.

Reply #13
it is still nowhere close to the actual 10.87dB.

What is "actual" here? It depends on what interpolation formula you use. I could say that your figure of 10.87dB is nowhere close to actual 15.59dB (obtained by using unwindowed sinc interpolation). This is as actual as it gets.

http://imgur.com/uQctz

Pathological example of a intersample peak, 11dB, discussion.

Reply #14
it is still nowhere close to the actual 10.87dB.

What is "actual" here? It depends on what interpolation formula you use. I could say that your figure of 10.87dB is nowhere close to actual 15.59dB (obtained by using unwindowed sinc interpolation). This is as actual as it gets.


Nice! Do you have a test wav? After all that is the purpose of the my test wav, to see how/if the interpeak sample is detected by software, and is so what it's measured at. The +10.87dB is from Adobe Audition 1.5, with 999 quality setting and post processing off and tested with 2,4,8,16,32,64x resampling. (Note! Audition 1.5 crashed when trying to resample to 128x, but the other checks was very consistent (i.e. the same) +/- 0.01dB)
If you meant a peak scanner showed +15.59dB then I'm very curious indeed as to what tool that is, if it's a sample you are able to generate then that is awesome as the highest I could make was 10.87dB, though that was hand edited rather than mathematically generated. Color me curious.

 

Pathological example of a intersample peak, 11dB, discussion.

Reply #15
I just generated another synthetic example waveform, but yes, we can hardly see this in real life, for discussion only.

Analog loopback of the above waveform with my soundcard:


Wow! Although the intersample peaks do not look that bad, I'm more surprised by what is going on with the waveform itself, it seems to be increasing in volume, what is  actually going on there with your gear?

Pathological example of a intersample peak, 11dB, discussion.

Reply #16
Nice! Do you have a test wav? After all that is the purpose of the my test wav, to see how/if the interpeak sample is detected by software, and is so what it's measured at. The +10.87dB is from Adobe Audition 1.5, with 999 quality setting and post processing off and tested with 2,4,8,16,32,64x resampling. (Note! Audition 1.5 crashed when trying to resample to 128x, but the other checks was very consistent (i.e. the same) +/- 0.01dB)
If you meant a peak scanner showed +15.59dB then I'm very curious indeed as to what tool that is, if it's a sample you are able to generate then that is awesome as the highest I could make was 10.87dB, though that was hand edited rather than mathematically generated. Color me curious.

Example wav is here: http://filesmelt.com/dl/upsample1.wav (88200 hz, 32-bit float). It should be your wav upsampled using windowed sinc with a very large window - it almost doesn't affect the peak value. Check it yourself, I can't trust myself. I just had a funny bug when constructing filtering kernel for 2x upsampling - I stuffed zeros in every second value.  Though it doesn't change peak value here.
As I said already, peak value entirely depends on the interpolation filter used in the upsampler. Though no reasonable filter I can think of (not many) should give more than 15.59dB here. If you make your sample longer ( I don't mean stuffing it with zeros at beginning or end), you can get even bigger peak with a suitable upsampler.

Pathological example of a intersample peak, 11dB, discussion.

Reply #17
I just generated another synthetic example waveform, but yes, we can hardly see this in real life, for discussion only.

Analog loopback of the above waveform with my soundcard:


Wow! Although the intersample peaks do not look that bad, I'm more surprised by what is going on with the waveform itself, it seems to be increasing in volume, what is  actually going on there with your gear?


My soundcard works fine. It gives excellent results in RMAA.
You will get similar result if you upsample the original file with some good resamplers like sox, adobe audition, foobar's PPHS and so on, try it yourself 

Pathological example of a intersample peak, 11dB, discussion.

Reply #18
Example wav [...] It should be your wav upsampled using windowed sinc with a very large window - it almost doesn't affect the peak value.


That is not the same waveform, whatever upsampling method you used it significantly altered the waveform. Audition retained the original waveform shape regardless what samplerate it resampled to.


Pathological example of a intersample peak, 11dB, discussion.

Reply #19
Example wav [...] It should be your wav upsampled using windowed sinc with a very large window - it almost doesn't affect the peak value.


That is not the same waveform, whatever upsampling method you used it significantly altered the waveform. Audition retained the original waveform shape regardless what samplerate it resampled to.

Before we can have a meaningful discussion, we need to agree about terms. Specifically:
- what is "true value" between samples. It can only be obtained by interpolation. Many different interpolation filters are possible. You seem to think that what's built into Audition is Final Truth™. I disagree - it's only a practical compromise. In my opinion, if we are to use the term "true value", we should define it as value given by Whittaker-Sannon interpolation formula, as it offers perfect reconstruction for signals satisfuing conitions f the sampling theorem.
- what is "waveform shape" and when it becomes sufficiently different. Your own post says that before upsampling the amplitude is 8.5db, and 10.87 after. I think that's sufficiently different.
Please also do a null test.

Pathological example of a intersample peak, 11dB, discussion.

Reply #20
Example wav [...] It should be your wav upsampled using windowed sinc with a very large window - it almost doesn't affect the peak value.
That is not the same waveform, whatever upsampling method you used it significantly altered the waveform. Audition retained the original waveform shape regardless what samplerate it resampled to.
Before we can have a meaningful discussion, we need to agree about terms. Specifically:
- what is "true value" between samples. It can only be obtained by interpolation. Many different interpolation filters are possible. You seem to think that what's built into Audition is Final Truth™.

Stop acting like a nincompoop, I never said final truth or anything like that, please do not try to imply I said something that I did not actually say. As for talking about Audition please see further below.

Quote
if we are to use the term "true value", we should define it as value given by Whittaker-Sannon interpolation formula, as it offers perfect reconstruction for signals satisfuing conitions f the sampling theorem.

Again, I never said true value. I said true peak, a term I did not make up but which the industry have been using for quite some years now, do not point the finger at me on that one.
I also assume you mean Whittaker-Shannon ? I am not familiar with that, nor do I have a way to practically test that. And be careful to claim "perfect", people get spanked for less on HA. 

Quote
- what is "waveform shape" and when it becomes sufficiently different. Your own post says that before upsampling the amplitude is 8.5db, and 10.87 after. I think that's sufficiently different.


Yes! Because peak detection of intersample peaks are only possible after upsampling. The 8.5 was from looking at the waveform rendering. After upsampling it matches visually (on the dB scale) with the peak analysis.

As to the waveform shape and me referring to Audition it's simple. I created the test wav by hand in Audition. I also tested upsampling. (999 quality, and no pre/post filter) If any pre/post filtering is done the waveform is altered.

To see what I mean look at this:
Original 44.1KHz waveform http://imageshack.us/photo/my-images/163/originaly.png/
Upsamples/resampled to 88.2KHz with Audition 1.5, 999 quality, no pre/post filter. http://imageshack.us/photo/my-images/571/upsampled.png/
And this is yours http://imageshack.us/photo/my-images/96/upsampledb.png/

To my eyes it is clear which upsample simply interpolated, and which actually altered the shape of the wave.

Quote
Please also do a null test.


I just did, but Audition is being an ass, seems that regardless what settings I'm using it insists on reducing the intersample peak to around 0dB. (Normally I'd welcome this, but it ruins a null test)
Regardless, the result is a waveform shape almost identical to the first two. (with the variation being how extreme the intersample peak actually is).
Note! There is no windowing (or if there is it's fixed) to be set here, not sure if that is significant.

I'm not sure what you are getting so upset for here. But we're getting slightly off topic now. This thread is not about Audition.
This thread is about extreme intersample peaks, an example of such for test cases, and detecting such, the headroom needed to handle it (if needed at all).

I'd also like to see a EBU R128 compliant test of the original wav as I'm curious as to how it detects (and by how much) with it's 4x upsampling true peak detection, but sadly Sox do no have this and I'm still trying to find a tool that does.

And don't get me wrong, I welcome other test wavs or examples of extremes, especially if they are even more extreme than my wav.
It's just that you claimed that your wav was a upsample mine, which may be true, but the interpolation is way off on yours.
Looking at the spectral view I see that my original (and it's upsample) has it's energy spread evenly through all frequency bands, with some minor clustering near 44.1 (both original and upsample) as does yours, but next is where things differ.
The length of the original wave is ~0:00.220, the length of my upsample wave is the same. The length of your upsample wave is ~0:03.823, a duration increase of over 3 and a half second.
You may have upsampled the original, but the result is not just a upsample, but a modification of the wave. If the wave is different then obviously the intersample peak(s) will be as well.

Now if you excuse me, I'm off to battle with Sox, (@bandpass, thanks for the Sox settings tip BTW!)
it seems to be morphing the wave a little during upsampling as well. (or perhaps it's resampling when I really should have it upsample instead?)



Pathological example of a intersample peak, 11dB, discussion.

Reply #23
bandpass, 2bdecided, saratoga and John Siau will hopefully find this interesting.


4th post here http://www.hydrogenaudio.org/forums/index....st&p=820447
has a .CSV with the raw data from a scan.

Here is the highlights:
Quote
5824 tracks.
Peak -1.18 dBFS (Min -28.03, Max 0.00)
ISP -0.45 dBFS (Min -28.02, Max 9.54)
ISP/Peak Delta -1.63 dB (Min -56.05, Max 9.54)
RMS -16.89 dBFS (Min -46.18, Max -6.34)

27.59% ISP <-1 dBFS
72.41% ISP >-1 dBFS

44.33% ISP -1 to 0 dBFS
19.95% ISP 0 to 1 dBFS
3.71% ISP 1 to 2 dBFS
0.79% ISP 2 to 3 dBFS
0.21% ISP 3 to 4 dBFS
0.34% ISP 4 to 5 dBFS
0.31% ISP 5 to 6 dBFS
1.56% ISP 6 to 7 dBFS
1.18% ISP 7 to 8 dBFS
0.00% ISP 8 to 9 dBFS
0.02% ISP 9> dBFS


The tracks are of all genres from "typical" to really weird. WAV, FLAC, Mp3, AAC/MP4, Ogg, 44.1KHz & 48KHz, 16 and 24/32bit. Soundtracks, pop, techno, anime, classic, game, computer music, my own composed music, standup, stereo and mono tracks, multi-channel tracks, spanning multiple decades. It took 4+ hours using 6 cores each at 100% (would have taken 24hrs if only one core had been used).

I am really surprised that only around 30% of the tracks is actually below the -1.0 dBFS for "true peaks" that EBU R128 wants.
I am even more surprised to see around 20% in the 0 to +1 dBFS range as that is (in my opinion) a rater high percentage of ISP "overs" to pass to a DAC or a audio chain.
The 3.71% in the +1 to +2 range and the 0.79% in the +2 to +3 range is very worrying, but a DAC like that mentioned by John Siau should handle this fine.

Where it really gets creepy is the >3dBFS ISPs, 3.62% total in the +3 to >+9 dBFS range. Wit the highest ISP at 9.54.
Those are probably really messed up tracks. Stupidly enough I did not also log the filepaths. (I just created the ids) So I can't easily find out what tracks are this bad so easily.
I may rescan everything in the future (maybe with better tools, R128 scan and log tool would be ideal for this stuff) and if the tracks where commercial tracks I'll post which ones (and if it turns out to be my music I'll provide samplers for testing).

Note! I used a intermediary tool that called sox, the tool itself was called/used by foobar2000 as if it was a encoder/converter, 32bit wav audio data was passed to sox, so it's unlikely the very high ISPs detected are corrupted file data, or if it is then foobar2000 treated it as part of the audio, in which case it matches real world situations (playing of corrupted data as audio).

Hopefully you guys find the .csv interesting, 5824 tracks is a rather large sample of data and thus hopefully useful.
Maybe if others could scan and make similar .csv and we can start gathering large amounts of data for numbers crunching (a foobar plugin or a standalone scan tool would make this easier for folks, *hint hint* !).

Pathological example of a intersample peak, 11dB, discussion.

Reply #24
To see what I mean look at this:
Original 44.1KHz waveform http://imageshack.us/photo/my-images/163/originaly.png/
Upsamples/resampled to 88.2KHz with Audition 1.5, 999 quality, no pre/post filter. http://imageshack.us/photo/my-images/571/upsampled.png/
And this is yours http://imageshack.us/photo/my-images/96/upsampledb.png/

To my eyes it is clear which upsample simply interpolated, and which actually altered the shape of the wave.
No, you can't say that from this view. I assume your version of Audition works the same as my old Cool Edit Pro - when zoomed out, the solid envelope is formed from the maximum excursion of the actual sample values, no interpolation. The whole point of this topic is that actual sample values can be very different from nearby reconstructed values (and sample value peaks can be very different from intersample peaks). It's wrong to claim that a resampling algorithm is better because the between-original-sample reconstructed (resampled) sample values are closer to the original sample values.

"Banned" is right to say that sinc reconstruction is theoretically perfect in this sense, and with a short audio sample it's not that difficult to get as close to theoretical perfection as you wish - bounded only by the rounding error in the mathematical operations you use, and (if you're seeking the absolute true peak) the number of discrete inter-sample time points you wish to calculate. With any time-bound audio signal this "perfect" reconstruction is possible, though with longer signals a) it's a pain, and b) it's pointless (not least because the sinc function falls off to irrelevantly small values pretty quickly compared with, say, the length of a typical music track).

Whether "Banned" is performing the sinc reconstructions correctly or not, I don't know - I haven't tried it.

Cheers,
David.