Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Observing the loss. (Read 18829 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Observing the loss.

Reply #25
Quote
That's exactly what Tigre is telling. Diff1 is the noise you add, Diff2 is the noise you drop. You can't tell the difference.

Uhm. yes you're right of course. I just can't read today. So we all agree that the sign doesn't matter then...

Observing the loss.

Reply #26
In the past Microsoft used the wave substraction method to prove WMA 7's superiority over other formats, claiming that because the difference file was quieter than that of other codecs, it was clear that WMA was of higher quality. We know from experience that at least WMA 7 sounds worse than fhg MP3 at comparable bitrates on the majority of samples.

In psychoaccoustic audio compression the noise is added in such a way that it is masked by the actual signal. Since the signal is tonal, it would make sense that the added noise is tonal to a certain degree as well. The added noise is shaped so that it can't be heard.

Eaqual uses a complicated model of the ear to analyse sound perception. However it is just a model and has it's own flaws. At quality levels near transparency, the insufficiencies of the Eaqual model come into play and the error is too big. While a yard stick will be fine to measure the length of your arm, you won't be able to determine the thickness of a single sheet of paper with it. You could probably use any existing psymodel to build a before/after quality analysis tool. The one eaqual uses is computationally intensive but not neccessarily better.

Observing the loss.

Reply #27
Quote
Unfortunately, this measurement is almost useless for audio quality assessment. It is useless because the measured value does not correlate with the perceived sound quality of the audio codec. In fact, the noise measurement gives no indication of the perceived noise level.


Absolutely. This issue seems clear with loss observation and has been pointed out in the thread before.

But consider the set of all lossy audio codecs (not only psychoacoustic codecs but also ones like WavPack lossy including the ones that only exist theoretically but are yet unknown or unimplemented). I'd contend the one that minimizes the noise energy would still do a very good job (like Waves L2 type 2 dither algorithm). The amount of noise is therefore an objective criterion for the amount of loss if we disregard the human ear model. Please refer to my analogy to the dither algorithm noise earlier in the thread. I think it's more or less the same issue.

I know HA is mainly a psychoacoustic codec discussion board. And people get irked when theoretical approximation of quality beyond transparency levels are discussed. That's why the domain between lossless quality and transparent quality is not explored in the threads here. But I think observation (volume, frequency domain analysis and other possible methods) of the loss file seems to be a very useful information at that point. It could help us give an idea how the lossy codecs compare at the same bitrate beyond transparency limits (which becomes important in transcoding, DSP applications, archiving, ...) .

Quote
If the inaudible noise could somehow be removed from the measurement, then the resulting quantity would match human perception more accurately, since it would reflect what is audible.


That's precisely what the noise-shaping algorithms are designed for at the moment. They shift the noise to unaudible spectrum at the expense of increasing the noise volume slightly. Hence couldn't adjusting to the human ear model issue be left to the noise shaping phase? Thus a lossy (again one which is not necessarily psychoacoustic) codec aimed at the minimal noise volume supplemented with noiseshaping should do a good job. Am I missing sth.?

Quote
Personally, I would be less worried by a difference file with audible music.


Umm but theoretically, the more the loss is correlated to the original (i.e. sounds like it), the more information you'd be losing out of your original. Or at least that's how it seems to me, again disregarding the issues about the human ear-model and trying to find an objective(not human based) criterion for lossy encoding quality.
The object of mankind lies in its highest individuals.
One must have chaos in oneself to be able to give birth to a dancing star.

Observing the loss.

Reply #28
Quote
I know HA is mainly a psychoacoustic codec discussion board. And people get irked when theoretical approximation of quality beyond transparency levels are discussed. That's why the domain between lossless quality and transparent quality is not explored in the threads here. But I think observation (volume, frequency domain analysis and other possible methods) of the loss file seems to be a very useful information at that point. It could help us give an idea how the lossy codecs compare at the same bitrate beyond transparency limits.

Hmmm... I'm not sure I see the importance of analyzing lossy codecs "beyond transparency limits". Can you give me an example of a situation where this may be useful?
Over thinking, over analyzing separates the body from the mind.

Observing the loss.

Reply #29
@ PoisonDan:

Funnily, I just gave examples in my latest edit!  There're people who pick the WavPack route because of transcoding...
The object of mankind lies in its highest individuals.
One must have chaos in oneself to be able to give birth to a dancing star.

Observing the loss.

Reply #30
Quote
I'd contend the one that minimizes the noise energy would still do a very good job (like Waves L2 type 2 dither algorithm). The amount of noise is therefore an objective criterion for the amount of loss if we disregard the human ear model. Please refer to my analogy to the dither algorithm noise earlier in the thread. I think it's more or less the same issue.


I understand (but I could be wrong, see below) that L2 type 2 dither is a type of strong ATH noise shaping dither for converting high bit depth studio material to 16-bit 44.1 kHz CD audio.

You might be surprised to learn that the amount of noise power it exhibits (just like Foobar2000's strong ATH noise shaping dither) is considerably (many times) higher than for standard dither with no noise shaping.

However, (and you seem to know this already from later in your post!?), it puts most of the noise in ultrasonic frequencies or other areas where the ear is less sensitive and has a relatively high absolute threshold of hearing (ATH) so that it can reduce the noise in the most sensitive frequencies where the ATH is lower. For that reason despite the considerably higher power, it sounds about 15-18 dB quieter than standard dither to the human ear.

This is another sign that it's what you hear, rather than simple measurements that is important. (Algorithms that take account of psychoacoustics, such as ReplayGain's equal loudness curve can do a decent job of measuring roughly the perceived loudness, however).

Incidentally, referring to the top of the thread, I'd expect MP2 or Musepack's difference noise to be a little more noiselike than Vorbis' or MP3's simply because MP2 and Musepack are sub-band codecs (quantizing in the time domain over sub-bands) whereas Vorbis and MP3 are transform codec, quantizing each frequency component separately, so leaving residuals that can be as tonal as the original frequency. (I haven't listened for myself to prove this)

On the subject of noise shaping, lossy encoders like Musepack already incorporate some form of noise shaping. Bryant, the Wavpack developer, seems quite keen to implement strong ATH-type noise shaping for lossy mode, according to discussions we had in the forums. This may reduce the perceived noise artifact or reduce it to similar levels at lower bitrates.

Observing the loss.

Reply #31
Quote
You might be surprised to learn that the amount of noise power it exhibits (just like Foobar2000's strong ATH noise shaping dither) is considerably (many times) higher than for standard dither with no noise shaping.


Are you sure? I think you are talking about the "Type 1" of Waves. "Type 2" focuses on low power.
The object of mankind lies in its highest individuals.
One must have chaos in oneself to be able to give birth to a dancing star.

Observing the loss.

Reply #32
Quote
Quote
You might be surprised to learn that the amount of noise power it exhibits (just like Foobar2000's strong ATH noise shaping dither) is considerably (many times) higher than for standard dither with no noise shaping.


Are you sure? I think you are talking about the "Type 1" of Waves. "Type 2" focuses on low power.

atici: No, I'm not sure, I only said "I understand that it's noise shaping dither". Noise shaping dither necessarily contains more power so that it still dithers correctly while reducing the audible noise compared to unshaped minimal dither. It was inference.

I can't see any other way it would work, unless it uses insufficient dither and instead permits some truncation distortion in a trade-off for lower dither noise. I don't have Waves DSPs to know the modes exactly, but I thought it as more like L2 Ultra dither, which I've read about.

Observing the loss.

Reply #33
Quote
I thought it as more like L2 Ultra dither, which I've read about.


The default mode of L2 Ultra is "Type 1". "Type 2" is the one aimed for low power. You can choose either of them.

Quote
However, (and you seem to know this already from later in your post!?), it puts most of the noise in ultrasonic frequencies or other areas where the ear is less sensitive and has a relatively high absolute threshold of hearing (ATH) so that it can reduce the noise in the most sensitive frequencies where the ATH is lower.


Noise shaping phase is separate from the dither and based on different algorithms AFAIK (that's at least how it is explained in its booklet). I don't know the specifics of Waves' plugins thus I don't know whether your inference works in this case.

Anyway, that's not the main discussion  It was just an example.
The object of mankind lies in its highest individuals.
One must have chaos in oneself to be able to give birth to a dancing star.

Observing the loss.

Reply #34
Quote
The default mode of L2 Ultra is "Type 1". "Type 2" is the one aimed for low power. You can choose either of them.


Thanks for clarifying.

Quote
Noise shaping phase is separate from the dither and based on different algorithms AFAIK (that's at least how it is explained in its booklet). I don't know the specifics of Waves' plugins thus I don't know whether your inference works in this case.


It is possible to noise shape without providing enough dither to completely prevent truncation distortion (you just reduce it by so many dB). Perhaps that's what it's doing?

It might be more like "soft ATH noise shaping (less noisy)" in Foobar2000, which is purportedly less noisy than the recommended strong ATH noise shaping dither, but will therefore either sound louder or won't dither (prevent distortion) as adequately.

You can measure the power in the noise by dithering nothing (or a 24-bit WAV containing a 1 kHz sine wave at -130 dB FS, for example) and measuring RMS power in a WAV editor.
You can zoom in on the waveform view (vertical zoom will show noise with peak amplitudes up to about +/-30 samples with FB2K's strong ATH noise shaping dither).
You can look at the Frequency Spectrum of the dithered silence to see how high it is in the near-ultrasonic (18 kHz+) area.
You can use a ReplayGain tool like FB2K or Wavgain to gauge the perceived loudness.

Quote
Anyway, that's not the main discussion :D It was just an example.


Yes, I'm at the edge of my knowledge, as I don't have Waves.

Now, back to the discussion, you seem to be getting away from psychoacoustic encoders which rely on masking, and simply getting to lossy modes of lossless encoders like Wavpack. It seem that Wavpack's existing lossy mode isn't quite transparent even at 320 kbps, and it may well let you down in music where the predictor does a poor job yet psychoacoustic masking from the original signal is poor in at least some parts of the spectrum, which leads to more noise being left at the target bitrate and more likelihood of it being audible.

Yes, we might expect strong ATH noise shaping to improve noise by up to 15-18 dB at the same lossy mode bitrate, which could probably allow a somewhat lower bitrate to become transparent most of the time, so I don't think you'd find Wavpack having a reasonably secure mode at much less than 256 to 320 kbps (guesstimate).

It's not possible to guarantee transparency (in a VBR sense) without the encoder knowing some psychoacoustics, so that it can always ensure that the noise is masked. Then you can shape the noise to the lowest possible masking threshold at that instant.

Of course, the measured noise would be greater in this case than for the un-noise-shaped mode - unless you measure perceived noise with a Replaygain type equal loudness curve weighting.

However, try a sample like one that den provided (Blue Monday was one, I think) where the existing Wavpack lossy at 320 kbps doesn't do well, or some that guruboolez tried. Here, the noise measurement you can make would be reasonably low, yet the noise is audible, making it non-transparent.

I'd be almost certain that if you measured the noise introduced by Musepack --quality 5 --xlevel, it would be higher in power (with or without equal loudness curve adjustment), yet Musepack --quality 5, at far below 320 kbps would sound transparent, while Wavpack lossy would not.

So the measurement really can't tell you much with any certainty.

Quote
Thus a lossy codec (again one which is not necessarily psychoacoustic) codec aimed at the minimal noise volume supplemented with noiseshaping should do a good job. Am I missing sth.?


I don't think 256-320 kbps and still failing on some music would be a particularly great job, not when Musepack at quality 5 and 160-190 kbps is transparent practically all the time and can use high bitrates when it has to.

Perhaps it depends how you define the job that you're trying to do with the encoder as to whether you consider it a good job.

Musepack already shapes the noise to follow the masking threshold it has calculated from the original music. If you go to --quality 5, it's fairly close to the threshold. If you go for higher quality, it gradually adds a margin of safety to both the masking threshold and the adaptive ATH "to be on the safe side". I think --quality 6 is about 2 dB of margin - it's in the FAQs/Stickies and in verbose mode.

However, you can't do a meaningful difference measurement afterwards without knowing the masking threshold and the associated psychoacoustics, and if you knew the psychoacoustics better than Musepack, you could make a psymodel that would reduce those very artifacts where your model is more accurate.

The only artifacts that get through Musepack --quality 5 --xlevel are from those very very rare failings in its psychoacoustic model. If the failing is bigger than 2 dB in magnitude, then --quality 6 still won't make the artifact inaudible, just about 2 dB quieter - only just perceptibly quieter - despite having thrown all those extra megabytes at your album (their benefit is spread of all frequencies, so just the troublesome instant).

Observing the loss.

Reply #35
Quote
Quote
How could the sum of those frequencies be higher than the original, if none of the frequencies has its phase shifted? And exactly that happens during clippings.

It is enough that the sum is higher at a given time in order to introduce clipping. The specrtal view only shows you the average level over the window analyzed.
Remove harmonics from a square wave without changing the phase and it clips.

Also, it has to do with the quantization equation. In AAC and MP3,  some spectrals, due to the MAGIC NumBER, 0.4054, could be rounded up instead of down! As a results, these decoded spectrals could be bigger in magnitud than the original.

Try remove the 0.4054 constant during quantization, the decoded wave form would have lower energy contant than the original. Some of the clippings would be eliminated.

 

Observing the loss.

Reply #36
The first problem with the diff is the potential phase shift problem (likely to happen when using intensity stereo, as an example). Perhaps this could be solved by using power of signal instead of the signal itself.

Now, let's say we have solved the phase problem and we have a diff signal. What conclusions can we draw from the diff?
If we take into consideration the diff info (how should this be done? framing and summing powers?), what we have is an indication of the compression ratio.

Let's say we have two diffs, A and B, resulting from 2 encoders using the same tools (as an example 2 mp3 encoders). We can consider that overall, the lossless compression ratio of both encoders should be in the same range, so globally we could ignore this factor when comparing A and B. What we can say is the overall higher diff means that the compression ratio was higher, nothing else. (this is a least an info)

Two encoders targetting the same compression ratio should have an overall similar diff power. How to evaluate quality of encoder?
We know that the quality of the encoder is not related to the ammount of noise introduced (as overall it should be similar), but to the shaping of the noise. The noise should be shaped in order to accomodate our ear. So to analyse this, we need something analysing the initial signal in order to determine how the noise should be shaped.

That means a psymodel, or an ear model. But this one used for analysis should be perfect in order to be perfectly reliable. If this was possible, this would means that such a model would be feasible, and so analysing quality would be useless as this model would probably also be included in psychoacoustic encoders.

So I think that we can not reliably analyse a diff to determine something else than the compression ratio used.