HydrogenAudio

Hydrogenaudio Forum => Scientific Discussion => Topic started by: atici on 2003-07-12 20:35:53

Title: Observing the loss.
Post by: atici on 2003-07-12 20:35:53
Ok after some other discussion that prodded me into this I decided to give it a try again.

That is, I calculated the pure loss with mp3 and mpc encoders in CoolEdit (Mix Paste, both channels inverted & Overlap) and listened the pure loss in each case with different quality settings. I observed that with mp3 standard preset, I can still figure out the melody because I can still hear some instruments (probably because of low pass filter). With MPC I hear the swoosh sound intensifying in some parts of the sample esp. when the original sample's volume is high. The average volume of the loss decreases as I increase the quality setting.

Could you tell me your points why this is not a good way of objectively evaluating how successful a lossy codec is? I think it's nice because the difference is not masked by the rest of the sample (which is usually higher in volume and dominates). But I can also imagine that using this way one cannot figure out stereo separation artifacts and even though when the pure loss is listened as a sample and sounds tolerable, the actual encoding result might have noticeable differences and is non-transparent. But isn't this also a reasonable method to supplement the results about which encoder is more successful? Can't we conclude anything objectively or subjectively by observing the pure loss? It sounds to me the discarded information is more tolerable a loss in q4 MPC than lame standard mp3 3.93.1.
Title: Observing the loss.
Post by: upNorth on 2003-07-12 20:59:25
Quote
I think it's nice because the difference is not masked by the rest of the sample (which are usually higher in volume).

My understanding of the problem:
The masking is the essence of psycho acoustics, what you do as I see it, is removing the foundation the lossy codec is build upon. It is essential to have the higher volume sounds present to do the masking. They are ment to hide the introduced noise.

Btw: I guess there will be given better and more elaborate explanations by others. I just want to try my understanding a little, to see if it will be picked to pieces...
Title: Observing the loss.
Post by: Xerophase on 2003-07-12 21:20:12
Quote
Ok after some other discussion that prodded me into this I decided to give it a try again.

That is, I calculated the pure loss with mp3 and mpc encoders in CoolEdit (Mix Paste, both channels inverted & Overlap) and listened the pure loss in each case with different quality settings.
...
Could you tell me your points why this is not a good way of objectively evaluating how successful a lossy codec is?

I don't have a lot to immediately add to this (although I plan to research it, as it really interests me) but I wanted to applaud your thinking, atici.  Hopefully some of the members will come in with some constructive brainstorming.  It would be helpful if even some part of this process could eventually be used to analyze a codec because theoretically, the loss is something you can't obfuscate through hardware, human perception, etc.

My only concern is, based on the way lossy codecs work, we couldn't always rely on this data because we wouldn't be taking into account the "ear tricking" factor and how well the codec is doing that despite of what it's discarding, along with the other acoustical properties you mentioned.
Title: Observing the loss.
Post by: atici on 2003-07-12 21:25:43
Once there was a program called eaqual. I guess it's still around as warez I don't know whether they're developing it anymore. It did provide and objective algorithmic quality loss measure. Although it was a neat idea, it wasn't extremely useful most of the times. For instance it thought for some samples ogg encoded wave file was more accurate than the original 

I don't know how this discussion reminded me of that. I guess just because it was suggesting another way of evaluating how successful a lossy codec is as this thread.
Title: Observing the loss.
Post by: 2Bdecided on 2003-07-12 22:21:56
Haven't we been here very recently, in a joint HA and Creative forum thread?

The method tells you almost nothing about how good the encoded version sounds.

Some one find the thread - I can't bare to type it all again, and I have to go and book a holiday!

D.
Title: Observing the loss.
Post by: tigre on 2003-07-12 22:56:04
Quote
Can't we conclude anything objectively or subjectively by observing the pure loss?


1. It's not as easy as it seems, me thinks.

Example (exteme case):
"Original" = sine wave, frequency = <x> Hz, amplitude = <y>
"Lossy copy" = sine wave, frequency = <x> Hz, amplitude = <y> 180° phase shift
Wave substraction result = sine wave, frequency = <x> Hz, amplitude = 2*<y>

"Original" and "Lossy copy" will sound identical while the result of the substraction will be even louder than the original. 


2. Using this method, what would you conclude?
- "Loss" consists of pure noise, no representation of the original singal noticable, low volume = good lossy compression ?
- The louder the "Loss" (while Original vs. Copy not ABXable) the better (->Psychoacoustic model successfully adds loads of noise/cuts sounds without being noticable)
- ... ?


Maybe something similar could work to help detecting some kinds of artifacts:

O = original, L = lossy copy, D = Difference, E = "Exagerated lossy copy"

D = L - O (wave substraction)

E(x) = O + D*x (D*x: amplify D)

For x > 1 (e.g. 1.5 or 2) some sorts of "loss" or problems caused by encoding could be better noticable, so trying to ABX D(x) vs. O could be used to find artifacts more easily or as kind of ABX training (lowering x step by step -> 1).
Title: Observing the loss.
Post by: atici on 2003-07-12 23:27:49
@tigre:

For point 1) we may assume the lossy encoder would not shift the phase of the original. And I don't think it does.

For point 2), I think we may conclude if the volume is low then lossy encoder does a better job (as is shown by increasing quality settings give rise to that effect). Also we can somehow conclude if the "loss" consists of random noise and the original signal is not noticeable, then lossy encoder is doing a better job, can't we?

I think the lossy encoder effect and addition of dither noise is in that sense somehow similar. As there're dither algorithms that introduce higher energy but less audible noise, there may be lossy algorithms that introduce higher volume "loss" but have more transparent effect. But in general, the lower the noise amount the better.
Title: Observing the loss.
Post by: Jebus on 2003-07-12 23:30:52
Look,

The more bits you throw away, the more information will be in the diff file. So an MPC at 160kbps average diff file will have more information in it than a LAME file at say 190kbps. This is NOT to say though that the MPC is lower quality, it actually means that MPC is simply smarter at throwing away stuff that gets masked anyhow.

If you remove the masking effects, you are in essence ignoring the goal of psychoacoustic compression in the first place: to remove information that would otherwise be masked anyhow.
Title: Observing the loss.
Post by: atici on 2003-07-12 23:36:38
Jebus, I guess you missed my point. Because I agree with what you say in the first paragraph. Do not consider it as throwing away bits and the difference in bitrates but as the noise introduced as each case of lossy encoder is applied. Just consider the Wav files only (the original and decode(encode(original)) ) because that's the effect of the lossy encoder.

I think when I listen to the loss file and it sounds like the original then I predict a good amount of information about the original is lost.
Title: Observing the loss.
Post by: lucpes on 2003-07-12 23:50:04
Blah... take a wave file. Apply wave gain or normalize to reduce by -12dB. Invert and mix with original. I'd have to say that this was a very lossy process  - sounds just like the original...

Anyway, the best lossy codec is the one that removes the biggest amount of 'information' but with good results - non ABXable - no audible differences between the encoded & original.
Title: Observing the loss.
Post by: guruboolez on 2003-07-12 23:52:37
Quote
(...) I observed that with mp3 standard preset, I can still figure out the melody because I can still hear some instruments (probably because of low pass filter). With MPC I hear the swoosh sound intensifying in some parts of the sample(...)  It sounds to me the discarded information is more tolerable a loss in q4 MPC than lame standard mp3 3.93.1.

Just a question : have you decoded your mp3 first with LAME, in order to remove the additional samples ? If not, your test is biased : Fhg decoding engine will maintain the 'gap'.


I had fun some times ago with this tool. I find interesting to mesure the real loss of the encoding process. Nevertheless, you can't evaluate the quality of two encodings by this way. The stronger difference (= noise) isn't necessary the most ABXable file. I tried to oppose MPC and Vorbis by this way ; mpc seemed to be the most degraded one, but after a carefully blind listening test, I only heard a difference (hiss) for the vorbis file.

Note that this tool is interesting in order to detect artifacts similar to the erhu effect.
Title: Observing the loss.
Post by: atici on 2003-07-12 23:58:46
Quote
Just a question : have you decoded your mp3 first with LAME, in order to remove the additional samples ? If not, your test is biased : Fhg decoding engine will maintain the 'gap'.


Yes I did. I used the same version of lame to decode and it notified that the encoding gap was taken into consideration.

Quote
The stronger difference (= noise) isn't necessary the most ABXable file.


I agree with that. But there is still a link with the quality lost and the "loss" file. Just as the dithering noise introduced in dither process as I was trying to make an analogy. Some dither algorithms, although at the expense of being audible, aim for the least amount of noise volume (like Waves L2 type 2). In general the lower the noise the better, but of course the difference might be audible and that's not what we want with neither the lossy encoders nor the dithering algorithms.

I was just trying to suggest a supplement to the comparison methods used not offer a panacea.
Title: Observing the loss.
Post by: ErikS on 2003-07-13 02:08:24
Quote
I think we may conclude if the volume [of the difference file] is low then lossy encoder does a better job (as is shown by increasing quality settings give rise to that effect). Also we can somehow conclude if the "loss" consists of random noise and the original signal is not noticeable, then lossy encoder is doing a better job, can't we?

I would say these conclusions are invalid.

1. You have to show implication both ways. Showing that A implies B, doesn't mean that B implies A until you show that too somehow. And I'd say it's very difficult to do that because you should start with the difference file and then find a pair of one original and one encoded file that will match this difference and then check the quality setting after that.

2. How do you draw the second conclusion? Same way as the first?
Title: Observing the loss.
Post by: Pio2001 on 2003-07-13 03:13:12
Personally, I would be less worried by a difference file with audible music. This way I can imagine that the difference is always masked by the music, since it's loud when the music is loud, quiet when the music is quiet, hi pitched when the music is high pitched, etc. A noisy difference file can be more worrisome, as music can't always mask noise.

In fact I'm worried by none, because I trust the masking effect.

Quote
Some one find the thread - I can't bare to type it all again, and I have to go and book a holiday!

There's another one in the FAQ
Title: Observing the loss.
Post by: tangent on 2003-07-13 07:34:14
Look around for the "masking effect", I'm tired of explaining it again.
It's possible to test the difference data to compare codecs. All you really have to do is to build a frequency graph of the difference and compare it to the masking curve built from the original data. Audible noise can be anything in the difference data which goes above the masking curve. I have no idea how EAQUAL or Earguy's objective comparer works, but may be similar.
Title: Observing the loss.
Post by: Pio2001 on 2003-07-13 11:21:52
But... masking effects occurs when two frequencies are played simultaneously. Building a frequency graph of a file will give no temporal information. If the original has -5 db @1000 Hz and the noise -70 db @999 Hz, how do you know if they are at the same time, thus masked, of if the noise one is occuring during complete silence, while the reference one is one minute away ?
And again, not speaking of temporal masking, ATH etc, won't this result in using a very bad "codec" as reference for perfect quality ?
Title: Observing the loss.
Post by: tangent on 2003-07-13 13:24:01
Obviously you do the frequency analysis over time, block by block.
Title: Observing the loss.
Post by: 2Bdecided on 2003-07-14 10:08:44
It's not the same question, but it's close enough...
http://www.hydrogenaudio.org/forums/index....opic=10522&st=0 (http://www.hydrogenaudio.org/forums/index.php?showtopic=10522&st=0)


This is from my PhD thesis:

Figure 2.6: Input Output difference analysis

In a digital system, providing any delay due to the device is known and corrected for, the input signal can be subtracted exactly from the output signal, as shown in Figure 2.6. The residue consists of any noise and distortion added by the device. This technique may be used to determine the noise that is added by an audio codec in the presence of an input signal. If a test signal is applied, standard noise measuring techniques (e.g. [ITU-R BS.468-4, 1986] weighting followed by RMS averaging) may be used to calculate a single noise measurement. Alternatively, a Signal to Noise like Ratio may be computed, where the noise level is measured in the presence of the signal, rather than with the signal absent. This noise measurement may be used in equation (2-1), in place of VN. The measurement is objective and repeatable.

Unfortunately, this measurement is almost useless for audio quality assessment. It is useless because the measured value does not correlate with the perceived sound quality of the audio codec. In fact, the noise measurement gives no indication of the perceived noise level.

The problem is that the noise measurement is quantifying inaudible noise. An audio codec is designed to add noise. The intention is to add noise within spectral and temporal regions of the signal where it cannot be perceived by a human listener. Subtracting the input signal from the output of the codec will expose this noise, and the noise measurement will quantify it. If the inaudible noise could somehow be removed from the measurement, then the resulting quantity would match human perception more accurately, since it would reflect what is audible. This task is complex, and many other approaches have been suggested which avoid this task. Some of these approaches, and the reasons why they are inappropriate, are discussed below.

A measurement of coding noise will include both audible and inaudible noise. Many analyses assume that all codecs will add equal amounts of inaudible noise. If this is true, then the codec that adds the most noise will sound worst, since it must add the most audible noise. However, a good codec may add a lot of noise, but all the noise may be masked. This codec will cause no audible degradation of the signal. Conversely, a poor codec may add only a little noise, but if the noise is above the masking threshold, then the codec will sound poor to a human listener. Hence, this approach is flawed, because the basic assumption is incorrect.

Many codec analyses published on the World Wide Web include plots of the long-term spectrum of the signal and coding noise. This approach assumes that where the coding noise lies below the signal spectrum, it will be inaudible, and where the noise is above the signal spectrum, it will be audible. Unfortunately, these assumptions are false. Noise above the signal spectrum may be masked, because masking extends upwards in the frequency domain. Noise below the signal spectrum may be audible, because the spectrum must be calculated over a finite time (ranges from 1024 samples to three minutes have been encountered). Hence, the signal that apparently masks the codec noise may not occur at the same time as the noise itself. This is especially true for sharp attacks, where many encoders generate audible pre-echo before the attack. This pre-echo is below the spectral level of the attack, so appears "masked" using this mistaken analysis method.

The problem with all these techniques is that they side-step the basic problem: it is necessary to determine which noise components are audible, and which are inaudible, before the audible effect of the codec upon the signal may be quantified.


http://www.mp3-tech.org/programmer/docs/Ro...nson_thesis.zip (http://www.mp3-tech.org/programmer/docs/Robinson_thesis.zip)

Cheers,
David.
Title: Observing the loss.
Post by: DonP on 2003-07-14 12:38:05
Here's another issue to chew on..

Even allowing/assuming/accepting that the difference file doesn't tell you the
quality, what do you think of using it as a crib in ABX'ing?  That is, using the
difference file to identify artifacts which you go and try to find in an ABX between
the original and encoded files, knowing exactly where to look?

Is it that all is fair as long as in the end you can identify the encoded file in a blind test, or
is it cheating a valid model if you could never pick the encoded file without "looking under
the covers" at the diff file?

Should this be a poll?
Title: Observing the loss.
Post by: Vietwoojagig on 2003-07-14 12:46:08
Quote
@tigre:

For point 1) we may assume the lossy encoder would not shift the phase of the original. And I don't think it does.

I would say, it's a must, that this sometimes happens. How could you otherwise explain clippings?

Lets say, you have a given frequency-spectrum at a given moment. The lossy encoder removes some of these frequencies. How could the sum of those frequencies be higher than the original, if none of the frequencies has its phase shifted? And exactly that happens during clippings.
Title: Observing the loss.
Post by: 2Bdecided on 2003-07-14 12:54:53
Quote
Here's another issue to chew on..

Even allowing/assuming/accepting that the difference file doesn't tell you the
quality, what do you think of using it as a crib in ABX'ing?  That is, using the
difference file to identify artifacts which you go and try to find in an ABX between
the original and encoded files, knowing exactly where to look?

Is it that all is fair as long as in the end you can identify the encoded file in a blind test, or
is it cheating a valid model if you could never pick the encoded file without "looking under
the covers" at the diff file?

That's not cheating - it's not making something audible that was inaudible. Rather, it's making something noticeable that you hadn't previously noticed.

If you compared the original to the coded version 100 times, the chance are that you could hear that difference eventually, if you had the patience. So, listening to the diff signal first is just making it a much quicker process. Maybe.


You are very likely to imagine that you hear the diff signal within the coded signal, once you've learnt it. But if it's pure imagination, ABX will take care of that!

Cheers,
David.
Title: Observing the loss.
Post by: Pio2001 on 2003-07-14 14:13:35
Quote
How could the sum of those frequencies be higher than the original, if none of the frequencies has its phase shifted? And exactly that happens during clippings.

It is enough that the sum is higher at a given time in order to introduce clipping. The specrtal view only shows you the average level over the window analyzed.
Remove harmonics from a square wave without changing the phase and it clips.
Title: Observing the loss.
Post by: tigre on 2003-07-14 14:13:52
Quote
You are very likely to imagine that you hear the diff signal within the coded signal, once you've learnt it. But if it's pure imagination, ABX will take care of that!

Additionally you never know if the diff signal consists of dropped information or of things added to the original. (Diff1 = Original - Copy and Diff2 = Copy - Original sound the same.  ) Take a sine wave as example. In lossy compression step its amplitude is quantized, so the compressed copy will be a little bit louder or a little bit lower than the original, but otherwise the same sine wave. In both cases the same tone (at lower volume) will be audible in diff signal ...
Title: Observing the loss.
Post by: ErikS on 2003-07-14 14:33:09
This is interesting... Diff1 = -Diff2 by your own definition, so the question comes down to if you can hear the difference between the original and an inverted waveform. My uneducated guess would be that you can't. Other opinions?
Title: Observing the loss.
Post by: Vietwoojagig on 2003-07-14 15:11:18
Quote
This is interesting... Diff1 = -Diff2 by your own definition, so the question comes down to if you can hear the difference between the original and an inverted waveform. My uneducated guess would be that you can't. Other opinions?

That's exactly what Tigre is telling. Diff1 is the noise you add, Diff2 is the noise you drop. You can't tell the difference.
Title: Observing the loss.
Post by: ErikS on 2003-07-14 15:19:17
Quote
That's exactly what Tigre is telling. Diff1 is the noise you add, Diff2 is the noise you drop. You can't tell the difference.

Uhm. yes you're right of course. I just can't read today. So we all agree that the sign doesn't matter then...
Title: Observing the loss.
Post by: Gecko on 2003-07-14 15:30:39
In the past Microsoft used the wave substraction method to prove WMA 7's superiority over other formats, claiming that because the difference file was quieter than that of other codecs, it was clear that WMA was of higher quality. We know from experience that at least WMA 7 sounds worse than fhg MP3 at comparable bitrates on the majority of samples.

In psychoaccoustic audio compression the noise is added in such a way that it is masked by the actual signal. Since the signal is tonal, it would make sense that the added noise is tonal to a certain degree as well. The added noise is shaped so that it can't be heard.

Eaqual uses a complicated model of the ear to analyse sound perception. However it is just a model and has it's own flaws. At quality levels near transparency, the insufficiencies of the Eaqual model come into play and the error is too big. While a yard stick will be fine to measure the length of your arm, you won't be able to determine the thickness of a single sheet of paper with it. You could probably use any existing psymodel to build a before/after quality analysis tool. The one eaqual uses is computationally intensive but not neccessarily better.
Title: Observing the loss.
Post by: atici on 2003-07-14 16:27:51
Quote
Unfortunately, this measurement is almost useless for audio quality assessment. It is useless because the measured value does not correlate with the perceived sound quality of the audio codec. In fact, the noise measurement gives no indication of the perceived noise level.


Absolutely. This issue seems clear with loss observation and has been pointed out in the thread before.

But consider the set of all lossy audio codecs (not only psychoacoustic codecs but also ones like WavPack lossy including the ones that only exist theoretically but are yet unknown or unimplemented). I'd contend the one that minimizes the noise energy would still do a very good job (like Waves L2 type 2 dither algorithm). The amount of noise is therefore an objective criterion for the amount of loss if we disregard the human ear model. Please refer to my analogy to the dither algorithm noise earlier in the thread. I think it's more or less the same issue.

I know HA is mainly a psychoacoustic codec discussion board. And people get irked when theoretical approximation of quality beyond transparency levels are discussed. That's why the domain between lossless quality and transparent quality is not explored in the threads here. But I think observation (volume, frequency domain analysis and other possible methods) of the loss file seems to be a very useful information at that point. It could help us give an idea how the lossy codecs compare at the same bitrate beyond transparency limits (which becomes important in transcoding, DSP applications, archiving, ...) .

Quote
If the inaudible noise could somehow be removed from the measurement, then the resulting quantity would match human perception more accurately, since it would reflect what is audible.


That's precisely what the noise-shaping algorithms are designed for at the moment. They shift the noise to unaudible spectrum at the expense of increasing the noise volume slightly. Hence couldn't adjusting to the human ear model issue be left to the noise shaping phase? Thus a lossy (again one which is not necessarily psychoacoustic) codec aimed at the minimal noise volume supplemented with noiseshaping should do a good job. Am I missing sth.?

Quote
Personally, I would be less worried by a difference file with audible music.


Umm but theoretically, the more the loss is correlated to the original (i.e. sounds like it), the more information you'd be losing out of your original. Or at least that's how it seems to me, again disregarding the issues about the human ear-model and trying to find an objective(not human based) criterion for lossy encoding quality.
Title: Observing the loss.
Post by: PoisonDan on 2003-07-14 16:39:05
Quote
I know HA is mainly a psychoacoustic codec discussion board. And people get irked when theoretical approximation of quality beyond transparency levels are discussed. That's why the domain between lossless quality and transparent quality is not explored in the threads here. But I think observation (volume, frequency domain analysis and other possible methods) of the loss file seems to be a very useful information at that point. It could help us give an idea how the lossy codecs compare at the same bitrate beyond transparency limits.

Hmmm... I'm not sure I see the importance of analyzing lossy codecs "beyond transparency limits". Can you give me an example of a situation where this may be useful?
Title: Observing the loss.
Post by: atici on 2003-07-14 16:42:44
@ PoisonDan:

Funnily, I just gave examples in my latest edit!  There're people who pick the WavPack route because of transcoding...
Title: Observing the loss.
Post by: DickD on 2003-07-14 18:15:52
Quote
I'd contend the one that minimizes the noise energy would still do a very good job (like Waves L2 type 2 dither algorithm). The amount of noise is therefore an objective criterion for the amount of loss if we disregard the human ear model. Please refer to my analogy to the dither algorithm noise earlier in the thread. I think it's more or less the same issue.


I understand (but I could be wrong, see below) that L2 type 2 dither is a type of strong ATH noise shaping dither for converting high bit depth studio material to 16-bit 44.1 kHz CD audio.

You might be surprised to learn that the amount of noise power it exhibits (just like Foobar2000's strong ATH noise shaping dither) is considerably (many times) higher than for standard dither with no noise shaping.

However, (and you seem to know this already from later in your post!?), it puts most of the noise in ultrasonic frequencies or other areas where the ear is less sensitive and has a relatively high absolute threshold of hearing (ATH) so that it can reduce the noise in the most sensitive frequencies where the ATH is lower. For that reason despite the considerably higher power, it sounds about 15-18 dB quieter than standard dither to the human ear.

This is another sign that it's what you hear, rather than simple measurements that is important. (Algorithms that take account of psychoacoustics, such as ReplayGain's equal loudness curve can do a decent job of measuring roughly the perceived loudness, however).

Incidentally, referring to the top of the thread, I'd expect MP2 or Musepack's difference noise to be a little more noiselike than Vorbis' or MP3's simply because MP2 and Musepack are sub-band codecs (quantizing in the time domain over sub-bands) whereas Vorbis and MP3 are transform codec, quantizing each frequency component separately, so leaving residuals that can be as tonal as the original frequency. (I haven't listened for myself to prove this)

On the subject of noise shaping, lossy encoders like Musepack already incorporate some form of noise shaping. Bryant, the Wavpack developer, seems quite keen to implement strong ATH-type noise shaping for lossy mode, according to discussions we had in the forums. This may reduce the perceived noise artifact or reduce it to similar levels at lower bitrates.
Title: Observing the loss.
Post by: atici on 2003-07-14 18:18:27
Quote
You might be surprised to learn that the amount of noise power it exhibits (just like Foobar2000's strong ATH noise shaping dither) is considerably (many times) higher than for standard dither with no noise shaping.


Are you sure? I think you are talking about the "Type 1" of Waves. "Type 2" focuses on low power.
Title: Observing the loss.
Post by: DickD on 2003-07-14 18:25:55
Quote
Quote
You might be surprised to learn that the amount of noise power it exhibits (just like Foobar2000's strong ATH noise shaping dither) is considerably (many times) higher than for standard dither with no noise shaping.


Are you sure? I think you are talking about the "Type 1" of Waves. "Type 2" focuses on low power.

atici: No, I'm not sure, I only said "I understand that it's noise shaping dither". Noise shaping dither necessarily contains more power so that it still dithers correctly while reducing the audible noise compared to unshaped minimal dither. It was inference.

I can't see any other way it would work, unless it uses insufficient dither and instead permits some truncation distortion in a trade-off for lower dither noise. I don't have Waves DSPs to know the modes exactly, but I thought it as more like L2 Ultra dither, which I've read about.
Title: Observing the loss.
Post by: atici on 2003-07-14 18:44:02
Quote
I thought it as more like L2 Ultra dither, which I've read about.


The default mode of L2 Ultra is "Type 1". "Type 2" is the one aimed for low power. You can choose either of them.

Quote
However, (and you seem to know this already from later in your post!?), it puts most of the noise in ultrasonic frequencies or other areas where the ear is less sensitive and has a relatively high absolute threshold of hearing (ATH) so that it can reduce the noise in the most sensitive frequencies where the ATH is lower.


Noise shaping phase is separate from the dither and based on different algorithms AFAIK (that's at least how it is explained in its booklet). I don't know the specifics of Waves' plugins thus I don't know whether your inference works in this case.

Anyway, that's not the main discussion  It was just an example.
Title: Observing the loss.
Post by: DickD on 2003-07-15 09:37:55
Quote
The default mode of L2 Ultra is "Type 1". "Type 2" is the one aimed for low power. You can choose either of them.


Thanks for clarifying.

Quote
Noise shaping phase is separate from the dither and based on different algorithms AFAIK (that's at least how it is explained in its booklet). I don't know the specifics of Waves' plugins thus I don't know whether your inference works in this case.


It is possible to noise shape without providing enough dither to completely prevent truncation distortion (you just reduce it by so many dB). Perhaps that's what it's doing?

It might be more like "soft ATH noise shaping (less noisy)" in Foobar2000, which is purportedly less noisy than the recommended strong ATH noise shaping dither, but will therefore either sound louder or won't dither (prevent distortion) as adequately.

You can measure the power in the noise by dithering nothing (or a 24-bit WAV containing a 1 kHz sine wave at -130 dB FS, for example) and measuring RMS power in a WAV editor.
You can zoom in on the waveform view (vertical zoom will show noise with peak amplitudes up to about +/-30 samples with FB2K's strong ATH noise shaping dither).
You can look at the Frequency Spectrum of the dithered silence to see how high it is in the near-ultrasonic (18 kHz+) area.
You can use a ReplayGain tool like FB2K or Wavgain to gauge the perceived loudness.

Quote
Anyway, that's not the main discussion :D It was just an example.


Yes, I'm at the edge of my knowledge, as I don't have Waves.

Now, back to the discussion, you seem to be getting away from psychoacoustic encoders which rely on masking, and simply getting to lossy modes of lossless encoders like Wavpack. It seem that Wavpack's existing lossy mode isn't quite transparent even at 320 kbps, and it may well let you down in music where the predictor does a poor job yet psychoacoustic masking from the original signal is poor in at least some parts of the spectrum, which leads to more noise being left at the target bitrate and more likelihood of it being audible.

Yes, we might expect strong ATH noise shaping to improve noise by up to 15-18 dB at the same lossy mode bitrate, which could probably allow a somewhat lower bitrate to become transparent most of the time, so I don't think you'd find Wavpack having a reasonably secure mode at much less than 256 to 320 kbps (guesstimate).

It's not possible to guarantee transparency (in a VBR sense) without the encoder knowing some psychoacoustics, so that it can always ensure that the noise is masked. Then you can shape the noise to the lowest possible masking threshold at that instant.

Of course, the measured noise would be greater in this case than for the un-noise-shaped mode - unless you measure perceived noise with a Replaygain type equal loudness curve weighting.

However, try a sample like one that den provided (Blue Monday was one, I think) where the existing Wavpack lossy at 320 kbps doesn't do well, or some that guruboolez tried. Here, the noise measurement you can make would be reasonably low, yet the noise is audible, making it non-transparent.

I'd be almost certain that if you measured the noise introduced by Musepack --quality 5 --xlevel, it would be higher in power (with or without equal loudness curve adjustment), yet Musepack --quality 5, at far below 320 kbps would sound transparent, while Wavpack lossy would not.

So the measurement really can't tell you much with any certainty.

Quote
Thus a lossy codec (again one which is not necessarily psychoacoustic) codec aimed at the minimal noise volume supplemented with noiseshaping should do a good job. Am I missing sth.?


I don't think 256-320 kbps and still failing on some music would be a particularly great job, not when Musepack at quality 5 and 160-190 kbps is transparent practically all the time and can use high bitrates when it has to.

Perhaps it depends how you define the job that you're trying to do with the encoder as to whether you consider it a good job.

Musepack already shapes the noise to follow the masking threshold it has calculated from the original music. If you go to --quality 5, it's fairly close to the threshold. If you go for higher quality, it gradually adds a margin of safety to both the masking threshold and the adaptive ATH "to be on the safe side". I think --quality 6 is about 2 dB of margin - it's in the FAQs/Stickies and in verbose mode.

However, you can't do a meaningful difference measurement afterwards without knowing the masking threshold and the associated psychoacoustics, and if you knew the psychoacoustics better than Musepack, you could make a psymodel that would reduce those very artifacts where your model is more accurate.

The only artifacts that get through Musepack --quality 5 --xlevel are from those very very rare failings in its psychoacoustic model. If the failing is bigger than 2 dB in magnitude, then --quality 6 still won't make the artifact inaudible, just about 2 dB quieter - only just perceptibly quieter - despite having thrown all those extra megabytes at your album (their benefit is spread of all frequencies, so just the troublesome instant).
Title: Observing the loss.
Post by: wkwai on 2003-07-18 07:51:23
Quote
Quote
How could the sum of those frequencies be higher than the original, if none of the frequencies has its phase shifted? And exactly that happens during clippings.

It is enough that the sum is higher at a given time in order to introduce clipping. The specrtal view only shows you the average level over the window analyzed.
Remove harmonics from a square wave without changing the phase and it clips.

Also, it has to do with the quantization equation. In AAC and MP3,  some spectrals, due to the MAGIC NumBER, 0.4054, could be rounded up instead of down! As a results, these decoded spectrals could be bigger in magnitud than the original.

Try remove the 0.4054 constant during quantization, the decoded wave form would have lower energy contant than the original. Some of the clippings would be eliminated.
Title: Observing the loss.
Post by: Gabriel on 2003-07-18 09:47:10
The first problem with the diff is the potential phase shift problem (likely to happen when using intensity stereo, as an example). Perhaps this could be solved by using power of signal instead of the signal itself.

Now, let's say we have solved the phase problem and we have a diff signal. What conclusions can we draw from the diff?
If we take into consideration the diff info (how should this be done? framing and summing powers?), what we have is an indication of the compression ratio.

Let's say we have two diffs, A and B, resulting from 2 encoders using the same tools (as an example 2 mp3 encoders). We can consider that overall, the lossless compression ratio of both encoders should be in the same range, so globally we could ignore this factor when comparing A and B. What we can say is the overall higher diff means that the compression ratio was higher, nothing else. (this is a least an info)

Two encoders targetting the same compression ratio should have an overall similar diff power. How to evaluate quality of encoder?
We know that the quality of the encoder is not related to the ammount of noise introduced (as overall it should be similar), but to the shaping of the noise. The noise should be shaped in order to accomodate our ear. So to analyse this, we need something analysing the initial signal in order to determine how the noise should be shaped.

That means a psymodel, or an ear model. But this one used for analysis should be perfect in order to be perfectly reliable. If this was possible, this would means that such a model would be feasible, and so analysing quality would be useless as this model would probably also be included in psychoacoustic encoders.

So I think that we can not reliably analyse a diff to determine something else than the compression ratio used.