Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: level matching when comparing lossy to lossless (Read 14994 times) previous topic - next topic
0 Members and 2 Guests are viewing this topic.

level matching when comparing lossy to lossless

I probably should have had this settled in my mind long ago, but -- how necessary is it to level match when you are comparing a lossless track, to its mp3 counterpart?

By way of investigation, I compared foobar2k's reported replaygain and track peak values for the same track encoded (LAME 3.98) as V2 (195kbps VBR) and at 320 kbps CBR

lossless:
Track gain -1.08
Track Peak 0.938171

195kbps VBR:
+0.11
0.954459

320kbps CBR:
+.10
0.934477

The track peak differences are negligable, but whence comes the >1db difference in RPG gain of lossless vs. lossy?  Does this mean that one should level match according to those RPG values when doing an ABX?

level matching when comparing lossy to lossless

Reply #1
The difference between lossless original and  LAME MP3 should be much less than this.

level matching when comparing lossy to lossless

Reply #2
If they're encoded from the same source I see no reason to level match, *if* the encoder is not intentionally changing the volume. I understand LAME does this at least in some ABR modes to lower the amount of clipping.

1dB of ReplayGain difference between lossy and lossless is surprising. The ReplayGain calculation isn't perfect of course, but that's more than I'd have expected. Are you sure the same ReplayGain *algorithm* was used in both cases? (original ReplayGain vs R128?)

level matching when comparing lossy to lossless

Reply #3
I probably should have had this settled in my mind long ago, but -- how necessary is it to level match when you are comparing a lossless track, to its mp3 counterpart?

By way of investigation, I compared foobar2k's reported replaygain and track peak values for the same track encoded (LAME 3.98) as V2 (195kbps VBR) and at 320 kbps CBR

lossless:
Track gain -1.08
Track Peak 0.938171

195kbps VBR:
+0.11
0.954459

320kbps CBR:
+.10
0.934477

The track peak differences are negligable, but whence comes the >1db difference in RPG gain of lossless vs. lossy?  Does this mean that one should level match according to those RPG values when doing an ABX?


Level matching within +/- 0.1 dB is the general rule. No level match and I will personally guarantee you will get proper identification if you are paying attention, even if you are comparing two copies of the same file but with the level shift.

I guess the question of interest is the choice of criteria for matching. Back in the days before phase disruptive things like perceptual coders even peak level was good enough. Today, if nothing else use average RMS or the closest thing to it that your software supports.

level matching when comparing lossy to lossless

Reply #4
If they're encoded from the same source I see no reason to level match, *if* the encoder is not intentionally changing the volume. I understand LAME does this at least in some ABR modes to lower the amount of clipping.

1dB of ReplayGain difference between lossy and lossless is surprising. The ReplayGain calculation isn't perfect of course, but that's more than I'd have expected. Are you sure the same ReplayGain *algorithm* was used in both cases? (original ReplayGain vs R128?)



Absolutely sure -- this was all done today, right before I posted,  using Foobar's v1.2.2's replaygain  scanner.  The lame.exe is 3.98r that comes bundled with dbpoweramp (lame --license shows it as 3.98.4)

Perhaps others could try and see what they get?  The track, btw, was 'Black Cow' , the first track on Steely Dan's Aja  -- couldn't tell you exactly which CD version this is, I bought it many years ago.

level matching when comparing lossy to lossless

Reply #5
Level matching within +/- 0.1 dB is the general rule. No level match and I will personally guarantee you will get proper identification if you are paying attention, even if you are comparing two copies of the same file but with the level shift.

I wouldn't guarantee anything.  It may be the general rule, but I would definitely make an exception for lossy.  To suggest that level-shifting the same source and comparing is going to be less noticeable (usage of the word "even") than comparing a lossless vs. a high-bitrate lossy version is more than a little strange.  Provided the encoder isn't scaling I would not apply a level adjustment.

I guess the question of interest is the choice of criteria for matching. Back in the days before phase disruptive things like perceptual coders even peak level was good enough. Today, if nothing else use average RMS or the closest thing to it that your software supports.

I find there is too great a tendency to treat RG figures as the absolute and perfect metric for loudness matching.  It definitely is not.  That said I don't think level matching using RMS is a good idea either.  The ideal thing to do is determine if the encoder scaled and by how much and use this to calibrate the two signals.

level matching when comparing lossy to lossless

Reply #6
Here's what I get when I run the three tracks through Audition 1.0's statistics generator (NB the mp3s were decoded by Audition 1.0's codec, which is Fraunhofer IIS  mp3/mp3PRO)

(Sorry about the formatting -- i have never been able to recall how to get tables to format correctly in BB code). 

The take home here is that the Average RMS Power is the same for all three.  This is straightforward average level measurement (using a RMS Window of 50 ms).  RPG, as I understand it, applies some psychoacoustics magic to come up with its recommended gain.  I would expect the three files to differ *spectrally* and thus the algorithmic psychoacoustic prediction would be different?


Code: [Select]
Black Cow .wav	Left	Right
Min Sample Value: -30296 -30143
Max Sample Value: 29222 30742
Peak Amplitude: -.68 dB -.55 dB
Possibly Clipped: 0 0
DC Offset: -0.003 -0.007
Minimum RMS Power: -91.32 dB -85.37 dB
Maximum RMS Power: -9.34 dB -10.29 dB
Average RMS Power: -22.68 dB -22.37 dB
Total RMS Power: -21.18 dB -20.9 dB
Actual Bit Depth: 16 Bits 16 Bits

195VBR Left Right
Min Sample Value: -30151.38 -31275.73
Max Sample Value: 29373.31 30374.21
Peak Amplitude: -.72 dB -.4 dB
Possibly Clipped: 0 0
DC Offset: -0.003 -0.007
Minimum RMS Power: -91.67 dB -85.63 dB
Maximum RMS Power: -9.35 dB -10.3 dB
Average RMS Power: -22.68 dB -22.37 dB
Total RMS Power: -21.19 dB -20.91 dB
Actual Bit Depth: 32 Bits 32 Bits

320CBR Left Right
Min Sample Value: -30620.92 -30537.32
Max Sample Value: 29536.84 30406.43
Peak Amplitude: -.59 dB -.61 dB
Possibly Clipped: 0 0
DC Offset: -0.003 -0.007
Minimum RMS Power: -91.82 dB -85.46 dB
Maximum RMS Power: -9.34 dB -10.29 dB
Average RMS Power: -22.68 dB -22.37 dB
Total RMS Power: -21.18 dB -20.9 dB
Actual Bit Depth: 32 Bits 32 Bits

level matching when comparing lossy to lossless

Reply #7
Thanks for the data.

I believe those average RMS figures long before I believe RG figures and will have to tip my hat to Arny.

Have a look at the RG algorithm and consider how it differs in the way it divides the audio and performs a ranking to arrive at a number from how Lame divides the audio prior to compressing it.

RG does not apply any psychoacoustic magic, other than using a less-than-state-of-the-art equal-loudness curve which hardly constitutes psychoacousitc magic.

Regarding Audition's decoder, besides failing to use Lame's gapless information, I would only expect differences in the LSB between it and the implementation used by foobar2000.

level matching when comparing lossy to lossless

Reply #8
Quote
I find there is too great a tendency to treat RG figures as the absolute and perfect metric for loudness matching.  It definitely is not.  That said I don't think level matching using RMS is a good idea either.  The ideal thing to do is determine if the encoder scaled and by how much and use this to calibrate the two signals.



I seem to recall JJ somewhere making much the same cautionary critique of current  level matching schemes, RPG included.

But what about lossless vs lossless -- say, comparing two remasters (for either audible difference, or for preference)?  No encoder scaling involve there. Should one even bother trying to level match when there's different EQ?  To the extent I've done such comparisonsm, I've always used replaygain values to 'match' them.

level matching when comparing lossy to lossless

Reply #9
Do you have foo_hdcd or use foo_dsp_effect for de-emphasis?

level matching when comparing lossy to lossless

Reply #10
Do you have foo_hdcd or use foo_dsp_effect for de-emphasis?


The are no active DSPs , and there was no processing of the mp3s.

level matching when comparing lossy to lossless

Reply #11
foobar2000 always uses postprocessing plugins such as foo_hdcd on playback and during RG calculation. But these plugins work only for lossless files, not MP3. That's why I asked about the presence of foo_hdcd, foo_dsp_effect, ...

level matching when comparing lossy to lossless

Reply #12
But what about lossless vs lossless -- say, comparing two remasters (for either audible difference, or for preference)?  No encoder scaling involve there. Should one even bother trying to level match when there's different EQ?  To the extent I've done such comparisonsm, I've always used replaygain values to 'match' them.

You do the best you can.  Personally, I still use RG because I'm an old dog and settled in on a routine.  I haven't bothered with R128 and likely never will.  My guess is that it won't do that much better of a job.

level matching when comparing lossy to lossless

Reply #13
(Sorry about the formatting -- i have never been able to recall how to get tables to format correctly in BB code).
Unfortunately, the only way to line up columns is to edit the table in a text editor using a monospaced font and do it the very old-fashioned way.

RG does not apply any psychoacoustic magic, other than using a less-than-state-of-the-art equal-loudness curve which hardly constitutes psychoacousitc magic.
FWIW, it’s at least better now that foobar2000 uses ReplayGain with EBU R128 rather than its original curve. Edit: And you just posted about R128 at exactly the same time, heh.

level matching when comparing lossy to lossless

Reply #14
Without going into a prolonged discussion about it...

1) Is "RG w/ EBU R128" different from R128Gain?

2) If yes to 1), is RG w/ EBU R128 just RG with a different equal-loudness curve?

Just a yes or no will suffice.

level matching when comparing lossy to lossless

Reply #15
foobar2000 uses libebur128 since v1.1.7, so a different project to r128gain but I assume much the same in its end results. I’ll have to defer to someone else on the technical details.

level matching when comparing lossy to lossless

Reply #16
foobar2000 always uses postprocessing plugins such as foo_hdcd on playback and during RG calculation. But these plugins work only for lossless files, not MP3. That's why I asked about the presence of foo_hdcd, foo_dsp_effect, ...


foo_hdcd is not installed in this instance of foobar2k.  And none of the playback DSPs in Prefs-->DSP Manager-->Available DSPs  are active.  No processing was selected for the MP3 encoding.  Nor am I aware of what postprocessing RPG would be using.  AFAICT there are NO postprocessing plugins in play here, unless 'foo_DSP_std' is active in background. 


here are the installed components
Core (2013-01-18 15:31:20 UTC)
    foobar2000 core 1.2.2
foo_abx.dll (2008-05-24 15:23:50 UTC)
    ABX Comparator 1.3.3
foo_albumlist.dll (2013-01-18 15:30:04 UTC)
    Album List 4.5
foo_cdda.dll (2013-01-18 15:29:30 UTC)
    CD Audio Decoder 3.0
foo_converter.dll (2013-01-18 15:29:34 UTC)
    Converter 1.5
foo_dop.dll (2011-06-12 21:17:16 UTC)
    iPod manager 0.6.9.6
foo_dsp_eq.dll (2013-01-18 15:30:04 UTC)
    Equalizer 1.0
foo_dsp_std.dll (2013-01-18 15:29:58 UTC)
    Standard DSP Array 1.2
foo_fileops.dll (2013-01-18 15:28:32 UTC)
    File Operations 2.2
foo_freedb2.dll (2013-01-18 15:28:12 UTC)
    Online Tagger 0.7
foo_input_alac.dll (2009-09-18 00:39:00 UTC)
    ALAC Decoder 1.0.3
foo_input_amr.dll (2009-03-05 20:20:06 UTC)
    AMR input 1.1.1
foo_input_dts.dll (2010-01-11 16:28:30 UTC)
    DTS decoder 0.2.8
foo_input_std.dll (2013-01-18 15:29:10 UTC)
    Standard Input Array 1.0
foo_playlist_revive.dll (2010-12-28 05:00:38 UTC)
    Playlist Revive 0.2
foo_rgscan.dll (2013-01-18 15:29:44 UTC)
    ReplayGain Scanner 2.1.2
foo_spdif.dll (2007-08-24 13:33:28 UTC)
    SPDIF support 1.3
foo_ui_columns.dll (2008-12-27 19:15:16 UTC)
    Columns UI 0.3.6.4
foo_ui_std.dll (2013-01-18 15:29:12 UTC)
    Default User Interface 0.9.5
foo_uie_albumlist.dll (2008-11-08 19:11:29 UTC)
    Album list panel 0.3.3
foo_uie_graphical_browser.dll (2008-04-19 19:37:53 UTC)
    Graphical Browser rev015
foo_unpack.dll (2013-01-18 15:29:52 UTC)
    ZIP/GZIP/RAR Reader 1.6

level matching when comparing lossy to lossless

Reply #17
Without going into a prolonged discussion about it...

1) Is "RG w/ EBU R128" different from R128Gain?

2) If yes to 1), is RG w/ EBU R128 just RG with a different equal-loudness curve?

Just a yes or no will suffice.


I can't answer just with a yes/no, but I'll be quick.

Both, RG and R128 are two things: A reference standard for transmitting loudness estimation of a signal, and an algorithm to calculate it.

So, RG w/R128 means using the RG "tag" standard, but analyze with the R128 algorithm to estimate the loudness.

This is not " just RG with a different equal-loudness curve", even though it can be a simplified way to explain it.

level matching when comparing lossy to lossless

Reply #18
This is not " just RG with a different equal-loudness curve", even though it can be a simplified way to explain it.

If R128 also differs in the block size used for RMS calculation, in statistical processing, different reference signal used for calibration (ignoring a possible  change in level if only trivial), or any other detail in the algorithm besides using a different equal-loudness curve, then the answer to my second question would be no.  While it wasn't readily apparent, I'm only interested in the algorithm, not in how the tagging is performed.

So is that a yes or a no?

level matching when comparing lossy to lossless

Reply #19
For critical work I still prefer to adjust for equal loudness by ear. Feed one channel of version A and B to both ears and adjust the level for minimal lateralization while swapping L/R in the monitor chain (headphones or speakers).
If A and B are very similar (and time aligned), "nulling" can work too, by reversing the polarity of one of them and adjusting the level to obtain a minimum difference signal.
Although this method can be pretty fast, it's not suitable for large numbers of files

Many years ago I had a special analog monitoring pre-amp built with 4 separate inputs and individual variable input gain, polarity switch and L/R output routing. With a bit of experience levels can be matched in less than a minute. I'm sure this can be done with most modern DAW's with a mixing desk.

level matching when comparing lossy to lossless

Reply #20
If I understand it correctly, EBU R128 / ITU BS.1770 is calculated as follows.

Without gating, it's quite simple. Essentially you just apply a filter to each audio channel, calculate the mean square of all the samples, add the per-channel results together, convert to dB, and Bob's your uncle. There's a bit of scaling, and you don't call the result dB, but basically that's it.


However, without gating it doesn't work very well, so gating is included - actually it's included twice. Let me try to explain it properly (if I make a mistake, please jump in and correct me!). Take a deep breath...

Before you start, remember the loudness calculation is 10log10(value) minus 0.691. The result is in LUFS*, which is just a dB scale. Calling it LUFS* makes it clear that all the other stages have also been done. Remember this - you'll need to apply it several times below.

1) To the samples from each audio channel, apply a "K" filter which consists of
1a) a shelf filter at ~ 1.5kHz
1b) a high pass filter at ~40Hz
2) split the samples in each channel into 400ms long 75% overlapping blocks
3) calculate the mean square value of the samples in each block in each channel
4) add the results for each channel together; you now have one value for each block
5) save these values!
6) apply the loudness calculation to each block, and throw away the blocks which give a result less than -70LUFS. Keep a record of which blocks remain.
7) for each block that remains, recall the value from step 5, and calculate the mean value of all these blocks
8) to that mean value, apply the loudness calculation. Bingo! A gated loudness value! you've finished! No, not really...
9) Take that gated loudness value, subtract ten, call the result X, go back to the blocks that were left after step 6, and throw any away that have a value less than X.
10) For the blocks that remain, recall the value from step 5, and calculate the mean value of all these blocks.
11) to that mean value, apply the loudness calculation. This is the loudness, in LUFS, of your audio signal. Congratulations!

12) The target loudness is -23LUFS. To make your audio match this, scale the audio samples by the appropriate amount. e.g. if step 11 gave -17LUFS, you need to multiply each audio sample by 0.5 to make the audio half as loud.

That's it. You've really finished now.


Obviously there are implementation tweaks and tricks, but I think it's clever that people can create meters which display the EBU R128 gated LUFS loudness in real time on an audio signal that's been running for many hours (think about it - you need to create a new sum of those many hours of audio every time you add another 400ms block!)



ReplayGain, re-worded to enable a simple contrast with the above, is as follows.

Before you start, remember the loudness calculation is 10log10(value) minus a non specified value (it gets cancelled out during the match with pink noise in steps 7+8).

1) To the samples from each audio channel, apply a filter based on inverting an equal loudness curve, which consists of
1a) a shelf at about 2kHz, plus high frequency roll off above 4kHz and again above 10kHz
1b) a 2nd order high pass filter at ~100Hz
2) split the samples in each channel into 50ms long non-overlapping blocks
3) calculate the mean square value of the samples in each block in each channel
4) add the results for each channel together; you now have one value for each block
5) apply the loudness calculation to each block.
6) calculate the 95th percentile. Save this value
7) repeat all the above for a -20dB RMS pink noise signal.
8a) step 7 minus step 6 plus 83dB gives you the loudness, in dB SPL, of your audio signal in an SMPTE RP200 calibrated system.
8b) step 7 minus step 6 gives you the ReplayGain adjustment needed to make the audio match the reference.
9) Add 6dB to the result of step 8b - everyone else did
10) scale the audio samples by the appropriate amount. e.g. if step 9 gave -6dB, you need to multiply each audio sample by 0.5 to make the audio half as loud.

Hope this helps.

Cheers,
David.

* - The EBU call it LUFS, and pronounce it as a single syllable; the ITU call it LKFS, and pronounce it by spelling out the four letters.

P.S. references:
EBU R128:
http://tech.ebu.ch/loudness
http://tech.ebu.ch/docs/r/r128.pdf
ITU BS.1770:
http://www.itu.int/rec/R-REC-BS.1770-3-201208-I/en
ReplayGain:
http://replaygain.hydrogenaudio.org/propos...ulating_rg.html

level matching when comparing lossy to lossless

Reply #21
I do not use automatic (RG/R128) loudness matching when I know the loudness change and/or when I expect the other differences to be so small that the smallest error in loudness matching will be the biggest potentially audible difference and/or when there are signal differences that I think/know will trip up RG/R128.

I use whatever loudness matching works best when I do not know the loudness change and/or when I expect the other differences to be large. I like the sound of Kees' method.

It is even worse when differences are clearly audible, and I'm trying to level match to establish preference. It's even worth A/B testing +/-2dB either side of "matched" (however that was determined) to flag up potential problems.

Cheers,
David.

 

level matching when comparing lossy to lossless

Reply #22
Before you start, remember the loudness calculation is 10log10(value) minus 0.691. The result is in LUFS*, which is just a dB scale. Calling it LUFS* makes it clear that all the other stages have also been done. Remember this - you'll need to apply it several times below.


dB is 10*log10(value) if the value represents power.  It it is voltage (the usual) then use 20*log10(value) because dB represents relative power, which is the square of relative voltage.

level matching when comparing lossy to lossless

Reply #23
Before you start, remember the loudness calculation is 10log10(value) minus 0.691. The result is in LUFS*, which is just a dB scale. Calling it LUFS* makes it clear that all the other stages have also been done. Remember this - you'll need to apply it several times below.


dB is 10*log10(value) if the value represents power.  It it is voltage (the usual) then use 20*log10(value) because dB represents relative power, which is the square of relative voltage.
Instead of doing 20log10(root-mean-square), they do 10log10(mean-square) which is entirely equivalent, and therefore correct.

Cheers,
David.

level matching when comparing lossy to lossless

Reply #24
I'm glad that is all spelled out, though you could have just said no.

FWIW, the original RG web pages as well as this step-by-step description blows away what is currently in our wiki for RG, IMHO.