Transparent encoding for older ears?

Topic: Transparent encoding for older ears? (Read 24335 times) previous topic - next topic

0 Members and 3 Guests are viewing this topic.

Transparent encoding for older ears?

Reply #25 – 2007-05-20 09:29:47

I haven't tested V5 to see if it truly cuts off at 16 kHz, but you're probably right (it also might vary a little depending on LAME version). But I wouldn't recommend using -V5 just for that effect. Because -V5 will still be of a lower quality in other aspects also. There are times where -V5 and even -V2 is not transparent, someone just posted a sample recently on the mp3 General forum (I'm not encouraging anyone to look for that sample, I'm just giving it as an example so people don't criticize me for saying things without proof).

buzzy, yeah you are right, LAME rounds off your --lowpass ##### setting. It only has a few very rough increments, roughly in increments of approx 500 Hz. Just make sure you check what the encoding screen output says is the range it is using. I haven't played with the width settings so I dunno how exact those are either, you'll have to figure it out yourself sorry.

And one last warning, as you've guessed it's probably a good idea not to assume that LAME is doing what it says it is doing. Even if it prints out the lowpass freq and width for you, if you can check it with a separate spectrograph that is better. I think LAME may be accurate in this case, but there's been a lot of times where I discovered LAME doesn't do what it says it is doing (switches not working, behind-the-scenes algorithms forcefully activated/deactivated, etc, even when the --verbose output says they are on/off or doesn't report them). This is all chaotic and depends on LAME version, etc. Bottom line, test yourself if you can. LAME gives me a headache with the way it behaves sometimes.

Transparent encoding for older ears?

Reply #26 – 2007-05-20 09:53:45

Quote

LAME 3.97 32bits (http://www.mp3dev.org/)
CPU features: MMX (ASM used), SSE (ASM used)
Using polyphase lowpass filter, transition band: 15826 Hz - 16360 Hz
Encoding C:\Documents and Settings\Rio\My Documents\My Music\temp.wav
to C:\Documents and Settings\Rio\My Documents\My Music\temp.mp3
Encoding as 44.1 kHz VBR(q=5) j-stereo MPEG-1 Layer III (ca. 11.9x) qval=3
Frame | CPU time/estim | REAL time/estim | play/CPU | ETA
7900/9030 (87%)| 1:06/ 1:16| 1:06/ 1:16| 3.0803x| 0:09
32 [ 7] *
40 [ 21] *
48 [ 13] *
56 [ 12] *
64 [ 7] *
80 [ 148] %***
96 [1408] %%%****************************
112 [3091] %%%%%%%%%%%%%%******************************************************
128 [1852] %%%%%%%**********************************
160 [1258] %%%%%%%*********************
192 [ 72] %*
224 [ 10] %
256 [ 0]
320 [ 1] *
-----------------------------------------------------------------00:29---------
kbps LR MS % long switch short %
120.3 16.9 83.1 98.2 1.1 0.7

LAME 3.97 @ -V5 applies polyphase lowpass filter, transition band: 15826 Hz - 16360 Hz.
EncSpot reports lowpass filter of 16000 Hz.

But when encoding using the presets according to your lowpass preferences as stated in the LAME wiki: http://wiki.hydrogenaudio.org/index.php?title=LAME
-V5's lowpass 16538 Hz - 17071 Hz

I agree with Porcupine regarding *un*documentation on fixes of previous LAME builds. Even the wiki needs to be fixed.

At any rate, ABX yourself.

Transparent encoding for older ears?

Reply #27 – 2007-05-20 17:30:03

Quote from: uart on 2007-05-20 05:56:51

Interesting. Tell me buzzy, how did you measure the "lowpass-width" to tell that those last two were the same? Or did you mean that they were litereally the same in terms of a binary file compare?

To be honest, I just looked at the byte size, as the odds of it being exactly the same seemed very unlikely given the complexity of an encode. But I just did a file compare, too, to confirm that they were identical. The key point, again, is that it's not clear what arguments these switches can take, so it's good to test them to be sure they are doing what was intended.

UPDATE - I've had to revise what I posted here, this is my current guess (!):

--lowpass-width switch syntax/arguments

It seems that lowpass-width might interpret inputs as a % of the lowpass frequency, rather than as a khz number. For example, --lowpass-width 10 creates a width of 10% of the lowpass frequency. (So some of the seemingly unusual things noted above in this thread are understandable, such as why lowpass-widths of .5 or 1 get rounded to the same result because it's interpreted as 1%.)

It seems to accept various integer values - I've tried it with 1, 2, 3, 5, 10, 15, and 20, and the resulting variations in file size seem to indicate that it's working as expected. Not everything worked - 25 seemed to give the same result as 20 for lowpass 19. I didn't try any decimals (such as 7.5), no real point in that.

This part of the longhelp is a little misleading, if you ask me:

[blockquote]--lowpass-width <freq> frequency(kHz) - default 15% of lowpass freq[/blockquote]Something more like this would be more useful:

[blockquote]--lowpass-width <number> % of lowpass frequency - default 15 (%)[/blockquote]Though I'm not sure when the default kicks in, and I wouldn't rely on it.

What may be most interesting is that if you use lowpass without the width switch, you get a width of zero, not the supposed 15% default. See the illustration below - the no width example.

How --lowpass-width works (?!)

Lowpath width seems to create a band below the frequency set with the --lowpass switch, over which the sound is phased in.

It seems to provide a way to have a less dramatic cutoff than the typical lowpass would have.

So, for example, instead of having a sharp cutoff at 16khz, you might use a lowpass of 17khz and a lowpass-width of 6% to have something of a phased transition from 16 to 17.

See the illustrations below.

How it's best used in practice is an open question.

Illustrations

Here's an illustration of an original wav file and several decoded wav files from various encodes. The encoded files were -V 0 --vbr-new --lowpass 19 encodes (the high lowpass of that preset gives a little more room to play around with the spectral graphs). The various graphs show the same four seconds of music for:

- the original wav file
- a -V 0 --lowpass 19 encode with no --lowpass-width - it's a fairly sharp cutoff
- the same with --lowpass-width of 1 - not much effect, though maybe the surprise is that there's clearly some effect even at such a low number
- --lowpass-width of 10
- --lowpass-width of 20

A couple things to note -

- the peaks in the lower graph do correspond to the peaks in the original, which seems intuitively a good thing
- as you might guess, the corresponding file sizes get smaller as you move down the graph.

Again, the most surprising thing perhaps is that it seems that when the lowpass switch is used without the width switch, there is no width, not the supposed default of 15%.

The other curious thing that seems to be happening is that it's affecting the frequencies well below where it should - compare the bottom 2-3 graphs in the lower part of what's shown. For example, if the width is a %, even at 20% x 19 = 3.8 khz - then you'd think it wouldn't be affecting the frequencies below 15k at the left side - but it seems to be.

My first reaction is - Why not just document this a little bit and save everybody a lot of time? My second reaction is - given that it's not clear enough what lowpass-width is doing, it may be something that's not very usable in practice.

I'll leave more detailed interpretation for later posts or for others to add.

Transparent encoding for older ears?

Reply #28 – 2007-05-20 17:41:21

Quote from: Porcupine on 2007-05-20 09:29:47

And one last warning, as you've guessed it's probably a good idea not to assume that LAME is doing what it says it is doing. Even if it prints out the lowpass freq and width for you, if you can check it with a separate spectrograph that is better. I think LAME may be accurate in this case, but there's been a lot of times where I discovered LAME doesn't do what it says it is doing (switches not working, behind-the-scenes algorithms forcefully activated/deactivated, etc, even when the --verbose output says they are on/off or doesn't report them). This is all chaotic and depends on LAME version, etc. Bottom line, test yourself if you can.

Very true, and I'm quoting it again to make sure word gets around.

I don't view that as a knock on the developers. It's more a matter than LAME seems so finished and elegant from the outside, so it's easy to assume all the many switches work, in a useful way. But people need to keep in mind that (like all software) is written to do what it's tested to do, not to be perfect in every possible combination of inputs.

As far as documentation - I was definitely getting that feeling when I discovered that the only place the Y switch seems to be documented is in the LAME-generated longhelp.

Rio - based on the fact that even with --lowpass 16, file sizes get smaller and smaller as you move from V0 on down - LAME does seem to be taking out bits below 16khz. While it might be transparent down to some level, what I'm looking for is something that I could safely use for home audio too.

So I'm guessing I'll end up with something like V1 or V2 with a lowpass at 16.5 or 17 and a width of 5 or so.

But I wanted to narrow down the options some before I started ABXing, to just the options I'm thinking about using, or even just the originals and the proposed settings; I think I've saved myself some time by doing that.

Transparent encoding for older ears?

Reply #29 – 2007-05-21 19:01:14

OK, well, one more thing on the lowpass-width ... here's a chart showing the relative file sizes. This only used one file, but I think it's indicative enough to post.

Notice that even with very high widths, the bitrate doesn't go down dramatically in this example. (The curve might be different at different lowpass numbers - this used 19 - but I'd still expect similar general behavior.)

So, is lowpass-width expecting a khz number, or a % of the lowpass?

Things that suggest a %:

- The effect on the bitrate seems small for what you'd expect at even 1 or 2 khz.
- Numbers that would be huge as a khz number, like 15 or 20, don't have that much effect on the bitrate.

Things that suggest a khz number:

- It seems to affect frequencies below the expected % range

Other observations would be welcome.

Anyway, the only thing I have concluded with any certainty: lowpass-width is too unpredictable and unknown to use, so I'm just going to try some straight lowpass encodes for what I need.

Transparent encoding for older ears?

Reply #30 – 2007-05-21 19:50:51

It would not be surprising if changing the filtering above a certain frequency affected frequencies below that frequency. As you reduce the amplitude of the higher frequencies, they provide less masking of the lower frequencies, which are then more likely to be encoded.

Transparent encoding for older ears?

Reply #31 – 2007-05-22 03:35:47

Quote from: buzzy on 2007-05-14 19:31:38

Thanks. All true, but based on what people have said, it seems like using v1 or v2 and a lowpass might give better perceived quality in less space.

I just did a simple test, encoding the same track with different settings:

-V2
-V2 --lowpass 16000

The first file yielded an average bitrate of 191, while the second file yielded 187, shaving off 4 kbps from the lowpass. Whether the saved bits would provide extra quality encoding to the sub-16khz frequencies, I have yet to know (I could assume it would, but is it audible at all?)

My next best bet is that you ABX two -V2 encoded files, the one at normal setting and the other with --lowpass 16000. If you can't hear the difference, then continue with the lowpass for your next encodings. If you can hear the diff, then REJOICE! Your ears aren't really that old at all!

EDIT: Ok, this is just 1 track...

Transparent encoding for older ears?

Reply #32 – 2007-05-22 04:02:17

I just noticed the following...I'll use Rio's convenient encoding log since it's here:

Using polyphase lowpass filter, transition band: 15826 Hz - 16360 Hz

Like I said earlier, the default width of the lowpass filter is generally around 500 Hz. What I never bothered to consider before is what percent is that of the lowpass frequency.

534 Hz / 16093 Hz = 3.3%, not 15%

So I guess maybe this solves the mystery. Yet another example of where the LAME documentation or descriptions are horribly out-of-date, but at least the encoding output seems to be telling the truth in this case.

Looking at buzzy's spectrographs, it's hard to tell if they confirm the default width of 3% or not, but I'd say it's not inconsistent. buzzy thought it was 0% width, but it's hard to tell from looking at that. Also as pdq said, there can be secondary effects which complicate the analysis when trying to see the difference between 1% or 2%.

Regarding the shrinkage of overall filesize by a marginal amount, it doesn't surprise me. The higher "freqs" in general are encoded at a much lower resolution (very quantized) compared to the middle and low freqs. In separate tests, I crudely estimated that a given high freq only uses 20% (one fifth) of the amount of data a "middle" freq (such as 400 Hz to 1 kHz) would use. This was for encoding at the highest quality settings, where the highs are relatively unquantized (at lower quality settings, the highs get sacrificed much more, their quantization algorithm changes and their quantization goes up by 2x, 4x, or more...even if you don't apply a lowpass).

Therefore if you lowpass all freqs above 16 kHz, you might only save 6 kHz / 22 kHz / 5 (extra quantization) = 5% of all your bits. This amount isn't insignificant, but it's not as huge as some people might expect. This is one reason why I personally encode everything with -k (no filters). I am only using a little bit of bits for the highs anyway, so I encode them whether I think I can hear them or not. But that's my own encoding preferences.

My calculation seems roughly consistent with buzzy's graph. As he increased his lowpass width by 20% of 16 kHz (about an extra 3 kHz) or so, he saved an extra 8% of bits (you start to save more and more the lower you go because the lower freqs are less quantized and take up more bits). If a lowpass width of 100% were accepted you'd be left with no sound and 0 kbps file, as long as the curve seems headed in that direction everything is fine. Hrmm, upon closer inspection that doesn't seem to be the case though, oh well! Curve should be curving the other way, maybe if you did more points it would start to curve that way.

Quote from: Rio on 2007-05-22 03:35:47

Whether the saved bits would provide extra quality encoding to the sub-16khz frequencies, I have yet to know

I dunno either. In theory a perfect VBR should output a 16 lowpassed file of the same level of sound quality as the non-lowpassed version, just smaller in size (and lowpassed). But VBR is not guaranteed to be truly perfect.

Either way though to me it's the same thing. If the file size is smaller and the quality is the same, isn't that good too? If you still want better quality then you can increase your V setting, therefore increasing the quality and maybe getting a file the same size as you used to (before applying lowpass).

Transparent encoding for older ears?

Reply #33 – 2007-05-22 13:07:48

Quote from: Porcupine on 2007-05-22 04:02:17

Therefore if you lowpass all freqs above 16 kHz, you might only save 6 kHz / 22 kHz / 5 (extra quantization) = 5% of all your bits. This amount isn't insignificant, but it's not as huge as some people might expect. This is one reason why I personally encode everything with -k (no filters). I am only using a little bit of bits for the highs anyway, so I encode them whether I think I can hear them or not. But that's my own encoding preferences.

Did you heard about sfb21 problems, bitrate bloat with highs, and what-not? Just to give you a quick point: encoding in the sfb21 (which starts at 16Khz precisely), can make *all* bands use more bitrate, not just this band.

Transparent encoding for older ears?

Reply #34 – 2007-05-22 23:14:07

I have heard about this issue but none of the descriptions I could ever find regarding it were intelligibly written. If you can point me to a reputable, well-written description of the issue I would be very appreciative.

> Just to give you a quick point: encoding in the sfb21 (which starts at 16Khz precisely), can make *all* bands use more bitrate, not just this band.

The way you've worded this, it doesn't mean anything to CBR. It could only affect VBR.

There's no way encoding more frequencies above 16 kHz could force a CBR file to suddenly use more bitrate. The bitrate is set with CBR. The only thing that the high frequencies could do is steal some bitrate from the frequencies below 16 kHz, which is of course what happens. According to my tests though (and buzzy's also, I think), the amount stolen is very small. High freqs use only about 1/5th of the bitrate an average middle frequency such as 1 kHz uses.

Regarding VBR, sure anything could happen. If what you say is true, then encoding frequencies above 16 kHz suddenly makes the VBR algorithm confused and the whole file (all frequency bands) bloats up in size and uses way more bitrate. Your file though should have improved sound quality because of this, too. So even then it's not a real problem, it's just the VBR intelligence of LAME being stupid (if what you say is true).

On the other hand, if you meant to say "wasting" bitrate, rather than "using" bitrate, that's a different story. A clear description of this "problem" or reputable reference would be appreciated. But even if LAME were suddenly to "waste" bits when told to encode high freqs, that might be a flaw in the LAME encoder, not a flaw in the mp3 codec itself.

Transparent encoding for older ears?

Reply #35 – 2007-05-23 01:29:00

This is not a LAME problem but rather a design flaw of MP3 itself. My understanding (which is very limited) is that since there is not a separate gain term for frequencies above 16 kHz (as there is for all other frequency regions), the only way to adjust the gain when trying to reproduce high frequencies is to adjust the global gain, which changes the gain for ALL frequencies, even those for which this will just be wasting space. The degree of waste is very dependent on the source material so don't expect to try one or two tracks and draw conclusions from that.

The alternative is to not change the global gain, which means that the high frequencies will not get the bits that they need but the rest of the bands will get the proper amounts. This is what the -Y switch does.

Transparent encoding for older ears?

Reply #36 – 2007-05-23 01:35:51

Quote from: Rio on 2007-05-22 03:35:47

I just did a simple test, encoding the same track with different settings:
-V2
-V2 --lowpass 16000
The first file yielded an average bitrate of 191, while the second file yielded 187, shaving off 4 kbps from the lowpass. Whether the saved bits would provide extra quality encoding to the sub-16khz frequencies, I have yet to know (I could assume it would, but is it audible at all?)

I'll come back and read the other comments more carefully - but before we get too far down this track, I think this is just the effects of a typo in how you entered it.

LAME is expecting a khz number, so if you literally used lowpass 16000, that's 16000 khz. (So, more pure LAME quirkiness that it even dropped 4 kbps.)

I did a quick test track and got

V 2 - 194 kbps
V 2 --lowpass 16 - 166 kbps

So a big difference, about 15%, which I would have expected given that the lowpass in the wiki is well over 16 khz - 18671 Hz - 19205 Hz for V 2

Transparent encoding for older ears?

Reply #37 – 2007-05-23 02:24:45

What version of LAME are you using, buzzy?

The different versions may want to take in different inputs for the --lowpass switch. I currently use LAME 3.95 (will switch to 3.92 soon) and my LAME wants lowpass values in Hz....such as --lowpass 16000. But I think it depends on version so I never said anything before.

Quote from: pdq on 2007-05-23 01:29:00

This is not a LAME problem but rather a design flaw of MP3 itself. My understanding (which is very limited) is that since there is not a separate gain term for frequencies above 16 kHz, the only way to adjust the gain when trying to reproduce high frequencies is to adjust the global gain

Thanks for the description, pdq. I'd already heard this exact same thing before as well. Again, to me this description is not in-depth and reputable enough to prove to me that this problem exists. And I've seen no evidence of any problems like this when looking at the MDCT quantization levels in the mp3s I've encoded.

First off, I might be inclined to think it foolish to have a gain value for each scalefactor band, plus one global gain value. This is redundant by one gain value. I see no need for a global gain value if there were a gain value for each scalefactor band. And if there is a global gain value, I would indeed take away a gain value for one scalefactor band (doesn't matter which one) so that I don't store redundant information in my encoded files.

If you wish to scale the data in SFB21, simply adjust the global gain value then re-adjust the 20 (or however many there are) remaining gain values for each scalefactor band as necessary. What is so hard about that? If you wish to prove that there is an issue, you must at the very least provide detailed information on the bit-depth of the scalefactors (are they 8-bit integers? 16-bit integers? 24-bit integer/floating-point? etc). You might also need to prove detailed information on other things as well, depending on how your argument goes. So far the argument given does not prove anything.

Transparent encoding for older ears?

Reply #38 – 2007-05-23 04:31:07

As I said, my understanding is very limted. About all that I can add is that the gain values are 8 bits and logarithmic, with a resolution of about 1.5 db per step.

Transparent encoding for older ears?

Reply #39 – 2007-05-23 06:59:46

I must say, I haven't really consulted the LAME --longhelp in applying the lowpass filter. I just used Porcupine's earlier setting. However, LAME 3.97 would still yield the same file, regardless of --lowpass 16 or 16000 (as confirmed by another simple encoding using such settings and as reported by EncSpot of lowpass filter 16000.) LAME --longhelp stated that the lowpass is in khz though (I couldn't blame the devs about the documentation, though.) "Hey, it's a free program! Why complain?"

It's just my track's idiosyncracy that LAME was just able to shave off 4kbps using the lowpass. At least you were able to save 15% space, a very significant savings.

At any rate, I think we are getting there to your encoding dilemma. I hope we were able to help you.

Cheers!

Transparent encoding for older ears?

Reply #40 – 2007-05-23 10:35:21

Quote from: Porcupine on 2007-05-23 02:24:45

What version of LAME are you using, buzzy?

The different versions may want to take in different inputs for the --lowpass switch. I currently use LAME 3.95 (will switch to 3.92 soon) and my LAME wants lowpass values in Hz....such as --lowpass 16000. But I think it depends on version so I never said anything before.

I'm using 3.97. So apparently the inputs could be different with different versions! Thanks for the heads up.

In any case, there have to be significant bitrate savings in shaving 2-3 khz off the lowpass vs. the presets.

Quote from: Porcupine on 2007-05-22 04:02:17

I just noticed the following...I'll use Rio's convenient encoding log since it's here:

Using polyphase lowpass filter, transition band: 15826 Hz - 16360 Hz

Like I said earlier, the default width of the lowpass filter is generally around 500 Hz.

I'm not sure the presets use the defaults, though. And is it clear that the transition band is the same thing as lowpass width?

Quote

What I never bothered to consider before is what percent is that of the lowpass frequency.

534 Hz / 16093 Hz = 3.3%, not 15%

So I guess maybe this solves the mystery. Yet another example of where the LAME documentation or descriptions are horribly out-of-date, but at least the encoding output seems to be telling the truth in this case.

Looking at buzzy's spectrographs, it's hard to tell if they confirm the default width of 3% or not, but I'd say it's not inconsistent. buzzy thought it was 0% width, but it's hard to tell from looking at that. Also as pdq said, there can be secondary effects which complicate the analysis when trying to see the difference between 1% or 2%.

There's definitely less width in the no width example than the 1 example, for what that's worth. Look at the section at the far right, for example. Having looked at a hundreds of these graphs over the years, that difference is meaningful. There's definitely a little less energy in the 1 than in the no width. So I don't know that the lowpass default uses 3%, either.

Just to be precise, though, no width meant no --lowpass-width setting was used. It isn't necessarily the case that there is 0 width.

Transparent encoding for older ears?

Reply #41 – 2007-05-23 14:00:34

Quote from: Porcupine on 2007-05-23 02:24:45

First off, I might be inclined to think it foolish to have a gain value for each scalefactor band, plus one global gain value. This is redundant by one gain value. I see no need for a global gain value if there were a gain value for each scalefactor band. And if there is a global gain value, I would indeed take away a gain value for one scalefactor band (doesn't matter which one) so that I don't store redundant information in my encoded files.

From : http://wiki.hydrogenaudio.org/index.php?title=Scale_factor

Quote

Thus they [scalefactor] implicitly modify the bit-allocation over frequency since higher spectral values usually need more bits to be coded afterwards.

Whenever a scalefactor band is amplified, it will force the next quantization to use more bits for that band. This will result in more bits used to encode the MDCT coefficients in that band, and thus less quantization error. That is why bands with audible distortion are amplified. However, it will also result in less bits for the unamplified bands.

From http://www.mp3dev.org/ (MP3 -> MP3 Limitations)

Quote

To increase sfb21 resolution, the global gain value has to be reduced. To balance this, scalefactors of other scalefactor bands can be reduced. But once they reach a value of 0, they can not be reduced anymore, meaning that an higher than needed resolution will locally be used in those bands, leading to an inflate of the bitrate.

You can think that these links are not reputable (that's your option), but I think they say clearly enough that you have to either use more bits, or have a worse quality. In fact, *you* even say it.
Take your option.

For the sake of completeness :

LAME -V2 usually leads to ~200kbps with usual material. With Metal Tracks is said to average 250kbps. That's a woooping 25%

Transparent encoding for older ears?

Reply #42 – 2007-05-24 05:00:30

The size ranges for LAME VBRs, for a given V setting and other parameters, also depend super-humongously on the LAME version you've used. I'm not saying your ranges are wrong, just a heads-up and cautionary warning, in case people notice discrepancies.

Also, just wanted to point out that the difference in size between difficult-to-encode metal tracks and "typical material" is not necessarily of any direct relation to the difference in size between lowpassed and non-lowpassed VBRs.

Quote

To increase sfb21 resolution, the global gain value has to be reduced. To balance this, scalefactors of other scalefactor bands can be reduced. But once they reach a value of 0, they can not be reduced anymore...

That link is laughably unreputable. It's not because of the site it is from, I am judging merely by what was written. Just read that. It is not even logical.

The global gain has to be *reduced*...then balanced out by *reducing* the scalefactors of the other bands as well? One of those is supposed to be *increased*, I dunno which one because this entire argument makes no sense. And a scalefactor reaching a value of 0? That means that there is no sound, that is stupid.

From the hydrogenaudio wiki (which has been proven to contain numerous errors in the past, but oh well) link you gave:

Quote

In Mpeg layer3 the global gain defines the largest stepsize to use. The scalefactors are used to reduce the stepsizes for the special needs of the scalefactor bands.

This sounds great to me, no arguments here. So why is this is a problem? This is exactly how things are supposed to be. The global gain, which in some sense corresponds to the SFB21 gain (which is "missing"), defines the largest stepsize to use. That is how it is supposed to be because SFB21 (the ultra-highs) should have the largest quantization of all the freqs. After that, the 20/21 remaining scalefactors are used to reduce the stepsizes for the other bands, it says. Yeah, that's how it should go, the lower freqs need a smaller scalefactor so that they can be encoded more carefully. Where is the problem with that? There's no flaw of the codec that I can see.

The second half of the hydrogenaudio wiki link you gave though, does not seem logically consistent with what was written in the top half I quoted. Also numerous things were written which are blatantly wrong, I think, but oh well. In any case, these arguments don't convince me, a more reputable source (or at the very least, one that is self-consistent and logically written) is needed.

EDIT: buzzy, hrm yeah I'm not sure about the 3%, 1%, vs no-width specified thing. I still think your graphs may be inconclusive in that subtle comparison (as pdq mentioned, secondary effects can complicate things). I'm not saying your conclusion is wrong. I just don't know...it's interesting, I hope you manage to get to the bottom of it.

Transparent encoding for older ears?

Reply #43 – 2007-05-24 08:07:20

The outcome from reading something does not only depend on whether it's written well and correctly but also on our own background for understanding things (technical stuff can rarely be written in a self-contained style).

As for the mp3 restrictions on scalefactor handling and potential resulting sfb21 bloat I personally don't know the details. But I take it as a fact - I consider the chance/risk relation for this hypothesis to be wrong is so bad that I wouldn't make any effort to prove it wrong. There is no use to me to learn about scalefactor details, and no reason to disbeleive in these commonly cited mp3 restrictions.

Transparent encoding for older ears?

Reply #44 – 2007-05-24 10:24:02

Ok here a simplified description of the sfb21 problem, for a more detailed explanation read the ISO docs or the LAME sources:

for long blocks:
a global gain, an 8 bit value: range 0-255
scalefactor bands 0-15, a 4 bit value: range 0-15
scalefactor bands 16-20, a 3 bit value: range 0-7
scalefactor band 21, 0 bits: range 0-0

oversimplified: the resulting step size for each scalefactor band:
stepsize = global gain - 210 - 2 * scalefactor[ i ]
as there is no scalefactor for sfb21:
stepsize[21] = global gain -210

now: let your psymodel say you need a smaller step size for sfb21, then you can see, every other scalefactor band will have to use a stepsize less or equal to the one of the sfb21.
As long as the demand for sfb21 is the largest one, there is not much bitrate bloating, bloating starts, when sfb21 demands a smaller stepsize than one of the other bands demands.

Btw, for short blocks there is a sfb12 problem like the one above.

This is a design flaw of MPEG Layer3, not LAME specific. It seems to me, the Inventors of Layer3 had only samples with a high frequency boost to play with.

Transparent encoding for older ears?

Reply #45 – 2007-05-24 22:18:32

Quote from: robert on 2007-05-24 10:24:02

bloating starts, when sfb21 demands a smaller stepsize than one of the other bands demands.

I see. Thank you VERY MUCH for such a clear and concise explanation of the issue. Now, the earlier quote from mp3dev.org makes sense too (before it made no sense to me because the definitions of the scalefactors weren't written). Again, thank you very much.

Quote

It seems to me, the Inventors of Layer3 had only samples with a high frequency boost to play with.

I see what you mean. But, maybe the developers of Layer 3 used such quirky definitions for the scalefactors and step sizes, to conserve header/sideinfo space for each frame. The scalefactor info appears to take 11 bytes to store, maybe 22 bytes for both channels (not sure), this is not insignificant compared to the total framesize of a "typical" 160 kbps mp3 (slightly over 511 bytes, I think).

It looks to me like the quirky definitions and bit-depths of the various scalefactors were carefully crafted by the designers of Layer 3 to try to provide maximum flexibility with a minimal amount of bytes needed to store the scalefactors. Besides the phenomena of bloating you just described, thanks to your definitions I see that other problems could possibly arise as well with those weird scalefactor definitions. If the high-freqs were super loud, then possibly the global gain could become too large, and scalefactor bands 0-20 might not have enough bit-depth to have the required small stepsize (the opposite problem of the "bloating", in a sense).

---------------

To the others/everyone, now I see that the problem of bitrate bloating is real. But is this a reason to choose not to encode any freqs above 16 kHz (via lowpass filter)? I often see that given as a reason, but I still don't know if I can agree with that. I think you may just need to do the best you can, given the limitations of the codec.

Note to buzzy: I'm not criticizing your desire to lowpass at 16 kHz, your reasons are different.

Transparent encoding for older ears?

Reply #46 – 2007-05-25 08:42:34

People who don't like the idea of giving away something from the orginal music may want to keep the highest frequencies to the utmost extent. These people usually are willing to pay the price and accept the need for higher bitrate. Everything's fine. That's what you do, Porcupine.

People who realize by listening tests that the musical content above 18 or 17 or 16 kHz (whatever individually is appropriate) is not of any real significance to them may prefer lowpassing as this is improving encoding efficiency (especially with mp3) and/or overall quality. With mp3 a rather low lowpass of 16 kHz or similar is most welcome with respect to the sfb21 issue of course in case that has no real impact on enjoying music (which seems to be the case for most people with regard to the last 128 kbps test where Lame 3.97 -V5 with its 16 kHz lowpassing came out great).

Transparent encoding for older ears?

Reply #47 – 2007-07-12 22:41:04

So, what I decided to do was to just use -V2 --lowpass 16, although whether it saves much space will depend on the kind of music you're encoding.

As a limited sample, I encoded several types of music with and without the lowpass switch, using both V1 and V2:

Code: [Select]

                  U2       JS      Iron     Norah
                           Bach   & Wine    Jones
Bitrates
     v1           218      212     220       190 
     v1-16        185      210     206       178 
     v2           188      189     190       163 
     v2-16        161      187     181       155 

Filesize comparison
     v1-16        85%      99%      94%      94%
     v2-16        85%      99%      95%      95%

The columns are the different CDs that were encoded:

U2 - Greatest Hits 1990-2000
JS Bach - Violin Concertos
Iron & Wine - Our Endless Numbered Days
Norah Jones - Not Too Late

The U2 saw the largest reduction (15%), the classical the smallest (1%). The two vocal & instrumental albums saw modest 5% reductions in bit rates. And of course the U2 was the one I was using for my initial trials, so it made the potential savings look larger than they really would be for a range of music.

Probably not worth fooling around with for most people, and it all just reminded me that getting into the innards of LAME is usually a waste of time, as well as turning up a lot of stuff you'd rather not know - about how only the core functions work well, and almost no functions are well documented.

I can see myself encoding it all to flac at some point, too, so I decided to go with V2 rather than V1.