Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Lame 3.99.5z, a functional extension (Read 54039 times) previous topic - next topic
0 Members and 2 Guests are viewing this topic.

Lame 3.99.5z, a functional extension

Reply #25
It simply forces the higher minimum frame sizes and raises the lowpass filter, forcing the existing algorithms to "work harder" [..]
[..] which includes restricting extremely high frequencies to 17.5 kHz.

This means it doesn't raise but lowers the --lowpass, compare to standard LAME.

So while we're asking questions about the working: does raising the lowpass a bit again (like --lowpass 18600), does that hurt the quality or does it only raise the bit rate.
(sorry, I know you spend a lot of effort in tuning to what you believe is optimal, but curious minds like to know).
In theory, there is no difference between theory and practice. In practice there is.

Lame 3.99.5z, a functional extension

Reply #26
...So while we're asking questions about the working: does raising the lowpass a bit again (like --lowpass 18600), does that hurt the quality or does it only raise the bit rate.

Depending on the actual situation for encoding a frame a higher lowpass frequency can raise the bitrate a little bit, or reduce the quality a little bit, but most of the time the difference will be next to nothing. Things won't change essentially.
I personally can't imagine that 17.5 kHz lowpassed music can be an audible issue to somebody, but if you feel like it's too low don't worry and use a higher lowpass.
lame3995o -Q1.7 --lowpass 17

Lame 3.99.5z, a functional extension

Reply #27
Halb, you mentioned that LAME 3.100 is in development right now - is Robert making that version available somewhere for testers?  Based on the changes you've mentioned in that version, I wouldn't mind using it instead of 3.99.5 for ripping my old CDs, even if it is still being tested.


Regarding the "CBR idea" - it sounds like Robert is indeed developing that very idea.  Let me try explaining it a different way to make sure.  Essentially, I'm looking for an encoder that always provides the maximum possible accuracy, given the 320kbps restriction.  Or, to put it a different way, a VBR whose accuracy requirement is continuously adjusted for each frame independently, until an accuracy requirement is found for each frame which fills every frame with 320kbps of data (or achieves 100% accuracy, whichever comes first).  Theoretically, this would provide the best quality possible in a standard MP3.
To give a simple example, let's say that 5 long frames in a row would require 270, 280, 270, 290, 300kbps to achieve the -V0 quality (say, 95% accuracy).  Reencoding those frames for 96% accuracy would require 290, 310, 290, 320, 320kbps.  The last two frames would need to stay at 96% accuracy as they have capped, but the first three could be increased further - say, to 97% or even 98%.  Short frames, the use of the bit reservoir, and dynamic adjustments to the lowpass/highpass filters on a frame-by-frame basis, would further complicate this.
Does that make sense?
I recognize that such an encoder would likely be very slow, but I would argue it'd be worth it.  I'd even propose a new name for this method - something like DBR (dynamic bit rate).

Lame 3.99.5z, a functional extension

Reply #28
What you describe is pretty much what 3.99.5z does when you use -V0+. I don't know about the details of what robert has in mind, I'm just looking forward to it. He gave me two test versions to do specific tests that's why I know a bit about 3.100 development, and treatment of tonal issues is really very promising.
lame3995o -Q1.7 --lowpass 17

Lame 3.99.5z, a functional extension

Reply #29
What you describe is pretty much what 3.99.5z does when you use -V0+.

Interesting.  I have one last question then.  Why not just set 310/450--the highest possible settings--as the minimum long/short frame sizes, and run in -V0+ mode?  Wouldn't this force the encoder to essentially always use 320 frames, instead of averaging 317 or so (which is what -V0+ is hitting for me right now)?  What would the encoder do if these settings required exceeding of the 320kbps overall limit?  And, would this ensure a superior rip versus CBR/ABR due to the Variable psychoacoustic model being used?

Lame 3.99.5z, a functional extension

Reply #30
Halb, you mentioned that LAME 3.100 is in development right now - is Robert making that version available somewhere for testers?  Based on the changes you've mentioned in that version, I wouldn't mind using it instead of 3.99.5 for ripping my old CDs, even if it is still being tested.

Ripping and encoding with a lossy encoder that is still under development is a bad idea, in my opinion. If you can't wait then rip to lossless and reencode with whichever version of the lossy encoder is available now, then re-reencode when the stable version is ready.

Lame 3.99.5z, a functional extension

Reply #31
@BFG:
The more you go to the limits the more often accuracy demands cannot be fulfilled due to lacking data space and due to the priority of keeping bit reservoir large for the sake of short blocks. In this respect minimum audio data bitrate of 290 kbps for long blocks is already pretty high (and overkill in 99.99..% of situations). What exact values to choose for -V0+ is more a matter of taste than a science. If you prefer higher values: go ahead but don't expect a noticeable different bahavior.
As a side note: don't think in terms of frame bitrate, audio data bitrate is what counts. For audio data there's no 320 kbps limit, but there's always a limit of available audio data space (which -Vn+ tries to maximize, but it can't work miracles depending on the encoding situation).
lame3995o -Q1.7 --lowpass 17

Lame 3.99.5z, a functional extension

Reply #32
@BFG:
The more you go to the limits the more often accuracy demands cannot be fulfilled due to lacking data space and due to the priority of keeping bit reservoir large for the sake of short blocks. In this respect minimum audio data bitrate of 290 kbps for long blocks is already pretty high (and overkill in 99.99..% of situations). What exact values to choose for -V0+ is more a matter of taste than a science. If you prefer higher values: go ahead but don't expect a noticeable different bahavior.
As a side note: don't think in terms of frame bitrate, audio data bitrate is what counts. For audio data there's no 320 kbps limit, but there's always a limit of available audio data space (which -Vn+ tries to maximize, but it can't work miracles depending on the encoding situation).

Thanks again!  I'll probably give the 310/450 limits a try, just to see what happens...but as you mentioned I doubt it'll change much, since even at 290/440 (i.e. -V0+ defaults) the encoder was probably hitting the limits most of the time anyway.
And I did have "frame bitrate" and "data bitrate" confused...I appreciate the explanation.

Lame 3.99.5z, a functional extension

Reply #33
Well, I'm testing some rather extreme settings tonight to see what I get.  The goal is to cram as much meaningful audio data as possible into the 320kbps audio stream, favoring short frames over long frames and lowrange/midrange frequencies over highrange, but not arbitrarily excluding any portion of the spectrum or filling any portion of the MP3 with filler data like would happen with a CBR320.  I'm not sure if I'm achieving that.

Settings used are: -v -q0 -V0+ --lowpass -1 --highpass -1 --adbr_short 450 --adbr_long 310 --replaygain-accurate
As you can imagine, encoding is SLOOOOOWWWWW...on my machine, about 2x speed versus 4x for standard -V0+ and 18x for standard -V0.

So, here's my questions:
(1) Would you agree that these settings allow for the most transparent MP3 encoding possible with the current LAME psychoacoustic model, without wasting any space?  Or, am I making a mistake by (for example) disabling the lowpass/highpass filters?
(2) In your opinion, is it worth reencoding one's entire library with these settings, or with -V0+, when it's already encoded at standard -V0?
(3) In your opinion, are these settings superior in any meaningful way to standard -V0+?

I'll be interested in the responses.  I suspect, in the end, that the only way to answer these will be for me to do some ABX tests.

Lame 3.99.5z, a functional extension

Reply #34
...I suspect, in the end, that the only way to answer these will be for me to do some ABX tests.

Absolutely. I very welcome this. As you can see from my signature, I personally prefer another solution.
lame3995o -Q1.7 --lowpass 17

Lame 3.99.5z, a functional extension

Reply #35
Absolutely. I very welcome this. As you can see from my signature, I personally prefer another solution.

Well, so far I've been unable to tell the difference between this approach and -V0, let alone between this and -V0+.  But then again, I've only got laptop speakers and a modest stereo to try it on right now.  Anyone else willing to ABX the settings I published two posts above?
(I need to find some more complex songs to throw at LAME, or a hifi system, apparently.)

Lame 3.99.5z, a functional extension

Reply #36
For the worst tonal and pre-echo problems I know: try lead-voice and eig provided in the zip file. A laptop usually is fine as long as it's not too noisy, but use headphones or earphones and turn up volume if the computer isn't totally quiet. You'll find a difference -V0+(eco) vs. -V0 for sure.
lame3995o -Q1.7 --lowpass 17

Lame 3.99.5z, a functional extension

Reply #37
Thanks for all the info on this, halb27 (not to mention providing the z variant itself).  After some further testing, I have settled on:
-v -q0 -V0+ --lowpass -1 --adbr_short 450 --adbr_long 110 --replaygain-accurate
This appears to use the standard -V0 approach on long frames (which I find to be satisfactory) but require maximum possible accuracy on short frames.  It's also quite a bit faster on my machine - about 9-10x - and only bumps the filesize about 8kbps above standard -V0.  This also seems to handle the samples you mentioned quite well (though I would appreciate a second opinion!)

One last question: how does -V0+ handle the minimum requirements for switch (i.e. long-->short or short-->long) frames?

Lame 3.99.5z, a functional extension

Reply #38
Granules of blocktype 'start' and 'stop' are essentially treated like long blocks, but I require them to use some extra bits.

I did a short test (I am about to leave home) with your setting for lead-voice and trumpet_myPrince, and yes, it sounded fine. I didn't expect that from --adbr_long 110.
I will investigate this further.
lame3995o -Q1.7 --lowpass 17

Lame 3.99.5z, a functional extension

Reply #39
I did a short test (I am about to leave home) with your setting for lead-voice and trumpet_myPrince, and yes, it sounded fine. I didn't expect that from --adbr_long 110.
I will investigate this further.

I'm glad to know I'm not the only one surprised by this!
In some ways, however, it makes sense; -V0 (in my opinion) does fine with long blocks; it's only the short blocks that cause trouble.  And, reducing the minimum frame size for long blocks leaves more room in the bit reservoir for high-quality short blocks.

As a future feature request, I'd ask that it be possible to specify the minimum size for start/stop (switch) blocks as well.

Lame 3.99.5z, a functional extension

Reply #40
A preliminary remark: you can drop the '-v -q0' without changing the output.

Though I knew lead-voice has a high percentage of short blocks I did not think that quality gets in when taking special care just of these when using -V0.

However it's not all there is. Take for instance herding_calls and listen to sec. 1...4.
There are only long blocks, and -V0 (as well as your setting) is not transparent. It takes some time however to hear the issue, and I need to find the right volume (not too quiet, not too loud). So it's best to listen to the problem at very low quality levels first to hear what the issue is. -V0+eco isn't transparent either, but -V0+eco --adbr_long 240 is (to me).
I agree though that also -V0 resp. your setting leads to a result which I'd call close to transparent. In the end it's a matter of taste what exact parameters to use.
At any rate I agree with you that the essential thing to improve upon -V0 is to take special care of short blocks.
lame3995o -Q1.7 --lowpass 17

Lame 3.99.5z, a functional extension

Reply #41
I was asked to provide more parameters. So I have been thinking about a new version for which there are more questions.
The extension has always been about avoiding inaccurately encoded frames, providing close to maximum possible audio data space, and minimum target bitrate for long and short blocks. The priority of these things have changed however since I started, and right now we see that at least for -V0+(eco) the minimum target requirements for long blocks have lost significance.

Default lowpass
The default lowpass of 17.5 kHz originates from the time when avoiding inaccurately encoded frames had priority 1. Questions about the lowpass have shown me that the default lowpass is not liked. And right: lowpassing is nothing that the functional extension has as a goal, so I'd like to omit it and fall back to Lame's orginal behavior in this respect. Any objections?

Additional parameters
The problem with additional parameters is that they require a certain understanding of the mechanism used. I had a hard time already deciding upon --adbr_long and --adbr_short though these don't require too deep an understanding. The problem is that harder stuff confuses people who don't want to dig that deep. As a compromise I can provide these options, but I don't describe them here on HA. Instead I provide an extra documentation file for these with the functional extension itself. Any objections?

More priority to short block behavior
We've seen that the most important thing about the functional extension is short block behavior. Long block behavior isn't that important. So I'd like to reflect that:
a) For -V3+ to -V1+ as well as for -V0+eco I'd like to use a minimum long block bitrate default of --adbr_long 160 which is the default of current -V3+.
b) For -V0+ and -V0+eco I'd like to strenghten the criteria for providing close to maximum possible audio data space. They are quite strong already now, but a bit more can be done. I'll try to find a compromise that doesn't have a serious impact on encoding speed, especially not for -V0+eco.
Any objections?

Streamlining -Vn+ parameter
There's not much use for -V1+ as bitrate is close to that of -V0+eco. Same goes for -V3+ the bitrate of which is close to that of -V2+.
So I'd like to drop -V3+ and -V1+, or, more generally speaking, the concept of providing a continuous quality scale of -V3+ to -V0+ which covers the bitrate range of ~200...320 kbps.
Instead the '+' in '-Vn+' should primarily address increased accuracy of short blocks for the provided -Vn+ compared to -Vn. As a consequence 3.99.5z's -V0+eco should be named -V0+. 3.99.5z's -V0+ additionally takes care of extreme quality for long blocks and should be named -V0++ because of this. Kind of falling back to a prior naming scheme, but consequent IMO. Any objections?
lame3995o -Q1.7 --lowpass 17

Lame 3.99.5z, a functional extension

Reply #42
Hi, halb27.

V3+ ends up with the same average bitrate as normal V1.5  on large number of samples.
I have tried V3+ on eig sample and I think it will be  interesting to make more detailed tests in some moment as there was a substantial improvement.

Code: [Select]
ABC/HR Version 1.1 beta 2, 18 June 2004
Testname:

1L = D:\Audio\Test samples\eig\LAME 3.99.5z V3.wav
2R = D:\Audio\Test samples\eig\LAME 3.99.5 V1.5.wav

---------------------------------------
General Comments:

---------------------------------------
1L File: D:\Audio\Test samples\eig\LAME 3.99.5z V3.wav
1L Rating: 4.0
1L Comment:
---------------------------------------
2R File: D:\Audio\Test samples\eig\LAME 3.99.5 V1.5.wav
2R Rating: 3.5
2R Comment:
---------------------------------------
ABX Results:
D:\Audio\Test samples\eig\LAME 3.99.5z V3.wav vs D:\Audio\Test samples\eig\LAME 3.99.5 V1.5.wav
    5 out of 5, pval = 0.031


Any chance for bitrates in range of -V5 ... -V2 in a future?

Lame 3.99.5z, a functional extension

Reply #43
Any chance for bitrates in range of -V5 ... -V2 in a future?


Hi halb27

I hope to get time for more testing maybe next month, and in the 3.99.5y variant that accidentally included -V5+ I did find it useful for ABXing, so if implementation isn't difficult I'd agree with IgorC's request (at least for testing, even if it requires an over-ride switch to allow access to that mode). If not, I might just keep the un-fixed y version for -V5+ tests.

I understand your rationale for limiting the -Vn range to the high-quality end, though I can see uses for lower quality -Vn+ for fixing individual problem tracks. I also understand IgorC's idea of comparing settings that have similar typical bitrates to see which is better.

I'm OK with having to force a lowpass if I want the old behaviour (though I guess the reduced lowpass you use currently could be part of 'eco' mode for any n in a more generic -Vn+eco and we could even add -Y to the commandline if desired when n <3)

As an aside for testing and diagnosis of problem samples, I presume that the old mp3x frame analyzer (see official lame site regarding tuning) that was once in lame (around 3.90.2 era) isn't simple to bundle around 3.99.5z as I haven't seen a compile of mp3x for years. I think the old one would at least show the output, just not in comparison to the input WAV, which I seem to recall being a feature (though it's a long time and memory is fallible)
Dynamic – the artist formerly known as DickD

Lame 3.99.5z, a functional extension

Reply #44
OK, I can provide -Vn+ for -V5+ to -V0+. With this I think it's best to stay with 3.99.5z's naming scheme -Vn+/-V0+eco.
So far I can see no severe suggestions against dropping 3.99.5z's specific lowpass, so as long as none comes up I'd like to drop it. We can get any individually prefered behavior by using switches like --lowpass, -Y, --adbr-xxxx.

@Dynamic: I don't know mp3x and don't want to do work with it. Did you try 3.99.5z's --frameAnalyzer option? What are you missing?
lame3995o -Q1.7 --lowpass 17

Lame 3.99.5z, a functional extension

Reply #45
I haven't tried --frameAnalysis but I was thinking more about digging into problem samples to try to detect certain problems like highly tonal signals during short blocks and coming up with a detection algorithm that could be rolled back into general LAME VBR codebase, which means that the original PCM, the MP3 decoder output and the FFT analysis spectrum would be useful pointers - and just the sort of things mp3x provided, IIRC.

I had the impression that mp3x was once very closely linked to LAME encoder up until around 3.90 time (especially before Dibrom's --alt-preset standard work when early investigations were tuning it from poor performance against the then best-in-class like FhG). But I suspect mp3x now is more than just a compile-time switch in the lame codebase, and would probably need an unjustifiable amount of work to get it running. Maybe if I find time to look into this, I'll set up a compiler, possibly under Linux until I can build a current standard LAME compile or the 3.99.5z version myself, then I can break out at short blocks and dump out the info I'm interested in to plot in LibreOffice Calc (or MS Excel) and try tweaking the code to trigger extra high bitrate only for problem samples and perhaps generate some samples artificially to ABX the correct threshold in the absence of other variables. I might even search for a very old mp3x version and see if provides enough info to germinate some ideas before I put in the effort of getting myself set up to compile the current LAME code frequently.

I'd be hopeful that it must be algorithmically determinable, and thereby of cracking a few of the problem samples like trumpet and Angels Fall First by specific analysis used only when short-blocks are active, and of doing so without increasing LAME VBR bitrate on general currently-transparent music. It clearly won't help herding calls, which only uses long blocks. Presumably that either needs a different short-block detection threshold (unlikely from the sound of it) or needs more bits in the long blocks for some frequency resolution reason that the psymodel is missing.

This work might be some time away, as I have too much going on in life for now, but I'm quite keen to give it a go in some downtime.
Dynamic – the artist formerly known as DickD

Lame 3.99.5z, a functional extension

Reply #46
You're welcome to do this work. As far as I am concerned your thoughts are nothing for me. Anything with a real impact on psychoacoustics is beyond my scope. My thing is introducing a certain amount of brute force safety into -Vn while deliberately (though also necessarily because of my restricted knowledge) staying pretty much on the surface of mp3 encoding. For this --frameAnalysis is quite appropriate for looking at the encoder's behavior IMO.
lame3995o -Q1.7 --lowpass 17

Lame 3.99.5z, a functional extension

Reply #47
I haven't tried --frameAnalysis but I was thinking more about digging into problem samples to try to detect certain problems like highly tonal signals during short blocks and coming up with a detection algorithm that could be rolled back into general LAME VBR codebase, which means that the original PCM, the MP3 decoder output and the FFT analysis spectrum would be useful pointers - and just the sort of things mp3x provided, IIRC.

I had the impression that mp3x was once very closely linked to LAME encoder up until around 3.90 time (especially before Dibrom's --alt-preset standard work when early investigations were tuning it from poor performance against the then best-in-class like FhG). But I suspect mp3x now is more than just a compile-time switch in the lame codebase, and would probably need an unjustifiable amount of work to get it running. Maybe if I find time to look into this, I'll set up a compiler, possibly under Linux until I can build a current standard LAME compile or the 3.99.5z version myself, then I can break out at short blocks and dump out the info I'm interested in to plot in LibreOffice Calc (or MS Excel) and try tweaking the code to trigger extra high bitrate only for problem samples and perhaps generate some samples artificially to ABX the correct threshold in the absence of other variables. I might even search for a very old mp3x version and see if provides enough info to germinate some ideas before I put in the effort of getting myself set up to compile the current LAME code frequently.

I'd be hopeful that it must be algorithmically determinable, and thereby of cracking a few of the problem samples like trumpet and Angels Fall First by specific analysis used only when short-blocks are active, and of doing so without increasing LAME VBR bitrate on general currently-transparent music. It clearly won't help herding calls, which only uses long blocks. Presumably that either needs a different short-block detection threshold (unlikely from the sound of it) or needs more bits in the long blocks for some frequency resolution reason that the psymodel is missing.

This work might be some time away, as I have too much going on in life for now, but I'm quite keen to give it a go in some downtime.


You should give current LAME 3.100 alpha 2--a cvs snapshot--a try. mp3x still does work, though you'll need to compile it yourself, or ask john33 to do it for you.

http://lame.cvs.sourceforge.net/viewvc/lame/lame/?view=tar

http://lame.cvs.sourceforge.net/viewvc/lam...l?revision=HEAD

Lame 3.99.5z, a functional extension

Reply #48
I was asked to provide more parameters. So I have been thinking about a new version for which there are more questions.

All of your suggestions for the next version make good sense to me, with one exception.
I would argue that users of the highest possible mode (-V0+ or -V0++, whatever you decide upon), are not concerned about encoding speed.  So, I would recommend that that setting be tuned to maximum possible quality, with no concern whatsoever for the encoding speed.  -V0+eco, meanwhile, could attempt to deliver nearly the same quality but at a much faster encoding speed and slightly smaller file size.

On a different topic -- do you find that -V0+ -adbr_long 160 creates a transparent "herding calls"?  I have tried it, and it does sound transparent to me.  But I'd like a second opinion.  So, right now I'm using -V0+ -q0 -adbr_long 160 -adbr_short 450.

Thirdly, I did request earlier an -adbr_switch option.  I would like to set the minimum size of these just like you allow us to do with _long and _short.  But, it may be best to let the minimum size of switch frames be calculated on the basis of long and short minima?


Finally, Robert--I'm curious if you plan to make any further adjustments to 3.100 based on the discussion in this thread?  I see from the prerelease notes that you've made adjustments to handle "lead voice", which is great.  I'd also LOVE to see:
  • further adjustments to the psychoacoustic model to better handle "herding calls" and a few other tough samples in VBR
  • a higher accuracy requirement for short and switch frames, at least in the higher VBRs
  • a -Vmax (or whatever you'd like to call it) switch that packs the most accuracy possible into a 320kbps constant audio rate, using the VBR model.  That is, have LAME continually search for a "best fit" solution until it always uses up all available space for every frame, taking into account bit reservoir needs etc.  If I understand LAME correctly, this would be different from CBR 320 in that CBR can often "waste" space that was unneeded.  In fact, this might be a good way to replace the CBR model that some people have complained about.
  • a freeformat VBR (ok, you're right, silly idea  )

Lame 3.99.5z, a functional extension

Reply #49
Yes, I will take care of best quality for -V0+, and encoding speed is of very minor concern here.
Also, I will provide all the audio data bitrate parameters including adjustments for the start resp. stop blocktype.
As 3.99.5z -V0+eco herding_call isn't transparent to me -V0+ --adbr_long 160 won't be either.
But you should keep things in relation. Were you able to ABX herding_calls? Sounds like: no. And to me too it's close to transparent, also with standard Lame's -V0. Though quality improvement is always welcome we can't expect transparency for any track and any listener. Being close to transparency in a rather universal way is the best we can expect (besides transparency for most tracks).
lead-voice is another story. Lame's VBR quality is inadequately bad here at the moment, and major improvement is very welcome. Any other tonal problem sample I know is very good to perfect when encoded with current Lame at -V0.
lame3995o -Q1.7 --lowpass 17