Skip to main content
Topic: LAME 3.96 FINAL vs. 3.90.3 Test (Read 122725 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #50
In drone_short, the use of alt-preset extreme greatly improves this sample in both encoders. At first glance, it seems 3.90.3 sounds slightly better; I'll try to do some ABX testing shortly. An interesting thing to note is that 3.96 alt-preset extreme is still smaller than 3.90.3 alt-preset standard, at 204kbps and 214kbps, respectively.

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #51
Is this size thing, 3.96 -pe being smaller than 390.3 -aps a common thing?  If that is the case, perhaps we should be looking at 3.96 -pe as the new standard.

A fair comparison between the two versions of lame does not require the same preset setting, it certainly is not being looked at that way for the 128 kbit tests.

If 3.96 -pe produces files that are constantly larger than 3.90.3 -aps, why not try 3.96 -V 1 to get the bit rates equal to 3.90.3 -aps?

I am a bit concerned that a judgment will be reached that 3.90.3 beats 3.96 by a nose, when the difference turns out to be due to a higher bit rate on the average, and not just on a few difficult samples.

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #52
RESULTS

Tested settings:
Lame 3.90.2 --alt-preset 128 --scale 1
Lame 3.96 --preset 128 --scale 1
Lame 3.96 -V 5
Decoding: Foobar2000 dithering with strong ATH noise shape and DSP Advanced limiter.

With Atrain there are a lot of problems for all encodings: muffled cymbals, disorted trumpet, and... warbling background noise. 3.90.2 is better overall while 3.96 -V5 seems to suffer much than the others of warbling background.
3.90.2 : 3.1
3.96abr: 2.0
3.96_V5: 2.4

BackS1007 is very interesting because HF issues i have noticed are evident. This sample is another proof that 3.90.2 doesn't have this problem. Moreover again,  (Atrain) VBR is the worst with such artifact: there are even impulsive HF noise (clicks at beginning). VBR doesn't perform well with low volume samples like this.
3.90.2 : 4.2
3.96abr: 3.2
3.96_V5: 2.0

BeautySlept: although easy to abx against original, i can't differentiate bewteen different encodings. Equal rating for all.
3.90.2 : 3.0
3.96abr: 3.0
3.96_V5: 3.0

Flooressence: preecho is a bad problem here. There is a bad regression with 3.96abr. Obviously VBR handle preecho better but 3.90.2 is good.
3.90.2 : 2.7
3.96abr: 1.2
3.96_V5: 2.5

ABC/HR and ABX results here: http://xoomer.virgilio.it/fofobella/396_2.zip
WavPack 4.3 -mfx5
LAME 3.97 -V5 --vbr-new --athaa-sensitivity 1

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #53
Quick questions about ratings before I post results:  ABX is fairly easy most of the time with any mp3 vs. wav, but when you ABX mp3 vs mp3 how do you determine your numbers?  I don't mean the actual number like 2-3-4, I mean: 

What is most important to developers when assigning a number?

Do you give lowest numbers to (pardon my non-tech terms since I don't know any) pre-echo, warbling, stereo image, tone distortion, etc?  For example - sample A with heavy concert percussion handles pre-echo poorly but the brass section sounds accurate, but B gets the attacks however the brass sound tinny - which is most important to give consideration to?  Because, what bothers me may not be of interest to the developers or it may be more important to know when I give a rating, so:

What issues should be marked down more than others when rating samples?

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #54
Forther to on file sizes:

These are the results for just one album, Solitude Standing (Suzanne Vega)
3.96 -ps    55.3 MB
3.96 -pe    70.3 MB
3.96 -V 1  62.7 MB
390.3 -aps 59.9 MB

What I suspect is that give enough tracks, 3.96 -pe is going to be too big to be comparable to either lame version at -ps or -aps, but output file sizes for 3.96 -V 1 are closer to 3.90.3 -aps than 3.90.3 -aps is to 3.96 -ps, suggesting that 3.96 -V 1 may be a fair comparison to 3.90.3 -aps.

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #55
Quote
Is this size thing, 3.96 -pe being smaller than 390.3 -aps a common thing?  If that is the case, perhaps we should be looking at 3.96 -pe as the new standard.

A fair comparison between the two versions of lame does not require the same preset setting, it certainly is not being looked at that way for the 128 kbit tests.

If 3.96 -pe produces files that are constantly larger than 3.90.3 -aps, why not try 3.96 -V 1 to get the bit rates equal to 3.90.3 -aps?

I am a bit concerned that a judgment will be reached that 3.90.3 beats 3.96 by a nose, when the difference turns out to be due to a higher bit rate on the average, and not just on a few difficult samples.

It seems like focus is on testing ~128kbps settings right now so it would be good timing to do some mass-encoding test to find out... IMO it should include 3.90.3 --alt-preset standard & extreme and 3.96 final -V 2, -V 1 and -V 0.

BTW: Among the encodings I've done for testing I've already found some tracks/CDs where 3.96 --preset standard has a higher bitrate then 3.90.3 --alt-preset standard (up to 15kbps difference) - mainly classical and similar stuff (slow, accoustic instruments).

I've performed a test with 3.96 beta 1 for -V 2 (standard) ... -V 6. Results:


Since the minimum bitrate of -V 2 (standard) has been increased to 128kbps, the bitrates for 3.96 should be somewhat higher.
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #56
Quote
What issues should be marked down more than others when rating samples?

Whichever you find more annoying. We want to hear your personal subjective opinion in this case. The test procedure is to be done blindly to eliminate bias or placebo. It's just like that blindfolded lady Justitia.

[span style='font-size:8pt;line-height:100%']lol[/span]

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #57
Quote
What issues should be marked down more than others when rating samples?

whichever sample sounds more annoying to you overall should be given a lower score.  That's necessarily subjective; there's no way around it.  As for what would help the developers most is detailed descriptions of what you find wrong with the samples.

ff123

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #58
Quote
3.96° --p 128 > 3.90.3 --ap 128 :: fall :: [proxima] :: 0x verified so far + missing ABX results*

---------------------------------------------------

3.96° --p 128 < 3.90.3 --ap 128 :: fall :: [proxima] :: 0x verified so far + missing ABX results*


???

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #59
Quote
Quote
3.96° --p 128 > 3.90.3 --ap 128 :: fall :: [proxima] :: 0x verified so far + missing ABX results*
---------------------------------------------------
3.96° --p 128 < 3.90.3 --ap 128 :: fall :: [proxima] :: 0x verified so far + missing ABX results*

???

Thanks. Fixed (there was another mistake with [proxima]'s ct_reference sample as well).
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #60
RESULTS

Drone_short, 0-1.2 sec

3.96pe vs. Original - 8/8
Sounds pretty good

3.90.3ape vs. Original - 8/8
Slightly more "rushing air" artifacts

3.96pe vs. 3.90.3ape - 8/8
3.96 wins.

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #61
Though I think 3.96 beat 3.90.3 overall in both ps and pe, 3.96 at both settings suffers from a slight blip artifact at about .4-.5 seconds into the song (I'll admit calling drone_short a song is a bit of a stretch). In ps, this blip is fairly pronounced and  appears in the center (in stereo terms). In pe, this blip is more muted and seems to exist in the far left channel. 3.90.3 and of course the original do not contain this extra  blip. It sort of sounds like a soft frog sound or fart or something.

EDIT: Here's something weird. At -V 1, this artifact is more muted than even pe. Although I think it may still exist, I cannot localize it (in stereo terms) like in ps and pe. If it exists it blends in much more.

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #62
RESULTS

Verifying fatboy with alt-preset standard.

3.90.3 vs. original - 8/8
Harshness of background static more pronounced--sound like additional noise is introduced.

3.96 vs. original - unable to ABX

3.96 vs. 3.90.3 - 8/8

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #63
I've repeated the bitrate test with 3.96 final and -V 2 - -V 0. Results in the attatched image. (Note that there's been a mistake in the last table I posted with the "Enjoy Baroque" sample (too low bitrate) which is corrected in the new table - the effect on the result is small enough not to worry about IMO.)

The difference for -V 2 (=preset standard) between 3.96b1 and 3.96 final is very small, in the 0.1% range.

My result suggests that 3.96 -V 2 comes closest to 3.90.3 --alt-preset standard bitrate-wise, but with another set of CDs tested (especially focussing on styles where 3.90.3 bitrate tends to be bloated), -V 1 might come closer. I've tried to cover a as broad range of styles as possible, but maybe the focus is too much on music with accoustic instruments...
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #64
I'm not sure if this will be qualified as valid, because it's not aps against ps, but I will post it anyway.

Quote
Testname: MyFunnyValentine - 3.90.3aps vs 3.96 pfs

1L = C:\My Music\lab\MYF-4SEC\aps3903.mp3.wav
2R = C:\My Music\lab\MYF-4SEC\pfs396.mp3.wav

---------------------------------------
General Comments:

---------------------------------------
1L File: C:\My Music\lab\MYF-4SEC\aps3903.mp3.wav
1L Rating: 3.5
1L Comment: 1st blow is more distorted.
---------------------------------------
2R File: C:\My Music\lab\MYF-4SEC\pfs396.mp3.wav
2R Rating: 3.8
2R Comment: 1st blow is a little cleaner. A little harder to ABX.
---------------------------------------
ABX Results:
Original vs C:\My Music\lab\MYF-4SEC\aps3903.mp3.wav
    13 out of 14, pval < 0.001
Original vs C:\My Music\lab\MYF-4SEC\pfs396.mp3.wav
    13 out of 14, pval < 0.001
C:\My Music\lab\MYF-4SEC\aps3903.mp3.wav vs C:\My Music\lab\MYF-4SEC\pfs396.mp3.wav
    12 out of 16, pval = 0.038


Please note that I compared 3.90.3 APS vs 3.96 final preset FAST standard. The reason is that for this sample (MYF_4SEC.wav), 3.96 final produced much better results in fast standard preset than standard preset.  And it turned out that it's either a close tie or it beat 3.90.3 slightly. I felt that 3.96pfs was harder to ABX than 3.90.3aps. The bitrate was both 138kbps.

I am starting to wonder if 3.96PFS is better tuned for some samples than 3.96PS, and if we test 3.96PFS against 3.90.3APS where 3.96PS lost in the past, the results might be interesting/surprising...

PS I also tested 3.96b1 PFS vs 3.90.3APS. Although 3.90.3APS won because of 3.96b1's minimum bitrate being 96kbps, it was pretty close (3 for 3.96b1 vs 3.5 for 3.90.3 in ABC/HR).

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #65
Not very interesting, but...

fatboy_30sec.wav (uploaded the other week by me - see uploads forum).
3.90.3 --alt-preset standard
3.96 --preset standard


All 3 ABXes were 8/8 (using foobar)

The 3.96 encoding is better. There's an artefact (usual problem - the sound of paper being rubbed together) in 3.96 around 10 seconds, but there are more artefacts in more places in the 3.90.3 encoder, so 3.96 wins.

I'll say it again - if you're tuning with fatboy, use the longer sample!

Cheers,
David.

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #66
More interesting...

awe32_20sec.wav (I'll upload it is anyone wants - it's an old favourite!)
3.90.3 --alt-preset standard
3.96 --preset standard


All 3 ABXes were 8/8 (using foobar)

The 3.96 encoding is much much better. The 3.90.3 encoding is covered with artefacts - funny swishes and whistles over the notes, and hissing over the drum beat. These are mainly gone in 3.96, but there is a new(? - probably hidden before!) harsh sounding artefact with the hi-hat sound in the latter part of the sample. It just sounds like extra noise.

So, 3.96 much better, but certainly not transparent with this artefact.

Cheers,
David.

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #67
I think 3.96 with -V 5 quality is good and this is a great news for me but i find HF problems i previously described particularly annoying. Samples i've tested with VBR high frequencies problems are: Atrain, LisztBMinor, Blackwater, BachS1007. I'm trying to experiment a bit with settings trying to reduce this problem: with -V5 --athaa-sensitivity 1 all the four samples are better regarding ringing and warbling background while bitrate increase is small.

I've noticed a certain corrispondence between tigre's results and mine, i hope he or someone else can confirm my impression.
WavPack 4.3 -mfx5
LAME 3.97 -V5 --vbr-new --athaa-sensitivity 1

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #68
With proxima's ratings from Apr 17, using the 12 samples again produces no significant results (-V5 was downgraded), when averaged between tigre, proxima and myself.

ff123

Edit:  mean scores are
3.90.3(2)    3.64
3.96p128    3.59
3.96V5      3.85

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #69
I just tested and uploaded two more samples provided by guruboolez in the past: it seems that --athaa-sensitivity 1 makes things better for ringing/high pitched noise with -V 5. Both samples should be very easy, with Bayle sample ringing artifatcs are very obvious and annoying.

Quote
ABC/HR Version 0.9b, 30 August 2002
Testname: Bayle

1L = C:\LAME_GUI\Bayle - Jeita or Waters' Murmur - Etching_V5.wav
2L = C:\LAME_GUI\Bayle - Jeita or Waters' Murmur - Etching_athaa.wav

---------------------------------------
General Comments:
Focising on 0:00 - 0:04 range.
---------------------------------------
1L File: C:\LAME_GUI\Bayle - Jeita or Waters' Murmur - Etching_V5.wav
1L Rating: 2.0
1L Comment: ringing, fluctuant HF noise
---------------------------------------
2L File: C:\LAME_GUI\Bayle - Jeita or Waters' Murmur - Etching_athaa.wav
2L Rating: 3.0
2L Comment: ringing, fluctuant HF noise is still present but reduced and less disturbing.
---------------------------------------
ABX Results:
Original vs C:\LAME_GUI\Bayle - Jeita or Waters' Murmur - Etching_V5.wav
    10 out of 10, pval < 0.001
Original vs C:\LAME_GUI\Bayle - Jeita or Waters' Murmur - Etching_athaa.wav
    10 out of 10, pval < 0.001
C:\LAME_GUI\Bayle - Jeita or Waters' Murmur - Etching_V5.wav vs C:\LAME_GUI\Bayle - Jeita or Waters' Murmur - Etching_athaa.wav
    11 out of 12, pval = 0.003

Quote
ABC/HR Version 0.9b, 30 August 2002
Testname: Avison

1L = C:\LAME_GUI\Track01 (Avison - Concerto Grosso 30 sec)_V5.wav
2R = C:\LAME_GUI\Track01 (Avison - Concerto Grosso 30 sec)_athaa.wav

---------------------------------------
General Comments:
ringing + chirping (high pitched notes)
---------------------------------------
1L File: C:\LAME_GUI\Track01 (Avison - Concerto Grosso 30 sec)_V5.wav
1L Rating: 2.5
1L Comment: ringing with crackles at ~ 2 sec. and after
chirping (high pitched notes) starting from 22 sec.
---------------------------------------
2R File: C:\LAME_GUI\Track01 (Avison - Concerto Grosso 30 sec)_athaa.wav
2R Rating: 3.5
2R Comment: ringing is still audible but reduced
no chirping at all
all HF artifacts are reduced
---------------------------------------
ABX Results:
Original vs C:\LAME_GUI\Track01 (Avison - Concerto Grosso 30 sec)_V5.wav
    10 out of 10, pval < 0.001
Original vs C:\LAME_GUI\Track01 (Avison - Concerto Grosso 30 sec)_athaa.wav
    10 out of 10, pval < 0.001
C:\LAME_GUI\Track01 (Avison - Concerto Grosso 30 sec)_V5.wav vs C:\LAME_GUI\Track01 (Avison - Concerto Grosso 30 sec)_athaa.wav
    10 out of 10, pval < 0.001

These two samples are not isolated cases, i choose them only because i can hear obvious problems with plain -V 5. According to my tastes -V5 with --athaa-sensitivity 1 is better with Atrain, LisztBMinor, Blackwater, BachS1007, Bayle, Avison samples.
WavPack 4.3 -mfx5
LAME 3.97 -V5 --vbr-new --athaa-sensitivity 1

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #70
New results edited in.

[proxima], thanks for your attempt to improve -V 5. I think too that we both hear the same high frequency problems, but from some results it seems to me that either I can't hear them as good as you can or I listen to samples focused on trying to find them somewhat more than I do...

I haven't tested yet and probably I won't have enough time for testing before rjamorim's multiformat test will start, but no matter if someone can verify your results: Do you have some numbers about how much adding --athaa-sensitivity 1 increases bitrate - maybe it's so much that we would have to use -V 6 with it instead of -V 5 to get 128kbps on average... Besides this, testing for regressions is necessary, i.e choosing samples without obvious hf ringing problem to find out what happens with other problems.
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #71
Another result...

vangelis1.wav
3.90.3 --alt-preset standard
3.96 --preset standard


3.90.3 vs orig: 7/8
3.96 vs orig: failed
3.96 vs 3.90.3: 8/8

3.96 is slightly better.

On 3.90.2 I previously heard a softening of the onset of the notes. I thought this was solved with 3.90.3, but never ABXed (not before, not today - didn't even try). However, listening today I found that there's a slight noise added behind the snare(?) hit at ~3 seconds on 3.90.3, and it's improved (solved?) with 3.96.

So, another good result for 3.96.

In case anyone has missed it, what I’m doing is going through 3.90.3 problem samples, seeing if 3.96 is an improvement. In the three cases so far, it is. That's cool! To balance it, we'll have to try to find some new problem samples for 3.96, but I guess that will take some time.

Cheers,
David.

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #72
Maybe someone else can check guruboolez's sample Fugue (premières notes).wav with (a)ps?

I think 3.96 may be slightly worse - it's certainly not better, but I can't ABX 3.96 against 3.90.3, so they're probably equally bad (both 8/8 ABX against the original - the last note is especially noisy).

It would be cool if future lame development could solve the "harpsichord problem".

Cheers,
David.

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #73
I just got the sample which triggered the gapless-playback issues yesterday. However, for some strange reason, when i encoded it with 3.90.3 and 3.96, the trackchange was just perfect. I DID have multiple problems with this album and 3.96 with gapless playback in the past, but either i did something wrong before (however, i cannot think of what that could be, since i just did the same as usual - the only difference is that with the previous tests, the mp3 was encoded via EAC, while this time, i encoded it from FLAC via foobar). Another explanation could be that the problem appears randomly and isn't reproducable. I currently dont have the time to investigate this more indeep, so i would like to apologise for blowing up unnecessary steam and causing false-alert - i'm sorry.

- Lyx
I am arrogant and I can afford it because I deliver.

LAME 3.96 FINAL vs. 3.90.3 Test

Reply #74
Quote
--------- 3.96 --p fast standard vs. 3.90.3 --ap fast standard
3.96 --p fast standard > 3.90.3 --ap fast standard :: myf_4sec :: LoFiYo :: 0x verified so far

Please change that to "3.96 --preset fast standard > 3.90.3 --alt-preset standard". Thanks 

Please read my post for detail.

 
SimplePortal 1.0.0 RC1 © 2008-2019