Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: lame 3.97 alpha 6 testing thread (Read 56242 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

lame 3.97 alpha 6 testing thread

Compared to 3.97a5, cbr/abr modes are adjusted. Vbr modes are not yet changed compared to pevious alpha.

Results from 3.97a5:
http://www.hydrogenaudio.org/forums/index....showtopic=30547

edit: the former X10 mode is now remapped to X9 and used by default on abr/cbr.

lame 3.97 alpha 6 testing thread

Reply #1
In case you haven't found it yet, alpha 6 is at Rarewares.

lame 3.97 alpha 6 testing thread

Reply #2
Actually alpha 5 not 6:
Quote
ABC/HR Version 1.0, 6 May 2004
Testname:

1L = C:\Temp\lame 3.90.3\Waiting.wav
2L = C:\Temp\lame 3.97 alpha 6\Waiting.wav

---------------------------------------
General Comments:
Sample 2 has less artifacts & sounds more vivid
---------------------------------------
ABX Results:
Original vs C:\Temp\lame 3.90.3\Waiting.wav
    16 out of 16, pval < 0.001
Original vs C:\Temp\lame 3.97 alpha 6\Waiting.wav
    16 out of 16, pval < 0.001


Added:
Quote
ABC/HR Version 1.0, 6 May 2004

1L = C:\Temp\lame 3.97 alpha 5\Waiting.wav
2L = C:\Temp\lame 3.90.3\Waiting.wav

---------------------------------------
General Comments:
---------------------------------------
ABX Results:
Original vs C:\Temp\lame 3.97 alpha 5\Waiting.wav
    14 out of 16, pval = 0.002
Original vs C:\Temp\lame 3.90.3\Waiting.wav
    16 out of 16, pval < 0.001


Quote
ABC/HR Version 1.0, 6 May 2004
Testname:

1L = C:\Temp\lame 4.0 alpha 12\Waiting.wav
2L = C:\Temp\lame 3.97 alpha 6\Waiting.wav

---------------------------------------
General Comments:
Sample 1 has less artifacts
---------------------------------------
ABX Results:
Original vs C:\Temp\lame 4.0 alpha 12\Waiting.wav
    11 out of 16, pval = 0.105
Original vs C:\Temp\lame 3.97 alpha 6\Waiting.wav
    16 out of 16, pval < 0.001

lame 4.0a is the best on this sample.

lame 3.97 alpha 6 testing thread

Reply #3
For anyone who has downloaded the alpha 6 from Rarewares, please repeat the d/l as I had renamed the file but not updated the content!!!! 

Apologies to all, but the file is now OK.

lame 3.97 alpha 6 testing thread

Reply #4
Quote
For anyone who has downloaded the alpha 6 from Rarewares, please repeat the d/l as I had renamed the file but not updated the content!!!! 

Apologies to all, but the file is now OK.
[a href="index.php?act=findpost&pid=265280"][{POST_SNAPBACK}][/a]

Well the above test was done using alpha 5 then.

lame 3.97 alpha 6 testing thread

Reply #5
Nice!  The results of 3.97a testing look promising!

Can you comment on tuning done to vbr-new that it seems better compared to vbr-default?

Also, I know its late in the game, but I'd like to know if -V 5's target rate could be increased from 130 to 132 or 136???  The reason I ask is that I sometimes get results quite a bit lower than 128 and I guess a higher target rate should give me a higher percentage of rips above 128?

Also we we have to add  --athaa-sensitivity 1 or is that in the new V tunings or did I miss something? 

Anything new incorporated into the V tunings compared to 3.96.1?

lame 3.97 alpha 6 testing thread

Reply #6
This is my hunch on vbr-new.

As guruboolez mentioned, vbr-new can produce files with much higher bitrates than vbr-old (though it can be smaller as well). It seems that vbr-new uses scalefactors at the bare minimum, just view with encspot. This will lead to bitrate bloat (like the whole VBR sfb21/-Y issue) in some cases.

Now recall the whole -Z debacle from the past. People were claiming better results from "-Z 1" and this profile uses scalefactors less liberally than the old default "-Z 2". So maybe because vbr-new is even less scalefactor dependent it does better on some test tracks than vbr-old...which would suggest a problem with the way LAME uses scalefactors.

But this is just a guess.

lame 3.97 alpha 6 testing thread

Reply #7
Quote
ABC/HR Version 1.0, 6 May 2004
Testname:

1R = C:\Temp\lame 3.97 alpha 6\velvet.wav
2R = C:\Temp\lame 4.0 alpha 12\velvet.wav

---------------------------------------
General Comments:
Sample 2 has less artifacts & sounds more vivid
---------------------------------------
ABX Results:
Original vs C:\Temp\lame 3.97 alpha 6\velvet.wav
    30 out of 30, pval < 0.001
Original vs C:\Temp\lame 4.0 alpha 12\velvet.wav
    26 out of 30, pval < 0.001

That's now 2 samples that 4.0a12 peforms better than 2.97a6.

All tests where performed using --preset 128.

Sorry if I’m drifting off topic.

 

lame 3.97 alpha 6 testing thread

Reply #8
I have a sample where 3.97a6 generates a blip noise at the beginning, but 3.97a5 and earlier do not. I did not specify any options, so in all cases, the setting was CBR128kbps. Download it here.

lame 3.97 alpha 6 testing thread

Reply #9
Quote
I have a sample where 3.97a6 generates a blip noise at the beginning, but 3.97a5 and earlier do not. I did not specify any options, so in all cases, the setting was CBR128kbps. Download it here.
[a href="index.php?act=findpost&pid=265564"][{POST_SNAPBACK}][/a]

The above seems to occur when the qval (-q parameter) is 2 or 3 in the CBR mode at 112 up to 160 kbps.

edit: spelling

lame 3.97 alpha 6 testing thread

Reply #10
LAME CBR 128 kbps ringing test

This test includes 10 known problem sample for ringing (included the last three by guruboleez) so the overall score is rather low and may be not rapresentative for generic samples. Moreover there are two alpha version tested and results should be considered only to provide a feedback for LAME developers. In all cases 3.97a6 performs better than 3.97a5, lower ATH maybe is the right way to go. Unfortunately the old 3.90.3 is still superior in the vast majority of cases. Samples and ABC/HR logs available.

Settings: --alt-preset cbr 128 --scale 1
Decoding: Foobar 2000 with DSP advanced limiter and no dithering.
Equipment: Creative SB128, Sennheiser HD497, ABC/HR 1.1 beta 2
Mood/Condition: Good attention and quite room, all results ABXed.

Here are the scores:
WavPack 4.3 -mfx5
LAME 3.97 -V5 --vbr-new --athaa-sensitivity 1

lame 3.97 alpha 6 testing thread

Reply #11
[span style='font-size:14pt;line-height:100%']TEST #1: 128 kbps ABR: ringing and --athlower switch[/b][/span]


I’ve began the 3.97a6 testing phase by playing with the --athlower switch. The main purpose was to find a good value which removes most of ringing issues I could hear with lame. For this introductive phase, I’ve selected four critical samples. I insist: these samples are not representative of ‘general music’. Last and not least, the listening conditions are also exceptional: high listening volume, in order to easily hear the ringing problems.

I’ve compared the following settings:
• lame 3.90.3 --alt-preset 128
• lame 3.97a6 --preset 128
• lame 3.97a6 --preset 128 --athlower 5
• lame 3.97a6 --preset 128 --athlower 10

Samples are available here.

RESULTS


Code: [Select]
                          3.97a5        3.97a6       3.97a6         3.97a6
                     abr128 X10,10     abr128    abr128 ATHlw5  abr128 ATHlw10

Die Lotosblume            2.0           3.0          3.5            4.5
Lamento della ninfa       1.5           3.0          3.5            4.0
LisztBMinor               1.0           2.0          2.5            4.0
Quis non posset           1.0           1.5          2.0            4.5
                Means   1.38          2.38         2.88           4.25

click here for log files

COMMENTS:[/b]


• there’s always a real difference between alpha 5 and alpha 6: lower ringing with the newest alpha, but still annoying with this kind of samples at high listening volume.
• --athlower 5 leads to slightly reduce the amount of audible ringing. Good point, but still annoying artifacts.
• --athlower 10 conduces most often to a big quality jump in my opinion. Files are always nearly transparent, at least with high but not totally crazy output listening level.


STATISTICS:[/b]

If some people would play with statistic tool, just copy and paste the following table:
Code: [Select]
[CODE]
3.97a5    a6deft  a6ATH5    a6ATH10
2.0    3.0    3.5    4.5
1.5    3.0    3.5    4.0
1.0    2.0    2.5    4.0
1.0    1.5    2.0    4.5


• ANOVA ANALYSIS
a6ATH10 is better than a6ATH5, a6deft, 3.97a5
a6ATH5 is better than 3.97a5
a6deft is better than 3.97a5

• TUKEY PARAMETRIC ANALYSIS
a6ATH10 is better than a6ATH5, a6deft, 3.97a5
a6ATH5 is better than 3.97a5
a6deft is better than 3.97a5



With this kind of samples, --athlower switch lead to a pretty quality improvement. But is it always true? What will happen when using this additional switch with totally different files? Could it reduce quality, but putting unnecessary bits for inaudible signals?
In order to answer to this question, I’ve tested the impact of this switch by testing it with the same various samples used last week.

lame 3.97 alpha 6 testing thread

Reply #12
[span style='font-size:14pt;line-height:100%']TEST #2: 128 kbps ABR[/b][/span]

Samples:
• all ff123’s samples selected for the 64 kbps listening test.
• all ff123’s samples selected for the 128 kbps listening test
macabre.wav (full orchestra sample, uploaded by ff123)
SinceAlways.wav (recently uploaded by Dev0)
castanets2.wav for testing sharpness & pre-echo
Orion II.wav (brass instrument) for testing micro-attacks with a real instrument.

Encoders and settings:
• lame 3.90.3 --preset 128
• lame 3.97a6 --preset 128
• lame 3.97a6 --preset 128 --athlower 10


RESULTS

Code: [Select]
                   3.90.3   3.97a6  3.97a5
                  ABR128   ATH 0   ATH 10
ATrain              2.0      2.5     3.5
BachS1007           4.0      3.5     5.0
BeautySlept         2.5      4.0     3.5
Blackwater          3.5      4.5     4.5
Casta.2             2.0      2.5     2.0
Dogies              3.5      4.0     2.0
FloorEssence        3.8      2.8     2.5
Fossiles            3.0      3.5     1.5
SinceAlways         1.5      2.5     2.5
Layla               4.0      4.5     4.7
LifeShatters        4.0      4.0     4.5
LisztBMinor         2.0      3.0     4.5
Macabre             2.0      3.0     4.0
MidnightVoyage      2.0      3.5     3.0
OrionII             2.0      3.0     3.0
Rawhide             3.5      4.0     4.0
thear1              4.0      4.5     4.5
TheSource           4.0      4.5     4.5
Waiting             1.3      3.3     2.8
Wayitis             2.5      3.0     2.5

------------------------------------------
· · · · · · MEANS  2.86     3.50    3.45 |
------------------------------------------

Click here for log files

COMMENTS:[/b]

• good news: according to my experience and the tested samples, lame 3.97 alpha 6 is now much better than 3.90.3 using exactly the same preset (--preset 128). Results were already positive for 3.97 alpha 5, but now, the perceived difference between old lame and the newest one is even higher.

• the complementary switch I’ve used conduces to irregular results. In some cases, --athlower greatly increases the quality (especially low volume classical music samples, like BachS1007.wav and LisztBMinor, already tested before). With some other, quality progress is less significant or simply non audible. But it is important to point out that --athlower 10 introduces some artefacts: more pre-echo, additional distortions… Fossiles.wav sounded really worse, and a with a bunch of other samples I’ve apparently noticed slight regressions (my feelings were fragile, and I didn’t tried to ABX the difference).

• Anyway, lame 3.97a6 --preset 128 --athlower 10 appears as globally similar to --preset 128, but results are clearly less enjoying if we remove all classical music samples (BachS1007, BeautySlept, LizstBMinor, Macabre, Orion II): 3.57 for lame 3.97a6; 3.27 for lame 3.97a6 --athlower 10 and 2.97 for lame 3.90.3.
Is it better to keep the --athlower 10 switch for classical music? Or should we try an intermediate --athlower value in order to equilibrate results?



STATISTICS:[/b]

If some people would play with statistic tool, just copy and paste the following table:
Code: [Select]
3.90.3    397a6_0    397a6_10
2.0    2.5    3.5
4.0    3.5    5.0
2.5    4.0    3.5
3.5    4.5    4.5
2.0    2.5    2.0
3.5    4.0    2.0
3.8    2.8    2.5
3.0    3.5    1.5
1.5    2.5    2.5
4.0    4.5    4.7
4.0    4.0    4.5
2.0    3.0    4.5
2.0    3.0    4.0
2.0    3.5    3.0
2.0    3.0    3.0
3.5    4.0    4.0
4.0    4.5    4.5
4.0    4.5    4.5
1.3    3.3    2.8
2.5    3.0    2.5



ANOVA analysis of results
397a6_0 is better than 3.90.3
397a6_10 is better than 3.90.3

TUKEY PARAMETRIC analysis of results
397a6_0 is better than 3.90.3
397a6_10 is better than 3.90.3

lame 3.97 alpha 6 testing thread

Reply #13
[span style='font-size:14pt;line-height:100%']TEST #3: 128 kbps ABR with classical music only[/b][/span]

Samples:
• the 15 classical music samples I’ve already used for my multiformat listening tests and more recently for my Nero AAC vs QuickTime AAC and also for the Nero AAC vs Vorbis aoTuV beta 3 listening tests.
Samples are momentary available here (11.8 MB).


Encoders and settings:
• lame 3.90.3 --preset 128
• lame 3.97a6 --preset 128
• lame 3.97a6 --preset 128 --athlower 10


RESULTS

Code: [Select]
                        3.90.3  3.97a6  3.97a6_ATH10
Bayle                    3.0     2.5     3.5
Beethoven                3.5     3.0     4.0
Mozart                   4.0     3.5     4.0
Rinaldo                  3.2     3.7     4.5
Track01 – Avison         2.0     3.0     3.0
Track04 – Hercules       4.0     3.5     5.0
Track06 – Mahler         3.0     2.5     2.5
Track07 – BWV 1034       2.0     2.0     2.5
Track08 – Brahms         3.0     2.5     4.0
Track09 – Compostelle    3.0     2.0     1.5    [WARBLING]
Track10 – Bruhns         1.5     1.0     2.0
Track11 – Mozart         3.5     3.5     3.5
Track12 – Couperin       1.0     2.5     1.8
Track13 – Van Wilder     2.5     3.2     4.0
Track14 – Barriere       2.0     3.0     3.0
                MEANS  2.75    2.76    3.25

Click here for log files



COMMENTS:[/b]

• lame 3.90.3 and lame 3.97a6 are apparently very similar with this kind of music and with --preset 128. Differences are most often very limited, and without ABX confirmation, I can’t be sure they all really exist.

• the --athlower 10 switch has a real and positive impact on quality with the tested samples. I’ve noticed two exceptions only:
- compostelle.wav sample: lame 3.97a6 amplifies a strange warbling, slightly audible with lame 3.90.3 and now really annoying. The --athlower switch doesn’t help to reduce it, and it apparently adds slight distortions on female voices (it nevertheless had to be confirmed).
- Couperin.wav solo harpsichord  sample: quality is slightly worse with –athlower, which increase smearing and maybe (hard to explain some feelings) level of distortions. Anyway, 3.90.3 was clearly worse.
With all other samples, quality between defaulted --preset 128 and --athlower 10 additional switch was at least identical, most often better, and sometimes clearly improved.


STATISTICS:[/b]

If some people would play with statistic tool, just copy and paste the following table:
Code: [Select]
3.90.3    3.97a6    3.97a6_ATH10
3.0    2.5    3.5
3.5    3.0    4.0
4.0    3.5    4.0
3.2    3.7    4.5
2.0    3.0    3.0
4.0    3.5    5.0
3.0    2.5    2.5
2.0    2.0    2.5
3.0    2.5    4.0
3.0    2.0    1.5
1.5    1.0    2.0
3.5    3.5    3.5
1.0    2.5    1.8
2.5    3.2    4.0
2.0    3.0    3.0


ANOVA analysis of results
3.97a6_ATH10 is better than 3.97a6, 3.90.3

TUKEY PARAMETRIC analysis of results
3.97a6_ATH10 is better than 3.97a6, 3.90.3



NOTA BENE[/b]
Mixing both tests (20 general samples + 15 classical only samples):

MIXED RESULTS (35 samples):
lame 3.90.3 = 2.81
lame 3.97a6 = 3.19
lame 3.97a6 --ahtlower 10 = 3.37

ANALYSIS (anova & tukey parametric):
3.97a6_ATH10 is better than 3.90.3
3.97a6 is better than 3.90


EDIT: wrong analysis reports corrected.

lame 3.97 alpha 6 testing thread

Reply #14
Explanations regarding the current ATH status:

It seems clear that there is an important point regarding ath in current version. Ath is configured by 2 elements in 3.97: the level and the shape.

In 3.97a6, the base level for abr/cbr 128k is -3 (ie similar to --athlower 3), and the shape is 4.
The shape is in fact controlling the upper freqs slope of the ath. It is controlled by --athshape xx.
A value of 10 leads to the old Painter&Spanias curve, and a value of 1 leads (nearly) to the Klemm ath.

Changing the base ath level has some impact on every freq, while changing the shape is impacting high freqs.
Of course, if we only change the base level to solve a problem only present in high freqs, we will be wasting bits on low/mid freqs.
A tradeoff must be found.

Another thing that can have some influence could be the ath automatic  adjustement. I think that I will soon enable it for abr/cbr.
note: regarding ath params, it would be helpfull to know the RG value computed by Lame for the tested samples.

Curious people can check the default ath settings there:
http://cvs.sourceforge.net/viewcvs.py/lame...s.c?view=markup

lame 3.97 alpha 6 testing thread

Reply #15
Quote
The shape is in fact controlling the upper freqs slope of the ath. It is controlled by --athshape xx.
[a href="index.php?act=findpost&pid=265974"][{POST_SNAPBACK}][/a]

This switch is unrecognized with 3.97a6.
What does the xx numbers mean? Maybe a sort of rolloff like mppenc  ?
The explanation you give us about the level and the shape is very reasonable.. maybe enabling athaa for cbr/abr can also solve the problem of wasted bits, at least this already happened with VBR.
WavPack 4.3 -mfx5
LAME 3.97 -V5 --vbr-new --athaa-sensitivity 1

lame 3.97 alpha 6 testing thread

Reply #16
My mistake, the switch is --athcurve xxx.
xxx is a number that is impacting the slope of the high freq part of the ath.
10 gives a strong slope, and 1 gives a weaker slope, leading to less masking in the high freqs.

lame 3.97 alpha 6 testing thread

Reply #17
Regarding vbr presets: I will adjust presets up to V4 (medium). Higher bitrate vbr presets (ie V3-V0) do not have any planned change, so it could be interesting to have some feedback regarding them.
Some 3.90.3 vs current alpha using preset standard would be nice.

lame 3.97 alpha 6 testing thread

Reply #18
Quote
note: regarding ath params, it would be helpfull to know the RG value computed by Lame for the tested samples.
[a href="index.php?act=findpost&pid=265974"][{POST_SNAPBACK}][/a]



I've take a look on lame headers, and here are stored RG tags for the classical music sample suit:

Code: [Select]
Bayle       -3.1dB
Beethoven   -2.9dB
Mozart      -2.2dB
Rinaldo     +1.5dB
Track01     -0.8dB
Track04    +12.1dB
Track06     +2.7dB
Track07     +2.4dB
Track08     +0.9dB
Track09     +0.9dB
Track10    +12.6dB
Track11     +2.2dB
Track12     +9.4dB
Track13     +2.7dB
Track14     -1.8dB

lame 3.97 alpha 6 testing thread

Reply #19
HI,
foobar ABX plugin with replaygain, medicore Sony headphones were used

encoders:
3.96.1 and 3.97 alpha 6, --preset [cbr] xxx/medium --scale 1 --noreplaygain

First an oldie but goldie

Code: [Select]
FATBOY (5.00s)
------

    CBR 128

-96.1 (1.8)  The voice is badly distorted, as expected.
-97a6 (1.0)  Just abysmal... a6 introduces a horrible "bonus" artefact that sounds like
             a constant, warbling voice echo

ABX 3.96.1 vs alpha6: effortless


    CBR 160

-96.1 (1.8)  Sounds exactly like it's 128k brother. In fact I was unable to ABX between both.
-97a6 (1.0)  Again not ABX'able from the 128k alpha6 sample. Weird.

3.96.1 vs alpha6: effortless


    ABR 128

-96.1 164k (2.0)  A little better than 3.96.1 CBR (ABX 8/8 against CBR 160 with some effort)
-97a6 163k (1.2)  New voice echo artefact still there, a little less painful than CBR 160 (8/8)

3.96.1 vs alpha6: effortless

    ABR 160

-96.1 205k (2.0) Not ABX'able from ABR 128
-97a6 201k (1.5) Echo gets a little better if you listen closely (ABX 10/12 vs ABR 128)

3.96.1 vs alpha6: 8/8 easy


    V4 (medium)

-96.1 236k (4.0)  nice.
-97a6 233k (4.0)  yeah.

not ABX'able against each other

V4 vs. original was tricky first, but after some loops you start noticing a slight hiss in the voice


Conclusion: something went very wrong for Fatty in Alpha 6 (CBR/ABR). ABR poor despite healthy bitrate. V4 good.


Hmm ouch

lame 3.97 alpha 6 testing thread

Reply #20
nasty (CODE) tag problem, please delete

lame 3.97 alpha 6 testing thread

Reply #21
Code: [Select]
VELVET (11.878s)
------

    CBR 128

-96.1 (2.4)  The bass drum is humping and the hi-hat (or snare? I dunno) sounds altered (stronger).
             I guess that's nothing new but I'm using velvet for the first time.
             The encoder struggles, there is one very short artefact that resembles a
             stream sync "bleep" error, with the bass drum attack at 10.5s

-97a6 (2.0)  Here are 3 instead of 1 noticeable "sync" errors (1.5s, 8.9s and 10.5s)

ABX 96.1 vs 97a6: effortless if you concentrate on 8.6-10.8s


    CBR 160

-96.1 (2.4)  I think the glitch a tad less pronounced than at 128k (ABX 13/16) after listening to 10.5s ~100 times :)
             Otherwise exactly like 128k.

-97a6 (2.1)  The 3 glitches are still there, but also a little less ugly than 128k when you listen often enough (7/8)

ABX 96.1 vs 97a6: effortless if you concentrate on 8.6-10.8s


    ABR 128

-96.1 135k (2.0)  I was surprised to find that ABR 128 is actually worse than CBR 128 here.
                  There are artefacts at 3.5 and 5.4s that are ABX'able from 96.1 CBR 128 (10/10)
                  The glitch at 10.5s is still there but got better over CBR 128.

-97a6 134k (2.4)  The glitches from 97a6 CBR 128 at 1.5 and 8.9s are gone, the one at 10.5s remains but got better.
                  The new problem of 96.1 ABR 128 at 3.6 and 5.4s is not present here.

ABX 96.1 vs 97a6:  8/8 because I concentrated on the 3.96.1 artefact at 3.5s that alpha6 does not have.
                   Sounds like a short jitter from a bad CD rip.


    ABR 160

-96.1 171k (3.0)  All singular glitches are gone, just the distinctive bass bumbling in the left channel remains :)

-97a6 167k (3.0)  ^same^...  Gah, the rumbling sounds like 2 huge rocks scraping together. rrrt brrrt rrrt brrrt...

ABX 96.1 vs 97a6: not possible


    V4 (medium)

-96.1 168k (4.0)  Hmmmm..  Very minor problem with the base drum. Oh well... :) (ABX 18/24 vs original)
                  Also, the hi-hats? seem still too crisp. but it'S barely noticeable. (8/10)                
                  This is way better than ABR 160, at 3kbit less bitrate.
                 

-97a6 167k (4.0)  Like 96.1, there is a gentle tone going together with the bass drum that gets distorted
                  slightly (8/8 vs original). And the hi-hat again too strong. (10/12)


ABX 96.1 vs 97a6: not possible


Conclusion: The glitches shouldn't be there in CBR (especially 160k); ABR is disappointing for both; V4 good.

lame 3.97 alpha 6 testing thread

Reply #22
[span style='font-size:14pt;line-height:100%']TEST #4: --preset standard listening test[/u][/span]

This test is probably the most difficult one. I’m not even sure that useful or significant results will be revealed at the end of such test. I know that --standard preset is rarely fully transparent to my ears (see for example my own listening test performed last summer on non-critical samples), but in order to find those small (and less small differences), I generally need to be highly concentrated and to take many rests. Testing high quality encodings supposes therefore a lot of time, and even more motivation.

Currently, I’m not motivated enough to perform a complete and careful listening test, which imply plenty of successful ABX tests. The following test has consequently some important limitations:
ABX tests results are not necessary significant. To limit the listening/motivation tiredness, I’ve decided to limit the number of trials to 8 (rarely more and in some cases less), which allows one mistake only to stay in the significant area. The latest ABC/HR beta of ff123 doesn’t allow performing a second test: once results revealed to the user, the game is over. If results are not good enough or if you want to test another range or problem, you can’t launch a second test (except by using the training mode, which is enough for the tester, but I’m afraid not for the reader). It implies that notation I gave to encoded files is not necessary legitimated by ABX results.
ABX comparison between two encoded files each others are missing too.
• When differences between encodings are too small to be quickly differentiated, I gave the same notation to all similar files rather than loosing time and patience in order to find ridiculous differences. In other words, some files will obtain the same notation, but it doesn’t mean than files are objectively or even perceptually (in some conditions) identical.

The 20 samples used for this test are still unchanged: the two ff123’s suit, and few additional samples. Hardware & software are also the same as before.


SETTINGS[/u]

For this difficult test, I had to drastically limit the number of challengers. The first idea was to simply oppose 3.90.3 --alt-preset standard to 3.97a6 --preset standard (corresponding to –V 2). But I was not completely serene. Recent tests at lower bitrate clearly revealed that --vbr-new (used by default for fast mode) performs better than defaulted vbr mode. Is it still true for higher bitrate encoding? To answer this question, I’ve tested three settings instead of the two initially planned:

• lame 3.90.3 --alt-preset standard
• lame 3.97a6 –V 2
• lame 3.97a6 –V 2 --vbr new

Average bitrate for the whole suit is: 207 kbps (3.90.3), 201 kbps (3.97a6) & 202 kbps (3.97a6 –vbr-new).



RESULTS[/u]





Code: [Select]
               3.90.3  3.97a6    3.97a6
             standard  -V 2  -V 2 vbr-new

ATrain          5,0      3,0      4,0
BachS1007       5,0      5,0      5,0
BeautySlept     3,5      2,0      2,5
Blackwater      4,5      3,0      4,5
castanets2      1,0      1,0      1,0
dogies          4,2      3,0      3,7
FloorEssence    3,5      2,5      4,5
fossiles        3,0      4,0      4,0
SinceAlways     3,0      2,0      3,7
Layla           3,0      4,0      4,5
LifeShatters    4,5      4,0      5,0
LisztBMinor     3,0      4,5      4,5
macabre         4,0      3,5      4,5
MidnightVoyage  4,0      3,5      4,0
Orion II (2.1)  4,0      3,0      2,5
rawhide         4,0      3,5      4,5
thear1          4,5      4,5      4,5
TheSource       3,0      2,5      4,0
Waiting         2,0      1,5      3,0
wayitis         3,0      2,5      4,0
               3,59     3,13     3,90

Click here for log files


PERSONAL ANALYSIS of results[/u]


• first, I have to say that test was not as difficult as feared. First results were encouraging enough to convince me to follow the test, and to not give up. Thanks to problems heard with –V 2 setting, which were manifest as much as necessary to be distinguished from reference first and then from other contenders.

• lame 3.90.3 --standard preset is obviously better than 3.97a6 --standard preset. There are few exceptions (fossiles.wav & Layla.wav : pre-echo issues and some distortions [metallic colour] on percussive signals; LisztBminor.wav: ringing, fluctuating noise). Lame 3.90.3 is close to transparency, with minor problems and rarely annoying artifacts (except pre-echo, inherent to mp3 limitations).

• --vbr-new switch represents (again) a big quality step compared to standard –V 2 (3.97a6) profile. It sounded worse only once, with the micro-attacks (Orion II.wav) sample. Compared to –V 2 --vbr-new, -V 2 suffers from a typical artifact: lack of musical matter, which deteriorates some precious informations of some instruments, mainly cymbals (false sounding, and also pre-echo). Additional pre-echo is also noticeable, at least with non-critical samples (castanets2.wav is apart). 3.97a6 –V 2 encodings were most often easier to ABX. In other words, --vbr-new mode lead to proper encodings, sharper sound and ringing free files.

• lame 3.90.3 --standard is apparently inferior to lame 3.97a6 –V 2 --vbr-new. But I can’t be fully affirmative, for some reasons:
— Statistically, Friedman’s analysis tool computes other conclusions. According to them, both encoders are tied, despite of overall superiority of the alpha encoder. Even with a less drastic level of confidence (10%), only ANOVA analysis would lead to the conclusion of 3.97a6 --vbr-new superiority.
— I’m a bit perplexed when I take a look to the table of results. At the beginning, lame 3.90.3 was always ranked better (or identical) to its contenders (cf. green cells). But after 6 samples, lame 3.97a6 appeared always as the best (with two exceptions on a total of 14 samples). It looks strange. It’s hard to believe that most favorable samples for 3.90.3 are grouped at the beginning of the whole series. Coincidence? Or did my sensibility changed during the test? Did my attention shifted on other problems? It’s possible, I don’t really know.


Anyway, at the end of this test, I’m sure about two things:
• lame 3.97a6 --preset standard (V2) have serious quality issues
• lame 3.97a6 --preset fast standard (V2 vbr-new) will personally replace lame 3.90.3 --preset standard, whatever the chosen priority:
    - security/quality: lame 3.97a6 seems to offer most often better quality compared to 3.90.3
    - speed: lame 3.97a6 --vbr-new is more than twice faster than lame 3.90.3, approaching or surpassing the speed of modern audio format, like musepack, WMAPro, Vorbis or AAC (QT & Ahead)¹.
    - efficiency: using lame 3.97a6 will reduce the size of most encoding, for better results when compared to 3.90.3



STATISTICAL ANALYSIS of results[/u]

Table:
Code: [Select]
3.90.3  3.97a6  3.97a6new
5.0     3.0     4.0
5.0     5.0     5.0
3.5     2.0     2.5
4.5     3.0     4.5
1.0     1.0     1.0
4.2     3.0     3.7
3.5     2.5     4.5
3.0     4.0     4.0
3.0     2.0     3.7
3.0     4.0     4.5
4.5     4.0     5.0
3.0     4.5     4.5
4.0     3.5     4.5
4.0     3.5     4.0
4.0     3.0     2.5
4.0     3.5     4.5
4.5     4.5     4.5
3.0     2.5     4.0
2.0     1.5     3.0
3.0     2.5     4.0


ANOVA (5% confidence):
Code: [Select]
Number of listeners: 20
Critical significance:  0.05
Significance of data: 5.84E-004 (highly significant)
---------------------------- p-value Matrix ---------------------------

        3.90.3   3.97a6  
3.97a6ne 0.096    0.000*  
3.90.3            0.015*  
-----------------------------------------------------------------------

3.97a6new is better than 3.97a6
3.90.3 is better than 3.97a6


ANOVA (10% confidence):
Code: [Select]
Number of listeners: 20
Critical significance:  0.10
Significance of data: 5.84E-004 (highly significant)
3.97a6new is better than 3.90.3, 3.97a6
3.90.3 is better than 3.97a6

3.97a6 < 3.90.3 < 3.97a6 –vbr-new.

TUKEY PARAMETRIC (5% confidence):
Code: [Select]
Number of listeners: 20
Critical significance:  0.05
Tukey's HSD:   0.443
(…)
-------------------------- Difference Matrix --------------------------

        3.90.3   3.97a6  
3.97a6ne   0.310    0.770*
3.90.3              0.460*
-----------------------------------------------------------------------

3.97a6new is better than 3.97a6
3.90.3 is better than 3.97a6


TUKEY PARAMETRIC (10% confidence):
Code: [Select]
Number of listeners: 20
Critical significance:  0.10
Tukey's HSD:   0.384
(…)
3.97a6new is better than 3.97a6
3.90.3 is better than 3.97a6





[span style='font-size:8pt;line-height:100%']¹ Speed comparison based on one unique track file (length: 20 minutes) and AMD Duron 800 CPU:

• MP3 lame 3.90.3² --preset standard = x2,05 [188 kbps]
• MP3 lame 3.97a6² –V 2 –vbr-new = x5,21 [186 kbps]
• MPC musepack 1.14 --standard = x6,11 [174 kbps]
• OGG vorbis aoTuV beta 3² Q6 = x4,51 [182 kbps]
• AAC Ahead 2.9.9.999 ‘fast’ VBR ::normal:: = x2,95 [208 kbps]
• WMApro 9.1³ VBR single pass Q90 = x5,92 [196 kbps]

      ² John33 compilation.
      ³ through dBPpowerAmp

[/span]

lame 3.97 alpha 6 testing thread

Reply #23
[span style='font-size:14pt;line-height:100%']TEST #5: -q0 switch[/u][/span]

In addition, I’ve just tried the –q 0 switch, which should slightly improve the encoding quality at the price of much lower encoding speed. The switch appeared to be buggy some months ago (especially with highly tonal music) and IIRC was declared to be corrected (but don’t ask me to find the source, I’m maybe wrong).

I’ve limited the test to one sample and one setting:
--preset cbr 128 using 3.97a6.
The sample is a chamber music work of Robert Schumann, including a tonal instrument (oboe). Sample is temporary available here (2,20 MB, wavpack self-extractible format)

Code: [Select]
ABC/HR Version 1.1 beta 2, 18 June 2004
Testname:

1L = D:\lame test\3.97a6\5_Q0\Mondnacht cbr 128.wav
2R = D:\lame test\3.97a6\5_Q0\Mondnacht cbr 128 q0.wav

---------------------------------------
General Comments:

---------------------------------------
1L File: D:\lame test\3.97a6\5_Q0\Mondnacht cbr 128.wav
1L Rating: 3.5
1L Comment: some background/lifeless sounding issues.
---------------------------------------
2R File: D:\lame test\3.97a6\5_Q0\Mondnacht cbr 128 q0.wav
2R Rating: 1.0
2R Comment: awful distortions and terrible level of ringing!!!
---------------------------------------
ABX Results:
D:\lame test\3.97a6\5_Q0\Mondnacht cbr 128.wav vs D:\lame test\3.97a6\5_Q0\Mondnacht cbr 128 q0.wav
   16 out of 16, pval < 0.001


=> -q0 still conduces to heavy artefacts. Only change: encoding speed isn’t painfully slow anymore with this beta. It’s simply not slower at all?!

For interested people, an eloquent frequency comparison of both encodings:
ftp://ftp2.foobar2000.net/foobar/397a6cbr128Q0.gif.


P.S. the switch doesn’t affect quality/speed of VBR encodings (tested with modified –V 5 command line).

P.S. I’ve also tested the cbr 128 –q0 command line with other samples, and I can confirm that the issue is global, and doesn’t only affect the Mondnacht.wav sample, though this one is especially wounded.

lame 3.97 alpha 6 testing thread

Reply #24
Thank you very much for those results.

*preset standard: unexpected results of the "regular" vbr mode. I am wondering if there was a regression between 3.90.3 and 3.96.1 and no one spotted it, or if there is a regression between 3.96.1 and 3.97a6.

*q0 test: compared to default settings, changes are introduced on q2 (substep quantization) and q0 (full outer loop). As substep quantization should only be applied to sfb21, I guess that the quality problem might come from the q0 level. Of course the current q0 should not theorically lead to quality degradations, so I will check it.

This means that unfortunately I still have some work to do but anyway it is still much more usefull to have some feedback than no results at all.