Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: LAME 3.96b regression examples (Read 39386 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

LAME 3.96b regression examples

Reply #25
westgroveg, I was just suggesting a way to make this thread as useful as possible. I am not your mother, therefore I am just suggesting, not commanding...

David, tnx for clarification. I was assuming, that you had to set a specific number of trials before the test. And the result would have to be taken from this session.

Not trying again a session and then a next one, until you find a result that is pleasing.

LAME 3.96b regression examples

Reply #26
OK did some more testing on the (-) Ions sample,

Code: [Select]
ABX Results:
Original vs C:\Documents and Settings\Administrator\Desktop\TEST SAMPLES\LAME test samples\(-) Ions-APS-3.96b1.wav
   15 out of 16, pval < 0.001

Code: [Select]
ABX Results:
Original vs C:\Documents and Settings\Administrator\Desktop\TEST SAMPLES\LAME test samples\(-) Ions-APS-3.90.3.wav
   14 out of 16, pval = 0.002

Code: [Select]
ABX Results:
C:\Documents and Settings\Administrator\Desktop\TEST SAMPLES\LAME test samples\(-) Ions-APS-3.90.3.wav vs C:\Documents and Settings\Administrator\Desktop\TEST SAMPLES\LAME test samples\(-) Ions-APS-3.96b1.wav
   9 out of 10, pval = 0.011


I'm disappointed to say 3.90.3 definitely outperformed 3.96.

Edit: If it was unclear I tested --alt-preset standard.

LAME 3.96b regression examples

Reply #27
If I'm reading this right, you were able to ABX 3.96b1 15/16 times, and 3.90.3 14/16 times. That seems pretty darn close to me.  Maybe I'm misinterpreting the data.
flac > schiit modi > schiit magni > hd650

LAME 3.96b regression examples

Reply #28
Quote
If I'm reading this right, you were able to ABX 3.96b1 15/16 times, and 3.90.3 14/16 times. That seems pretty darn close to me.  Maybe I'm misinterpreting the data.

Then I ABXed 3.90.3 against 3.96 & as I wrote 3.96 sounded worse.

Quote
# Posting results in the thread requires:

    * Upload or link to sample
    * ABX results Original<->3.90.3, Original<->3.96, 3.90.3<->3.96, with detailed description of the difference(s)
    * Report about software/hardware used: Soundcard (resampling?), Player/ABXtool, DSPs (shouldn't be allowed, besides resampling to 48kHz and volume reduction/replaygain to prevent clipping <- both a 'must'), Amplifier, Speakers/Headphones

# Results must be confirmed by someone else before they are included in 'official' statistic, p-values must be < 0.05 for at least 2 people.

LAME 3.96b regression examples

Reply #29
Wouldn't we rather need ABC/HR results (with ratings) in order to know which codec sounds better ?

I mean, if you can ABX 3.90.3 vs. 3.96, it only proves you can hear a difference between the two, but you can't tell which one sounds better (because you don't know which one you're listening to).

If you rate the codecs with ABC/HR, it will clearly indicate which one sounds better.

Please correct me if I'm wrong.
Over thinking, over analyzing separates the body from the mind.

LAME 3.96b regression examples

Reply #30
Quote
Please correct me if I'm wrong.

You are wrong.  With file ABX you chose which one is A and which one is B, so you would definitely be able to make a judgement as to which sounds better.  ABC/HR is designed to eliminate personal bias, with ABX one could easily skew the results (in fact you could flat out lie and say that the one that sounded worse actually sounded better).  That, I believe, is one of the reasons for requiring that at least one person verifies results in this thread.
gentoo ~amd64 + layman | ncmpcpp/mpd | wavpack + vorbis + lame


LAME 3.96b regression examples

Reply #32
@harashin
Did you replaygain the samples in a way?
Husltlejet suffers from clipping, even more with 3.96b

@westgroveg
Ions is a sample showing 3.90.3 superior due to how it works.
3.90.3 Chooses 320kbit for short blocks in general i think. Ions is full of it.
3.96b doesn´t use always 320kbit for short blocks. This may cause the slight degration here.
Edit: If this is the case, it would be hard for 396b outperforming 3.90.3

Wombat
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

LAME 3.96b regression examples

Reply #33
Quote
@harashin
Did you replaygain the samples in a way?
Husltlejet suffers from clipping, even more with 3.96b

No, I didn't. So I tried again on that sample.

Hustle_Jet_LAME_RG_test

LAME 3.96b regression examples

Reply #34
Tnx harashin,

these samples are hard to compare. 3.96b is about 50 kbps smaller.

LAME 3.96b regression examples

Reply #35
@amano
I wouldn't hear the difference, if the average bitrate of both files were similar.

I did another test on PVNC's (-) Ions sample.
(-) Ions LAME test

LAME 3.96b regression examples

Reply #36
Quote
Tnx harashin,

these samples are hard to compare. 3.96b is about 50 kbps smaller.

That doesn't matter - we're comparing --aps to --aps, if 3.96 is using too low a bitrate then that is a legitimate regression for --aps.

LAME 3.96b regression examples

Reply #37
Possible regression in badvilbel (download from ff123.net):

Original vs. 3.96b1: 35 out of 50, pval = 0.003
Original vs. 3.90.3: 20 out of 52, pval = 0.965
3.90.3 vs. 3.96b1: 20 out of 46, pval = 0.849

Tested using --alt-preset standard, 4.4 -> 5.8 seconds into the clip, with ABC/HR (so I couldn't tell which version I was ABXing).

When I saw that I couldn't ABX the difference between the two compressed versions, I went back and did 8 more of each vs. original. 3/8 for the 3.90.3, 7/8 for 3.96b1. These are included in the total ABX scores.

LAME 3.96b regression examples

Reply #38
Definite regression in trumpets1:

Original vs. 3.96b1: 8 out of 8, pval = 0.004
Original vs. 3.90.3: 9 out of 16, pval = 0.402
3.90.3 vs. 3.96b1: 8 out of 8, pval = 0.004

The 3.96b1 encode is really warbly, worse than any artifact I've heard with Lame 3.90 --aps, worse even than the erhu sample (which is equally bad with both Lame versions: 13/16 for both, couldn't ABX the difference).

In case anyone wants the erhu sample, the link in the erhu thread is broken, so I've attached my copy of the sample.

LAME 3.96b regression examples

Reply #39
Tell me to shut up if I'm talking nonsense, but...

I've missed the post where someone said they'd adapted the (--alt)-preset standard tuning to work with 3.96.

Surely, if the tunings which worked for the old encoder are simply copied over to the new (updated, i.e. changed!) encoder, then they can't be expected to work as well - simply because they were specifically targeted at problems in the older encoder.

Or is there a general expectation that the problems which --alt-preset standard addressed in 3.90.3 are still present in 3.96, and therefore it should still work well?

Or has --alt-preset standard been updated in this version?


Otherwise, it seems to me that the only way to get anywhere is to show that some vanilla command line is better in 3.96 than in 3.90.3 at the same (approx.) bitrate, and then to make a new preset/hack/whatever to fix any remaining problems.

Sorry if I've jumped in too late and missed all this already. I'm sure Gabriel will put me right.

Cheers,
David.

LAME 3.96b regression examples

Reply #40
Quote
Surely, if the tunings which worked for the old encoder are simply copied over to the new (updated, i.e. changed!) encoder, then they can't be expected to work as well - simply because they were specifically targeted at problems in the older encoder.


Of course the presets have been updated for the new code.

LAME 3.96b regression examples

Reply #41
@2Bdecided
The preset is totally different in behaviour. Gabriel seems to try to port the preset to the newer optimizations.
The vbr mode changed, short block behaviour is totally different and maybe other things.
Still the preset must be more than only a command line. Maybe Gabriel can explain?

Till here it seems that only my birds and sophia2 sample became better with 3.96b. They seem to profit from better masking or something like that. All other samples seem to became worse

Edit: Ups! Same time posted as Gabriel!

Wombat
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

LAME 3.96b regression examples

Reply #42
Quote
All other samples seem to became worse

awe32_20sec is much better, and at a slightly lower bitrate.

That's why it would be helpful to have two sticky threads: samples where it's worse, and samples where it's better. That way, it would be easier to form a fair opinion.

Cheers,
David.

LAME 3.96b regression examples

Reply #43
I thought fatboy was a regression example - but if you try the full intro (I've uploaded it in a separate thread) it seems that both 3.90.3 and 3.96 both have similar problems, but in different places.

Cheers,
David.

LAME 3.96b regression examples

Reply #44
Another definite regression, this time in death2:

Original vs. 3.96b1: 8 out of 8, pval = 0.004
Original vs. 3.90.3: 0 out of 2, pval = 1.000 (didn't bother)
3.90.3 vs. 3.96b1: 18 out of 24, pval = 0.011

3.96b1 has lots of smearing on the transients. I tested somewhere in the middle of the sample, not right at the beginning. I will try adding -b128 to the 3.96b1 commandline to see if that improves things. I also tried castanets while I was at it: 5/5 ABX for both vs. original, but 15/30 trying to ABX against each other.

I'm in the process of ABXing the Birds sample, too. I need to rest my ears and run more trials, though, before I report my results. So far, I cannot ABX Sophia2, Liebestod, or awe32_20sec any which way, but this may be due to extreme listener fatigue.

LAME 3.96b regression examples

Reply #45
Sophia2
Introduces added noise from 1.2-2sec and 4-6sec

Birds
Distortion is added mostly with the "e" when she sings "become"

I just copied this from the thread i introduced these samples. If you heard the problems once, it is easy. If you want to spot the problem easier for learning try to disable -Z again or use pure 3.90.
I call this distortion sandpaper noise.

The Liebestod sample was ok since aps 3.90.3 used -Z by default, before it was horrible and the most annoying artifacts i heard with aps.

btw.
Cool thing, you try so much!
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

LAME 3.96b regression examples

Reply #46
 It seems 3.95.1 has similar regressions such is with 3.96.

LAME 3.96b regression examples

Reply #47
I hope more people start ABXing samples, even the same ones I did. I'm just one particular set of ears, testing specific time intervals of a few sampless. It's possible that I would have heard improvements instead of regressions, had I picked other one-second snippets of each sample to analyze! There's not enough information in this thread to absolve nor condemn 3.96b1.

Quote
I just copied this from the thread i introduced these samples. If you heard the problems once, it is easy. If you want to spot the problem easier for learning try to disable -Z again or use pure 3.90.
I call this distortion sandpaper noise.

btw.
Cool thing, you try so much!

Well, I just got these new headphones, so I'm having a bit of fun comparing my ABX scores to the ones I got with my old headphones in previous ABX tests. Two years ago, I wrote down for the Birds sample "successful ABX (barely)", and that's how well I'm doing now, hovering between 5% and 1% pval. Interestingly enough, I got the same score for Garf's underwater love sample this time (15/20) as I did when he first made it available!

LAME 3.96b regression examples

Reply #48
Gabriel, have you updated the bitrate presets for 3.96b1, such as --preset 128? I want to know if I should test lower bitrates as well.

LAME 3.96b regression examples

Reply #49
Quote
Gabriel, have you updated the bitrate presets for 3.96b1, such as --preset 128? I want to know if I should test lower bitrates as well.

Yes, they are using less short blocks than 3.95.1.
On mid/low bitrates it should improve samples like dafunk, fsol or waiting.