Skip to main content

Topic: Personal Listening Test of MP3 encoders at 224kbps (Read 25601 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • Kamedo2
  • [*][*][*][*]
Personal Listening Test of MP3 encoders at 224kbps
Abstract:
Blind Comparison between Lame 3.100i V2+, Lame 3.99 V1, LAME 3.98 CBR 224kbps -q 0 , Helix -V146, BladeEnc CBR 224kbps(low anchor).

Encoders:
LAME 3.100i
http://www.hydrogenaudio.org/forums/index....showtopic=99483
LAME 3.99.5 VBR V1
http://www.rarewares.org/mp3-lame-bundle.php
LAME 3.98.4 CBR 224kbps -q 0(slowest)
Helix mp3enc v5.1 Open Source encoder 2005-12-20 -V146
http://www.rarewares.org/mp3-others.php
BladeEnc 0.94.2 CBR 224kbps (low anchor)

Settings:
lame3100i -S -V2+ input.wav  output.mp3
lame3.99.5 -S -V1 input.wav output.mp3
lame3.98.4 -S -q 0 -b 224 input.wav output.mp3
hmp3 input.wav output.mp3 -X2 -U2 -V146
bladeenc -quit -nocfg input.wav output.mp3 -224

Samples:
25 Sounds of various genres.

Hardwares:
Sony PSP-3000 + RP-HT560.

Results:



Conclusions & Observations:
I could not a significant difference except the low anchor. There are no big differences in the average quality of these four encoders.

Anova analysis:
Code: [Select]
FRIEDMAN version 1.24 (Jan 17, 2002) [url=http://ff123.net/]http://ff123.net/[/url]
Blocked ANOVA analysis

Number of listeners: 25
Critical significance:  0.05
Significance of data: 0.00E+000 (highly significant)
---------------------------------------------------------------
ANOVA Table for Randomized Block Designs Using Ratings

Source of        Degrees    Sum of    Mean
variation        of Freedom  squares  Square    F      p

Total              124          23.56
Testers (blocks)    24          7.75
Codecs eval'd        4          9.24    2.31  33.80  0.00E+000
Error              96          6.56    0.07
---------------------------------------------------------------
Fisher's protected LSD for ANOVA:  0.147

Means:

Helix-V1 3.100iV2 3.98CBR  3.99V1  BladeEnc
  4.60    4.58    4.57    4.54    3.90

---------------------------- p-value Matrix ---------------------------

         3.100iV2 3.98CBR  3.99V1  BladeEnc
Helix-V1 0.829    0.706    0.419    0.000*
3.100iV2          0.871    0.553    0.000*
3.98CBR                    0.666    0.000*
3.99V1                              0.000*
-----------------------------------------------------------------------

Helix-V146 is better than BladeEncCBR
3.100iV2+ is better than BladeEncCBR
3.98CBR is better than BladeEncCBR
3.99V1 is better than BladeEncCBR
Raw data:
Code: [Select]
% MP3 224kbps ABC/HR Score
% This format is compatible with my graphmaker, as well as ff123's FRIEDMAN.
3.100iV2+ 3.99V1 3.98CBR Helix-V146 BladeEncCBR
%feature 7 LAME LAME LAME Other Other
4.700 4.600 4.000 4.300 3.100
4.300 4.200 4.600 4.800 3.800
4.500 4.500 4.400 5.000 4.700
4.800 5.000 4.600 5.000 4.300
4.700 4.500 4.200 4.400 3.500
4.700 4.300 5.000 4.600 4.500
4.400 5.000 3.800 4.700 3.900
4.200 4.500 4.400 4.500 3.800
4.300 4.200 4.000 4.500 3.200
4.400 4.300 5.000 4.600 3.400
4.000 4.300 4.500 4.600 3.500
4.500 4.200 4.400 4.600 3.600
4.200 4.500 5.000 4.700 4.000
4.300 4.100 4.300 4.100 3.500
4.200 4.200 4.400 4.600 3.900
5.000 4.500 5.000 5.000 4.100
5.000 4.300 4.700 4.400 4.000
4.500 4.400 4.200 4.000 3.200
5.000 5.000 5.000 4.500 4.400
5.000 5.000 4.700 5.000 4.400
5.000 4.800 4.800 4.600 4.200
4.700 5.000 5.000 4.500 3.900
4.800 5.000 5.000 4.600 4.200
5.000 5.000 4.700 5.000 4.200
4.400 4.100 4.600 4.400 4.100
%samples 41_30sec hihats
%samples finalfantasy cemb
%samples ATrain Jazz
%samples BigYellow Pops
%samples FloorEssence Techno
%samples macabre orch
%samples mybloodrusts guitar
%samples Quizas Latin
%samples VelvetRealm Techno
%samples Amefuribana Pops
%samples Trust Gospel
%samples Waiting Rock
%samples Experiencia Latin
%samples Heart to Heart Pops
%samples Tom's Diner Vocal
%samples Reunion Blues Jazz
%samples French Speech
%samples undelete Pops
%samples Dimmu Borgir Metal
%samples Run up Pops
%samples German Speech
%samples ItCouldBeSweet Pops
%samples OnTheRoofWith Pops
%samples easy game Pops
%samples Tears Infection Pops
Bitrates:
Code: [Select]
259222	250962	224500	270110	224109
212206 190894 224404 200702 224012
210626 223963 224651 248472 224040
256744 246869 224559 243848 224081
272211 268745 224645 225813 224060
212126 222561 224771 234717 224101
227229 274100 224802 226008 224228
252353 237478 224475 243034 224060
264467 270433 225449 245293 224317
230309 214030 224517 219325 224051
243315 240742 224427 240749 224024
232944 226612 224709 251829 224129
256994 236299 224619 229343 224034
245990 237097 224533 243966 224121
220848 204723 224825 225298 224235
226596 224930 224500 248784 224110
274433 235266 224408 176326 224012
240666 234456 224458 230376 224032
218750 222924 224796 232772 224208
234844 237687 224774 229799 224189
210745 167946 224583 104856 224026
219320 180893 224796 161194 224211
211124 209905 224500 200196 224110
214539 204183 224648 182829 224167
226791 224161 224673 215760 224121
average:
235016 227514 224641 221256 224112

  • Gainless
  • [*][*][*]
Personal Listening Test of MP3 encoders at 224kbps
Reply #1
Thanks for the work, nice to have such a detailed comparison. Did you use the -HF switch with the Helix encoder, btw?

  • halb27
  • [*][*][*][*][*]
Personal Listening Test of MP3 encoders at 224kbps
Reply #2
It must have been a hard test, thank you very much.
Very interesting result. From this using 3.100i -V2+ isn't very useful compared to using -V1.
  • Last Edit: 16 May, 2013, 03:32:23 PM by halb27
lame3995o -Q1

  • Kamedo2
  • [*][*][*][*]
Personal Listening Test of MP3 encoders at 224kbps
Reply #3
Thanks for the work, nice to have such a detailed comparison. Did you use the -HF switch with the Helix encoder, btw?

hmp3 input.wav output.mp3 -X2 -U2 -V146
I didn't use the -HF switch, as the default option is typically the most recommended option by the developer(s).
But it may be interesting to test -HF, along with 3.100 alpha2, and 3.99.5f. I won't start testing them now because I'll be
very busy until June and rainy season starts from June.

  • halb27
  • [*][*][*][*][*]
Personal Listening Test of MP3 encoders at 224kbps
Reply #4
I welcome very much if you could test Lame3.100 alpha2 as well. With your current test lame3.100i stands against 3.99.5 which is not the same basis (though I don't think things will change essentially, I even expect 3.100 alpha2 to come out a little bit better than 3.99.5 does).
lame3995o -Q1

  • greynol
  • [*][*][*][*][*]
  • Global Moderator
Personal Listening Test of MP3 encoders at 224kbps
Reply #5
It seems the basis here is ~224 kbits.  If there is a desire to determine if there are advantages between Lame versions using the same V level, run a new test and present the results in a new discussion rather than attempt to co-opt this one.
13 February 2016: The world was blessed with the passing of a truly vile and wretched person.

Your eyes cannot hear.

  • halb27
  • [*][*][*][*][*]
Personal Listening Test of MP3 encoders at 224kbps
Reply #6
???
I think it's interesting to see how 3.100a2 -V1 compares against 3.99.5 -V1 in its own right.
Sure I'm interested to see how 3.100i -V2+ compares against 3.100a2 -V1 (because the underlying basis is the same - except for the -V level of course).
lame3995o -Q1

  • greynol
  • [*][*][*][*][*]
  • Global Moderator
Personal Listening Test of MP3 encoders at 224kbps
Reply #7
Neither of those are on-topic. See TOS #5 and #7 if you have any questions. Further posts will on the matter will be binned.
13 February 2016: The world was blessed with the passing of a truly vile and wretched person.

Your eyes cannot hear.

  • Destroid
  • [*][*][*][*][*]
Personal Listening Test of MP3 encoders at 224kbps
Reply #8
Astounding listening test, and quite interesting. I am a long-time fan of HMP3 (a great time saver on a hum-drum machine).
I didn't use the -HF switch, as the default option is typically the most recommended option by the developer(s).
But it may be interesting to test -HF, along with 3.100 alpha2, and 3.99.5f.
I wanted to add that simply adding -HF1 / -HF2 will shift the bitrate slightly higher. Here's my quick test results (optional remarks follow):
Code: [Select]
-X2 -U2 -B146 ....... ~234kbps
-X2 -U2 -B146 -HF1 .. ~235kbps
-X2 -U2 -B145 -HF1 .. ~233kbps
-X2 -U2 -B143 -HF2 .. ~234kbps
-X2 -U2 -B142 -HF2 .. ~233kbps

Note: all these bitrates are above 224 kbps,
I am just going with the OP's switches.

Note2: Spectrogram observations of HMP3 regarding bitrate increase
(NOT a quality metric)
- Without -HF the material is clearly cut-off above 16kHz;
- Using -HF1 looks similar to LAME 3.99 -V2/-V3 (some material encoded >16kHz);
- Using -HF2 encodes like LAME 3.99 -V0/-V1 (gradual roll-off between 16-20kHz).
In regards to the 16kHz cutoff and -HF switches, you can see from the OP that the
results are not as dramatic as some may believe
  • Last Edit: 17 May, 2013, 08:13:43 PM by Destroid
"Something bothering you, Mister Spock?"

  • IgorC
  • [*][*][*][*][*]
Personal Listening Test of MP3 encoders at 224kbps
Reply #9
Kamedo2,
Thank You Very Much for sharing this test here.  Great one! 
Have followed your tests since some time ago and it's clear for me why You've used CBR for LAME 3.98.4. Because it performs better http://d.hatena.ne.jp/kamedo2/20111214

I was reading your test and trying to process an information. Here are my thoughts.
Well, except the low anchor, all encoders are on par.  But some additional analysis would be useful to get some extra conclusions. 
  • A lowest score per encoder. 
    All individual scores are >= 4.0 per sample. Except 3.98.4 CBR. That's where CBR fails imo. I would rather prefer a bit lower average score while  scores for each particular sample would stay at least at 4.0. It's only one sample where 3.98.4 CBR did worse than 4.0. Yes, but it also does the same in your previous test http://d.hatena.ne.jp/kamedo2/20111214
  • It's hardly a coincidence that Helix MP3 encoder ends up with a slightly higher score each time, as here  http://listening-tests.hydrogenaudio.org/s...8-1/results.htm and http://www.hydrogenaudio.org/forums/index....st&p=808142 .  Helix encoder is 7 years old  and it still shines.
  • All average scores are >4.5 (except the low anchor) and You are the experienced listener. It means these encoders will be transparent for an averaged listener.
  • The halb’s 3.100i V2  looks good. A bit higher average score comparing between LAME encoders (though no significant difference making statistical analysis, but still) and all individual scores are higher than or equal to 4.0.

  • Kamedo2
  • [*][*][*][*]
Personal Listening Test of MP3 encoders at 224kbps
Reply #10
Yes, my thought is the same. Even when the difference of Helix MP3 encoder over LAME is slight, I like the way how Helix behaves. The number of badly encoded samples is low.
I collected 3 different test results and combined the results in one image. Many people will use encoders in many bitrates and settings, and this collection represents a fair approximation of these overall average quality people will experience. Average score: LAME3.98=4.27 Helix=4.33, Number of samples: 25+20+14=59

Code: [Select]
%Kamedo2's Personal Listening Test of MP3 224kbps
LAME3.98 Helix
4 4.3
4.6 4.8
4.4 5
4.6 5
4.2 4.4
5 4.6
3.8 4.7
4.4 4.5
4 4.5
5 4.6
4.5 4.6
4.4 4.6
5 4.7
4.3 4.1
4.4 4.6
5 5
4.7 4.4
4.2 4
5 4.5
4.7 5
4.8 4.6
5 4.5
5 4.6
4.7 5
4.6 4.4

%IgorC's Personal Listening Test of MP3 encoders (part II) LAME vs Helix MP3 encoders at 130 kbps.
4.1 4
3.9 3.8
3.1 3
3.4 3.8
4 3.7
3.2 3.9
4.1 4.2
3.3 3
2 4.5
4.4 3.8
3.2 3.1
4.3 4.1
4.4 3.6
3.1 4
4 4.5
4.5 3.3
3.9 4.3
4 4.3
3.3 3
4.2 4.2


%Results of the public MP3 listening test @ 128 kbps (October 2008)
3.68 4.74
4.34 4.67
4.64 4.6
4.12 4.39
4.58 4.75
4.65 4.77
4.55 4.8
4.57 4.41
4.82 4.22
4.79 4.59
4.75 4.08
4.44 4.74
4.62 4.7
4.54 4.75

  • Dynamic
  • [*][*][*][*][*]
Personal Listening Test of MP3 encoders at 224kbps
Reply #11
Thanks for the effort and ability you put into this substantial test, Kamedo2.

I wonder if this thread should be added to the Wikipedia list of Codec Listening Tests (which includes those high bitrate individual tests by Guruboolez some years ago).

I also tend to look at the lower bound and/or the tightness of the distribution to attempt to reduce the likelihood of really nasty artifacts, though my artifact detection training is fairly poor. The problem, as always, is there might be extreme cases that one psymodel just doesn't deal with adequately that are missed in the test corpus, but a general idea of the spread and lower bounds of quality is still helpful.

HELIX VBR seems to do very well at 128 kbps and 224 kbps, and I'd feel confident using it anywhere from 128kbps upwards.

Pros and Cons
  • Encoding speed: Helix MP3 wins
  • Quality (~128 to ~224kbps): Helix MP3 and LAME tie
  • Gapless support: LAME wins


I do use Helix at ~131kbps for loudness-levelled background music compilations on hardware where gapless support is impossible. Otherwise, easy gapless support in sufficiently good players keeps me using LAME, and I'm happy that the likes of Amazon use LAME at around -V0 for that reason.


Halb27's special LAME -Vn+ does also have specific uses for certain types of track (e.g. solo harpsichord or other music having heavy transients with strong steady tones). I haven't completely kept up with how the main LAME3.100 copes with these (I think there's some improvement over 3.99), but the strategy of providing maximum bitrate for short blocks seemed to work for halb27's version where LAME 3.99 and Helix MP3 both fall down unless the bitrate gets very high generally. I might well adopt that version for specific types of content or to fix a problem sample.
Dynamic – the artist formerly known as DickD

  • Kamedo2
  • [*][*][*][*]
Personal Listening Test of MP3 encoders at 224kbps
Reply #12
I wonder if this thread should be added to the Wikipedia list of Codec Listening Tests (which includes those high bitrate individual tests by Guruboolez some years ago).

Yes, it should be. And Hydrogenaudio knowledgebase too.

  • Kohlrabi
  • [*][*][*][*][*]
  • Global Moderator
Personal Listening Test of MP3 encoders at 224kbps
Reply #13
I wonder if this thread should be added to the Wikipedia list of Codec Listening Tests (which includes those high bitrate individual tests by Guruboolez some years ago).

Yes, it should be. And Hydrogenaudio knowledgebase too.
With the conclusion that at these high bitrates all modern encoders perform equally well?
  • Last Edit: 21 May, 2013, 11:31:11 AM by Kohlrabi
It's only audiophile if it's inconvenient.

  • Kamedo2
  • [*][*][*][*]
Personal Listening Test of MP3 encoders at 224kbps
Reply #14
I wonder if this thread should be added to the Wikipedia list of Codec Listening Tests (which includes those high bitrate individual tests by Guruboolez some years ago).

Yes, it should be. And Hydrogenaudio knowledgebase too.
With the conclusion that at these high bitrates all modern encoders perform equally well?

Yes, the conclusion is 4-way tie (all except BladeEnc). The word 'tie' is preferred over 'equal', for obvious reasons.

  • greynol
  • [*][*][*][*][*]
  • Global Moderator
Personal Listening Test of MP3 encoders at 224kbps
Reply #15
BladeEnc != modern encoder
13 February 2016: The world was blessed with the passing of a truly vile and wretched person.

Your eyes cannot hear.

  • Kamedo2
  • [*][*][*][*]
Personal Listening Test of MP3 encoders at 224kbps
Reply #16
BladeEnc != modern encoder

That's why I said 4-way tie. (I assume 3 Lame encoders and Helix are the modern encoders. These 4 encoders are the winner and BladeEnc is the obvious loser.)

  • greynol
  • [*][*][*][*][*]
  • Global Moderator
Personal Listening Test of MP3 encoders at 224kbps
Reply #17
I just like stating the obvious.
13 February 2016: The world was blessed with the passing of a truly vile and wretched person.

Your eyes cannot hear.

  • shadowking
  • [*][*][*][*][*]
Personal Listening Test of MP3 encoders at 224kbps
Reply #18
Very good and interesting test. It shows that for a modern mp3 encoder above a certain threshold (192k)  - the bitrate is a strong indicator of quality no matter VBR / CVBR or CBR.  Also it prove as I've said before that CBR will not 'starve' of bits given sufficient bitrate and the popular 320 CBR encodings on the internet are a huge waste as 224 yields an excellent quality.
  • Last Edit: 24 May, 2013, 04:16:07 AM by shadowking
wavpack -b4x4s1c

  • Gecko
  • [*][*][*][*][*]
Personal Listening Test of MP3 encoders at 224kbps
Reply #19
Thank you very much for the test!

I realize you have extensive artifact training, but I am still surprised that so few samples are 100% transparent.

  • greynol
  • [*][*][*][*][*]
  • Global Moderator
Personal Listening Test of MP3 encoders at 224kbps
Reply #20
It seems that if you're particularly sensitive to pre-echo then just about anything with hard transients won't be transparent with mp3, even at 320kbits.
13 February 2016: The world was blessed with the passing of a truly vile and wretched person.

Your eyes cannot hear.

  • Kamedo2
  • [*][*][*][*]
Personal Listening Test of MP3 encoders at 224kbps
Reply #21
The ABX criteria was 15+/20(p=0.02). All samples and all encoders were ABXed 20 times. So there were 25(samples) x 5(encoders) = 125 tests, of which 25 tests I failed and thus scored 5.0(Correct answer:14 or less)
The 15+/20 criteria allows me to fail up to 25% of the blind tests, so it explains why only 20% of them were transparent.

The software to plot the graph and table in this result thread. Web application. Feel free to use it.
http://zak.s206.xrea.com/bitratetest/graphmaker3.htm
Help page:
http://zak.s206.xrea.com/bitratetest/faq.htm

I wonder if this thread should be added to the Wikipedia list of Codec Listening Tests (which includes those high bitrate individual tests by Guruboolez some years ago).

I refrain from adding the result in wikipedia, because writing articles about oneself or one's own result is what should be avoided.

  • greynol
  • [*][*][*][*][*]
  • Global Moderator
Personal Listening Test of MP3 encoders at 224kbps
Reply #22
It would help since Guruboolez's data is classical-centric, and, quite frankly, I'm tired of seeing his results get raised during discussions where they aren't a good fit.
13 February 2016: The world was blessed with the passing of a truly vile and wretched person.

Your eyes cannot hear.

  • Gecko
  • [*][*][*][*][*]
Personal Listening Test of MP3 encoders at 224kbps
Reply #23
It seems that if you're particularly sensitive to pre-echo then just about anything with hard transients won't be transparent with mp3, even at 320kbits.

I'm only familiar with some of the samples, but would you say that most of them contain hard transients? The "speech" and "vocal" samples don't but still are not transparent.

Kamedo, could you maybe elaborate a little on the problems you heard?

  • greynol
  • [*][*][*][*][*]
  • Global Moderator
Personal Listening Test of MP3 encoders at 224kbps
Reply #24
It was a general comment.

Stop consonants in speech could be qualified as hard transients, however.
13 February 2016: The world was blessed with the passing of a truly vile and wretched person.

Your eyes cannot hear.