Historical Lame Listening Test?

Topic: Historical Lame Listening Test? (Read 7521 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Historical Lame Listening Test?

2007-05-31 10:50:35

Hi.

Is there anyone interested in testing differend versions of lame to see how lame was improved over the last years?

Some (5-7) high quality pop, rock and classic samples, instead of problem samples would be very interesting to test.

Testing the following (or other) lame versions with a bitrate of 96 oder 128 should show how lame was improved over the last years.

3.97 (09-2006)
3.96 (04-2004)
3.93.1 (12-2002)
3.90.1 (12-2001)
3.87a (09-2000)
3.50 (11-1999)

After the test, we can post the results on wikipedia. So anyone can see that lame at 128k should be save for most listeners today.

Anyone interested?

Historical Lame Listening Test?

Reply #1 – 2007-05-31 11:03:27

Personally, I am not interested to see what LAME sounded like five years ago - I want to see how it sounds today compared to its current competitors. I also think that your argument regarding safeness at 128 kbps is not going to make others switch to 128 kbps. Even though past listening tests demonstrated that 128 kbps is quite transparent, I still see countless of posts on Internet forums where people claim to hear artifacts at 256 kbps or say that at 320 kbps, Ogg Vorbis sound is richer than MP3 at the same bitrate.

Historical Lame Listening Test?

Reply #2 – 2007-05-31 11:42:30

I think a private blind test, with results posted on HA, might be interesting. Whoever is motivated enough to do it can go first, and anyone who wants to can repeat the test themselves.

Don't be surprised at how good Lame has been for a long time though!

Cheers,
David.

Historical Lame Listening Test?

Reply #3 – 2007-05-31 11:52:45

You might want to consider using 3.90.3 rather than 3.90.1 as it was the HA recommended version for quite some time.

Historical Lame Listening Test?

Reply #4 – 2007-05-31 13:32:36

Though it's at beta state 3.98b3 should be added IMO.
It has improved on several issues.

Historical Lame Listening Test?

Reply #5 – 2007-05-31 18:12:04

I would be ok for testing this.

Historical Lame Listening Test?

Reply #6 – 2007-06-02 00:06:56

My personal view is that there is no way to 100% objectively compare any encoders at low bitrates such as 128 kbits/s. On many or most materials they are not transparent. Depending on your personal listening abilities, they may be quite far from transparent.

If you have 2 non-transparent, but different-sounding samples, how do you say which is better? You can only say what is "better" or "worse" to you but that's subjective. You can't double-blind test that. You just have to say it, and it's subject to your own opinion and what you happened to notice and what you didn't happen to notice.

For years I thought Fraunhofer 256 kbps (mp3 producer pro 2.1 version) was pretty good. Then one day I realized that on almost every song with bass notes, you can hear the bass notes wobble (this may be a violation of forum rules since I'm not providing proof, so if anyone requires me to retract this statement I will). On LAME or the original WAV file a bass note sounds powerful and steady....like this BUUUUUUUUUUUUU. But on Fraunhofer it sounds like this MYOINGYOINGYOINGYOING. To me, 99% of songs can be identified from the original in a 1-second clip because you just have to listen to this bass difference. (it's pretty hard though, take lots of practice and probably only marginally ABX-able...again no proof so I will retract if required).

Different codecs function so differently (even different codecs of the same format...such as mp3...LAME and Fraunhofer have totally different encoding strengths and weaknesses and settings) that when you compare two 128 kbits/s recordings that happen to ABXable from each other and the original...you're just going to be picking on whatever YOU notice...which is only a subset of all the things that can be noticed...and your judgement will be extremely biased in my opinion.

EDIT: You can test all you want and post it here, I have no problem with it. But I would raise a complaint if you tried to post the results of your personal 128 kbps listening test on wikipedia.

Historical Lame Listening Test?

Reply #7 – 2007-06-02 07:54:39

Your post raises good questions. There may be cases where the original is not really music, but just crap, in which case I suppose that acoustic defects might actually enhance it (especially dropoffs in volume, smoothings or mufflings, or mistaken encodings of golden silence)--I've often wished my neighbors would use a "bad" encoder that just encodes silence. Also, some people prefer "brightened" tracks, so a brightened lossy might sound "better," esp. if the original itself is thin, hollow, watery, flat, etc., e.g. elevator "music" (I assume that's one reason for a high anchor sanity check). But other than issues like that, why wouldn't a double-blind test be possible and valid?

True the test picks up only what the listener notices. That's a severe limitation if being asked to rank worth or quality of music, since that would be sharply limited by possession of taste & uncommon abilities & passions (and also you can't double-blind the type of music, so a bias of bad taste can't be technically adjusted for). But purely low-level technical qualities are not really a problem like that, unless there's really a bias, as you suggest, in favor of certain types of even simple technical flaws (e.g. if ha posters were biased not to hear warbling as a defect: if their favored encoders produced warblings, and they had gradually convinced themselves not to notice or be bothered by that). It could also be simpler bias: if some people don't tend to hear a certain kind of technical defect that would bother other people. Likewise, there's the reverse (and I suspect much more likely bias here): some people are bothered by defects that wouldn't even be noticed by other people.

But these bias issues don't seem to be a decisive objection to me. It's like a vision test. I doubt people are actually preferring blurrings or defects (and also there's the high-anchor check), with the possible exception of brightening or sharpening if the original is flat or blurry etc. One person may have poor eyesight or be colorblind, and perhaps most people will seem defective next to a guy with 20/10 or 20/5 vision, let alone a guy with UV or IR vision abilities, but it seems that's not a deep bias problem, as it would be with judging the worth of what's being seen or heard. It would be a deep bias problem if there were radical individual differences between most human beings in hearing (then a general lossy encoding would be impossible), but those differences seem extremely minor compared to differences in taste etc., and I don't see people generally preferring any significant types of simple technical defects or blurriness etc in this area (unlike other areas of life...). If you're saying, with one person *only*, he might rank high anchor best, but then rank e.g. discoloration as less bad than blurriness, that's true. But with more people in the test such uniform kinds of bias are unlikely (I agree there are some exceptions, e.g. children will hear frequencies that the old won't, and also have better eyesight).

So: (1) there are partisans of crappy music, and partisans of good music, but you don't see partisans of types of acoustic distortion -- i.e. no deep bias / judgment issues; and (2) I suspect--but I'm just an uneducated layman--that there's no significant preference for one type of distortion over another in these tests, when talking of distortions actually hearable by a human mind, if the distortion is of significance. i.e. maybe BUUUUUUUUUUUUU over MYOINGYOINGYOINGYOING if they're both extremely and equally slight effects, but not once either gets large or gets much larger than the other. That's just a guess. Perhaps that should be tested by double-blind tests: deliberate testing of sensitivity to pre-echo versus dampening versus cutoffs, or types of warbling etc. I'd be curious what the people who write the encoders, who probably know, say about which preferences exist for what distortions, but I suspect that, if those distortions are actually hearable by a human being (i.e., rather than being beyond human auditory range or being cancelled out by other sound etc.), that they are going to be very slight and few. The more serious bias is the non-random selection of the samples themselves, i.e. if the samples are skewed (apart from, of course, being music, and preferably music one actually wants to hear) such that one encoder tends to do well on them (certain problem samples etc.) but I think they try to be careful about that.

Quote from: Porcupine on 2007-06-02 00:06:56

My personal view is that there is no way to 100% objectively compare any encoders at low bitrates such as 128 kbits/s. On many or most materials they are not transparent. Depending on your personal listening abilities, they may be quite far from transparent.

If you have 2 non-transparent, but different-sounding samples, how do you say which is better? You can only say what is "better" or "worse" to you but that's subjective. You can't double-blind test that. You just have to say it, and it's subject to your own opinion and what you happened to notice and what you didn't happen to notice.

For years I thought Fraunhofer 256 kbps (mp3 producer pro 2.1 version) was pretty good. Then one day I realized that on almost every song with bass notes, you can hear the bass notes wobble (this may be a violation of forum rules since I'm not providing proof, so if anyone requires me to retract this statement I will). On LAME or the original WAV file a bass note sounds powerful and steady....like this BUUUUUUUUUUUUU. But on Fraunhofer it sounds like this MYOINGYOINGYOINGYOING. To me, 99% of songs can be identified from the original in a 1-second clip because you just have to listen to this bass difference. (it's pretty hard though, take lots of practice and probably only marginally ABX-able...again no proof so I will retract if required).

Different codecs function so differently (even different codecs of the same format...such as mp3...LAME and Fraunhofer have totally different encoding strengths and weaknesses and settings) that when you compare two 128 kbits/s recordings that happen to ABXable from each other and the original...you're just going to be picking on whatever YOU notice...which is only a subset of all the things that can be noticed...and your judgement will be extremely biased in my opinion.

EDIT: You can test all you want and post it here, I have no problem with it. But I would raise a complaint if you tried to post the results of your personal 128 kbps listening test on wikipedia.

Historical Lame Listening Test?

Reply #8 – 2007-06-02 08:32:30

I think any test usually brings us more information, and the point is not to be nitpicking with the results and of course weigh the the outcome according to the personal preferences or abilities.
Looking at myself I absolutely dislike audible tonal distortions, but I am not sensitive towards pre-echo and don't need a very good behavior concerning extremely high frequencies. I can also easily live with a very slight amount of (natural sounding) audible added noise on rare occasion. Of course I favor an encoder with a characteristics which matches this and I am fond of learning about such encoders. Other people with other preferences can do it accordingly.
The only problem with listening tests is for people in the mood of 'easy reading' who just have a look at the final graphical (or numerical) result. But that's up to these readers.

Notice