Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Proposal on listening tests (Read 20918 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Proposal on listening tests

Hello.

It has been a month since the (official) finish of the last listening test, and I'd like to propose some tests that can be conduced by the more courageous people out there. It's about time new tests start getting planned.

There have been several proposals. I believe the one deserving more attention is a speech listening test, comparing speech samples against several different speech codecs (GSM, WMA Voice, Speex, g729, MPEG4 CELP, PureVoice/CDMA...) in both wideband and narrowband mode. I'm confident jmvalin would be able to help the conducer choose the adequate encoders and settings.

Another test, proposed by Danchr, would be testing open source encoders at some bitrate. LAME, Vorbis, FAAC, maybe Musepack...  such test would surely be of interest to users using platforms other than Windows, specially Linux users.

Sthayashi proposed a test comparing several AAC encoders againt Vorbis, to see how Vorbis performs againt encoders other than iTunes. I guess that the answer is now clear that Vorbis will perform better than all AAC encoders at 128kbps, since if it won even over the best of them. But the proposal is made.

Last but not least, ScorLibran proposed a transparency thresold test. It would somehow detect at which average bitrate each codec first reaches transparency. That would serve as proof if Musepack is still the codec that offers transparency at lowest bitrates, or if the recent developments in all other codecs obsoleted Musepack in this aspect.

Of course, the most courageous (or nuts) out there might be dreaming of a 160 or 192kbps test. I personally believe this is madness, but hey, don't let me stop you :B

I personally don't think that conducing another multiformat at 128kbps, or AAC at 128kbps test would be justifiable right now. There has hardly been any development in the mainstream encoders since my last tests were conduced, so it would just be a waste of resources. Maybe next year?

This is what I can offer to help starters:
- Hosting the sample packages
- Hosting the torrent tracker and help seeding from fast servers
- Help you with answers and hints about test conducing to the best of my knowledge

All this for the low, low fee of zero bucks

All other responsabilities would belong to the test conducer: choosing the sample set, deciding on codecs, versions and settings, managing eventual pre-tests, gathering and processing the results, and the most dreaded question - VBR or CBR?

Hope I can spark some interest with this invitation. We definitely need someone to pick up from where I left.

Thanks for your attention.

Best regards;

Roberto.

Proposal on listening tests

Reply #1
Crap. The subtitle should have been "What should be conduced next?". It's a question, not an order. Please ignore that.

Proposal on listening tests

Reply #2
The test I would like to see done is various codecs at 80 to 96kbits, to me no codec in stereo can perform well at the 64kbits level.  Maybe after HE-AAC with PS comes out it will help use some of it's extra bits to change my mind, but that's yet to be seen....errr heard. 

Do something along the lines of Mp3pro, HE-AAC, Vorbis, wmaPro, ect  at both 80 and 96kbs.  The reason for both is also to see if it is distinguishable between those 2 bitrates as well with the same codec.


Just so ya understand, encode all the sample in both 80kbs and 96kbs with each codec.

Anyway, that's what I'd like to see the conclusion to.


***Edited Part***
Of course these want be transparent, that isn't what I wondered.  Just which sounds the best to the public at large and is their some breaking point at the lower bitrates.

Proposal on listening tests

Reply #3
Quote
Last but not least, ScorLibran proposed a transparency thresold test. It would somehow detect at which average bitrate each codec first reaches transparency. That would serve as proof if Musepack is still the codec that offers transparency at lowest bitrates, or if the recent developments in all other codecs obsoleted Musepack in this aspect.


this is it .... =)

also ... a transcoding test should be good ... from MPC, Vorbis, to LAME ...

Proposal on listening tests

Reply #4
Quote
Last but not least, ScorLibran proposed a transparency thresold test. It would somehow detect at which average bitrate each codec first reaches transparency. That would serve as proof if Musepack is still the codec that offers transparency at lowest bitrates, or if the recent developments in all other codecs obsoleted Musepack in this aspect.
[a href="index.php?act=findpost&pid=233782"][{POST_SNAPBACK}][/a]


I'll second this. It'll be interesting to know where transparency occurs, although then it wouldn't make sense to use the standard 'killer samples.' Well, I suppose if you want a general ratio it's ok (ie. AAC reaches transparency at 80% the bitrate of MP3), but for an absolute value (ie. MPC is transparent at ~160) normal music should be used.
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2

Proposal on listening tests

Reply #5
Most people wouldn't be able to do this test. Guruboolez did test normal music (mpc vs vorbis vs lame) a few weeks ago, concluding that mpc was superior. I suppose where transparency occurs is dependant on the sample used.

If mpc q5 avg 170k, then we would use q5-5.5 for vorbis, lame V2,V3

Vorbis has quality issues below Q6, lame 3.96.1 aps or apm will match mpc bitrates. Nero AAC 'normal' profile should be used against mpc.

My bet is that the other codecs don't stand much chance at these bitrates. At >200k things even out more or less.

Proposal on listening tests

Reply #6
I would definitely like to see a speech codec test and, eventhough I don't have enough time to organize it, I can provide help for choosing the codecs and test samples.

I think a speech codec test is probably more complicated (in some aspects at least) than for music because most speech codecs usually only one bit-rate (even Speex doesn't have a continuous range like Vorbis or MP3). Actually, the only way I see for comparing the codecs is to plot the results on a quality vs. bit-rate graph.

These are the codecs which I think would be the most interesting to have:
narrowband: Speex (8, 11, 15 kbps), iLBC (15.2 kbps), AMR-NB (8, 10, 12 kbps), G.729A (8 kbps), GSM-FR (13 kbps), QCELP?
wideband: Speex (12.8, 20.6, 27.8 kbps), AMR-WB (2 or 3 bit-rates), G.722 (reference, 64 kbps), VMR?

The choice of samples is also important I think. Do we want only clean (studio-like) samples or samples that would cover other applications like VoIP (samples with background noise) and broadcast (samples with light music background). Even the filtering would be important as some codecs don't react well when there's lots of low frequencies (especially the narrowband ones).

Proposal on listening tests

Reply #7
I also vote for a transcoding test, I'd like to see how Musepack performs.

Proposal on listening tests

Reply #8
Quote
I also vote for a transcoding test, I'd like to see how Musepack performs.
[a href="index.php?act=findpost&pid=233802"][{POST_SNAPBACK}][/a]


This test already happened. Sthayashi conduced it. He discussed and announced the test here. Nearly nobody participated :B

Proposal on listening tests

Reply #9
Quote
Most people wouldn't be able to do this test. Guruboolez did test normal music (mpc vs vorbis vs lame) a few weeks ago, concluding that mpc was superior. I suppose where transparency occurs is dependant on the sample used.

I think most people would be able to do this test.  We're very likely not talking bitrates above 160kbps, but rather closer to the 96-128 range.  People would ABX each bitrate until they couldn't distinguish one with p<0.05.  That's their transparency threshold for that format and sample.  Wash, rinse and repeat for each other format and sample across all participants, then average all resulting bitrate thresholds and present them by format with a standard ANOVA error margin.  The whole thing would follow ITU-R BS.1116-1 standards as much as possible.  As for making the test "participant-friendly", that's something I've made it a high-priority to do when this test starts the planning phase.

Quote
If mpc q5 avg 170k, then we would use q5-5.5 for vorbis, lame V2,V3

Vorbis has quality issues below Q6, lame 3.96.1 aps or apm will match mpc bitrates. Nero AAC 'normal' profile should be used against mpc.

My bet is that the other codecs don't stand much chance at these bitrates. At >200k things even out more or less.
[a href="index.php?act=findpost&pid=233796"][{POST_SNAPBACK}][/a]

This is exactly why I think we need this kind of test, to resolve these issues and eliminate the need for speculation of sound quality and efficiency with formats tested seperately and at different points in time.

Proposal on listening tests

Reply #10
Transparency test would be interesting.  I'm curious to see where most people end up considering modern codecs transparent and to see if the recent developments in codecs like Vorbis have really helped them a lot.
Nero AAC 1.5.1.0: -q0.45

Proposal on listening tests

Reply #11
Quote
Quote
Most people wouldn't be able to do this test. Guruboolez did test normal music (mpc vs vorbis vs lame) a few weeks ago, concluding that mpc was superior. I suppose where transparency occurs is dependant on the sample used.

I think most people would be able to do this test.  We're very likely not talking bitrates above 160kbps, but rather closer to the 96-128 range.  People would ABX each bitrate until they couldn't distinguish one with p<0.05.  That's their transparency threshold for that format and sample.  Wash, rinse and repeat for each other format and sample across all participants, then average all resulting bitrate thresholds and present them by format with a standard ANOVA error margin.  The whole thing would follow ITU-R BS.1116-1 standards as much as possible.  As for making the test "participant-friendly", that's something I've made it a high-priority to do when this test starts the planning phase.

[a href="index.php?act=findpost&pid=233804"][{POST_SNAPBACK}][/a]

Agreed. The point of this test would to figure out at what bitrate people won't be able to do the test (so to speak). Everybody will be able to input their particular threshold, no matter how bad their ears are.

I can say for sure that my results will be around the 96 range. My hearing doesn't go above ~12khz(*), and I have found previous listening tests quite difficult. But it would still be good to know if, for example, AAC was transparent at 80kbps and MP3 at 128.

I will definitely participate in this test, should it occur.

(*) I can hear a single sine wave at 14khz, but I can't ABX a 12khz lowpass on normal music
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2

Proposal on listening tests

Reply #12
Ouch!

Did something happen when you were younger to damage your hearing?

Last time I checked I could hear a single sine wave up to around 18kHz.  I usually can ABX a 16kHz lowpass but not always.
Nero AAC 1.5.1.0: -q0.45

Proposal on listening tests

Reply #13
Not that I recall, but maybe it damaged my memory as well 

<Pointless story>
I remember making a hearing test for myself with a 17khz sine wave. I played it, turned up the volume slowly, but couldn't hear a thing. Just then my friend opened the door, and he acted like he got hit in the head with some invisible brick. He said that's exactly what it felt like. For days he would come up to me and say "I can't believe you didn't HEAR that! Do you know how loud that was?!" Pretty loud, I guess. 
</pointless story>

What's kind of odd is that I'll worry about audio quality to no end. I keep wondering if there's something that I'm not hearing, but that was somehow deterring from my overall enjoyment. I kept prowling this forum, looking for any codec that might be better than what I was using at the time. In the end I was using MPC Xtreme, even though I probably couldn't tell the difference at half the bitrate.
Eventually I decided to save myself the emotional stress and re-ripped to FLAC. It probably uses up 8 times the disk space than I need, but it saves my mind. In the end, that's what really matters.
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2

Proposal on listening tests

Reply #14
I'm really bad about worrying about quality.  I had to spend 3 days talking myself out of going from -q6 to -q7 Vorbis for my portable even though I knew I wouldn't be able to really hear a difference.  That's why I really think the transparency test would be good.  Peace of mind.
Nero AAC 1.5.1.0: -q0.45

Proposal on listening tests

Reply #15
I'd really like to see the transparency test happen. But it's a tough one because it needs people than can really spot the smallest glitches in codecs.

In the other hand I think most people would like to use the result

Proposal on listening tests

Reply #16
I would like to know if anyone else in interested in a listening test to determine the effect of post processing (eg EQ, Compression etc.) on a codec after compression.

Is it easier or harder to ABX?

Perhaps the same processing could be applied to the original uncompressed file and the file after compression.

Any takers?

-Iain

Proposal on listening tests

Reply #17
I've been thinking about how to do the transparency test as objectively as possible. The problem is that there would be a LOT of files. For example, if one wanted to do a test of MP3, AAC, MPC, Vorbis on 10 different samples, with 4 bitrates, that's 160 separate files, and up to 160 ABX sessions.

Well, if one feels like trusting people, it could be an informal thing. Download a FLAC, compress it yourself, and tell whoever's doing the test your lowest non-ABXable bitrate.

But then zealots could easily tip the scales ("OMG MPC @ 300kpbs and OGG @ 64!!!1"), so if one wants a truely scientific test, it would have to be encrypted, and compression settings detemined beforehand. As far as I know, there is no program that will blindly ask you to ABX a bitrate, then if you pass go on to a higher bitrate, etc.. Basically, I think it will be hard to implement.

ABChr could do it, but not very efficiently. One would download a 64kbps sample, and ABX it. Then go on to 96, and ABX it, then 128... But what if they could ABX a codec at 128 but could NOT at 96? I don't really know.

PS. There's a very good reason this post reads like a raw brain dump. 

PPS. Was HA not working for a few hours a little while ago, or what?
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2

Proposal on listening tests

Reply #18
My vote is for a high bitrate listening tests on problem samples. We all know that most codec perform well and are nearly transparent at 192 with the exception of problem samples. It would be nice to see which codecs handle these samples the best when compared at "advertised" transparent settings (probably VBR near 192 avg)

Proposal on listening tests

Reply #19
Quote
My vote is for a high bitrate listening tests on problem samples. We all know that most codec perform well and are nearly transparent at 192 with the exception of problem samples. It would be nice to see which codecs handle these samples the best when compared at "advertised" transparent settings (probably VBR near 192 avg)
[a href="index.php?act=findpost&pid=233930"][{POST_SNAPBACK}][/a]


The problem with such test, as I already wrote here some times, is that using problem samples leads to non-representative results.

That is, you can't guarantee codec X is the best at 192kbps just because it encodes problem samples better than the competition. At most, you can say it's the best when encoding problem samples.

Proposal on listening tests

Reply #20
Add another vote for the transparency test, it would be an interesting challenge for the conducer to say the least
< w o g o n e . c o m / l o l >

Proposal on listening tests

Reply #21
What about transparancy test on EASY SAMPLES (just music - some proportional mix of metal, pop, classical e.t.c samples but not the hard ones)?
Ogg Vorbis for music and speech [q-2.0 - q6.0]
FLAC for recordings to be edited
Speex for speech

Proposal on listening tests

Reply #22
Quote
What about transparancy test on EASY SAMPLES (just music - some proportional mix of metal, pop, classical e.t.c samples but not the hard ones)?
[a href="index.php?act=findpost&pid=233953"][{POST_SNAPBACK}][/a]

Do you mean easy for the encoder, or easy to ABX? They're quite different.

I suggest against using only easy-to-ABX samples, as there should be a representative dififculty level in order to get an absolute conclusion. That is to say, if only the trouble samples (easy to ABX) were tested, the transparency bitrates would be artificially inflated.
"We demand rigidly defined areas of doubt and uncertainty!" - Vroomfondel, H2G2

Proposal on listening tests

Reply #23
Quote
Quote
My vote is for a high bitrate listening tests on problem samples. We all know that most codec perform well and are nearly transparent at 192 with the exception of problem samples. It would be nice to see which codecs handle these samples the best when compared at "advertised" transparent settings (probably VBR near 192 avg)
[a href="index.php?act=findpost&pid=233930"][{POST_SNAPBACK}][/a]


The problem with such test, as I already wrote here some times, is that using problem samples leads to non-representative results.

That is, you can't guarantee codec X is the best at 192kbps just because it encodes problem samples better than the competition. At most, you can say it's the best when encoding problem samples.
[a href="index.php?act=findpost&pid=233944"][{POST_SNAPBACK}][/a]


A number of tests have already been done to show which codecs are best at 128. I dont think many ppl would be able to abx many samples at 192, so it would be pointless. However, if only problem samples are used (with the assumption being that all of the codecs would be essentially transparent for most listeners - even those with tuned ears and good equipment), that the best codec would be the one that handles most of the problem samples well.

Proposal on listening tests

Reply #24
Quote
that the best codec would be the one that handles most of the problem samples well.
[a href="index.php?act=findpost&pid=233958"][{POST_SNAPBACK}][/a]


Yes, the best codec - for problem samples!

There's no guarantee that it will show the same behaviour on "normal" samples. And what's the point of a test that only show results applicable to a small share of the musical styles?