Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: www.soundexpert.info (Read 5318 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

www.soundexpert.info

Seems to be interesting and helpful.  www.soundexpert.info
Any other opinions?

Serge.
keeping audio clear together - soundexpert.org

www.soundexpert.info

Reply #1
Quote
For the bitrates 128 kb/s and higher, where the difference between the reference and coded sound material is less perceptible, a different rating scale with "digital audio zoom" will be used. For the mentioned bitrates reference sound material will be coded-decoded for several times to make the drawbacks of audiocompression audible for the most of participants (testers).


I disagree with this technique.  I'm of the opinion that quality differences should be arrived at "fairly" by comparing only the first generation encoded file against the original.

The use of ABC/HR is laudable, but it would be much nicer for the statistics if it were somehow possible for them to keep track of all of each listener's inputs and group them together.  I don't think that is done currently because participation is anonymous (or at least only tracked by IP address, if at all).

For example, suppose listeners submitted multiple responses, and SoundExpert was able to group all the responses together per listener.  The ABC/HR results could be used to eliminate all the entries of a "noisy listener" (someone who is mostly guessing).  Right now, SoundExpert's only option is to eliminate responses where the original is rated higher than the encoded files -- each listener's results must be weighted equally to every other listener's results.

I like the idea of the random selection of a file to listen to (called a Latin Square Design in "Sensory Evaluation Techniques.")

I'd be interested in knowing what statistics will be used.

ff123

www.soundexpert.info

Reply #2
Serge Smirnoff:

Please do not cross post to multiple forums.  I deleted the duplicate in the main forum since this one already had a response.

I'll try to check out the page some later and post a more useful response

www.soundexpert.info

Reply #3
1. I've tied, but can't download any files from them.

2. To compare the quality of 32kbps mp3 to 32kbps WMA, a more appropriate test would be MUSHRA or the like.

Let me clarify. Asking someone to listen to the CD, and then listen to the 32kbps mp3, and finally grade the difference will usually give a very very bad score. Then asking someone to listen to the CD, and then listen to the 32kbps WMA, and finally grade the difference will usually give ANOTHER very very bad score.

How "comparable" these scores are depends on how well the user can remember or maintain their own opinion of sound quality from one session to another. In other words, if the user forgets whether a certain quality of sound is "annoying" or "very annoying", then comparisons of quality across listening sessions are impossible. There are other reasons why this type of test is a poor choice for poor quality audio (e.g. everything gets a "1"), but that's the main one.

I would suggest:
1. play user the original. tell them it is the original.
2. Get them to rank versions A, B, and C. A, B, and C are the original, WMA, and mp3 (in a random order). Best=closest to original. Worst=most different from original, or most annoying.
3. Discard all results where the original (hidden ref) is not chosen as the best.

You shouldn't need to ABX mp3 at 32kbps!!! But when they move up to mp3 at 128kbps and above, the kind of joint ABX and grading system proposed recently on the r3mix.net message baords would be good.

Just my thoughts. The results of their test will still be interesting (if anyone else can download a test file!), but would be better at this quality level if a direct comparison was being made.

Cheers,
David.
http://www.David.Robinson.org/

www.soundexpert.info

Reply #4
It seems it's not MP3 at 32kbps, but MP3pro.

They say they used the Nero MP3pro encoder.

www.soundexpert.info

Reply #5
David,

Good point about needing to do direct comparisons of encoded files at low bitrates (not just encoded vs. original).

It would be interesting to come up with some sort of scheme to be able to determine a listener's ability, so that ratings from sensitive listeners would count for more.  In the real world, though, such a thing might encourage politicking and cheating. 

But at least a baby step in this direction would be the ability to group results of each listener together.

ff123

www.soundexpert.info

Reply #6
Quote
Originally posted by ff123


I disagree with this technique.  I'm of the opinion that quality differences should be arrived at "fairly" by comparing only the first generation encoded file against the original.


I'm not sure in this technique as well. But I still believe that correlation between the results of such tests and ordinary ones (ABX/HR) could be high enough. Good chance to prove that.

Serge.
keeping audio clear together - soundexpert.org

www.soundexpert.info

Reply #7
Quote
Originally posted by ff123


The use of ABC/HR is laudable, but it would be much nicer for the statistics if it were somehow possible for them to keep track of all of each listener's inputs and group them together


I don't think it's a good idea to try to meet all ITU-R requirements in Internet testing. We need to be honest - it is not possible (listening environment, specially prepared listeners and so on...). Instead, it would be much wiser to develop new methodology for Internet testing on the basis of the classical one. The main feature of Internet testing is broad participation of different ordinary listeners. It would be very nice if we could derive some useful information from that.



Quote
Originally posted by ff123


I'd be interested in knowing what statistics will be used.


I'm too. But I suppose it will be highly determined by the "quality" of the raw data received.


Serge.
keeping audio clear together - soundexpert.org

www.soundexpert.info

Reply #8
Quote
Originally posted by 2Bdecided
1. I've tied, but can't download any files from them.
GetRight,  IE 5.x


Quote
Originally posted by 2Bdecided
To compare the quality of 32kbps mp3 to 32kbps WMA, a more appropriate test would be MUSHRA or the like.

Let me clarify. Asking someone to listen to the CD, and then listen to the 32kbps mp3, and finally grade the difference will usually give a very very bad score. Then asking someone to listen to the CD, and then listen to the 32kbps WMA, and finally grade the difference will usually give ANOTHER very very bad score.
Right you are. In case of one or few persons it will be exactly so. But SoundExpert appeals to thousands of participants and will offer them more then 10 codecs. This is completely different situation - individual psychology against group psychology and few results against thousands of results. Yes, there is some risk of calculating the ratings, which look like the average temperature in hospital. Well then, bad result is the result as well.

Serge.
keeping audio clear together - soundexpert.org

www.soundexpert.info

Reply #9
Quote
Serge Smirnoff


did anyone say Smirnoff?

www.soundexpert.info

Reply #10
So this is the situation: I want to know which is better, A or B.

I also have X, a reference, which is vastly better than A or B.

I can find out which is better in two ways:

1. Ask "which is better, A or B?" and "which is better, A or X?"

or

2. Ask "how much worse than X is A?" and "how much worse than X is B?". Then the better one is whichever is closer to X.

In both cases, anyone who thinks A or B is better than X isn't a suitable test candidate, so I discard their results.


I believe 1 will give better results than 2, mainly for pracitcal reasons...

The biggest problem in HQ tests is finding people who can really hear a difference. The next biggest problem is that everyone has a different idea of what the quality scale really means. And the third problem is figuring out how to filter out all the bad data, or statistically account for it.

I understand that you're saying that if enough people take part, it'll all cancel out and they'll be a nice trend. From experience, this is the last thing that'll happen - in the HQ audio tests I've taken part in without propper checks on listener ability, the noise almost swamps the data.


This test is different, because it's not HQ audio, and there is some filtering of listeners. However, in this test, there is a new handicap of not letting your expert (or otherwise) listeners actually answer the question you're asking! Rather than reducing the noise, the test methodology adds ANOTHER random variable.

If I ask "how much worse than X is A?" on Wendesday, and then ask "how much worse than X is B?" on Thursday, then the quantity by which the subject measures "worseness" will have changed. This adds noise to your data - and it's only added BECAUSE of the test methodology - it's not intrinsic to the data itself.

So, added to the fact that some people will genuinely disagree about whether A or B sounds better, and added to the fact that some people will be deaf to artefacts but will still slip through your safeguards, you now have an extra problem: the people who really hear a difference, and can definitely say which sounds better DON'T GET TO EXPRESS THIS OPINION!


Look at is this way:
I give you two cups of water - I ask you to dip your finger in each, and tell me which one is hotter. You tell me - you're probably right. Alternatively: Today I give you one cup of water and ask what tempreature it is - tomorrow I give you another cup of water and ask what temperature it is - on neither day do you have access to a thermometer, and your hands may be hotter one day than the other - but I'll still compare your temperature judgements and try to determine which cup of water WAS hotter.

It's stupid! It's making the task so much harder than it needs to be. It's preventing the people who could provide your with useful answers from doing so.

Given that you won't get millions of listeners, and that many of the listeners your do get will give random results, you need to protect the good data that you do get - not add even more noise to it. You may never get enough listeners to remove the noise.


End of rant! Having said all that, with low bitrate audio, I'd still be hopeful that trends will emerge. I just wish the test method made this more likely!

Cheers,
David.
P.S. why would I have to use IE5 to download something? Is it a cookies issue? I thought Netscape 2 could handle ftp OK?!


www.soundexpert.info

Reply #12
Good link. Will soundexpert.info take note of it before wasting a lot of people's time?

D.