How are these listening tests conducted? I understand they are double blind ABX, which is great, but how are the listeners selected?
Ideally you'd want to weed out the listeners who cannot identify quality at a certain level - so you'd have a preliminary round with a known bad sample. Those who could not identify that would not progress to the full test session.
My worry is that we have a lot of info about tests, but maybe not so much about the listeners. I know many people who think that a 128kpbs MP3 is as good as it gets.
This is how.
A listening test organizer may opt to include one or more low anchor(s) among the codecs he/she really want to test.
Below is one example, used in a public listening test conducted in 2014 (https://listening-test.coresv.net/).
The listening test organizer's real interest in the first four, AAC(iTunes), Opus, Ogg vorbis, and MP3, but the listening test organizer decided to include two low anchors (FAAC) among the first four.
- AAC iTunes 11.2.2 with CoreAudioToolbox 184.108.40.206 via qaac 2.41 --cvbr 96 (Equivalent to "VBR enabled" in iTunes)
- Opus 1.1 with opus-tools-0.1.9-win32 --bitrate 96
- Ogg Vorbis aoTuV Beta6.03 -q2.2
- MP3 LAME 3.99.5 *bitrate is around 136 kbps. -V 5
- AAC FAAC v1.28 (Mid-low Anchor) -b 96
- AAC FAAC v1.28 (Low Anchor) *bitrate is around 52 kbps. -q 30
These two low anchors are radically, indisputably without the fidelity to the original, and that's why they are deliberately selected as the low anchors.
Then the listening test organizer can filter out some outlier results, like a result that says the low anchor is identical to the original, or a result that says original have a perceptible difference to the original, while reporting that lossy file is identical to the original - it can't be.
thank you, that clears it up.
I have seen the low anchor before, but it wasn't clear to me that this was used to remove outliers in the results. Thank you again!
how are the listeners selected?
Most people here are doing just the ABX test by themselves. ;) So it's as-much a test of your ability to hear compression artifacts (or other defects/limitations) as it is a test of the system or codec (etc.). It's a way to "prove" to yourself if you can reliably hear a difference or not. Then if you want, you can report your results here on the forum - "I can ABX this" or "I can't ABX this."
If you are publishing a paper you might want to get a random selection of listeners or you might want to use audio professionals and/or "trained" listeners or maybe just some interested volunteers. It depends on what you're trying to do. An audio company or audio magazine is likely to use handy volunteer employees or there are audio clubs or audiophile clubs that might use their members. ...It looks like a lot of people occasionally "have fun" with audiophiles who are not used to blind listening tests! (AFAIK most audio companies & magazines/publications don't do blind testing.)
It’s certainly my opinion that public listening tests are next to worthless unless you have trained listeners. Trained to what extent I’m not sure, but certainly not the average Joe.
Yes, using ABX to see where your personal level of transparency lies is perfectly valid, but those results do not reflect the quality of the codec in my opinion. It’s simply a “how good is my hearing” test.
I’d also suggest that the gear used should perhaps be mentioned. I’ve recently changed my monitor setup (went from Yamaha HS7 W sub and minidsp to Genelec 8341A) and it’s made an enormous difference to what I can hear in codecs (no codecs mentioned here becuase I don’t want to break ToS).
But I guess, where do you draw the line? Do you label a codec as not transparent becuase a mastering engineer in £1,000,000 studio can ABX? No, I wouldn’t suggest so.
However, I don’t want us to call codecs transparent when an untrained person can’t tell the difference when listening in a noisy room with poor quality ear buds.
Maybe it’s always a personal thing… Thank goodness for lossless ☺️
Edit. I’d like to add that this is just an open discussion point I’m presenting. I’m by no means and authority on this and there are certainly lossy codecs that are transparent to me at even moderate bitrates .
All of your comments are perfectly valid points requiring consideration. Which is why some formal listening test methodologies require things like
- only so-called "othologically normal" subjects as participants (age 18-25 IIRC, without any hearing disorders; this part is called pre-screening)
- pretraining of the test subjects qualified to participate - the more experience, the better usually (incl. handout of instructions before the test)
- exclusion of test results based on criteria like "mistaken hidden reference condition for a codec condition" (this part is called post-screening)
- listening in a controlled environment (listening with specific loudspeakers or headphones, with background noise below a certain threshold).
Plus what Kamedo2 said about anchors. The reasoning behind the training and those anchors is that they help the test participants understand the quality scale
they are judging stimuli on. That, in turn, makes the judgments more consistent and, therefore, facilitates identification of quality differences between codecs (or encoders) after the test since the confidence intervals usually stay smaller than without training and anchors.
Unfortunately, most of these things can only be realized in a lab, like the ones available to companies developing codecs. On this forum, some compomises have to be made, especially regarding controlled environment of course, since it can't be controlled by the test coordinator. Btw, there actually are formal test methodologies explicitly focusing on untrained listeners, especially in the speech coding area (e.g. for VoIP communication, see, e.g., section A.4 of ITU-T Rec. P.800 (https://www.itu.int/rec/T-REC-P.800-199608-I)).