New Listening Test
Reply #193 – 2006-05-21 13:02:45
Also to be true MUSHRA, there must be at least one anchor, and the recommendation for that first anchor is 3.5 kHz lowpassed. After all, the procedure is termed MUlti Stimulus test with Hidden Reference and Anchors. I think this part is probably more important than the 100-interval scale or the labels or maybe even eliminating the side-by-side reference for each codec, because the anchors are what standardizes the ratings. I wonder if a current 48 kHz test isn't becoming too good for the original intent of MUSHRA? ff123 My original intent was to get a better scale than the current one. I already explained why I dislike the usual 1...5 one (in short: the scale is too small and the corresponding description isn't well balanced for my own taste). In that perspective MUSHRA was interesting but I didn't mean that we have to religiously follow all MUSHRA recommendations. Anyway, previous experiences are proving that MUSHRA can't pertinently be used « as it » with listening tests involving HE-AAC. A 3.5 KHz lowpass as obligatory low anchor with contenders like HE-AAC (where lowpass is often = or > 16 Khz) is close to be nonsensical. And the high-anchors (7.5 lowpassed) were also rated as significantly inferior to HE-AAC encoding; it defeats the purpose of high anchors! Exemples:http://www.ebu.ch/en/technical/trev/trev_283-kozamernik.pdf http://www.telos-systems.com/techtalk/host..._AAC_paper).pdf and also one previous test from Roberto using the same anchors My idea would be to think about and finally decide about a new, well-balanced and universal scale for the upcoming listening tests rather than follow existing recommendations. A quick conversion should explain why the current one (1-5) is a problem. If you convert it to a 100% scale, it will look exactly like this: CURRENT SCALE 100% SCALE transparent 5.0 100% perceptible but not annoying 4.0 75% slightly annoying 3.0 50% annoying 2.0 25% very annoying 1.0 0% 75% for something that is not annoying? ! A med -iocre [50%: mid of the scale] ranking for something that is just slightly annoying? Does it sound balanced to you? 1/ I would personaly say that « perceptible but not annoying » is a state, not a level, and that it barely accepts nuance. And for harsh, demanding or intolerent people, the sole existence of such state may even be nonsensical: every audible difference starts to be annoying (this point of vue is defendible). In my own perception, the « perceptible but not annoying » state would correspond to very subtle distortions I can hear with high lossy encodings (160 kbps or more) during specific conditions (like meticulous ABX trials) and should be at 95% of a 100% scale . Maybe 90% for rounding purpose, but certainly not 75% like in our current 1-5 scale. 2/ The « slightly annoying » state would correspond to artefacts or distortions I can hear without too much effort but that don't make my hackles rise. To give a concrete example it may correspond to LAME -V5 encodings: they're not completely transparent; I don't need two hours to succesfully ABX it; but quality is pretty good despite the few audible differences I can perceive. On a 100% scale it would correspond to 75...80% (with some nuances: 85% for distortions that are near non-irritating; 70% for artefacts that are just slightly more than slightly annoying. Would it be a scandal if collective listening tests would end with LAME VBR 130 kbps at 80...85% of a 100 full quality? I don't think so.=> my own vision of the top of a balanced scale would shift our current scale from one step (4.0 = slightly annoying and 4.8 = perceptible but not annoying). If everybody agrees with it, we could follow:100% = transparent /unABXable encoding 90% = perceptible but not annoying 80% = slightly annoying 70% = slightly annoying 60% = annoying but decent 50% = annoying but decent 40% = unpleasant 30% = very unpleasant 20% = bad/very bad 10% = very bad 0% = chaos, total mess or something like that (someone with a better english lexical knowledge may refine it). That way, I could imagine as possible something like that: MP3 ~192 kbps = 95...100% MP3 ~160 kbps = 90% MP3 ~130 kbps = 80% OGG ~100 kbps =70% OGG ~80 kbps = 60% HE-AAC 64 kbps = 50-55% HE-AAC 32 kbps = 40-45% WMA 64 kbps = 25-30% etc... It's of course my own vision (or imagination). But it gives me a coherent position and coherent distance among different coding solutions at different bitrate - something I can't get with the current scale (HE-AAC would obtain 2.0 "annoying" but then I don't have room enough to coherently rate a low anchor (3.5Khz) and WMA 48 kbps: the low anchor should get 0 but the minimum is 1/5).