Pre-Test thread
Reply #21 – 2003-08-23 08:38:59
I don't think it really makes sense to do this because it doesn't mirror a real world usage scenario. People are not going to use a different -q setting per sample to reach a set bitrate every single time they encode a different file. Instead, they pick a quality setting and stick with it. It may turn out that on this sample set, Vorbis averages a little low, but on another sample set, it's going to be the opposite. Given this and the fact that -q0 is widely recognized as given "64kbps" (even the Xiph guys seem to support this on a wide scale), this is the setting that IMO should be used. This sounds reasonable to me. The weak point I see here is the real world scenario. There are several theoretically thinkable ways to get (=measure) figures about bitrate-wise behaviour of the tested VBR codecs under average real world conditions. E.g. taking statistics about sold records of different genres, encode huge numbers of samples and calculate an average bitrate weighted by the statistics ... . All possibilities that come to my mind here are just too much effort. So there are two possibilities left: 1) Taking "Ogg vorbis -q0 averages 64kbps" as best possible assumption because "it's widely recognized as true" OR 2) Changing overall -q setting for the test as I've suggested. Both have their problems: 1) We try to set up a test to get hard, comparable figures out of human subjective perception by double blind testing, statistical analysis etc., but we choose codec settings based on an assumption that is nothing than "widely recognized as true". That could lead to an uncalculable insecurity of the results we get. 2) We "adjust" average bitrates, but we don't know how close they mirror a real world scenario either (it's very hard to define "real world scenario" anyway), which leads to a similar insecurity.As Roberto pointed out earlier, and I agree, I think that adjusting the settings away from the common incarnations, just to "set" the bitrate on this test, calls into question it's credibility. As I tried to explain above, both possibilities have similar problems about what you call credibility here.IMO, it's one thing to adjust settings downward to try and reach a set bitrate (as was done with MPC in the previous test), since this should only really have the affect of worsening the results, but it's entirely different to adjust the settings upward to compensate for a lack of accuracy in encoding. If Vorbis, or any of these other codecs, happen to use too few bits per sample in their VBR modes in any given case, that points to a possible flaw in the encoding scheme and any adjustment around this just goes to hide the very issues that we are trying to discern in the first place. I don't understand why you refer to vorbis needing fewer bitrates on some type of music as "lack of accuracy in encoding". One could also say "Vorbis is very good at encoding this type of music because it reaches a certain quality level needing less bits than encoding other music". Isn't this what VBR is about? So *not* adjusting Vorbis' bitrate could be seen as punishment for the good performance.This test is about quality, and the test subjects are VBR coders. The points of the test are to measure fidelity at a given quality mode, with bitrate being used only as a rough guideline and mode of classification (not implicit comparison). People should realize that very important fact and accept the implications that come along with it (possible VBR pitfalls) in the context of this test. Unfortunately the relationship between VBR bitrate and measured quality (1-5) isn't defined mathematically, e.g. there's no linear correlation. In situations like this a test can only deliver comparable results if only one parameter is measured while the others are fixed to the same level. (You can't compare how much fuel different cars need per 100 miles by letting each one drive at different speeds). Because of this bitrates have to be as close as possible, otherwise it'd be useless to measure quality. And finally, in the test results, we should be interested in the representation of -- and significance in relation to -- real world usage scenarios rather than technicalities beyond the concerns of the majority of the readers (something like using -q0.x vs -q0). As it's very hard to get a widely accepted definition of "real world scenario" (e.g. mine would consist of > 50% latin music ) and even harder to get total averaged numbers about bitrate-wise behaviour of VBR codecs we should take what we have got for sure and assume our set of test samples as mirror of "real world" IMHO. Both possibilities have their pros and cons - and I can understand and will accept (do I have a choice? B) ) if rjamorim decides to stick with -q0. [span style='font-size:7pt;line-height:100%']edit: grammar, typos, clarification[/span]