New Public Multiformat Listening Test (Jan 2014)
Reply #337 – 2013-12-27 09:14:24
Finally the puzzle of equal bitrates for VBR encodes is solved. Here is how. A set of test samples L for a listening test can be obtained in two ways – (1) sampled from some population of music S or (2) chosen independently according to some criteria, problem samples for example. In case (1) if the test set is sampled properly (sufficiently and randomly) target bitrates of encoders are equal with both S and L , so it doesn't matter with what sound material they are calculated – whole population S or selected samples L . If there is possibility to find such settings that the target bitrates are equal then results of such listening test show comparison of encoders' VBR efficiency. If bitrates can't be set equal (due to discontinuity of q-parameters) then results of such listening test show comparison of encoders at specific settings. Such specific settings can be only of one kind – natural (integer) ones (as bitrates can't be set equal with S and consequently with L , so all other settings are just random without any meaning). In case (2) the test set L is already predefined and the population of music S which it is sampled from is undefined (a population of problem samples would be best guess). Consequently there is no possibility to calculate bitrates (and corresponding settings) with S . Any attempt to do this with some other music population leads to random variance of bitrates with the test set L , because the latter is not representative to that music population chosen out of the blue. That random variance in turn leads to variance of results making them less accurate. Thus in case (2) target bitrates can be calculated only with the test set L (no other sound material is present in the context of such listening test). As in the first case there are two choices – to make bitrates equal for the test set L (results then show comparison of VBR encoders efficiency) or to use natural (integer) values (results then show comparison of popular settings). All other settings are just random without any meaning. In case (1) the results of the listening test are biased towards population of music S which was chosen for the test (some genre or a mix of them). In case (2) the results are biased towards particular test set L . Case (1) needs much more sound samples in the test set because results are considered to be generalized to the whole population S . All of listening tests that were ever conducted belong to case (2) - the test set was chosen according to some criteria (problem samples, usual samples ...) but never sampled from some population as in (1). And the reason is quite obvious - with more samples (that case (1) needs) the test become labor-intensive but results are hardly better than with problem samples.