Skip to main content
Topic: Proposition for database of listening test samples (Read 8800 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Proposition for database of listening test samples

Please comment on this idea:

We create online database of listening test samples. Then define some subsets:

1a. Samples that must be tested in 24-48kbps
1b. Other samples that are candidates for 24-48kbps test
2a. Samples that must be tested in 56-98kbps
2b. Other samples that are candidates for 56-96kbps test
3a. Samples that must be tested in 112-320kbps
3b. Other samples that are candidates for 112-320kbps test
4. Cases specific for particular encoder/format
5. Cases where some people hear problems even at high bitrates

Bitrates are only approximations and actually represent 3 levels of encoding quality (low bitrate, acceptable for portable, near transparent).
We can also create subsets for 4 levels.

When testing only one format we include samples that represent problem only for one format (from 4)
in the corresponding 1a-3b. For each test case in 4 it is defined at which bitrate it starts to produce problems.
Subsets 1a, 2a and 3a should have only small number of known problematic samples for the target bitrate and should always be included in a test for the target bitrate. They should represent problems for the bitrate,
but not total misuse of low bitrate (e.g. fatboy should not be in 1a).

Samples in subset 1b, 2b and 3b should represent wide variety of music (different genres).
Some samples from subset 1b, 2b or 3b are chosen based on some public lottery numbers.
Chosen samples together with corresponding 1a, 2a or 3a are then test samples.

To which subset a sample belongs is to be decided based on previously conducted public listening tests.
A sample can be in more than one subset, but subsets 1a and 1b, 2a and 2b, 3a and 3b don't have intersections.
New samples can be added and a sample can change subset it belongs to.
To add new sample, a person proposes one. Then it should be tested by at least 2 people using different encoders/formats.
Based on the results it is decided to which subsets should it be added.
A sample can also be removed from database if consensus is met that it is not interesting anymore.

Subset 5 is specific and samples belonging to it can also be part of other subsets. 3a and 5 shouldn't have an intersection - because a sample from 5 can not be distinguished by most people at high bitrates.

All samples should be very short (< 20s, preferably 8-15s). If previously used sample in a public listening test is longer, then it should be shortened (removing uninteresting parts) or split into 2 parts.
Big number and wide variety of samples should compensate for short duration.
Having short samples attracts more people to a listening test and makes them more concentrated.
Also online space issue would be easier to deal with.

Final decisions about a sample being added or removed will be problematic in some cases, and I hope we will find a solution for it. Maybe some board of well known HA members.


Proposition for database of listening test samples

Reply #2
IMO there should be only small number of samples in 1a, 2a and 3a (maybe 4-6 per subset).
These samples should be those that any or most  encoders of interest can not encode transparently at the target bitrate specific for the group. These should be samples for which most people can hear differences. Good start would be samples from previous tests for which most people for almost all encoders were able to distinguish encoded from original.

To make one more thing clear: by public lottery I mean results from real public lottery like http://www.national-lottery.co.uk/player/p...ults/results.do

Proposition for database of listening test samples

Reply #3
To make one more thing clear: by public lottery I mean results from real public lottery like http://www.national-lottery.co.uk/player/p...ults/results.do
That's a good proposal, provided it is done with lottery numbers which aren't available yet. For example, assign each sample a number (using any criterion, it doesn't matter), then use next week's lottery numbers to select the samples. It might seem obvious, but is important. Also, how do you map M samples to a lottery with balls labeled 1..N? Perhaps using another "public" random number (like the first ball drawn in next week's lottery) to seed a strong PRNG is the best way to go. That way you can chose K samples from a pool of M for any M and K.

Still, the picking of which samples are a and which are b is a little problematic. If you are going to such lengths to ensure fairness, you need to address this step more clearly.

Proposition for database of listening test samples

Reply #4
Additional note: subsets 1a-3a should exists because randomly chosen samples from 1b-3b might not produce good enough challenge for a listening test and encoders may falsely get to high grade.

Proposition for database of listening test samples

Reply #5
Additional note: subsets 1a-3a should exists because randomly chosen samples from 1b-3b might not produce good enough challenge for a listening test and encoders may falsely get to high grade.

I agree with you completely, but the difficulty is in selecting the samples in 1a-3a in a way which is "fair" to all codecs. If you select well known problem samples for a particular codec, the result could be significantly skewed.

Proposition for database of listening test samples

Reply #6
I agree with you completely, but the difficulty is in selecting the samples in 1a-3a in a way which is "fair" to all codecs. If you select well known problem samples for a particular codec, the result could be significantly skewed.

I also agree with you, it is difficult to make a fair decision.
Of course problem samples particular for a codec or format should be in subset 4 and not in 1a-3a.

Proposition for database of listening test samples

Reply #7
Addition for choosing samples:

3 groups of people are differentiated:
1. Everybody can suggest which sample should be added/removed from a group (group and subset are synonyms)
2. Recognized HA members (i.e. members that joined before 2007 or that have more than 50 posts). ABX results from these people can be trusted
3. Administrators of the database - people interested in actively administrating database and chosen by Recognized HA members. These people decide (by voting if necessary) if a sample should be added/removed to a group. Their decision should be determined only by posted ABX results of Recognized HA members and logged.

Maybe it was not obvious but which samples belong to 1a-3a should be chosen before decision to start some public listening test. Only small changes are allowed after discussion about a listening test has started. Any such decision will be noted and should be posted together with results of the listening test.

This gives much more transparency about choice of samples for a listening test. Before it was final decision by organizer of a test which samples are chosen for the test. With this database, some samples will be chosen before it is even decided that a test should be conducted and some will be chosen by a public lottery. Any specific modifications to chosen samples will be noted and known to the public.

Proposition for database of listening test samples

Reply #8
Should the a) groups contain music from all genres like at one pop / rock, one jazz, one classical and one whatever sample or can you also have all or multiple samples from one genre? I'd say the genre should be different.

Proposition for database of listening test samples

Reply #9
Having different genres in a) groupds is good idea IMO.

Proposition for database of listening test samples

Reply #10
Is there any interest for such database in the public?
Any opinions if having such database would make tests more relevant?
Any opinions at all?

Proposition for database of listening test samples

Reply #11
I guess the general interest is not that big since I am actually the only one who organizes public listening tests more or less regularly. However, for me, such a database would be a huge help since choosing samples for a listening test would be much easier and less time consuming.
Also, I think maintaining the DB is not going to be a huge problem and I would be more than happy to help setting it up.

Proposition for database of listening test samples

Reply #12
Haha, you guys are creating a whole hierachical political system here

So, say I wanted to conduct a listening test with 18 samples (that will probably happen by the end of the year, but I'll discuss that more throughly later). How would I go about using the database to select samples? I'm quite confused.

Proposition for database of listening test samples

Reply #13
You define what bitrate your test should cover. Based on that, you choose samples from the respective group (there are two or three groups with samples that should fit the respective bitrates).
Each group has a set of mandatory samples that have to be tested (four or five samples covering the most important genres) and a large group of samples that can be tested. So you choose all samples from the mandatory set and then wait for public lottery to tell you which samples to choose from the optional set.

Of course you don't have to use the DB if you don't want to. I just thought it might be helpful and since public lottery is a neutral party, people won't complain "Why did you choose samples xyz, zyx and yxz? You knew encoder abc performs worse on these and you did it on purpose".

 
SimplePortal 1.0.0 RC1 © 2008-2020