Skip to main content

Topic: What Kind Of Music For Testing? (Read 9671 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
What Kind Of Music For Testing?
What kind of music would be good for testing for like a mp3 and other codec comparisons? Im going to do a blind test with some of friends and family to get their opinons on which codec and bitrates sounds better and need some good examples.....thanks in head of time
  • Last Edit: 13 July, 2010, 04:47:24 AM by Native_Soulja

  • pdq
  • [*][*][*][*][*]
What Kind Of Music For Testing?
Reply #1
I would recommend testing with whatever kind of music you typically listen to and are most familiar with. Examples of music that certain codecs have problems with are usually specific to the codec and are not typical in any case.

Also, if you have in mind testing at fairly high bit rates then your test subjects will probably not be able to hear differences. Start with relatively low bit rates and then work your way up.

Be sure to research ABX testing so that you do proper double-blind testing.

  • Meeko
  • [*][*]
What Kind Of Music For Testing?
Reply #2
Use whatever you like and listen to regularly, this way if something stands out, you'll know it.  If you plan on testing mp3 files and things like that it would probably be best to start low at -V7 and go up to -V0 because you may find yourself getting stuck at -V5 (approx 130kbps).  Mp3 with the Lame encoder has gotten that good!

Good luck!
foobar2000, FLAC, and qAAC -V90
It just works people!

  • C.R.Helmrich
  • [*][*][*][*][*]
  • Developer
What Kind Of Music For Testing?
Reply #3
And, Native_Soulja, if you don't want to test with music you listen to regularly, but with a collection of recordings which are known to challenge most modern codecs, selected from about 100 "critical recordings" after months of listening tests, take a look at this post and the ones below:

http://www.hydrogenaudio.org/forums/index....st&p=695576

This is intended for an upcoming AAC test but should apply just as well to codecs like Vorbis, WMA, and MP3.

Chris
If I don't reply to your reply, it means I agree with you.

What Kind Of Music For Testing?
Reply #4
What kind of music would be good for testing for like a mp3 and other codec comparisons? Im going to do a blind test with some of friends and family to get their opinons on which codec and bitrates sounds better and need some good examples.....thanks in head of time



I agree with Chris Helmrich - check the outcomes of past subjective tests of a similar nature and include some of the musical selections that are already known to produce positive results.

The choice of program material is probably the most important variable in most subjective tests.

In equipment tests it is possible to make educated guesses as to what program material will be the most diagnostic. For example, if you know that the amplifiers being tested vary in terms of high frequency performance, program material with more than usual amounts of high frequency information may be a big help. It's pretty sure that using program material that lacks highs will probably bias the test towards null results.

In the case of perceptual coders, it seems that music with rapdily changing content can be more challenging to code. Codec designers may reveal the kinds of music that they have used as guides in their development efforts. There has been pretty good public documentation of the musical selections used in some tests that were done by the MPEG group.

  • southisup
  • [*][*][*][*]
What Kind Of Music For Testing?
Reply #5
..to get their opinons on which codec and bitrates sounds better

You'll have to use either "killer samples", known to cause problems, or quite low bit rates - all the major codecs will almost certainly sound completely transparent, at common bit rates, to everyone there.
  • Last Edit: 14 July, 2010, 06:31:44 AM by southisup

What Kind Of Music For Testing?
Reply #6
..to get their opinons on which codec and bitrates sounds better

You'll have to use either "killer samples", known to cause problems, or quite low bit rates - all the major codecs will almost certainly sound completely transparent, at common bit rates, to everyone there.


I've also used another technique - encode and decode a sample over and over again. This works very well with hardware listening tests.

  • pdq
  • [*][*][*][*][*]
What Kind Of Music For Testing?
Reply #7
..to get their opinons on which codec and bitrates sounds better

You'll have to use either "killer samples", known to cause problems, or quite low bit rates - all the major codecs will almost certainly sound completely transparent, at common bit rates, to everyone there.


I've also used another technique - encode and decode a sample over and over again. This works very well with hardware listening tests.

Absent some theoretical analysis of how repeated encode/decode affects sound quality, I would be a little leary of this.

What Kind Of Music For Testing?
Reply #8
..to get their opinons on which codec and bitrates sounds better

You'll have to use either "killer samples", known to cause problems, or quite low bit rates - all the major codecs will almost certainly sound completely transparent, at common bit rates, to everyone there.


I've also used another technique - encode and decode a sample over and over again. This works very well with hardware listening tests.

Absent some theoretical analysis of how repeated encode/decode affects sound quality, I would be a little leary of this.


Every methdology that I've seen  proposed suffers from the same general potential problem - it is somehow asymmetric from the real world in some way other than just percentage of error.

I'm under the impression  that coders make different kinds of mistakes for different target bitrates. So there is a potential asymmetry.

Obviously, different music triggers different artifacts, so there is yet another potential asymmetry.

Coding and decoding music makes at least technical changes to the music, so sucessive encodings may add different amounts or kinds of errors.

Another possibility is to subtract the reconstructed coded file from the original source file to create a difference file, and then add variable amounts of the difference file back in as desired to get enough errors to be readily audible.  This approach may be asymmetric because there is no guarantee that coder errors add linearly when judged by the standard of human perception.

In short, nothing is perfect and you do the best you can with the tools at hand.

Being able to create test files with realistic errors at any desired level is highly desirable because it has great potential for listener training.


  • C.R.Helmrich
  • [*][*][*][*][*]
  • Developer
What Kind Of Music For Testing?
Reply #9
I've also used another technique - encode and decode a sample over and over again. This works very well with hardware listening tests.

I've never tried this in blind tests, so don't know if it has the same effect as turning down the bit rate (in terms of bits per sample) or using highly trained listeners. Essentially, the desired effect would be that all listening grades just move down the scale proportionally and that the ranking between the coders stays the same.

For simplicity, Native Soulja, if you're interested in high bit rates and your friends and family have never done blind listening tests, I recommend a test where all coders run at 96 kb/sec CBR or VBR. There, you have the highest chances of hearing clear quality differences between codecs.

Chris
  • Last Edit: 15 July, 2010, 05:25:49 AM by C.R.Helmrich
If I don't reply to your reply, it means I agree with you.

  • 2Bdecided
  • [*][*][*][*][*]
  • Developer
What Kind Of Music For Testing?
Reply #10
I've also used another technique - encode and decode a sample over and over again. This works very well with hardware listening tests.
It'll certainly cause audible problems.

What's not clear is whether the severity of audible problems after, say, 100 encode/decode cycles is in any related to, or correlated with, the number of "problem samples" that reveal audible problems after a single encode/decode, or the severity of the audible problems on those samples.

It could be that the "best" encoder, with zero problem samples at a given bitrate, was found to sound worst after 100 encode/decode cycles.

Cheers,
David.

What Kind Of Music For Testing?
Reply #11
I've also used another technique - encode and decode a sample over and over again. This works very well with hardware listening tests.
It'll certainly cause audible problems.

What's not clear is whether the severity of audible problems after, say, 100 encode/decode cycles is in any related to, or correlated with, the number of "problem samples" that reveal audible problems after a single encode/decode, or the severity of the audible problems on those samples.


I would suggest that using 100 encode/decode cycles as your critical point for the suggested approach is more than a little extreme. 

What would be a reasonable number of iterations? 5? 10? 20? 

When we studied things like this at university back when dinosaurs roamed the earth and vinyl, tubes, and mag tape were all we had; we started out looking at the first few interations.

Let's assume that there is only mild degradation in each iteration.

Then the results of the first iteration is composed of a good copy of the input plus some small error signal:

output = input + f(input).  f(input) is the error and could be just about any kind of error we can think of.

The second time through, output =  input + f(input) + f(input) +  f (f(input))  Since the error is small, we can simplify this to output = input + 2 f(input).

Repeat as needed.  The error increases linearly with the number of repetitions.

Quote
It could be that the "best" encoder, with zero problem samples at a given bitrate, was found to sound worst after 100 encode/decode cycles.


Can you provide a convincing argument for this assertion?

  • pdq
  • [*][*][*][*][*]
What Kind Of Music For Testing?
Reply #12
In the analog world what you are saying is very reasonable.

However, in the digital world of perceptual encoding it may be very different. In attempting to encode artefacts from the previous encode does it simply make the same artefacts twice as big? And can you generalize this to all encoders of all codecs?

What Kind Of Music For Testing?
Reply #13
In the analog world what you are saying is very reasonable.

However, in the digital world of perceptual encoding it may be very different.


Or not. Your speculations are somehow more reliable than math?

Quote
In attempting to encode artefacts from the previous encode does it simply make the same artefacts twice as big? And can you generalize this to all encoders of all codecs?


The math is classic and irrefutable as far as it goes. If you want to attack the hypothesis, you have to refute its assumptions.

Got game at the same level as the hypothesis?  ;-)

  • lvqcl
  • [*][*][*][*][*]
  • Developer
What Kind Of Music For Testing?
Reply #14
Quote
I've also used another technique - encode and decode a sample over and over again. This works very well with hardware listening tests.


I tried to do this with LossyWAV (0.wav is original file, 1.wav, 2,wav etc are subsequent generations).

foo_bitcompare shows the number of different samples:

0.wav and 1.wav: Differences found: 16513693 sample(s)
1.wav and 2.wav: Differences found: 1103325 sample(s)
2.wav and 3.wav: Differences found: 27242 sample(s)
3.wav and 4.wav: Differences found: 777 sample(s)
4.wav and 5.wav: No differences in decoded data found.

Oops.
  • Last Edit: 16 July, 2010, 07:34:38 AM by lvqcl

What Kind Of Music For Testing?
Reply #15
Quote
I've also used another technique - encode and decode a sample over and over again. This works very well with hardware listening tests.


I tried to do this with LossyWAV (0.wav is original file, 1.wav, 2,wav etc are subsequent generations).

foo_bitcompare shows the number of different samples:

0.wav and 1.wav: Differences found: 16513693 sample(s)
1.wav and 2.wav: Differences found: 1103325 sample(s)
2.wav and 3.wav: Differences found: 27242 sample(s)
3.wav and 4.wav: Differences found: 777 sample(s)
4.wav and 5.wav: No differences in decoded data found.

Oops.


That suggests rather strongly that  lossyWAV violates the assumption that the number of errors in the initial pass is small.

I would be surprised if 0.wav and 1.wav are difficult to distinguish from each other.

Can you post them?

  • pdq
  • [*][*][*][*][*]
What Kind Of Music For Testing?
Reply #16
That suggests rather strongly that  lossyWAV violates the assumption that the number of errors in the initial pass is small.

I would be surprised if 0.wav and 1.wav are difficult to distinguish from each other.

Can you post them?

All it suggests is that with very small adjustments in the lowest bits of each 16 bit value the data become much easier to encode losslessly. Why would you think that the difference is necessarily audible?

  • lvqcl
  • [*][*][*][*][*]
  • Developer
What Kind Of Music For Testing?
Reply #17
I would be surprised if 0.wav and 1.wav are difficult to distinguish from each other.

Can you post them?

Sorry but they are too long (about 5 min long).

What about simplified version that just zeroes 2 least significant bits of each sample? Second generation is apparently equal to 1st, etc.

  • 2Bdecided
  • [*][*][*][*][*]
  • Developer
What Kind Of Music For Testing?
Reply #18
When we studied things like this at university back when dinosaurs roamed the earth...
Quote
...and vinyl, tubes, and mag tape were all we had; we started out looking at the first few interations.

Let's assume that there is only mild degradation in each iteration.

Then the results of the first iteration is composed of a good copy of the input plus some small error signal:

output = input + f(input).  f(input) is the error and could be just about any kind of error we can think of.

The second time through, output =  input + f(input) + f(input) +  f (f(input))  Since the error is small, we can simplify this to output = input + 2 f(input).

Repeat as needed.  The error increases linearly with the number of repetitions.
You can't do that though - not with psychoacoustic codecs. The process isn't linear, and the error isn't small.

e.g. think about time domain smearing. That just spreads and spreads and spreads. Your arithmetic "output = input + 2 f(input)" doesn't work at all.


btw, I'm not sure it always worked for analogue. You had to apply some common sense even back then.

e.g. It doesn't work for VHS tape. There are lots of dynamic processes in recording and playback. At some point, the sync pulses get lost, and there's no picture at all. Let's say that happens after 10 generations. Your arithmetic says the error after one generation is only 1/10th of this. Yet it also happens after 100 generations. Your arithmetic says the error after one generation is only 1/100th of this.

I'm not sure one generation of VHS gives you 1/10th or 1/100th of total signal loss.

Cheers,
David.

What Kind Of Music For Testing?
Reply #19
That suggests rather strongly that  lossyWAV violates the assumption that the number of errors in the initial pass is small.

I would be surprised if 0.wav and 1.wav are difficult to distinguish from each other.

Can you post them?

All it suggests is that with very small adjustments in the lowest bits of each 16 bit value the data become much easier to encode losslessly. Why would you think that the difference is necessarily audible?


It suggests that as well.

It's a pathological example.  One could cut to the chase and simply strip off the low order bit. There would be no changes at all in sucessive passes since the bit is already gone and there an be no other changes in sucessive passes.

Knowing that this is the nature of the degradation, an intelligent person would act wisely and prepare the progressively degraded samples by stipping off an *additional* low order bit for each pass. At least, that's what I did when I built the samples for www.pcabx.com .

I'm sure that Forrest Gump had a saying that described this patholgoical example. ;-)

If people rejected every idea for which there was a pathological counter-example, we would probably still be back in dark ages.


  • 2Bdecided
  • [*][*][*][*][*]
  • Developer
What Kind Of Music For Testing?
Reply #20
If people rejected every idea for which there was a pathological counter-example, we would probably still be back in dark ages.
Funny how the examples that disprove your theories are always "pathological", "straw men", "extreme" etc etc.

If people always held on to theories that were disproved, we would probably still be back in the dark ages.

Cheers,
David.

  • Alex B
  • [*][*][*][*][*]
What Kind Of Music For Testing?
Reply #21
Assuming the goal is to test modern encoders using settings that are intended to produce transparent quality, it doesn't make sense to artificially  reinforce the inaudible artifacts. The developers try to improve the lossy encoders only to the point where the artifacts become practically inaudible. When the handling of a certain kind of signal is tweaked to be good enough normally the next goal is to fix newly found audible artifacts that occur with different problem samples, not to somehow reduce the already fixed problems even more.

The only sensible way to test nearly transparent encoders is to find a good selection of problem samples that can produce useful test results when the encoders are used in the normal, intended way.

What Kind Of Music For Testing?
Reply #22
Assuming the goal is to test modern encoders using settings that are intended to produce transparent quality, it doesn't make sense to artificially  reinforce the inaudible artifacts.


The problem with this approach is that it would seem to presume that any given artifact is either inaudible to everybody or are audible for everybody.

IOW, there are only exactly two pigeonholes and every artifiact goes into one pigeon hole or the other, and goes into the same pigeon hole for everybody.

Reality is that our ability to hear artifacts varies from person to person, and also varies for the same person at different times.

Our highest goal would probably be that no artifact would be heard by anybody at any time.

How do we test encoders so that both the bitrate and the number of artifacts are minimized?

How do we test a coder with a finite number of tests that take a reasonable amount of time, and also have a very high probability of an coder that never creates an audible artifact.

With amplifiers, we can say that if all nonlinear distoriton and other artifacts are 100 dB down, then nobody can ever hear any artifacts. We then build amplifiers with all artifacts 110 dB down.

How do you do the same thing for encoders, and do so with any confidence?

What Kind Of Music For Testing?
Reply #23
I would be surprised if 0.wav and 1.wav are difficult to distinguish from each other.

Can you post them?

Sorry but they are too long (about 5 min long).

What about simplified version that just zeroes 2 least significant bits of each sample? Second generation is apparently equal to 1st, etc.


Oh, you want to change the basic nature of the examples?

I don't see what that would show other that that you need to make basic changes to your examples in order to demonstrate your point.

What about posting shorter but useful subsets of the 5 minute selections?

One other comment - the criteria that you used - any change in any part of any sample seems to be far more nonlinear than hearing.  Your example probably fails to apply because of that.

What Kind Of Music For Testing?
Reply #24
In attempting to encode artefacts from the previous encode does it simply make the same artefacts twice as big?



I'm presuming that it does not make artifacts twice as big. I'm assuming that it makes the same audible artefacts as it did the first time, and they in some sense add.

Quote
And can you generalize this to all encoders of all codecs?


*All* is a very big word, so the answer has to be "maybe".

This was intended to be one of those things that you try and see what happens.