Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Are my ears broken? (Read 40187 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Are my ears broken?

Reply #50
So what criteria do you use in defining the performance of a standard playback system for testing codecs?

That's a good question, one for which I don't have an answer.  I believe people should conduct DBTs on their own equipment set up in the most typical fashion for them.  While I think public listening tests are useful and can provide a general baseline for people, I put more emphasis on personal listening tests.  In the case of public listening tests, I think there should be some type of control over playback hardware and environments, but I don't think it's a make or break situation.  If people really want to know what codec and settings to use, they should perform their own personal tests rather than rely on results from public tests.

My initial point to which you objected was to address the audiophile myth that audiophile grade components are required to most easily distinguish lossless from lossy.  The people who perpetuate this myth are usually the very same people who think they can tell night and day differences between lossy and lossless but have never conducted a well-controlled double blind test.

Are my ears broken?

Reply #51
So what criteria do you use in defining the performance of a standard playback system for testing codecs?

That's a good question, one for which I don't have an answer.  I believe people should conduct DBTs on their own equipment set up in the most typical fashion for them.  While I think public listening tests are useful and can provide a general baseline for people, I put more emphasis on personal listening tests.  In the case of public listening tests, I think there should be some type of control over playback hardware and environments, but I don't think it's a make or break situation.  If people really want to know what codec and settings to use, they should perform their own personal tests rather than rely on results from public tests.

My initial point to which you objected was to address the audiophile myth that audiophile grade components are required to most easily distinguish lossless from lossy.  The people who perpetuate this myth are usually the very same people who think they can tell night and day differences between lossy and lossless but have never conducted a well-controlled double blind test.


Thanks. I don't disagree with you regarding the myth of needing audiophile grade component to hear codec artifacts. The problem I had with  the term "audiophile component" is that  as defined it's too vague and meaningless in terms of  implied performance. My experience is that the term audiophile is often misused, and only implies "high price tag" which is not necessarily a good indicator of the component's sound quality and reliability.

I think we agree that frequency response aberrations, distortion\noise, room reflections, and listener training/hearing/aptitude are all nuisance variables that can influence the results of codec listening tests.  High priced audio components don't necessarily help control these variables, and may in fact contribute to the problem.

If these nuisance variables are not well-controlled in public tests, then you may expect to get different results from different sites/people, and it may be difficult to reach consensus about the quality of the codec.


Cheers
Sean
Audio Musings

Are my ears broken?

Reply #52
If these nuisance variables are not well-controlled in public tests, then you may expect to get different results from different sites/people, and it may be difficult to reach consensus about the quality of the codec.

I hope by consensus you mean that one skilled listener can overrule your more mediocre participants. At higher bit rates you should expect results to diverge as people reach the limits of their hearing abilities at different points.

Are my ears broken?

Reply #53
If these nuisance variables are not well-controlled in public tests, then you may expect to get different results from different sites/people, and it may be difficult to reach consensus about the quality of the codec.

I hope by consensus you mean that one skilled listener can overrule your more mediocre participants. At higher bit rates you should expect results to diverge as people reach the limits of their hearing abilities at different points.


By consensus, I meant that you should get better agreement in test results (based on the number of ABX positive responses, similar MOS or MUSHRA ratings) from different sites\listeners when all variables are well controlled.

If the nuisance variables are not well controlled, you would expect to get less agreement and more noise in the test results.

Cheers
Sean
Audio Musings

Are my ears broken?

Reply #54
By consensus, I meant that you should get better agreement in test results (based on the number of ABX positive responses, similar MOS or MUSHRA ratings) from different sites\listeners when all variables are well controlled.

If the nuisance variables are not well controlled, you would expect to get less agreement and more noise in the test results.

There are other things that can cause noise in test results. Don't you also expect to get more noise in the results as you approach transparency? For a codec that is nearly transparent, you'll have some listers that can reliably distinguish the difference and some that cannot. If you've removed every nuisance variable you can think of you can assume that noise is not due to those variables. You don't definitively know what variables you have not eliminated. When someone fails to distinguish a difference, you don't know if that is due to a limitation in the test or a limitation of the listener. When someone is able to reliably distinguish, in a properly calibrated and executed ABX, you know that it is due to the codec. In your analysis of results, you need to ignore the noise of those who could not distinguish and concentrate on those who could. This is not a consensus process as you've described it. Building consensus typically involves persuasion. We know that listeners are easily persuaded. We don't want to persuade anyone. We want to know what they're hearing.

If you're testing for transparency, you might actually want to leave some of the "nuisance variables" in as they make or break distinguishibility for some listeners. Some examples: You will most likely degrade listening performance if you require subjects to listen on headphones, or prohibit them from adjusting their listening position to their liking. Audiophiles and recording engineers will tell you that they can't do their best listening on an unfamiliar system. There is some evidence indicating that acoustic reflections in the listening space enhances audibility of timebase errors.

So, in answer to a question you asked earlier, I'm going with the gist of greynol's answer: You want to do codec tests with a variety of listeners in a variety of realistic listening scenarios. Give your listeners latitude to attack the challenge creatively. Results may look noisier, and you'll have more work to do in back-end analysis, but you're casting a wider net and that's what you need to do to at this point to advance the art.

Are my ears broken?

Reply #55
My point was that for any given individual, personal tests should take precedence over public tests and that those personal tests be conducted with the equipment and levels (volume, eq) that the listener typically uses.  I do agree with all that you've said, though.

Are my ears broken?

Reply #56
Quote
There are other things that can cause noise in test results. Don't you also expect to get more noise in the results as you approach transparency?...

Yes, agreed.
Quote
This is not a consensus process as you've described it. Building consensus typically involves persuasion. We know that listeners are easily persuaded. We don't want to persuade anyone. We want to know what they're hearing.


I'm not talking about  persuading listeners  but rather convincing ourselves  (i.e. scientists) that the results from the CODEC tests are valid and can be explained by  differences in the CODECS  and not some uncontrolled variable.

Quote
If you're testing for transparency, you might actually want to leave some of the "nuisance variables" in as they make or break distinguishibility for some listeners. Some examples: You will most likely degrade listening performance if you require subjects to listen on headphones, or prohibit them from adjusting their listening position to their liking. Audiophiles and recording engineers will tell you that they can't do their best listening on an unfamiliar system. There is some evidence indicating that acoustic reflections in the listening space enhances audibility of timebase errors.


I agree with you 100%  that we want to know what the listeners hearing perceptually,  but let's make sure that the signals delivered to their ears are physically well-defined and controlled so we can a) learn something about the psychoacoustics of the codecs and b) ensure the experiment is repeatable. 

Quote
So, in answer to a question you asked earlier, I'm going with the gist of greynol's answer: You want to do codec tests with a variety of listeners in a variety of realistic listening scenarios. Give your listeners latitude to attack the challenge creatively. Results may look noisier, and you'll have more work to do in back-end analysis, but you're casting a wider net and that's what you need to do to at this point to advance the art.


Again, I have no problem with this (note: I do similar experiments to study psychoacoustic interactions between different loudspeakers, room acoustics, trained vs naive listeners) as long as this is part of the experimental design and analysis.  Otherwise you are missing a great opportunity to better understand how  these  variables  (trained vs untrained listeners, loudspeakers versus headphones, room acoustics, different program,etc]  influence the perception of CODECS. This is my main point.

If you are already doing this in the public tests, then please forgive me for stating the obvious.

Cheers
Sean
Audio Musings

Are my ears broken?

Reply #57
So I just did the audiocheck.net frequency test that was supplied earlier in this thread and I can hear the sweep right from the beginning, when it's at 22 khz. Also, as it goes down, it seems higher and louder to me and actually hurts my ears. I can see why it would hurt my ears ascending because as the frequency moves more into my audible range, it would be louder I think? How is this possible though? I'm aware that nobody can hear up to 22k and especially not me considering I did a test the other day with a pure sine wave and could only hear up to 16.5. What am I doing wrong? Sorry if I should've started another thread, this is my first time posting on this site btw.

Are my ears broken?

Reply #58
So I just did the audiocheck.net frequency test that was supplied earlier in this thread and I can hear the sweep right from the beginning, when it's at 22 khz. Also, as it goes down, it seems higher and louder to me and actually hurts my ears. I can see why it would hurt my ears ascending because as the frequency moves more into my audible range, it would be louder I think? How is this possible though? I'm aware that nobody can hear up to 22k and especially not me considering I did a test the other day with a pure sine wave and could only hear up to 16.5. What am I doing wrong? Sorry if I should've started another thread, this is my first time posting on this site btw.


Okay, and now I can only hear up to 19. This is very confusing!

Edit : Nevermind, I just replicated what I heard the first time at 22 k with the Aliasing test. Looks like I need a new soundcard.

Are my ears broken?

Reply #59
I'm not talking about  persuading listeners  but rather convincing ourselves  (i.e. scientists) that the results from the CODEC tests are valid and can be explained by  differences in the CODECS  and not some uncontrolled variable.

Isn't this handled by the ABX? You can let listeners introduce any variables they like. Then they sit down and do the testing and any uncontrolled variable affects both A and B equally.

You would definitely want to tighten things if you need to move from determining whether there's a transparency problem to determining why there's a problem.

And, no, I do not do public testing. I don't consider my listening skills to be extraordinary. But, it is an interesting topic for me. Understanding the state of the art in listening motivates realistic design goals for systems I work on.

Are my ears broken?

Reply #60
Isn't this handled by the ABX? You can let listeners introduce any variables they like. Then they sit down and do the testing and any uncontrolled variable affects both A and B equally.

"Affects both A and B" can be quite different from "affects both A and B equally". It may be that a particular setup emphasizes (or suppresses) one kind of artifact more than another, for example.

Are my ears broken?

Reply #61
I would say that such transforms are a valid part of the test as long as it remains a realistic listening environment. It is not cheating to test the envelope of how people listen. Some people do crank the tone controls. Only spoilers listen to L-R.

 

Are my ears broken?

Reply #62
And, no, I do not do public testing. I don't consider my listening skills to be extraordinary. But, it is an interesting topic for me. Understanding the state of the art in listening motivates realistic design goals for systems I work on.

I think Sean might mean private tests conducted during development, in which case he makes a very excellent point.

EDIT: I don't mean to give the wrong impression here, his points are excellent regardless.

Are my ears broken?

Reply #63
I'm not talking about  persuading listeners  but rather convincing ourselves  (i.e. scientists) that the results from the CODEC tests are valid and can be explained by  differences in the CODECS  and not some uncontrolled variable.

Isn't this handled by the ABX? You can let listeners introduce any variables they like. Then they sit down and do the testing and any uncontrolled variable affects both A and B equally.

You would definitely want to tighten things if you need to move from determining whether there's a transparency problem to determining why there's a problem.

And, no, I do not do public testing. I don't consider my listening skills to be extraordinary. But, it is an interesting topic for me. Understanding the state of the art in listening motivates realistic design goals for systems I work on.


Of course, I agree with you - that for any given ABX test setup, the nuisance variables are being held constant for both A and B.

Sorry, if I wasn't clear: my concern about controlling nuisance variable arises from pooling the results of public codec tests (using either ABC, ABC or MUSHRA methods) that are being conducted at multiple sites using different playback setups and listeners of unknown quality (e.g hearing, training, ability).  When you start pooling the results together from these different tests, unless these different playback setups are well-defined and  accounted for in the design and statistical analysis you risk getting increased, systematic errors/biases and possibly come to erroneous conclusions based on invalid results.


Cheers
Sean
Audio Musings

Are my ears broken?

Reply #64
And, no, I do not do public testing. I don't consider my listening skills to be extraordinary. But, it is an interesting topic for me. Understanding the state of the art in listening motivates realistic design goals for systems I work on.

I think Sean might mean private tests conducted during development, in which case he makes a very excellent point.

EDIT: I don't mean to give the wrong impression here, his points are excellent regardless.


Yes, I guess that's what I mean -- a private test - like the kind used for product development and validation, and the sorts of CODEC tests that are published by standards groups like ITU where they conduct CODEC tests at different sites throughout the world using a standard setup.

I'm quite ignorant about how Hydrogen audio organizes public listening tests, what their purpose is, and  how they use the results. Is this strictly for hobbyists, or do companies actually use the results to help  tweak the performance of their CODECS?

Perhaps you've published a document  on recommended practices for public CODEC tests somewhere on HA that  answers all these questions?

Thanks!

Cheers
Sean

Are my ears broken?

Reply #65
You really should conduct blind ABX tests though as sighted tests are flawed by the placebo affect.
Though you're correct, it's not all that hard to identify what codec was used by listening to the artifacts, at least at lower bitrates.
Mixing audio perfectly doesn't take more than onboard.

Are my ears broken?

Reply #66
You really should conduct blind ABX tests though as sighted tests are flawed by the placebo affect.
Though you're correct, it's not all that hard to identify what codec was used by listening to the artifacts, at least at lower bitrates.

This is not an issue in ABX tests, since the only thing being tested is whether the compressed file can be distinguished from the original.

In ABC/HR testing OTOH there certainly can be a bias based on recognising the codec, but then ABC/HR is asking for people's preferences, which can extend to factors beyond simple audible quality.