Print Page - List of typical problems and shortcoming of "common" audio t

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-15 22:41:11

First thing first: if this topic has already been discussed somewhere, feel free to point me there (publications ref. accepted ).

There are a few things that make me feel uneasy about typical audio tests. For instance (in no particular order):

Switching between sample A and B seamlessly is hard in SW/HW if the decoder/bitrate/sample-rate is different (cf. past and present problems with foo_abx).
Self-reported results are hard to trust: people can easily cheat (e.g. changing the logs is usually enough).
Codec comparison is fragile and hard to do: for instance fdk_aac lowcuts at 15000Hz@96kbit/s while opus/vorbis/etc. don't. Do my OGG and AAC @96kbit/s sound different because of the codecs themselves or because of the (optional, but set by default in fdk_aac) frequency cut?
Buggy or outdated codecs/pipelines could degrade the audio for specific formats only. People could hear a difference, but it would be completely unrelated to what we want to test.
Preference tests seems to be rarely preceded by ABX tests, so they are hard to trust.
Most people just don't do formal testing.
When people choose correctly 15 times out of 20, the test is usually considered positive (i.e. "this person can differentiate A and B") however, there is no notion like "the difference in audio is so small that people cannot pick the right one every time".
It is usually easy to identify which sample is X from its waveform/spectrum.

The purpose of this thread is to get a more exhaustive problem list.
I really want to get the big picture and I am sure this will be useful in the near future.

IMPORTANT NOTE: I'm not looking for solutions (yet). The time for solutions will definitely come, so please refrain from giving them now.

Title: List of typical problems and shortcoming of "common" audio t
Post by: xnor on 2015-12-15 23:37:25

I guess you are talking about blind tests. My comments:

1) ABX software may just decode all files before the test even starts, so the problem is mainly a difference in the decoded format (samplerate, bit depth, could even be channels or PCM vs DSD).
2) Yes, but you don't even need to forge logs. See spectrum analyzers.
3) I consider cutoffs as part of the codec. Different codecs make different tradeoffs, but that's what you want to test ... if there is an audible difference.
4) Yeah and I think this is commonly overlooked.
5) Hmm, you mean like ABC/HR? Have you checked out the multiformat listening tests, exclusion criteria (like getting N samples wrong), statistical analysis? It's pretty solid if you have enough people and samples.

7) There's been some discussion about p-values and ABX comparator results. Without some idea about statistics the results are indeed not trivial to understand.
8) That's why it is supposed to be a blind test, you shouldn't "see" which is which.

Another one: sometimes files are created by an incompatible codec (or decoded even, that would be point 1 again) which can e.g. result in added silence at the beginning.

Another big one: differences between the files that you didn't even want to test for. For example there have been several invalid resampler test in the past. The guys wanted to test whether the resampler's filter caused audible differences, but didn't notice that some resamplers introduced a time delay. On fast switching between tracks this could give away which is which ... this could have even happened without the participants noticing it as such, but just perceiving some vague difference when switching.

It's a tragedy when such mistakes are not detected in the preparation of the test but only much later, or not at all, but conclusions are still drawn as if it was a valid test. Some dishonest people won't even acknowledge the flaws...

Title: List of typical problems and shortcoming of "common" audio t
Post by: ajinfla on 2015-12-16 14:33:49

Quote from: MMime on 2015-12-15 22:41:11

There are a few things that make me feel uneasy about typical audio tests.

Most people just don't do formal testing.

What's wrong with not wearing a tuxedo when testing??
Actually, the latter should make you more uneasy. Entire industries like High Fashion audio and acoustic "treatments", etc. are based specifically on not doing any perceptual testing whatsoever. The house of cards collapses with their inclusion.

Quote from: MMime on 2015-12-15 22:41:11

The purpose of this thread is to get a more exhaustive problem list.

That or you are seeking a solution in search of a problem.
For example:

Quote from: MMime on 2015-12-15 22:41:11

Preference tests seems to be rarely preceded by ABX tests, so they are hard to trust.

Ummm, why would an ABX be needed for a preference test?

cheers,

AJ

Title: List of typical problems and shortcoming of "common" audio t
Post by: greynol on 2015-12-16 15:12:14

Preference tests need a lot of participants in order to have much hope in providing statistically significant results.

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-16 16:00:33

Please, please, let's focus on finding problems with the current usual ways of doing audio tests (and I'm talking about audio testing in general, not just DBT).

Discussions about each specific problem (even to say that they actually aren't) will come. Solution finding will also come. But please refrain from doing them now.

But thank you, you three, for your input. @anjinfa: are you really working on loudspeakers? If so, I'm sure that you can think of problems related to hardware and how people setup them for listening tests.

As for the motivation behind such a list, it has been given by @xnor in his last paragraph.
So, back to the point...

As an extra (psychologocal bias) problem, people tend to think that original audio *always* sounds better.

HW/SW-related problems are hard to spot (only by ABXing sound card output records)

Stereo recordings can exhibit differences due to phase interactions in loudspeaker systems.

Title: List of typical problems and shortcoming of "common" audio t
Post by: Green Marker on 2015-12-17 00:02:34

Quote from: xnor on 2015-12-15 23:37:25

It's pretty solid if you have enough people and samples.

I don't agree. I think that statistics are the 'soft' scientist's version of hard mathematics. Statistics mean nothing unless the original premises of the experiment are valid i.e. that certain mathematical criteria concerning the test data are met. In the case of people listening to music - a cultural, aesthetic judgement - that isn't going to happen.

Audiophile music is 'niche', shall we say. Audiophile music and audiophile systems enjoy a 'symbiotic' relationship. Audiophile music is chosen because it sounds 'good' (acceptable) on standard audiophile systems. The music selected by audiophiles for testing audiophile systems is not capable of revealing the weaknesses that plague audio systems.

One definition of science says that it is not valid for subjective judgements. Quite so. You will never be able to conclusively say that technical parameter X is inaudible or otherwise because your experiment is already biased towards the anodyne music that audiophiles 'enjoy' - because their systems sound acceptable when playing it. You can't meaningfully test listeners with synthesised test signals or music they are not familiar with, either.

This is not science.

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-17 01:15:57

Please, once again, stop arguing!!! This is ABSOLUTELY NOT the right time for this. Only list problems for now.

You WILL be given plenty of time (at the very least a whole week) and opportunities to discuss them starting on next week. Until then, please refrain from arguing (at least here, feel free to open your own thread if you want to start now).

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-17 01:35:25

Though, Green Marker expressed problems (valid or not, don't argue!) :-)

- Preference tests seem non-scientific
- Test samples are not randomly chosen (biased toward familiar music and "audiophile" music on which gears were tuned).
- The applicability of usual stats should be checked (letting aside the "subjective therefore ascientific" stuff, the normality of the distribition of values is never checked for instance)

I would also add:

- Making sure people can differentiate sample A from sample B before asking for which they prefer is not obvious
- What is "part of the codec" (for codec comparison) is hard to determine.
- People basically have the program and the files: they can temper with them as much as they want.

Title: List of typical problems and shortcoming of "common" audio t
Post by: greynol on 2015-12-17 01:41:56

Regarding #5, I'm just trying to help clear up a potential source of confusion. I'm not a mind reader; I don't know what you know and what you don't know. If you think this is arguing then count me out of this discussion.

I wish you luck in whatever it is that you're trying to accomplish.

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-17 02:24:34

If you are refering to my first warning, this is not related to your contribution at all: what you did was exactly what I'm asking for.

What I want to achieve for now is a list of problems related to audio testing. And to finish clearing up your doubts, this is not about what I know or not: absolutely everything is fine as long as it is a related problem. Everything. I really don't want to be in a situation as described by xnor (figuring out flaws, restarting, figuring out a new flaw, restarting, etc.).

However comments like the ones of ajinfla or Green Maker are not what I want: they are reacting, arguing and starting side discussions. What they did is perfectly normal, expected and anyone of us would have done it. That's why I'm reminding everyone about the purpose of this thread (again, I want to nail it, if someone else started such a thread, I would be the first to react, argue and start side discussions!).

Think of this thread as a wiki page named "List of typical problems and shortcomings in audio testing". Would you like to see people arguing and discuting directly within Wikipedia's article? That's not to say people can't discuss: they just have to wait until next week to do it here or they can start a new thread.

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-17 22:51:30

People can use high volumes and use the noise floor to discriminate samples

Crossfading leads to a lot of problems (think crossfading to an inverted signal, samples with different gains, etc.)

Fading is necessary to avoid audible Gibbs effect and possible high frequency components (hann window over a few ms seems fine)

Samples should be perfectly aligned.

ReplayGain (or similar) should be used to guaranty similar gain (0.2 dB sufficient?)

Some sound card/drivers seem to force fade in when changing sample rates (third sample rate needed?)

People tend to spend too much time testing (they get accustomed and can't hear differences anymore)

Bad DAC: clicks with different sample (related: HW issue/SC force fade)

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-18 01:53:15

Double blind tests are not understood at all ("another fantasy that proves nothing")

People need solid and obvious proofs that they can actually hear differences in a blind test (they think facts are hidden by the setup/cheap gears/etc.)

Some people tend to think that the "subjectivist" way of doing things (basically training to have golden ears and then listen to the music once or very very often, in contrary) is far more accurate.

In extension, not allowing prior or parallel semi-blind test can be seen as an issue. (Note2ms: could UI be improved? Hybrid approach from a simple preference test to a differentiation test? "How would you define this sound? That one? Is this one more like this or that?")

The engagement phenomenon is really really really strong.

People tend not to be able to differentiate "different" and "better"

Some people have absolutely no notion of aliasing and are astonished that "objectivists" say there are no differences whike they can clearly hear a difference. (Note to myself: should make high res test files with high freq noise or funny frequency spec.)

Naming tests and results as "Proving that there is no difference between X and Y" is not perceived well by a whole category of people

According to some people, it takes days, even weeks to transition from the lowly CD world to the glorious HD world... (The people that did the switch may want to be "recognised")

"Subjectivist" test should last a long time (not one-shot, "would you like to get back to a previous test?").

There are groups of self-recognised expert listeners: only results provided by the group seem good enough for the people in that group

ABX testing is not performed "in situation" (e.g. with everyday noise: not everyone hear their music in a soundproof room)

People/Gear capabilities are not properly tested to make sure that everything is fine (hearing curve on given HW for instance)

Results are often too binary "difference can't be heard" (vs maybe "there might be differences but imagine, it took X minutes on average (and Y minutes min) for the people to correctly choose and yet, they were still wrong Z times out of T")

People could stress that they are not right (positive reinforcement? Sth else?)

Don't you forget the fg ref group john!!!

Oh, good one: ABX box makes A and B sound the same.

In extension to "the difference cannot be heard": the null hypo. is that the samples are different. Failing that means that we couldn't prove that they were different.

Training and all the other user actions should be reported (not as something wrong )

ABX is not about measuring differences in audio but differences in perceived audio

Title: List of typical problems and shortcoming of "common" audio t
Post by: xnor on 2015-12-18 12:16:50

I think you need to distinguish between different kinds of "problems" here. Some points are just myths or audiophile rumors or misunderstandings.

Title: List of typical problems and shortcoming of "common" audio t
Post by: pelmazo on 2015-12-18 12:20:04

I also fail to see what the point of all this is supposed to be.

You fail to distinguish between problems the ABX testing method has, problems that people have with ABX testing because of their misunderstanding or prejudice, and problems that are wrongly attributed to ABX testing.

You say you want to get the big picture, but what you show is a method of obfuscating any big picture there may be. It looks more like a thinly veiled attempt at discrediting ABX, not the least because you don't seem to be interested in any counterbalancing fact, i.e. the problems of alternatives of ABX.

Title: List of typical problems and shortcoming of "common" audio t
Post by: Soap on 2015-12-18 13:49:17

All that you have described, MMime, are reasons that good evidence collection is difficult and why no single piece of evidence can be considered proof in a vacuum.

You have not actually pointed out problems with ABX or double-blind testing in general but rather attributed problem with how certain people at certain times use the results (or lack thereof).

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-18 15:20:05

It may be the time for a small explanatory note...

As I repeated already at least thrice, this topic is about listing problems related to audio testing in general. Not only ABX, not only blind testing, all of them.
What I did not specify is that as soon as something is perceived by someone as a problem, it logically becomes a problem (think about it, even if just psychological). Example: an old lady throws peas all around her at Hyde Park to protect herself from lions. You consider it a problem (for whatever reason). Yet, acknowledging this problem does not mean that you consider that the lions are real.
I guess you really don't see the point of doing this without classification because you (I should really say we) are used to solve problems locally. So naturally, if you are given such a list of problems, which starts to be huge, you are thinking "how the hell does he want to 'solve' all of that?! some problems aren't even real!!!" because in your mind, I'll take problem #1, discuss, find a solution and then problem #2, discuss, find a solution, etc.
Starting tomorrow (or Monday depending on my free time), I'll start to order the problems in a tree. In practice, most problems are caused by deeper problems (and reciprocally, most problems induce other, shallower problems). This is what I call the big picture (of the current situation... there will be other 'big pictures').
Once this is done, I'll let some time to find other cause-/consequence- problems and I'll start step 2: identifying which specific problems to solve.
Once this is done (this should be quick), I'll start step 3 with a new 'big picture' and I expect you to discuss, comment, argue, cry... a lot.

Does that seem clearer?

@xnor: I hope you understand now why I did not classify the problems: there is no need because even if the lions are an illusion, that would not change the fact that the lady thinks this is a problem. And our own problem is that {the lady thinks lions are a problem} (not the lions). And that's even one of the toughest problems! From your own experience, for instance, if you told her there are no lions in the UK, would she just reply "ohhhh! how silly of me, you are right! oh my bad, now that I think of it, what peas would have done to it? oh oh ah ah! have a cuppa tea?"... Nah... You'd get the usual "but there are zoos!!! And what prevent them from taking the train???!!!!"

@pelmazo: your crystal ball might be slightly broken, your comfort zone might be slightly bruised and I understand that you find that frustrating, but I really don't like your tone and won't tolerate any more of it. So quit freaking out and chill down. You can question, but the part about 'I see through you and your pathetic attempt at undermining my holy ABX' was unneeded.
You would see how wrong your accusations are just by looking at my previous comments in other threads (one about FLAC, particularly).
And don't you fell any shame, telling me that I am not "interested in any counterbalancing fact, i.e. the problems of alternatives of ABX" while I AM ASKING FOR THAT EXACTLY!!! Read again, if you assumed this was about the shortcomings of ABX, this is in your mind only! I'd LOVE to hear the shortcoming of the alternatives!!!

@Soap: I pointed out problems related to ABX and DBT only because that's what I know the most. That's why I'm asking you guys. And I repeat, you absolutely don't have to focus on ABX or DBT, that's not the purpose. But I also listed problems that are linked to statistics and psychology.
As well, you may consider that "people do not trust ABX" is not a problem with ABX itself. Depending on the perspective, it is perfectly true. However the final effect is the same: if people don't trust ABX testing, they don't trust ABX testing and won't do them or be interested in them to guide their choice.
The workflow that I presented above has another advantage: "people do not trust ABX" is awfully vague. As such, it cannot be solved by a single action (well, usually). But by representing the problems in a graph, you will figure out what "people do not trust ABX" actually means! Because the cause-problems that link to that particular problem ARE what actually make people feel distrust about the technique. And chances are high that you can fix these to some extent.

Title: List of typical problems and shortcoming of "common" audio t
Post by: Soap on 2015-12-18 15:39:14

Quote from: MMime on 2015-12-18 15:20:05

It may be the time for a small explanation note...
As I repeated already at least thrice, this topic is about listing problems related to audio testing in general. Not only ABX, not only blind testing, all of them.

Does that seem clearer?

@Soap: I pointed out problems with ABX and DBT only because that's what I know the most. That's why I'm asking you guys. And I repeat, you absolutely don't have to focus on ABX or DBT, that's not the purpose. But I also listed problems that are links to statistics and psychology.

Again, this appears to be listing "problems" (peas / lions example is a classic one) which are only Problems if you play the game.

The old lady spreading peas is only a Problem if one thinks it is a good use of time to try to force others to view the world the way you do.

So let's address your original points one by one.

1 - Switching tests are hard: So what? Lots of evidence collecting is hard and not everyone is equipped to study gene splicing either.

2 - Can't trust self-reported tests: So what? Don't. Your happiness shouldn't rely on trusting others making outrageous claims.

3 - Codec comparison is hard: See point #1

4 - Pipelines might cause flaws: See point #1

5 - You can't trust preference tests: Broken record, you don't need to trust others. If something you need to accomplish relies upon knowledge determined through preference testing it probably needs done by you anyway.

6 - Most people don't do formal testing: See point #2. It Doesn't Matter For Your Needs And Wants Unless You Want To Pick Internet Fights.

7 - People misunderstand statistics and draw faulty conclusions from one round of test: This is you allowing the mistakes of others to make you feel angry. Anger is the killer.

8 - Spectrum: This is a restatement of #2.

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-18 16:01:08

Sorry, I updated the part for you while you were answering.

Quote from: Soap on 2015-12-18 15:39:14

Again, this appears to be listing "problems" (peas / lions example is a classic one) which are only Problems if you play the game.

The old lady spreading peas is only a Problem if one thinks it is a good use of time to try to force others to view the world the way you do.

What are the problems if you don't play the game? It would be much more helpful to give examples of such problems instead of staying vague.

Quote from: Soap on 2015-12-18 15:39:14

So let's address your original points one by one.
[...]

That's exactly what I don't want.
Could you try to understand what I wrote, not only in the first post (which indeed is not that good), and not just focus on saying that these are not actually problems.

The day you are fired, going to jail because of debts with no one coming for you because your whole family has died in an accident, would that be perfectly OK to be said "what the hell are you talking about, you don't have problems... You are fired, so what? Don't see the problem: a lot of people do not have a job. You have debts? So what?! You happiness should not rely on material things such as this. Going to jail? See point #2. Your family died, see point #2 but with 'others' instead of 'material things'. You don't like my answer? This is you allowing the mistakes of others to make you feel angry. Anger is the killer."

So yeah, with such a reasoning, there is no problem whatsoever... And you can safely not coming in this thread again. I'm ultimately looking for flaws and ways to address them to design (or help people design) better tests that better represent the reality. If you don't see the point, fine, but refrain from posting then.

Title: List of typical problems and shortcoming of "common" audio t
Post by: Soap on 2015-12-18 16:18:24

Quote from: MMime on 2015-12-18 16:01:08

That's exactly what I don't want. Why? Because I can counter each of your counter arguments and we can repeat that ad nauseam... In the meanwhile, nothing is done and we don't have more insight, nor a larger picture. That's true that, now that I see the result, my first post has not been presented the right way. This was intended to give an example of list (still, with points I consider valid) to kick off the listing. This was not intended to actually represent the flaws of ABX testing that plague every day of my life.

But OK, let's consider these are not problems in audio testing for a moment, like you suggest. What such a listening test would be like? Isn't that strictly equivalent to saying "don't do nor trust tests, they are not worth it, just randomly choose something and shut up".

I'm ultimately looking for flaws and ways to address them to design (or help people design) better tests that better represent the reality.
You are saying: there is no point in doing any test at all, people, choose something random, that won't change anything anyway.

Huh? They aren't problems because none of them stop people who want to learn from learning.

All of those "problems" only exist when one chooses to argue with someone who doesn't want to do things right.

When you ask "What such a listening test would be like?" what do you mean?

"Isn't that strictly equivalent to saying "don't do nor trust tests, they are not worth it, just randomly choose something and shut up"." NO. Do proper tests and control for the variables you can control for and understand the limits of your collected data. THE SAME AS ANY OTHER SCIENCE. Nowhere did I suggest something as fundamentally stupid and insulting as "randomly choose something and shut up".

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-18 16:27:50

Science is also about improving the methods.

Improving the methods also means identifying flaws.

With your reasoning, non-blind tests would still be performed for drug tests for instance.

But someone figured out that there was a psychological, YES, a PSYCHOLOGICAL flaw with that.

This flaw had NOTHING to do with the drug test by itself (from your perspective anyway).

Yet, DBT allow us to buy better drugs with a minimum confidence that they are working.

What would you have said at the time? "Patients would get subconscious hints from the doctor? So what? Don't trust patients anyway."...

Title: List of typical problems and shortcoming of "common" audio t
Post by: pdq on 2015-12-18 16:33:23

In essence you are asking us to "describe a test that will fail because it is flawed". I would rather describe tests that will succeed because we have eliminated potential flaws.

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-18 16:35:49

Think of these "problems" as "things I should think and care about if I were to design a proper audio test".

I called these "problems" because these are things to think about, not "natural" things that would "naturally" go well without a single thought to handle it.

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-18 16:40:05

Quote from: pdq on 2015-12-18 16:33:23

In essence you are asking us to "describe a test that will fail because it is flawed". I would rather describe tests that will succeed because we have eliminated potential flaws.

So does that mean that there is a audio test methodology, without trade-off, accepted by absolutely everybody as "valid"?

If not, I want to know why "the tests that will succeed" according to you is not enough for person X or in situation Y.

AND I understand that there is prior work, hence the very first line of my very first post.

Title: List of typical problems and shortcoming of "common" audio t
Post by: Soap on 2015-12-18 16:40:06

Quote from: MMime on 2015-12-18 16:27:50

Science is also about improving the methods.

Improving the methods also means identifying flaws.

But you haven't identified any flaws in audio testing.
You've identified flaws in people who you can't trust! And you've identified that test setups which have not been properly designed may lead to inaccurate results.

Quote from: MMime on 2015-12-18 16:27:50

With your reasoning, non-blind tests would still be performed for drug tests for instance.

How so? That charge does not follow what I've said. This is not the first time you've charged me (without support!) with claims I did not make, but it will be the last.

Title: List of typical problems and shortcoming of "common" audio t
Post by: pdq on 2015-12-18 16:47:47

Quote from: MMime on 2015-12-18 16:40:05

Quote from: pdq on 2015-12-18 16:33:23
In essence you are asking us to "describe a test that will fail because it is flawed". I would rather describe tests that will succeed because we have eliminated potential flaws.

So does that mean that there is a audio test methodology, without trade-off, accepted by absolutely everybody as "valid"?

If not, I want to know why "the tests that will succeed" according to you is not enough for person X or in situation Y.

AND I understand that there is prior work, hence the very first line of my very first post.

If you want to learn about bad audio testing, you should go someplace else. You won't find it here.

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-18 17:19:32

Quote from: Soap on 2015-12-18 16:40:06

But you haven't identified any flaws in audio testing.
You've identified flaws in people who you can't trust!

Just like you couldn't trust patients and doctors alike before blind testing were introduced in medical testing.

I just want you to focus on the following until it becomes clear to you (or ask for clarification, because this is central, and I hope we will agree):

What you are saying and summing up by "you can't trust people" is correct.
What you say right after that is we already have a solution to "you can't trust people".

What I am saying is that "you can't trust people" is not the root problem, nor an interesting one: there are reasons that "you can't trust people".
These reasons (for instance, they can look at a spectrogram or forge the test log, etc.) all lead to the same consequence: "you can't trust people".

What YOU tell me is that what I am doing is useless because it all comes down to "you can't trust people".
What I am telling you is that acting directly on the reasons can lead to new audio tests, with new/other limitations and trade-offs.
But first we have to identify the main problems (like "you can't trust people"), their causes (they can look at a spectrogram or forge the test log, etc.) and the consequences (uncompelling evidence).

If I apply that to the medical field...
The orginal problem was exactly the same: they couldn't trust the doctors (not that they would say if the drug was true or fake, but some would ask "are you sure there is absolutely no improvement since last week?" to a person with a 'real pill" while they would not insist for the persons with the fake ones).
In your current way of thinking, tell me if I am wrong, if I were to say "ok, so we don't trust doctors because 1-they tend to use different questions in each case 2- they use different overall tones 3- they tend to insist if the result does not go expected way (are you sure you really feel better? vs are you sure there is absolutely no improvement whatsoever?) 4- some blatantly cheat, etc.", you would take make list, find counter arguments and finally conclude, "well, in the end, it all comes down to 'we can't trust the doctors'".
However, there is some value to find the cause the this lack of trust: by doing so, you can improve, not only locally (by putting a nurse behind each doctor to check for flaws and avoid cheating for instance) but globally ("interestingly enough, these 4 cause-problem disappear if the doctors themselves do not know what they are giving"). And naturally, if you remove the causes, the consequence vanishes as well.

To get back to our case, having the cause (e.g. periodogram, fake log, etc.) give more insight and angles to strike than just saying that it all comes down to "we can't trust people".

Title: List of typical problems and shortcoming of "common" audio t
Post by: Soap on 2015-12-18 17:32:11

Quote from: MMime on 2015-12-18 17:19:32

What I am saying is that "you can't trust people" is not the root problem, nor an interesting one: there are reasons that "you can't trust people".

yet the inability to trust people is the sole "problem" in most of your list NOT problems with "audio testing"!

Quote from: MMime on 2015-12-18 17:19:32

What I am telling you is that acting directly on the reasons can lead to new audio tests, with new/other limitations and trade-offs.

WHY DO WE NEED NEW TESTS? The current tests work fine unless you're trying to argue with people not acting in good faith.

Quote from: MMime on 2015-12-18 17:19:32

But first we have to identify the main problems (like "you can't trust people"), their causes (they can look at a spectrogram or forge the test log, etc.) and the consequences (uncompelling evidence).

The inability to trust people only leads to uncompelling [sic] evidence in internet fights. It does not lead to problems in scientific research.

Quote from: MMime on 2015-12-18 17:19:32

If I apply that to the medical field...
The orginal problem was exactly the same: they couldn't trust the doctors (not that they would say if the drug was true or fake, but some would ask "are you sure there is absolutely no improvement since last week?" to a person with a 'real pill" while they would not insist for the persons with the fake ones).
In your current way of thinking, tell me if I am wrong, if I were to say "ok, so we don't trust doctors because 1-they tend to use different questions in each case 2- they use different overall tones 3- they tend to insist if the result does not go expected way (are you sure you really feel better? vs are you sure there is absolutely no improvement whatsoever?) 4- some blatantly cheat, etc.", you would take make list, find counter arguments and finally conclude, "well, in the end, it all comes down to 'we can't trust the doctors'".
However, there is some value to find the cause the this lack of trust: by doing so, you can improve, not only locally (by putting a nurse behind each doctor to check for flaws and avoid cheating for instance) but globally ("interestingly enough, these 4 cause-problem disappear if the doctors themselves do not know what they are giving"). And naturally, if you remove the causes, the consequence vanishes as well.

I think your medical analogy is off the rails. Not only does it appear to be historically inaccurate, but it's not moving us anywhere useful.

Quote from: MMime on 2015-12-18 17:19:32

To get back to our case, having the cause (e.g. periodogram, fake log, etc.) give more insight and angles to strike than just saying that it all comes down to "we can't trust people".

Again, where are these "problems" limiting serious study and not just internet fights?

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-18 17:33:52

Quote from: pdq on 2015-12-18 16:47:47

If you want to learn about bad audio testing, you should go someplace else. You won't find it here.

Just an example (without consequence, I am not saying that it would actually be a good idea). Some "subjectivists" argue that long term tests are required. I know a little bit of the literature regarding this, and I know that the brain actually does the reverse (it will "smooth" out differences in the samples you know well).

Yet, as long as the conditions of the test are known, would a "long term" ABX test be considered worse that a "short term" one. Yes, of course, there is the difference reported in the literature. Allowing this kind of test would confirm these findings (since the conditions would be known) and the people who prefer to trust these tests would be happy.

How would that be bad testing?

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-18 17:46:16

@Soap, if you don't see the value in this, then fine. But then please refrain from posting here. I'm asking for willing contributors to help me here. If you are not willing, please don't spend your time imposing your thoughts here.

I understood you. The fact that my house is burning is not a problem related to my house. but having my house burning is the only problem my house have. So there was absolutely no need in the first place, to have my fireplace, heaters, electric system and devices checked since it all comes down to my house burning and that's the only thing I should try to solve. Or not, because houses will always catch fire. That's why I must apply the same solution as everyone else (living in a blockhaus without any furniture nor electricity) and absolutely never ever dare to even think about other solutions or trade-offs. Understood.

And for the ones that were possibly surprised by my strong will to avoid arguments for the moment in this thread, this was the reason why. As far as I am concerned, I'll keep up building my tree. If you are willing to contribute and also want to try and get a better picture of the state of audio testing today, feel free to contribute. However be warned that I won't read anything other that list-like posts. Thanks for your contribution.

Title: List of typical problems and shortcoming of "common" audio t
Post by: Soap on 2015-12-18 18:06:38

Quote from: MMime on 2015-12-18 17:46:16

If you are willing to contribute and also want to try and get a better picture of the state of audio testing today,

You haven't started painting a picture of the state of audio testing today. Do you not grok this?

Title: List of typical problems and shortcoming of "common" audio t
Post by: pelmazo on 2015-12-18 18:36:39

Quote from: MMime on 2015-12-18 15:20:05

@pelmazo: your crystal ball might be slightly broken, your comfort zone might be slightly bruised and I understand that you find that frustrating, but I really don't like your tone and won't tolerate any more of it. So quit freaking out and chill down. You can question, but the part about 'I see through you and your pathetic attempt at undermining my holy ABX' was unneeded.
You would see how wrong your accusations are just by looking at my previous comments in other threads (one about FLAC, particularly).
And don't you fell any shame, telling me that I am not "interested in any counterbalancing fact, i.e. the problems of alternatives of ABX" while I AM ASKING FOR THAT EXACTLY!!! Read again, if you assumed this was about the shortcomings of ABX, this is in your mind only! I'd LOVE to hear the shortcoming of the alternatives!!!

I was not freaking out, and I didn't accuse you of anything. I wrote what your attempt looked like to me. So I think it is up to you to chill down.

You should perhaps view the responses of several people here, not just me, as a sign that you had indeed not made yourself clear enough. I understand clearer now what you want, but I still don't understand either what it is supposed to be good for, nor how it can be useful and for whom. Particularly if you include non-problems that are only perceived by some to be a problem. That is going to include any amount of nonsense people can come up with, which is close to infinite. See your example with the old lady and the peas. Her kind of problem can be multiplied limitlessly. Want to test my imagination?

You are right that you didn't limit this to ABX testing, and I owe you an apology regarding this. Nevertheless, I still think that your attempt is ill-conceived and poorly justified. If you are doing this for some kind of scientific undertaking, that makes it worse.

Title: List of typical problems and shortcoming of "common" audio t
Post by: Bublic on 2015-12-18 19:10:33

And from here, in the tests, taken sound samples? Maybe they were corrupted 128-32-16bit and etc. types recalculation?

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-18 23:22:11

I'll start from scratch next week then.

What makes me really sad is that there seems to be strictly no attempt to really understand what I mean. I'm not saying that I made t easy. I understand it could have been better. I understand that what I'm doing is not familiar to you. However I find very tiring to try and explain more what I did just to find it ditched because of the choice of word at ond place or because someone makes one huge generalisation and refuse to hear any further adjustement or explanation.

I would have largely preferred that you told me what you did not understand and what you think you understood. After reading all that conversation again, the main problem we had is that we had different definitions of "problems". Come on, words are tools to express a meaning. If I tell you that what I call problem is not what you call problem, you don't have to change your own definition, but at least you should try to understand what I said with the definition that I gave you and suggest a better word. The second one is that you seem not to understand that I want to take psychological issues into account. You can repeat any time you want that this is not a problem with audio tests, if I tell you, I, the guy who created the thread, that I want to take them into account, then accept it as a part of the request and don't ditch it because it does not correspond to your definition of a problem in audio tests. Again, if I was silent and you didn't understand, that would be ok. But I spent too much time trying to refine and explain what I meant just to get " So what? This is not a problem in my dictionnary".

What I meant about DBT in the medical field was not to be historically correct. I just wanted to stress out that this is a non-obvious solution (or we would have started with it right away) to a psychologocal problem that had a great negative impact on results. You may consider that these psychological bias were a problem OF drug testing or you may consider they are NOT even related... Thats purely irrelevant: these bias had great effects on the results of drug tests and something had to be done about it for people to trust the results. That's just about it.

And here we come to the last part... Trust. In fine, science is all about trust. You observe, model, understand, expand and build knowledge. The scientific method has been developed and refined years after years to try and reach unbiased results. Why would you apply these methods? To be sure that you avoid the pitfalls that would transform your great measurements, models and results into pure crap. In other words, you want to trust and have confidence in your results.

But now, what's the use if the people in general have no trust in what you did? You can try to convince them. Sometimes you will be successful and sometimes you won't. From there, you can just not give a crap about thoses people and their flawed beliefs or you can, at least, try to understand them and try to make them understand. For instance (again, just an example, not something I wanna do or defend), maybe most people would come and accept ABX tests and results with minor changes. Letting them test on a month span, asking them for similar properties ("this sample has more profound bass, that one too") instead of directly asking them "is it A or B?". That may sound minor to you but that could make a lot of difference, while keeping the same rigorous testing framework (or not, but that typically the kind of thing I'll ask you in the future when the time for actual problem solving will come).

Before you object, let me make this clear. I, as the creator of this thread, consider that any psychological issue and bias that could undermine the trust of people in general in any audio test, is worthy to appear here and to be (ultimately) discussed. Because, once again, a test that no one trust, for valid or invalid reasons is useless. And, once again, that is not to say that any bias based on invalid reasons will be addressed, but to exaggerate a little bit, if tomorrow magazine X tels that only software in the color red should be trusted, it basically costs nothing to make it red and won't impair the results in any mean. Yet that would bring more trust for free (or not... I personally wouldn't like that, but ultimately this has nothing to do wih the quality of the device).

If that's not clear, please tell me. If you want to dismiss something then REALLY explain why.

Title: List of typical problems and shortcoming of "common" audio t
Post by: Soap on 2015-12-18 23:50:00

The problem of trust only exists when trying to argue with unseen opponents on the internet. In the lab one can tell if someone is cheating.

You're chasing problems which, I'll admit, I don't value for they appear to only be problems of arguing with strangers, not problems researchers experience.

PLEASE, for the last time, tell me how these problems affect anything other than internet arguments with strangers you can't trust!

Title: List of typical problems and shortcoming of "common" audio t
Post by: Bublic on 2015-12-19 00:34:36

MMime
My question was probably still technical. It is logical to have on hand a really good tool, but without the quality of samples is not possible. To test lossy codecs apparently enough accuracy existing set, which was made through any audio editor by converting to 32bit float (I do not know this reliably). However, you have written that expand the boundaries of the application yours concept, and then I naturally became interested in the subtleties. However, it is not important. I am here for the casual observer, but you are free to do as you see fit to do.

Title: List of typical problems and shortcoming of "common" audio t
Post by: pelmazo on 2015-12-19 01:02:27

Quote from: MMime on 2015-12-18 23:22:11

I would have largely preferred that you told me what you did not understand and what you think you understood. After reading all that conversation again, the main problem we had is that we had different definitions of "problems". Come on, words are tools to express a meaning. If I tell you that what I call problem is not what you call problem, you don't have to change your own definition, but at least you should try to understand what I said with the definition that I gave you and suggest a better word. The second one is that you seem not to understand that I want to take psychological issues into account. You can repeat any time you want that this is not a problem with audio tests, if I tell you, I, the guy who created the thread, that I want to take them into account, then accept it as a part of the request and don't ditch it because it does not correspond to your definition of a problem in audio tests. Again, if I was silent and you didn't understand, that would be ok. But I spent too much time trying to refine and explain what I meant just to get " So what? This is not a problem in my dictionnary".

Ok then, here's what I still don't understand: If you include psychological issues in your definition of the word "problem", that makes the list you are trying to put together potentially infinite. Are you realizing this? How are you going to deal with it? What is the point of making the task so unwieldy?

As an example, if I told you that a potential problem of such a listening test was, that some people might want to have a dowser work on the test site before testing, to make sure that there are no negative earth rays that could hamper the test, would that be a welcome addition to the list, or would you think I'm trolling? If it should be welcome, where do you stop?

Quote

And here we come to the last part... Trust. In fine, science is all about trust. You observe, model, understand, expand and build knowledge. The scientific method has been developed and refined years after years to try and reach unbiased results. Why would you apply these methods? To be sure that you avoid the pitfalls that would transform your great measurements, models and results into pure crap. In other words, you want to trust and have confidence in your results.

But now, what's the use if the people in general have no trust in what you did? You can try to convince them. Sometimes you will be successful and sometimes you won't. From there, you can just not give a crap about thoses people and their flawed beliefs or you can, at least, try to understand them and try to make them understand. For instance (again, just an example, not something I wanna do or defend), maybe most people would come and accept ABX tests and results with minor changes. Letting them test on a month span, asking them for similar properties ("this sample has more profound bass, that one too") instead of directly asking them "is it A or B?". That may sound minor to you but that could make a lot of difference, while keeping the same rigorous testing framework (or not, but that typically the kind of thing I'll ask you in the future when the time for actual problem solving will come).

Trying to increase your own trust in your findings is quite a different thing from trying to gain someone else's trust in them. I think we've all seen instances of people not trusting your result no matter what you say or do. Relativity, evolution, even the landing on the moon still are being denied by many people despite overwhelming evidence. I doubt that anything can be done about this. Even if you understand perfectly why they reject the evidence, it doesn't help much. Those people exist in audio, too, and they appear here regularly and engage us in discussions.

Quote

Before you object, let me make this clear. I, as the creator of this thread, consider that any psychological issue and bias that could undermine the trust of people in general in any audio test, is worthy to appear here and to be (ultimately) discussed. Because, once again, a test that no one trust, for valid or invalid reasons is useless. And, once again, that is not to say that any bias based on invalid reasons will be addressed, but to exaggerate a little bit, if tomorrow magazine X tels that only software in the color red should be trusted, it basically costs nothing to make it red and won't impair the results in any mean. Yet that would bring more trust for free (or not... I personally wouldn't like that, but ultimately this has nothing to do wih the quality of the device).

The single most prominent reason that makes people distrust a listening test, as far as I have seen, is when the test yields the "wrong" result in their opinion. I have seen instances when people "discover" after the test that they were stressed during the test even though they had negated it before hearing the result. Those people are quite capable to invent excuses after the test, and even believe in them honestly and without maliciousness. If you have a recipe against that, please tell. I don't know of any.

The case that no one trusts a test is a quite artificial one, by the way. You are usually going to encounter trust in some people, and distrust in others. Therefore, you will have to consider in advance whose trust you are interested in. You can forget about winning the trust of everybody. So what is your goal? Do you seek the trust of scientifically minded, rational people, who are capable of understanding a test design and draw their conclusions? Or do you address the layman who hasn't got any idea how to conduct a good listening test? Or is your target the audiophile who distrusts controlled listening tests in principle, because of their habit of producing unwelcome results? Or who else?

Quote

If that's not clear, please tell me. If you want to dismiss something then REALLY explain why.

I'm not dismissing it. I just don't think what you are attempting makes much sense. That may be because of what you want, or because of how you explain it. I still don't know, or I misunderstand it.

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-19 01:17:19

Thanks Soap, that's clearer to me.

I am thinking about two different things mainly (these are only projects, I just began experimenting with, I can't guaranty that I will be able to work on them for a long time):

1) I want to provide a simple audio encoder with an encoding guidance. Basically, you would answer a small series of questions (where would you hear this? With what?), perform some rough hearing tests and suggest an encoder with sensible options. This tool would also provide a few functions to test the complete audio pipeline if possible. And, finally, since nothing really replaces the real thing, I would like to provide difference and preference testings. For differences, the ABX methodology is obvious to me. But not to everybody. I want people to be able to find something good enough for them in 30 seconds if they just want to invest that time. But at the same time, if they want to invest 2 months in it, I personnally don't care, that's up to them. However, I don't want to provide any methodology that is not rigorous and scientifically accepted. Period. So no non-blind testing at all for instance (if they want that, they would have to do it themselves). But at the same time, I don't want to let down users who don't trust foobar ABX results: some of them don't trust these results for invalid but easily 'fixable' reasons (a workaround that does not impair the quality of the results is implementable). But that requires to actually listen to people gibberish and to compose with that (again, without introducing flaws, that's not the purpose). That's why there are invalid elements in my list: these are problem in the mind of some people and while these problem are not real, a few of them could still be 'fixed'. This application could become a foobar plugin or be an open source app, don't know yet.

With that in mind, I could implement something, ask for comments here and in some other places and randomly fix things (just like anyone would do). The advantage of building the tree I talked to you about is to see the cause-consequence relationship between what *I* consider as issues... E.g. "oh, so my users don't trust my software, why is that? ah there are 10 reasons... But 7 of them would already disapear if I only fixed this single issue. Is there a way to achieve that while keeping the scientific quality?". The other advantage is to identify trade-offs: "I can solve this whole bunch of problems either this or that way". Of course you have to trust me for now, but this is why I do it this way.

2) In parallel, I want to explore what exactly is testable online (i.e. less sensitive to cheating, more trustable). I can see you reprobative look... I'm not saying that the system would be cheat-proof. Just as you said: if you really want to trust results, do it in a lab where people are monitored. No argument there. However there are ways to remove the urge to cheat or to make it far more complex to do (in some specific cases), so while you cannot trust the results, there are things that could be tried, even to check for the cheatability of them. For instance, I was talking about including hearing tests in the application. If the result of the hearing test is "you can hear up to 16803Hz", not only would that been done without any uncertainty considerations, that would also encourage cheating. Now, if you say: "You can hear the full range of what the best ears can" (not great, but the meaning is there), why would they cheat? They would get an advice: don't use a cutoff below 20kHz. At the same time, I would get 16803Hz in the logs... In the same field (audibility of high frequencies), I could provide files to ABX, one with the real HD data, the other, at the same sample rate but with shaped noise above 20kHz. Seeing no difference would not prove that the is no difference between reasonable sample rates and "HD", however that would suggest that the same effect can be obtained by taking any CD, upsampling and adding noise. The other positive effect is that differences due to aliasing in moronic setups *may* disapear. Etc.

In that case, having a tree (or even a simple list) of "issues" like "people can easily differentiate different sample rates in a spectrogram" or "ABX false positive can come from aliasing" or "people tend to do whatever possible to get a big number associated to their name" merely allows to think about trying such things and avoiding common pitfalls at the same time (again you have to trust me, but if you imagine chains of "problematic" things with their causes and consequences, that shoukd be quite obvious).

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-19 01:47:55

Quote from: pelmazo on 2015-12-19 01:02:27

Ok then, here's what I still don't understand: If you include psychological issues in your definition of the word "problem", that makes the list you are trying to put together potentially infinite. Are you realizing this? How are you going to deal with it? What is the point of making the task so unwieldy?

In theory yes. In practice, you'll naturally stop. Once I make the first tree, you'll see that you won't reach infinity. That's also why I'll only let people add things to the tree for a limited amount of time.

Quote

Trying to increase your own trust in your findings is quite a different thing from trying to gain someone else's trust in them. I think we've all seen instances of people not trusting your result no matter what you say or do.

True, but I'm sure there are a few things easily done that would bring non-hardcore scientists to trust and see the values of these methodologies. For some it would require a small explanation, for other you'd have to provide a red-colored theme and for others, there is absolutely nothing you can do.

Quote

The single most prominent reason that makes people distrust a listening test, as far as I have seen, is when the test yields the "wrong" result in their opinion. I have seen instances when people "discover" after the test that they were stressed during the test even though they had negated it before hearing the result.

I was under the same impression from what I've read. But see, the "stress", you can simply ditch it saying this is posterior justification for failure. But if you provide an environment that cannot, in anyway, be thought as stressful that's a (small) victory. Now, with a tree, it would be easy to look at what "stressful" really means for a bunch of people... And I may very well figure out that providing a transcoding app with an ABX function to use as a "quick check" is considered far less stressful than a pure ABX software...

Quote

I'm not dismissing it. I just don't think what you are attempting makes much sense. That may be because of what you want, or because of how you explain it. I still don't know, or I misunderstand it.

Does it make more sense now? Do I need to explain something else? To go into more details?

Title: List of typical problems and shortcoming of "common" audio t
Post by: Soap on 2015-12-19 02:28:58

Quote from: MMime on 2015-12-19 01:17:19

1) I want to provide a simple audio encoder with an encoding guidance. ..in 30 seconds if they just want to invest that time.

No such ability is known.

You are literally asking for a short cut through unconscious bias removal through blind testing AND the reduction of uncertainty through multiple trials.

The discovery of a way to accomplish those goals in anything approaching 30 seconds would be Nobel worthy.

Quote from: MMime on 2015-12-19 01:17:19

I don't want to provide any methodology that is not rigorous and scientifically accepted. Period. So no non-blind testing at all for instance (if they want that, they would have to do it themselves).

As I said. 30 seconds is orders of magnitude too short to accomplish your stated goals.

Quote from: MMime on 2015-12-19 01:17:19

But at the same time, I don't want to let down users who don't trust foobar ABX results: some of them don't trust these results for invalid but easily 'fixable' reasons (a workaround that does not impair the quality of the results is implementable).

You can not do this remotely. Full stop.

You can not prevent cheating by a dedicated cheater. Even assuming you sent them testing hardware sealed in Lucite they could still record the output of their speakers and cheat (the classic "analog hole"). Trying to prevent cheating by people whose behavior has no impact on you is a foolish errand. Learn to accept that which you can not change.

Quote from: MMime on 2015-12-19 01:17:19

With that in mind, I could implement something, ask for comments here and in some other places and randomly fix things (just like anyone would do). The advantage of building the tree I talked to you about is to see the cause-consequence relationship between what *I* consider as issues... E.g. "oh, so my users don't trust my software, why is that? ah there are 10 reasons... But 7 of them would already disapear if I only fixed this single issue. Is there a way to achieve that while keeping the scientific quality?". The other advantage is to identify trade-offs: "I can solve this whole bunch of problems either this or that way". Of course you have to trust me for now, but this is why I do it this way.

This is magic seeking. Magic does not exist. You can not prevent cheating by those using your software and thinking that if only you had the perfect logic diagram those facts would be different is flawed thinking.

Quote from: MMime on 2015-12-19 01:17:19

2) In parallel, I want to explore what exactly is testable online (i.e. less sensitive to cheating, more trustable). I can see you reprobative look... I'm not saying that the system would be cheat-proof. Just as you said: if you really want to trust results, do it in a lab where people are monitored. No argument there. However there are ways to remove the urge to cheat or to make it far more complex to do (in some specific cases), so while you cannot trust the results, there are things that could be tried, even to check for the cheatability of them.

How is this a second point and not literally a restatement of the first? Regardless - analog hole. You can't trust remote testers, end of story. Stop banging your head against the wall.

Quote from: MMime on 2015-12-19 01:17:19

For instance, I was talking about including hearing tests in the application. If the result of the hearing test is "you can hear up to 16803Hz", not only would that been done without any uncertainty considerations, that would also encourage cheating. Now, if you say: "You can hear the full range of what the best ears can" (not great, but the meaning is there), why would they cheat? They would get an advice: don't use a cutoff below 20kHz. At the same time, I would get 16803Hz in the logs... In the same field (audibility of high frequencies), I could provide files to ABX, one with the real HD data, the other, at the same sample rate but with shaped noise above 20kHz. Seeing no difference would not prove that the is no difference between reasonable sample rates and "HD", however that would suggest that the same effect can be obtained by taking any CD, upsampling and adding noise. The other positive effect is that differences due to aliasing in moronic setups *may* disapear. Etc.

Another restatement. Let's try a different track: why are you personally invested in them not cheating?

Quote from: MMime on 2015-12-19 01:17:19

In that case, having a tree (or even a simple list) of "issues" like "people can easily differentiate different sample rates in a spectrogram" or "ABX false positive can come from aliasing" or "people tend to do whatever possible to get a big number associated to their name" merely allows to think about trying such things and avoiding common pitfalls at the same time (again you have to trust me, but if you imagine chains of "problematic" things with their causes and consequences, that shoukd be quite obvious).

The very people who motivate you to create a fool-proof test are the very people who have no interest in complying. You can not make the horse drink.

Title: List of typical problems and shortcoming of "common" audio t
Post by: Bublic on 2015-12-19 02:48:32

Quote from: MMime on 2015-12-19 01:17:19

In the same field (audibility of high frequencies), I could provide files to ABX, one with the real HD data, the other, at the same sample rate but with shaped noise above 20kHz. Seeing no difference would not prove that the is no difference between reasonable sample rates and "HD", however that would suggest that the same effect can be obtained by taking any CD, upsampling and adding noise. The other positive effect is that differences due to aliasing in moronic setups *may* disapear. Etc.

You have it appears, grand plans! Better at once - jolt of electricity through the body. Noise is not the music signal, his presence and to hear if the subject, then realizes it's somewhere in a month. Have you already made this the wrong conclusions. Recognize, you live in some totalitarian country?

Title: List of typical problems and shortcoming of "common" audio t
Post by: ajinfla on 2015-12-19 11:15:43

Quote from: MMime on 2015-12-18 23:22:11

I understand that what I'm doing is not familiar to you.

Actually you don't, or more accurately, can't. It's impossible for you to ever be cognizant of this, which makes it even funnier.

Title: List of typical problems and shortcoming of "common" audio t
Post by: MMime on 2015-12-19 12:41:06

Quote from: Soap on 2015-12-19 02:28:58

Quote from: MMime on 2015-12-19 01:17:19
1) I want to provide a simple audio encoder with an encoding guidance. ..in 30 seconds if they just want to invest that time.

No such ability is known.

You are literally asking for a short cut through unconscious bias removal through blind testing AND the reduction of uncertainty through multiple trials.

The discovery of a way to accomplish those goals in anything approaching 30 seconds would be Nobel worthy.

Once again, you are not reading. I say something not familiar to you so you try to force it into something familiar. The problem is that by doing this is that you determine what I want out of what you think and not what I said.

Can't you understand that the majority of the population outside of this forum consider reaching transparency with absolute certainty as useless nitpicking? Mostly because they are not interested in general findings, academic research and because all they want is to find something good enough for them. Where you see a successful attempt at differentiating two samples, they see a lunatic who looped over 200ms hundreds of times to look for a click, and even finding it that someone would still be wrong 1/3 of the time. They don't want to invest such a time for such a small value. Without judgement here, can you understand that a 90% certainty that 99% of the songs will sound almost exactly as intended is enough for many people (as opposed to a 99% certainty that 99% of the songs will sound exactly as intended)?

On the other hand, I don't like to see people performing non-blind tests to settle on encoder options. I want to, at least, let these people do it properly. At the top of that, some other people want to be sure to reach the absolute tranparency for their ears.

But as I warned, I wouldn't tolerate any more mind reading instead of proper actual reading, I'll stop there with you as you seem incapable of asking for more details or explanations. The "you are litterally asking..." stuff is a blattant example of that: where the fuck did I say that I wanted people to do in 30 seconds what takes hours for others to obtain? I stated the mere fact that some people will want to invest only 30 seconds, other people will want to invest months. And I want to be as useful as possible to all of them with the given time. But "being as useful as possible" does not mean "provide the same information with the same guaranty".

Another blattant example of mind reading is all the stuff about "magic thincking" and cheating in reference to my first point. My first point is NOT about remote tests. It's about a local program to determine which codec use for yourself. Even though I made a link in the second point (stating it as a possibility, not something to do) with the program of the first point, these are for two very different purposes. So all the cheating related comments regarding the first point are only the result of your mind misinterpreting things, not what I said. Read again...

Regarding the rest, all I read was "experimentation is futile, you absolutely won't find anything interesting, stay with the status quo please as our system is perfect with the most perfect trade off" (to what I insist, I'm glad DBT has been introduced BEFORE you reached such an enlightenment).

On these nices words, I'm going away.

Title: List of typical problems and shortcoming of "common" audio t
Post by: Soap on 2015-12-19 12:55:38

Quote from: MMime on 2015-12-19 12:41:06

Once again, you are not reading. I say something not familiar to you so you try to force it into something familiar. The problem is that by doing this is that you determine what I want out of what you think and not what I said.

Incorrect. I believe I understand fully what you are saying and cut to the crux. If I do not how about restatement?

Quote from: MMime on 2015-12-19 12:41:06

Can't you understand that the majority of the population outside of this forum consider reaching transparency with absolute certainty as useless nitpicking?

Read what I said again. I said nothing about that. You're putting words in my mouth.

If anything your stated goal of listener/encoding software user testing for anything other than transparency is even more silly. "Good Enough" is a known quantity, and the difference between "Good Enough" and "Transparent for all but the oddball samples" is so slender that there is little point chasing it. What do you feel you're going to accomplish through a hearing test of your users? Is lowering the lowpass of an encoder 1K going to produce results either smaller or higher quality enough to make a difference in everyday life?

For if you aren't trying to give the users of your encoding software transparency why should they be arsed to take a test or answer questions? Where have common encoders and their default settings let users down? What is the problem you feel needs solved?

Quote from: MMime on 2015-12-19 12:41:06

The "you are litterally asking..." stuff is a blattant example of that: where the fuck did I say that I wanted people to do in 30 seconds what takes hours for others to obtain? I stated the mere fact that some people will want to invest only 30 seconds, other people will want to invest months. And I want to be as useful as possible to all of them with the given time. But "being as useful as possible" does not mean "provide the same information with the same guaranty".

Then instead of cursing the wind restate an example of what you hope to accomplish in 30 seconds and how it will be better than what we have now (sane defaults).

For, as I said "Good enough" is a known quantity and the line between "Good enough" and "corner case" can not be discovered in 30 seconds or through your software's questionnaire.

Still none of this addresses why you want to prevent "cheating." A question which has been on the table for 24 hours now.

Quote from: MMime on 2015-12-19 12:41:06

Regarding the rest, all I read was "experimentation is futile, you absolutely won't find anything interesting, stay with the status quo please as our system is perfect with the most perfect trade off" (to what I insist, I'm glad DBT has been introduced BEFORE you reached such an enlightenment).

I attack ideas. You attack people.

Title: List of typical problems and shortcoming of "common" audio t
Post by: Bublic on 2015-12-19 13:49:08

Brainstorming sometimes it's not such a bad idea, if the brain is not the only one. But, of course, enlisted the permission of administration HA.

Title: List of typical problems and shortcoming of "common" audio t
Post by: pelmazo on 2015-12-19 15:16:13

Quote from: MMime on 2015-12-19 01:47:55

In theory yes. In practice, you'll naturally stop. Once I make the first tree, you'll see that you won't reach infinity. That's also why I'll only let people add things to the tree for a limited amount of time.

That'll make your list rather random. Is that good enough for you?

Quote

True, but I'm sure there are a few things easily done that would bring non-hardcore scientists to trust and see the values of these methodologies. For some it would require a small explanation, for other you'd have to provide a red-colored theme and for others, there is absolutely nothing you can do.

I find it pretty hopeless to come up with a-priori solutions to the non-rational of those problems. I don't know where you get your hope from. If you provide a red-colored theme, the next fellow will want a pink one. Why bother?

My own experience points in a completely different direction: You ought to try to come up with a test that can stand up to the rational objections. That's difficult enough already. Convince yourself first that you have something solid and credible. If you are confident, and can demonstrate and explain your considerations, you have a better chance of convincing others, if they are convincable at all. As with all statistical tests, absolute certainty (a term that you seem to be fascinated with) is out of reach anyway. It all comes down to raising the confidence level. If that doesn't help, trying to come up with the right "theme" is futile IMHO. When non-scientific people don't trust a test, what they usually need is more education about testing, and not a superficial change in the test to make them feel better. Or perhaps the test is indeed dubious, and their scepticism is warranted, after all, but that can only be clarified in a rational discussion.

I acknowledge that my stance is somewhat anti-marketing and contains a dose of scientific arrogance. I stand by that. There are things that need to be understood before they can be appreciated, so there's no substitute for learning.

Quote

I was under the same impression from what I've read. But see, the "stress", you can simply ditch it saying this is posterior justification for failure. But if you provide an environment that cannot, in anyway, be thought as stressful that's a (small) victory.

It would be a small victory if it were possible. In practice, the stress argument is a joker argument that can always be played, regardless of the details of the test, because stress needs no external cause. You can of course reject such excuses after the fact, and I would be with you, but that won't deter the others from playing this card. I have seen this happening before.

Quote

Does it make more sense now? Do I need to explain something else? To go into more details?

I think I understand. And I still think what you are trying is pointless.

And, no, this doesn't mean that "experimentation is futile, you absolutely won't find anything interesting, stay with the status quo please as our system is perfect with the most perfect trade off". You demonstrate that you are happy to commit the exact offense you accuse others of. I have seen nobody state that experimentation is futile, or that we have an already perfect system. Your exaggeration doesn't clarify, it makes you look offensive and abrasive. By all means experiment all you want, perhaps you come across a gold nugget. There's always a chance. But if you engage others, try not to waste their time.

Title: List of typical problems and shortcoming of "common" audio t
Post by: Bublic on 2015-12-19 17:09:20

By Post #40
And by the way and if we talk seriously, if you will append a constant electrical signal above 20 kHz, while analog amplifier transistors can switch from its standard operating mode class AB in сlass A. In theory, the listener can choose a noisy like a more natural and correct! (THD A < THD AB)

HydrogenAudio

Hydrogenaudio Forum => Listening Tests => Topic started by: MMime on 2015-12-15 22:41:11