lecture: Critical listening/evaluation - a path to the future of quali

Topic: lecture: Critical listening/evaluation - a path to the future of quali (Read 95557 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

lecture: Critical listening/evaluation - a path to the future of quali

Reply #125 – 2009-06-10 11:03:34

Quote from: Stereoeditor on 2009-06-10 02:37:14

Quote from: Arnold B. Krueger on 2009-06-10 00:46:37

False negative does not always mean identically the same as Type II error.

Others disagree: http://en.wikipedia.org/wiki/Type_I_and_type_II_errors

As usual John, you misrepresent the references you cite. Just because a type two error is on occasion called a "false negative" does not mean that the only meaning of "false negative" is a type 2 error. In the recent past you've even made a big show out of observing in a response to me that just because all house cats are felines does not mean that all felines are house cats. I guess that not only is your understanding of logic and rehetoric fading, you can't even remember things youve written in the recent past.

Quote

Quote
Now getting back to the chase John, why can't you provide us with unambigious evidence that high rez recording techniques provide even a different sound, let alone a better sound.

As I said in the message to which you originally posted, Mr. Krueger, a group of AES illuminati are working on exactly that project.

So what? I understand that those people are also eating and sleeping, but I'm not depending on them to do my eating and sleeping for me.

Furthermore, just because those people are working on a project does not mean that they will ever complete it with an unambigious result. As I understand it, this is a voluteer project - none of them are being paid by the AES for their time and talents. Therfore, they are under no obligation to make this project a high priority, or give it any priority at all.

Quote

Why am I obliged to duplicate their efforts?

Because John your past claims and comments have painted the picture that this should be a relatively easy project. You and/or your magazine has never said that it would take a team of AES illuminati to set up an audio system or make a recording that would unambigiously demonstrate the benefits of high sample rate recordings. You've made representations that would seem to say the exact opposite.

John, are you now recanting on your past statements and now saying that only AES illuminati can set up audio systems and make recordings that unambigiously demonstrate the benefits of high sample rate recordings?

John, are you now recanting on your past statements and now saying that only with great difficulty and extreme expense can audiophiles and music lovers set up audio systems and make recordings that unambigiously demonstrate the benefits of high sample rate recordings?

lecture: Critical listening/evaluation - a path to the future of quali

Reply #126 – 2009-06-10 18:22:32

Quote from: Stereoeditor on 2009-06-09 23:39:14

No, that was George Massenburg. Any reference I have made in this thread to difference tracks involved some dems I did last year.

OK, so the 'dem' we're talking about now featured you saying things like 'notice the change in rhythm here...the loss of low-level detail here..' as you presented A and B...things like that? Feel free to be specific and correct me if I'm wrong.

Quote

Quote
...and then you played them the Frankenstein track that started with lossless 'hi rez' and ended with a 2003 vintage 128kbps encode. No spoken intro, just something like, 'Now listen to this'. And then afterwards you asked them something like 'what did that sound like'? And during the 'discussion' some number of people said something like 'well, it started out great, but sounded worse and worse".

Close?

But no cigar. I described what happened is precise terms. I refer others to that description.

I'd be happy to refer readers of this thread to precise terms too... how about a link? Or just restate them.

Quote

Quote
Did any of the listeners note a three-stage reduction in quality? Did any just note that it sounded worse at the end than at the start?

As I wrote, most listeners felt that there had been a degradation. That was all I felt necessary. given the premises underlying the presentations that I have described in earlier postings.

So, you said 'What did that sound like to you?" and they said 'That sounded worse at the end than it did at the beginning?' Or what? Feel free to be specific

(post split in two because of quote restriction)

lecture: Critical listening/evaluation - a path to the future of quali

Reply #127 – 2009-06-10 18:23:44

(reply continued from above)

Quote from: Stereoeditor link=msg=0 date=

Quote
Now, let me see if I have it finally straight:
-you don't think the number of listeners who replied ...and how they replied...matters as much as the fact that 10 groups of 20 'audiophiles' were involved

I have not said anything like that. I refer you to my previous posts.

Where, IIRC you indicate it was groups of 20, and a total of 200, but no clear breakdown of who said what. Let's use this thread to get all the specifics in one place, shall we? Because it's you who seems to think this 'dem' -- and particularly the audience response -- proved something about the sound of the samples you presented. Let's interrogate that claim.

Quote

Quote
-you seem to think your 'teaching demo' (which may have grossly exaggerated the audible effects) did not load any expectations into the 'audiophile' audience during the 'single blind listening demo'.

Again I didn't say that.

Hence the word 'seem'. But does this mean you think you may have loaded the audience of audiophiles with at least some expectations of difference?

Quote

Quote
So, any recordings of this event?

Sorry, no.

Pity. Memory can be so unreliable.

lecture: Critical listening/evaluation - a path to the future of quali

Reply #128 – 2009-06-10 22:39:22

Oh, great. Another massive slugfest between JA and Arny. Just what I always wanted to read.

So far, the biggest take home points I'm seeing are

JA does not understand what an ad hominem attack is
Arny does a great Statler/Waldorf impression
Everybody seems to be insulting everybody else in this thread
even widely respected audio engineers like Massenberg can get the theory wrong (which meshes nicely with the debate over Dr. Kunchur's papers at the Stereophile forums, suggesting that PhDs in physics can also get the theory wrong)
PRaT actually does have a justifiable meaning in the case of compressors and other nonlinear devices that operate primarily in the time domain over a period of many milliseconds. But virtually all linear filters (to say nothing of other nonlinear effects like quantization error) don't fit that description
The AES refused to print a letter of criticism about Meyer/Moran from acknowledged pros? Wow, that kinda supports the whole 'cabal' conspiracy theory some people keep ranting on about. Bad form.

Beyond that, I'm not seeing any new points in this thread that haven't been discussed before with JA here; AFAIK I am not aware that these objections have ever been satisfactorily resolved:

The usage of difference signals as a methodology for identifying sound quality (and more specifically the lack of quality in lossy codecs) is fundamentally flawed, due to the explicit rejection of masking effects (which are the entire point of lossy encoding in the first place)
The MP3 encoders available in Audition and Pro Tools are very strongly believed to not be representative of the sound quality of commonly available lossy encodings today, and are strongly believed to be substantially worse - thus using them is something of a strawman
While type II error does exist, IMHO it is strongly exaggerated in most listening situations, and might even be under 5% for many ABX tests conducted today

lecture: Critical listening/evaluation - a path to the future of quali

Reply #129 – 2009-06-10 23:04:30

Quote from: Gag Halfrunt on 2009-06-10 10:02:17

Calling Mr Atkinson a troll is probably taking things a bit far. This is something of a hostile environment for the man and he's hardly likely to convert HA members to Stereophile readers. That in and of itself deserves some kudos, at least compared to those who hide behind their editorial pulpits.

Hi, HHGTTG man: I'm not talking about Atkinson's life-role, just his participation here.

The issue is very simple: however much perceived differences might require metaphorical and imprecise language to describe in terms of subjective perception, DBT is the only way to be absolutely certain that the differences in subjective experience are the result of audio differences, and not other factors. Anybody who denies this just ain't interested in the truth, and in the worlds of wine tasting and the grading of student work in English Literature (both fields not normally noted for scientism), something like DBT is accepted as a way of checking on results.

It is, however, always open to someone to claim that a particular method of DBT is in some way flawed, and doesn't permit expert listeners to perform at their best. In such a case, what someone interested in approximating to the truth would do would be to work on some mode of DBT which isolates the audio component of experience, whilst not interfering with proper critical listening. In a somewhat co-operative world, people on both sides of the debate would work out a number of possible methods of DBT, in an attempt to establish the limits of human hearing, and the abilities of those few people we know to exist who have statistical outlier hearing.

Atkinson is just prolonging this thread by quibbling about minutiae, verbal details of posts, and the other things beloved of fifteen-year-old competitive debaters.

I guess Atkinson may be in part motivated by a desire for revenge for participation by HA members in woo forums, where, doubtless, some participants see the cold voice of reason as a form of trolling. I've got a certain sympathy for that view, since such forums seem to contain people who are bolstering a flaky self image by claims for super hearing powers, and the willingness to spend a lot of money. They are sincere, in that they are not driven by ulterior motives, though not, in the Existentialist sense, authentic: at some level they've got to have doubts about what they're saying. And, of course, some HA people can get pretty abrasive in their pursuit of truth, and sometimes mistake subtleties, nuances, and genuine questions for endorsement of woo.

I actually think Atkinson probably knows woo when he sees it. At the most favourable, you could see his position as being like that of a steely-minded Vatican bureaucrat forced to grit his teeth and go along with the canonization of Padre Pio, because saying what he really thinks would disturb the faithful. But that is probably too charitable a view of where he really stands.

lecture: Critical listening/evaluation - a path to the future of quali

Reply #130 – 2009-06-10 23:48:46

Quote from: MichaelW on 2009-06-10 23:04:30

The issue is very simple: however much perceived differences might require metaphorical and imprecise language to describe in terms of subjective perception, DBT is the only way to be absolutely certain that the differences in subjective experience are the result of audio differences, and not other factors. Anybody who denies this just ain't interested in the truth, and in the worlds of wine tasting and the grading of student work in English Literature (both fields not normally noted for scientism), something like DBT is accepted as a way of checking on results.

Never use the words 'absolutely certain' if you are going to rely on audio DBTs -- or scientific experiments generally. The results are going to be probabilities, not absolute certainties. That's plenty good enough for science...and BETTER than sighted comparison!

lecture: Critical listening/evaluation - a path to the future of quali

Reply #131 – 2009-06-11 01:13:50

Quote from: MichaelW on 2009-06-10 23:04:30

The issue is very simple: however much perceived differences might require metaphorical and imprecise language to describe in terms of subjective perception, DBT is the only way to be absolutely certain that the differences in subjective experience are the result of audio differences, and not other factors. Anybody who denies this just ain't interested in the truth, and in the worlds of wine tasting and the grading of student work in English Literature (both fields not normally noted for scientism), something like DBT is accepted as a way of checking on results.

A DBT is the most robust method we currently have of determining whether there is or is not a difference in audio performance. Does that mean it will remain the only method? Who knows? Someone may come up with an equally valid method of testing today, tomorrow or in 20 years time that demonstrates superiority over DBTs in key areas.

This is not so far-fetched as it first appears. What if a method of longitudinal testing was found to return more consistent, reliable and repeatable results that DBTs, for example? Would you jump to dismiss such findings as not truth-seeking?

Note, this does not let simple sighted-test subjectivism in by the back door. That's already dismissed as flawed. Note also that even the most stringent test does not guarantee certainty, because any scientific test is subject to refutation and falsifiability.

lecture: Critical listening/evaluation - a path to the future of quali

Reply #132 – 2009-06-11 02:44:23

Quote from: MichaelW on 2009-06-10 23:04:30

It is, however, always open to someone to claim that a particular method of DBT is in some way flawed, and doesn't permit expert listeners to perform at their best. In such a case, what someone interested in approximating to the truth would do would be to work on some mode of DBT which isolates the audio component of experience, whilst not interfering with proper critical listening. In a somewhat co-operative world, people on both sides of the debate would work out a number of possible methods of DBT, in an attempt to establish the limits of human hearing, and the abilities of those few people we know to exist who have statistical outlier hearing.

Here's how the claims that a particular method of DBT is in some way flawed come to be. The claimant *knows* (based on sighted evaluations and other equally falliable means) that the audible difference exists. The DBT does not support his belief. Therefore the DBT is flawed. Thats how it was when we first started presenting DBTs to high end audiophiles back in the late 1970s, and that is how it is today.

Here is how to estimate the sensitivity of a particular method of doing listening tests. Contrive a series of tests where the audible difference can be reduced in increments, and see how the listener sensitivity for the audible difference stacks up in comparisons with other means of doing tests. I've done this many times, and if anything ABX equals or betters other testing methods.

lecture: Critical listening/evaluation - a path to the future of quali

Reply #133 – 2009-06-11 06:30:44

Quote from: Arnold B. Krueger on 2009-06-11 02:44:23

I've done this many times, and if anything ABX equals or betters other testing methods.

What are the (DBT) alternatives to ABX ?
Are there significant differences in the results between various protocols ?

lecture: Critical listening/evaluation - a path to the future of quali

Reply #134 – 2009-06-11 06:31:18

Quote from: Gag Halfrunt on 2009-06-11 01:13:50

A DBT is the most robust method we currently have of determining whether there is or is not a difference in audio performance. Does that mean it will remain the only method? Who knows? Someone may come up with an equally valid method of testing today, tomorrow or in 20 years time that demonstrates superiority over DBTs in key areas.

This is not so far-fetched as it first appears. What if a method of longitudinal testing was found to return more consistent, reliable and repeatable results that DBTs, for example? Would you jump to dismiss such findings as not truth-seeking?

Perhaps I overstated the case (I'm absolutely certain that "absolutely certain" was overstated, though "practically certain" seems pretty right). However, given that what we are concerned with is the audibility of phenomena to a listening subject; and given that we know that all sorts of factors can affect the listening experience, including expectations about the item(s) under test: then some way of testing using a human subject who does not know the identity of the items really would seem inescapable.

There might very well be ways of assuring this ignorance more subtle and more reliable than current ABX testing procedures (I think of the famous power cord test as an example), but this requirement for judgement devoid of expectations seems to be required by the nature of the situation, not by limitations of technology and human ingenuity. I mean, you could, conceivably, do brain scans of subjects listening to 24/96, but then, supposing you found measurable differences in brain function, you'd have to validate any conclusion that these differences related to subjectively important differences in audio experience. And to do that, surely you'd need to have a DBT in the loop somewhere?

And why might all this matter? I think, ultimately, it's a matter of ethics. Resources are limited. Although extravagance can be justified, waste can't. Wanting the best possible audio experience is extravagance, but we live with that. But applying resources to, say, high-priced interconnects, rather than to a nice chair or good Scotch, both of which would have a more soundly based effect on the audio experience, is Waste, and Waste is Bad.

lecture: Critical listening/evaluation - a path to the future of quali

Reply #135 – 2009-06-11 10:45:06

Quote from: Kees de Visser on 2009-06-11 06:30:44

Quote from: Arnold B. Krueger on 2009-06-11 02:44:23
I've done this many times, and if anything ABX equals or betters other testing methods.
What are the (DBT) alternatives to ABX ?

The leading alternative to ABX is ABC/hr. ABC/hr is not exactly an alternative to ABX because it is a test for degree of impairment, not a pure test for audible differences. However, if two alternatives have statisticaly different degrees of impairment, that is very strong evidence that they do sound different.

The ABX test that you see used in articles about things like articulation of speech differs from our ABX in that there is only one sequence of AXB per trial, instead as many sequences of AX and BX as the listener desires. Again, it is testing for something a little different than what we are testing for.

The concept of ABX can be extended to more different alternatives being compared in one trial, giving you something like a ABCX or ABCDX test. The people who do taste tests on food have a number of other variations. Krab has written about them here in the past, I think.

There's a testing style that you may have encountered in a audiologist's office where you push a button whenever you start hearing a tone that is increasing in intensity, and when you push the button the tone starts getting softer. This can be extended to any audible effect that you can smoothly increase and decrease at will. This kind of test was used in the Dolby labs jitter tests that were written up in the JAES some years back. This test is not completely foolproof, but its pretty good if your subjects are honest and don't try to spoof it.

Quote

Are there significant differences in the results between various protocols ?

There are DBT protocols that are possible, but do put the listener at a big disadvantage. We devised several of them before we hit on ABX. For example, we had a protocol that was based on just listening to a sequence of Xs, and requiring the listener to state whether each was A or B. That one put much to much burden on the listener's memory of what A and B actually sounded like. We tried it once and the next step was ABX.

Any protocol that involves manual switching is at best suspect because the switching delays are usually so long. Since this is how most audiophiles do their listening tests, besides all of the false positives due to sighted listening, no level-matching, and no time-synch, the remaining responses are probably false negatives due to long switching delays.

If they are executed properly, and can possibly produce comparable results, then DBT test results strongly tend to converge. The strongest determiners in the sensitivity of good listening test results are listener training and choice of music. The results of good listening tests also converge with results that can be developed by other means, such as analysis of the structure of the human ear.

lecture: Critical listening/evaluation - a path to the future of quali

Reply #136 – 2009-06-11 11:13:01

Quote from: GeSomeone on 2009-06-09 19:08:29

Hmm, when he played back the "error noise" on the same level as the original, that would explain the reason for loud levels. The noise would be not very impressive unless turned up.

I don't think it was "turned up" to be louder than it would have been. He literally had the original, and the mp3, on two adjacent tracks in Pro Tools. He inverted the original, and played it at the same time as the mp3. There was no level fiddling (except at one part, where I think he inadvertently knocked one of the faders, and you could hear the music itself didn't quite cancel - he immediately heard and corrected this).

Cheers,
David.

lecture: Critical listening/evaluation - a path to the future of quali

Reply #137 – 2009-06-11 11:47:03

Thanks Andy.
Wouldn't it be a good idea for important tests to use a "placebo control group" for which stimulus B=A ? If the test is well setup, the result should be close to 50%, right ?

A few days ago there was an interesting post by Bob Katz on a mailing list about audibility of jitter (internal/external clocking of DACs). He used a somewhat different approach, but still DBT. To me it looks like a correct test, although the conclusions are perhaps a bit optimistic according to HA criteria. Any comments ? NB: the topic was about audible differences between different pressings of data-identical cd's.

Quote

>Always willing to have the 'ears' prove science wrong.
>We synced two Master Disk CD players into a common DtoA so we could A/B the playing CDs.
>No one was able to consistently pick the SHM-CD while blind swapping disc between players.

That's one of the toughest tests in the world to pass blind. And I feel you've put an obstacle in by the very nature of your test. The prime problem is the listening method you used for the blind test. What you chose was the "serial listening" method, which totally screws up because it assumes that music is identical at the switch point! You're not listening to pink noise :-). Music is fluid, it's constantly changing. The ear can get completely confused, even by small changes in dynamics at the switch point. For example, if you cue into the drumbeat, we "expect" that the drum beat will sound the same, but if it's a human drummer, each drum beat is minutely different, and so, you get fooled. If you cue into the ambience, and the drummer hits the drum just slightly harder at the switch point, the ambience will go up! It's nearly impossible to make anything but random conclusions serially!*

The most effective method I've found for conducting blind testing of extremely subtle differences is to:

a---find a perfect 30 second passage. This passage should be "reasonably consistent" and not have any momentary dynamic surprises in it to confuse the listener

b) familiarize the listeners with the loudspeakers, room, and especially the test material

c) TRAIN the listeners on the test material

d) Now, decide amongst yourselves which of A or B you THINK you prefer. That's what you are now trying to prove: A is better than B, or vice versa. Now, let's assume for the moment that you decided A is better, sighted. Prior to the blind test, you're comfortable and as convinced as you can be that you can tell A from B and that you prefer A.

e) For each trial, play 30 seconds of A. Then play 30 seconds of A again. Now play 30 seconds of X (the unknown). Then 30 seconds of X again. Then finish with 30 seconds of A. That's 5 repetitions, 30 seconds each. For each trial. Note how the order of the repetitions helps the listener to feel comfortable about his decision. The fifth repetition is a further assurance. All this is geared to helping the listener to make and feel comfortable with his decision. This works, try it!

How do I know it works? Well, I recently conducted a jitter listening test for 8 listeners (5 in one session, 3 in another), doing exactly the above testing method. 10 trials times 8 subjects equals 80 trials. The score was 60% (48 trials out of 80), which is above chance, but an amazing result for such an extremely difficult and (you would admit) subtle test. I personally scored 7 out of 10 and feel I would have gotten better if I could ust maintain my concentration! The test requires continuous concentration and it is no fun!!!!

What was the test: Listening to the Pro Tools HD 192 DAC with a stereo recording, and comparing Pro Tools on internal clock versus the Grimm CC 1 clock. The Grimm clock won and all listeners felt it improved the depth and stereo image. Plus, with a p of 2.5, the assurance that we got a valid result is pretty high. p of 2.5 means there's only a 2.5% chance that the listeners got these reults "out of luck" or by chance.

That said, I've never been able to pass a blind test on data-identical CDs which I feel (sighted) sound different. Jitter tests on "data-identical CDs" are the most difficult blind test to conduct (see the difficulty of trying to synchronize two players and play just a 30 second pasage!!!!!) and so the controversy lingers on.

* serial is an "ok" method for us mastering engineers who are working on eqs and such, though I try to average it out over time and if possible, rewind and play the passage again with the different EQ, because we can also easily get fooled by short term changes in the music getting confused with changes in EQ!

Bob Katz

[/size]

lecture: Critical listening/evaluation - a path to the future of quali

Reply #138 – 2009-06-11 11:56:58

Quote from: Stereoeditor on 2009-06-09 19:52:20

Quote from: rpp3po on 2009-06-09 19:24:26
Quote from: Stereoeditor on 2009-06-09 17:57:13
It is not me who is saying that designing such a test is complex; instead I have been reporting what those involved in doing so - "AES Fellows, some university professors, some well-known recording and mastering engineers, and even JJ" - are finding during the planning of such a test. If you are offering the opinion that those people are "clueless," you need to rethink that position.

I don't think that you are completely honest here. You know the basic procedure: Compare a plain 88.2 kHz record to a time synchronized 88.2 kHz -> 44.1 kHz -> 88 kHz record. If that shows, that no difference can be heard under double blind conditions, you already have very solid results and it doesn't get any more complicated than that.

But you haven't eliminated all variables and the result may still be a false negative, thus not transportable. All you can conclude is with _that_ recording_ with _that_ hardware and _that_ testing protocol, no difference could be identified to a given degree of statistical certainty.

Presumably you'd use a combination of recording and hardware that your expert listeners say demonstrates the huge advantage hi-res has over CD. So, these are irrelevant criticisms (given that you've previously said you trust your expert listeners).

So what you're saying is that "all" you can conclude is that you may prove that "with _that_ testing protocol, no difference could be identified to a given degree of statistical certainty."

I can't believe the protagonists in this debate have the energy to run the obvious subsequent arguments again.

Cheers,
David.

lecture: Critical listening/evaluation - a path to the future of quali

Reply #139 – 2009-06-11 12:40:14

Quote from: Kees de Visser on 2009-06-11 11:47:03

Wouldn't it be a good idea for important tests to use a "placebo control group" for which stimulus B=A ? If the test is well setup, the result should be close to 50%, right ?

Yes. I have definately done tests where I could get very high scores even when A was nominally the same as B. In fact I can do it with the QSC ABX box any time I want to, with nothing attached to it at all. The relay noise is not random. To be effective it has to be acoustically separated from the listeners.

Quote

The most effective method I've found for conducting blind testing of extremely subtle differences is to:

a---find a perfect 30 second passage. This passage should be "reasonably consistent" and not have any momentary dynamic surprises in it to confuse the listener

b) familiarize the listeners with the loudspeakers, room, and especially the test material

c) TRAIN the listeners on the test material

This begs the question of how to teach people to hear a difference that they may not be able to hear. My preferred methodolgy is to come up with a means for varying the strength of the artifact being tested, and start people out with super strong and work down to the actual test of interest.

Quote

d) Now, decide amongst yourselves which of A or B you THINK you prefer. That's what you are now trying to prove: A is better than B, or vice versa. Now, let's assume for the moment that you decided A is better, sighted. Prior to the blind test, you're comfortable and as convinced as you can be that you can tell A from B and that you prefer A.

This violates a basic rule of testing - communiation among the listeners violates the basic assumptions of most statistical analysis techniques. One of the symptoms of this problem is scores that are worse than random guessing.

Quote

e) For each trial, play 30 seconds of A. Then play 30 seconds of A again. Now play 30 seconds of X (the unknown). Then 30 seconds of X again. Then finish with 30 seconds of A. That's 5 repetitions, 30 seconds each. For each trial. Note how the order of the repetitions helps the listener to feel comfortable about his decision. The fifth repetition is a further assurance. All this is geared to helping the listener to make and feel comfortable with his decision. This works, try it!

I can show that in general samples this long are far less sensitive for small differences than samples on the order of just a few seconds.

I suspect that the switching was not quick enough and not under the control of the listeners.

Quote

How do I know it works? Well, I recently conducted a jitter listening test for 8 listeners (5 in one session, 3 in another), doing exactly the above testing method. 10 trials times 8 subjects equals 80 trials. The score was 60% (48 trials out of 80), which is above chance, but an amazing result for such an extremely difficult and (you would admit) subtle test. I personally scored 7 out of 10 and feel I would have gotten better if I could ust maintain my concentration! The test requires continuous concentration and it is no fun!!!!

What was the test: Listening to the Pro Tools HD 192 DAC with a stereo recording, and comparing Pro Tools on internal clock versus the Grimm CC 1 clock. The Grimm clock won and all listeners felt it improved the depth and stereo image. Plus, with a p of 2.5, the assurance that we got a valid result is pretty high. p of 2.5 means there's only a 2.5% chance that the listeners got these reults "out of luck" or by chance.

I've got my doubts on the grounds of questionable double blindness.

A single blind test is just a defective DBT.

General comment - these guys seem to be making the classic mistake of going for a "world class" restult without working their way up with some easier tests. This is just like trying to do a 24/96 test without working your way up from 12/32. IOW, they are trying to do the audio equivalent of breaking the 4 minute mile in their first middle school track meet.

lecture: Critical listening/evaluation - a path to the future of quali

Reply #140 – 2009-06-11 13:42:18

An mp3 of the lecture (without pre-release musical extracts) is now available on-line:

www . mediafire . com/?8exbl1v60gcw2tz
(remove the spaces from the URL to make it work)

Poor (dictaphone!) quality - though it's still easier to hear what George is saying on this recording than it was where I was sat.

The criticism of mp3 starts about 12 minutes in (though the music examples are missing - replaced by short beeps)
The post-lecture discussion of the Myer and Moran paper starts at 1:16:40

There were many other interesting parts, but I don't have a spare 2 hours to listen to the whole thing and index them.

Cheers,
David.

lecture: Critical listening/evaluation - a path to the future of quali

Reply #141 – 2009-06-11 14:05:40

Quote from: 2Bdecided on 2009-06-11 11:56:58

Presumably you'd use a combination of recording and hardware that your expert listeners say demonstrates the huge advantage hi-res has over CD. So, these are irrelevant criticisms (given that you've previously said you trust your expert listeners).

I think that Atkinson is implying that using equipment that is all Stereophile Class "A" would not be good enough.

I see his concern there because his Class A list includes some real question marks like all that vacuum tube stuff, especially the SETs.

He's also implying that his staff would not be good enough.

I'm seeing his concern there because he's recently had to overrule one of his lead columnists - Fremer. If I've got their ages pegged right I'd have more than a few doubts of my own.

Quote

So what you're saying is that "all" you can conclude is that you may prove that "with _that_ testing protocol, no difference could be identified to a given degree of statistical certainty."

Of course we don't know exactly what *that* protocol is or would be. Based on past performance, Atkinson's choice of test protocol would be non-standard and have a few question marks of its own.

Quote

I can't believe the protagonists in this debate have the energy to run the obvious subsequent arguments again.

Different nuances? ;-)

lecture: Critical listening/evaluation - a path to the future of quali

Reply #142 – 2009-06-11 16:27:15

Quote from: krabapple on 2009-06-10 18:22:32

Quote from: Stereoeditor on 2009-06-09 23:39:14
I described what happened is precise terms. I refer others to that description.

I'd be happy to refer readers of this thread to precise terms too... how about a link? Or just restate them.

Sorry for the tardy response, Mr. Sullivan. Sometimes increasing the influence and success of what Arny Krueger has referred to as the "Evil High-End Audio Establishment" and what more level-headed fellows call "Stereophile magazine" has to take precedence over posting to Internet forums. :-)

I have to say that I don't comprehend what you are trying to achieve in this thread by posting and reposting the same questions, often offering your own incorrect paraphrase of what I said or your unsupported conjectures concerning my motivations and conclusions. I have described the presentations I gave in Colorado in considerable detail in this thread - I refer you to postings numbers #37, #49, #55, #65, #69, #85, #93, and #112, which I know you read because you responded to many of them.

Quote

Where, IIRC you indicate it was groups of 20, and a total of 200, but no clear breakdown of who said what. Let's use this thread to get all the specifics in one place, shall we?

Reread the numbered posts of mine, in this very thread, that I listed above, Mr. Sullivan, and you will find a full description of what the events were and what happened at each one.

Quote

Because it's you who seems to think this 'dem' -- and particularly the audience response -- proved something about the sound of the samples you presented.

Really? I don't see any claim I made concerning "proof" of anything, Mr. Sullivan. All I did was to describe the reactions of listeners to the demonstration. AS to what I "seemed to think," I wrote back in April on HA (on April 26, 2009, 10:40am, in the "Why We Need Audiophiles..." thread) in response to the questions you were putting to me at that time what my motives were: I wanted the listeners to audition hi-rez PCM data under optimal circumstances and I was also interested in exposing listeners to various data-reduced formats so that they could decide for _themselves_ whether hi-rez formats are necessary, whether CD is good enough for serious listening, and whether the lossy versions are sonically compromised or not. There was no scoring, no tallying of results, and no other detail other than what I have already offered you.

If you feel that to be insufficient, my apologies. All I can suggest is that if and when I repeat the event somewhere closer to where you live, you attend and ask questions in person at that time (the same invitation I offered to all HA subscribers back in April).

John Atkinson
Editor, Stereophile

lecture: Critical listening/evaluation - a path to the future of quali

Reply #143 – 2009-06-11 16:29:41

Quote from: 2Bdecided on 2009-06-11 13:42:18

An mp3 of the lecture (without pre-release musical extracts) is now available on-line:

http://www.mediafire.com/?0omijnwtzty

Thank you for posting the link, David. I shall listen to George's comments with interest.

John Atkinson
Editor, Stereophile

lecture: Critical listening/evaluation - a path to the future of quali

Reply #144 – 2009-06-11 17:05:48

Quote from: Arnold B. Krueger on 2009-06-11 12:40:14

Quote
d) Now, decide amongst yourselves which of A or B you THINK you prefer. That's what you are now trying to prove: A is better than B, or vice versa. Now, let's assume for the moment that you decided A is better, sighted. Prior to the blind test, you're comfortable and as convinced as you can be that you can tell A from B and that you prefer A.

This violates a basic rule of testing - communication among the listeners violates the basic assumptions of most statistical analysis techniques. One of the symptoms of this problem is scores that are worse than random guessing.

I don't think Katz is saying anyone should communicate with anyone. He's not a novice at conducting blind tests. Katz is just saying that a subject should believe that A and B really do sound different, before proceeding to try to identify them 'blind'.

lecture: Critical listening/evaluation - a path to the future of quali

Reply #145 – 2009-06-11 17:10:31

Quote from: krabapple on 2009-06-11 17:05:48

Quote from: Arnold B. Krueger on 2009-06-11 12:40:14
Quote
d) Now, decide amongst yourselves which of A or B you THINK you prefer. That's what you are now trying to prove: A is better than B, or vice versa. Now, let's assume for the moment that you decided A is better, sighted. Prior to the blind test, you're comfortable and as convinced as you can be that you can tell A from B and that you prefer A.

This violates a basic rule of testing - communication among the listeners violates the basic assumptions of most statistical analysis techniques. One of the symptoms of this problem is scores that are worse than random guessing.

I don't think Katz is saying anyone should communicate with anyone. He's not a novice at conducting blind tests. Katz is just saying that a subject should believe that A and B really do sound different, before proceeding to try to identify them 'blind'.

Which, btw, is really good advice.

lecture: Critical listening/evaluation - a path to the future of quali

Reply #146 – 2009-06-11 17:42:57

Mr. Atkinson, your claims about this 'dem' have been spread over two threads. You are the one who touted it as a demonstration of something. It has been a chore getting a full description of your 'dem' from you, and even now aspects of it remain unclear. So please don't get cranky when I suggest responses be kept pertinent and comprehensively informative to the questions asked. All you have done in the most recent reply to me is refer me to previous posts in this thread, all of which raised methodological questions of their own, which were not fully answered. The purpose of the 'progressive' degradation demo , and whether it was actually perceived as such, remains unclear; the reason given for using an obsolete codec were frankly, lame; the role of expectation bias in the 'results' was essentially unaddressed, and the means by which subject response was gauged, seems spotty as described.

As for not trying to 'prove' anything, the first mention of your 'dem' on this thread ((post #37)) was as evidence that low bitrate lossy affects perceived 'rhythm' and 'dynamics' of a track. If not to support a point, why would you mention it at all?

Quote

AS to what I "seemed to think," I wrote back in April on HA (on April 26, 2009, 10:40am, in the "Why We Need Audiophiles..." thread) in response to the questions you were putting to me at that time what my motives were: I wanted the listeners to audition hi-rez PCM data under optimal circumstances

and I was also interested in exposing listeners to various data-reduced formats so that they could decide for _themselves_ whether hi-rez formats are necessary, whether CD is good enough for serious listening, and whether the lossy versions are sonically compromised or not. There was no scoring, no tallying of results, and no other detail other than what I have already offered you.
If you feel that to be insufficient, my apologies. All I can suggest is that if and when I repeat the event somewhere closer to where you live, you attend and ask questions in person at that time (the same invitation I offered to all HA subscribers back in April).

John Atkinson
Editor, Stereophile

It cannot possibly have been 'optimal' listening conditions, as described, and yes, we already know what you claimed to be demonstrating.

Let me try to phrase this in a way that you cannot possibly wiggle away from on specious semantic grounds (a mighty challenge, btw):

You still seem to think that your dem offered a valid means to compare whether hi-rez formats were better than CD, and whether lossy versions are sonically compromised. However, you do not claim anything was proved by it.

Correct?

lecture: Critical listening/evaluation - a path to the future of quali

Reply #147 – 2009-06-11 17:44:11

Quote from: Axon on 2009-06-11 17:10:31

Quote from: krabapple on 2009-06-11 17:05:48
Quote from: Arnold B. Krueger on 2009-06-11 12:40:14
Quote
d) Now, decide amongst yourselves which of A or B you THINK you prefer. That's what you are now trying to prove: A is better than B, or vice versa. Now, let's assume for the moment that you decided A is better, sighted. Prior to the blind test, you're comfortable and as convinced as you can be that you can tell A from B and that you prefer A.

This violates a basic rule of testing - communication among the listeners violates the basic assumptions of most statistical analysis techniques. One of the symptoms of this problem is scores that are worse than random guessing.

I don't think Katz is saying anyone should communicate with anyone. He's not a novice at conducting blind tests. Katz is just saying that a subject should believe that A and B really do sound different, before proceeding to try to identify them 'blind'.
Which, btw, is really good advice.

I'd say it required, otherwise why bother with the blind test?

lecture: Critical listening/evaluation - a path to the future of quali

Reply #148 – 2009-06-11 18:36:02

Quote from: krabapple on 2009-06-11 17:05:48

Quote from: Arnold B. Krueger on 2009-06-11 12:40:14
Quote
d) Now, decide amongst yourselves which of A or B you THINK you prefer. That's what you are now trying to prove: A is better than B, or vice versa. Now, let's assume for the moment that you decided A is better, sighted. Prior to the blind test, you're comfortable and as convinced as you can be that you can tell A from B and that you prefer A.

This violates a basic rule of testing - communication among the listeners violates the basic assumptions of most statistical analysis techniques. One of the symptoms of this problem is scores that are worse than random guessing.

I don't think Katz is saying anyone should communicate with anyone. He's not a novice at conducting blind tests. Katz is just saying that a subject should believe that A and B really do sound different, before proceeding to try to identify them 'blind'.

I see your point. There's a possibility that he means this happens before the test and not during the test. If it happens before the test, then no harm, no foul. We did something like this before all of the amplifier and CD player tests we did for Stereo Review.

lecture: Critical listening/evaluation - a path to the future of quali

Reply #149 – 2009-06-11 18:45:29

Quote from: krabapple on 2009-06-11 17:44:11

Quote

Katz is just saying that a subject should believe that A and B really do sound different, before proceeding to try to identify them 'blind'.

Which, btw, is really good advice.

I'd say it required, otherwise why bother with the blind test?

So you don't think that people who are agnostic about the outcome of the test should could or would do the test?

I've definately seen cases where people who did not in the slightest believe that the difference existed produced positive results. In some cases they didn't believe that they reliably heard anything even after the test!

IME at the threshold of audibility, it is not clear that a diffference exists. However, the so-called guesses are more reliable than chance.

There can be a professional approach to listening where people simply do their best and let the results be what they will be.

On balance, believing that the difference exist might make people more enthusiastic, and therefore find the test easier.

But again, belief that the difference exists can and definately has lead to overconfidence and poor results. Sometimes a test with a questionable outcome leads to a sense of determination that gives better results the next time around.

Notice