HydrogenAudio

Hydrogenaudio Forum => Listening Tests => Topic started by: solive on 2010-07-10 11:08:20

Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: solive on 2010-07-10 11:08:20
Thomas Edison was probably the greatest stereo salesman that ever lived. He believed that "listeners will hear what you tell them to hear", and he was pretty successful convincing thousands of listeners that his 1910 Edison Diamond Disc Phonograph reproduced recordings that sounded identical to a live performance.  His secret weapon was an elaborate live-versus-recorded demonstration that managed to convince  people that his phonograph sounded a lot better than it really was.

Several times over the past 10 years, I have been asked by live-versus-recorded apologists why I don't do these types of the tests since they claim they are the only true valid measures of loudspeaker fidelity or accuracy.  That is what prompted me to write about why I believe live-versus-recorded listening tests don't  work, in this month's blog article (http://seanolive.blogspot.com/2010/07/why-live-versus-recorded-listening.html)

Cheers
Sean Olive
Audio Musings (http://seanolive.blogspot.com)
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: kdo on 2010-07-10 16:53:35
I have a couple of questions.

Quote
Live and Recorded Performances Must Be Identical

For live-versus-recorded tests to be valid, the live and recorded performance should be identical, having the same notes, intonation, tempo, dynamics, loudness, balance between instruments, and the same location and sense of space of the instruments. Otherwise, there are extraneous cues that allow listeners to readily identify the live and recorded performances. Midi-controlled instruments (e.g. player pianos) are but one example of how this problem could be resolved.


Would it be possible to design a valid test with the opposite approach? That is, instead of trying to reproduce a single performance identically, could we use various different performances and recordings every time?

I'm thinking of such scenario: suppose we need a test with 20 trials. Take 20 different singers with different voices, make 20 recordings. And then let some 20 more singers (again, all different voices) perform during the test, a sort of A/B test.
This way, I'm thinking, the singers don't even have to perform the same piece of music. It could be different music material every trial/performance.

Would it be possible to gather any statistically significant result from such a test?



And my 2nd question: can we consider our everyday practice of enjoying recorded music as the "ultimate" proof that such recordings are indeed capable of creating an illusion of live performance?
After several decades of such practical experience all across the globe, perhaps we already have enough evidence to draw some statistically valid conclusions? or still not?
I mean, okay, measuring the accuracy of a particular loudspeaker is one thing, but can't we say anything definitive of the technology in general?
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: analog scott on 2010-07-10 18:41:57
Thomas Edison was probably the greatest stereo salesman that ever lived. He believed that "listeners will hear what you tell them to hear", and he was pretty successful convincing thousands of listeners that his 1910 Edison Diamond Disc Phonograph reproduced recordings that sounded identical to a live performance.  His secret weapon was an elaborate live-versus-recorded demonstration that managed to convince  people that his phonograph sounded a lot better than it really was.

Several times over the past 10 years, I have been asked by live-versus-recorded apologists why I don't do these types of the tests since they claim they are the only true valid measures of loudspeaker fidelity or accuracy.  That is what prompted me to write about why I believe live-versus-recorded listening tests don't  work, in this month's blog article (http://seanolive.blogspot.com/2010/07/why-live-versus-recorded-listening.html)

Cheers
Sean Olive
Audio Musings (http://seanolive.blogspot.com)


Very nice article Sean. I think you make a valid point that one can't use live music as a means of judging loudspeakers per se. But it seems to me that one can use live music as a reference to judge a recording and playback system in it's totallity.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: solive on 2010-07-11 07:46:45
Thomas Edison was probably the greatest stereo salesman that ever lived. He believed that "listeners will hear what you tell them to hear", and he was pretty successful convincing thousands of listeners that his 1910 Edison Diamond Disc Phonograph reproduced recordings that sounded identical to a live performance.  His secret weapon was an elaborate live-versus-recorded demonstration that managed to convince  people that his phonograph sounded a lot better than it really was.

Several times over the past 10 years, I have been asked by live-versus-recorded apologists why I don't do these types of the tests since they claim they are the only true valid measures of loudspeaker fidelity or accuracy.  That is what prompted me to write about why I believe live-versus-recorded listening tests don't  work, in this month's blog article (http://seanolive.blogspot.com/2010/07/why-live-versus-recorded-listening.html)

Cheers
Sean Olive
Audio Musings (http://seanolive.blogspot.com)


Very nice article Sean. I think you make a valid point that one can't use live music as a means of judging loudspeakers per se. But it seems to me that one can use live music as a reference to judge a recording and playback system in it's totallity.


Thanks. Yes, you can certainly judge the accuracy of the entire recording/playback chain to a live performance but the test makes it difficult to know which component is responsible for the artifacts: the recording, the loudspeakers, or both.

In my view, the closest you can come to recreating a live performance experience is a binaural recording/room scan with a head-tracking headphone-based auditory display. Stereo just doesn't cut it, but multichannel gets a lot closer if the recording is done well.  And there are still challenges controlling all the nuisance variables.

What about recordings that are not intended to sound like a live performance?  That would include about 95% of all recording made today. How do we judge the accuracy of those? Of course, you know the answer to that question: you define the performance of the loudspeakers and their interaction with the room acoustics where the art (the recording)  was created, and  simply replicate the playback system in the consumer space. Science in the service of art -- Not a  popular concept among the live-versus-recording apologists I've met.




Cheers
Sean Olive
Audio Musings (http://seanolive.blogspot.com)
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: analog scott on 2010-07-11 15:50:58
What about recordings that are not intended to sound like a live performance?  That would include about 95% of all recording made today. How do we judge the accuracy of those?



I can't say that it is an issue to me. But if one is going to ask the question then one has to define the reference. What is the reference for a studio recording if one is seeking an "accurate" reproduction of it? Unlike a recording of an original live performance a studio recording in and of itself has no intrinsic original sound.


Of course, you know the answer to that question: you define the performance of the loudspeakers and their interaction with the room acoustics where the art (the recording)  was created, and  simply replicate the playback system in the consumer space. Science in the service of art -- Not a  popular concept among the live-versus-recording apologists I've met.




Cheers
Sean Olive
Audio Musings (http://seanolive.blogspot.com)



Yep and no thanks.

But this points to a bigger question in audio. Why seek accuracy? Why use live music or the sound originally heard in the control room as a reference? As an audiophile I think in this quest for accuracy the forest has been lost for the trees. For me live music is really a benchmark more than a literal rigid "reference." The most beautiful sounding music I have heard has come from live acoustic music be it a symphony orchestra at Disney Hall or that magical concert I went to in a church in Soweto with Ladysmith Black Mambazo (maybe the most beautiful thing I have ever heard) or any number of other magical moments I have experienced with live acoustic jazz or folk etc. These experiecneces for me have been the pinnicles of aesthetic beauty in sound. So with recording and playback of live music it consistantly seems to work that the closer you get to the sounds one hears with live music (with all the qualifiers) the better the playback tends to be. Heck if some day I hear something on a hifi that simply sounds better than anything I have ever heard live...that becomes the new benchmark for me. If live music didn't set the benchmark there would be no point in using it as a reference and no point in trying to accurately recreate those sounds.

Now with studio recordings....IMO the control rooms don't set such a lofty benchmark. IOW IMO one can do much better than "accurate" with those recordings. So I see no point in seeking accuracy with such recordings.

Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: kdo on 2010-07-11 16:25:47
Would it be possible to design a valid test with the opposite approach? That is, instead of trying to reproduce a single performance identically, could we use various different performances and recordings every time?


A quick follow-up on my first question.

I did some googling and found that I was actually thinking of a kind of test called "Randomized controlled trial" (http://en.wikipedia.org/wiki/Randomized_controlled_trial) (RCT).
The "explanatory" type of RCT with "parallel-group" design and "allocation concealment", in particular. The goal of such RCT is to test the 'efficacy' of a treatment or medicine given to a group of patients.

So, here goes my analogy:
* 'efficacy' = ability to create illusion of a live performance.
* Participants (patients) are the singers/performers.
* Half of the participants are allocated to receive the 'treatment' (record and playback via loudspeakers),
and the other half is allocated to receive no 'treatment' (perform live).

The problem, I guess, is that it might be not quite properly triple-blind, since our 'participants' (singers) know which 'treatment' they are receiving, obviously.

But maybe this bias could be eliminated, too: let's make all singers perform live, but let some of them ("control group") perform before dummy-listeners in one room, and the other group perform before the actual audience in another room. Or something like that. Mix them and confuse them.
Then the singers wouldn't know which of their performances would actually count, and the test becomes fully triple-blind, I hope. 

So, on the surface it seems that it should be possible to eliminate the need for identical stimuli in a live-vs-recorded test - by using RCT method.

Any thoughts? Anybody? 

Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: googlebot on 2010-07-11 16:53:45
Now with studio recordings....IMO the control rooms don't set such a lofty benchmark. IOW IMO one can do much better than "accurate" with those recordings. So I see no point in seeking accuracy with such recordings.


Your problem is just a narrow understanding of the word "accurate". No one wants an accurate reproduction of a low resonance studio room on a final record (and no one has claimed that, either). Accurate studio equipment in this context just means it does not add additional artifacts by itself. A professional skilled in the art will process it with the tools of his choice and create something not resembling anything close to a studio chamber. So your whole excursion is kind of pointless.

Even most live recordings aren't mixed down with the goal of accurate sonic reproduction of a singular listening position anymore. Since records are generally played back in rooms with walls and a finite number of speakers, that's impossible anyway. So people searching that could be disappointed.

While going to a concert in a good hall can be an exceptional experience, I have experienced the most detail in classical music from great records. If you want every detail, IMHO, the best records have surpassed live performances (which doesn't necessarily make it the better experience) for quite some time now. But experience is hard to quantify anyway, so why don't enjoy the best of both worlds?
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: analog scott on 2010-07-11 17:44:18
Now with studio recordings....IMO the control rooms don't set such a lofty benchmark. IOW IMO one can do much better than "accurate" with those recordings. So I see no point in seeking accuracy with such recordings.


Your problem is just a narrow understanding of the word "accurate".



1. I don't have a problem, I'm doin just fine thank you.
2. A "narrow understanding" of the word accurate? Sorry not interested in semantical arguments. I know what "accurate" means.

You might want to take this argument up with Sean as well since he was the one who said "What about recordings that are not intended to sound like a live performance? That would include about 95% of all recording made today. How do we judge the accuracy of those? Of course, you know the answer to that question: you define the performance of the loudspeakers and their interaction with the room acoustics where the art (the recording) was created, and simply replicate the playback system in the consumer space." I guess Sean suffers from the same "narrow understanding" of the word "accurate." 

No one wants an accurate reproduction of a low resonance studio room on a final record (and no one has claimed that, either).



Well actually people have claimed that they do want that. I have seen it with my own eyes on other forums. Maybe not such a good idea to speak for other unnnamed people.


Accurate studio equipment in this context just means it does not add additional artifacts by itself. A professional skilled in the art will process it with the tools of his choice and create something not resembling anything close to a studio chamber. So your whole excursion is kind of pointless.


Who said anything about "accurate studio equipment?" Again you might want to take this argument up with Sean as well. Sean's words "What about recordings that are not intended to sound like a live performance? That would include about 95% of all recording made today. How do we judge the accuracy of those? Of course, you know the answer to that question: you define the performance of the loudspeakers and their interaction with the room acoustics where the art (the recording) was created, and simply replicate the playback system in the consumer space." In case it isn't clear the "loudspeakers and their interatcion with the room acoustics where the art (the recording) was created" is the control room.

By the way, it's not *my* excursion. I think I made my position quite clear when it comes to seeking "accuracy" with studio recordings. I believe I said "no thanks." all I did was agree with Sean that if one wants an aural reference for studio recordings in the persuit of accuracy one is ultimately left with what was heard in the control room. I suppose one can take it a step further and claim what was heard in the room when final approval was given for the commerical release is the final reference but....





While going to a concert in a good hall can be an exceptional experience, I have experienced the most detail in classical music from great records. If you want every detail, IMHO, the best records have surpassed live performances (which doesn't necessarily make it the better experience) for quite some time now.




I want detail to be detail. I'm not keen on exaggerated detail. I have seen forensic photographs that show more detail than one could ever see with the naked eye. But they are ugly pictures. Heck you can hear more detail just by jacking up the treble and compressing the signal. You will certainly get more detail from an acoustic instrument with a close miked recording in an anechoic chamber. None of that nasty reverb to obscure the sound of the instrument itself.  But for me, much like a forensic photograph, exaggerated detail beyond what we hear in live music tends towards the ugly. Not my thing.

But experience is hard to quantify anyway, so why don't enjoy the best of both worlds?




That is what I have been doing all along.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: solive on 2010-07-11 20:16:09


Quote
I can't say that it is an issue to me. But if one is going to ask the question then one has to define the reference. What is the reference for a studio recording if one is seeking an "accurate" reproduction of it? Unlike a recording of an original live performance a studio recording in and of itself has no intrinsic original sound.


The reference is what the artist heard in the studio plain and simple. That is the "live performance"


Quote
Yep and no thanks.

But this points to a bigger question in audio. Why seek accuracy? Why use live music or the sound originally heard in the control room as a reference? As an audiophile I think in this quest for accuracy the forest has been lost for the trees. For me live music is really a benchmark more than a literal rigid "reference." The most beautiful sounding music I have heard has come from live acoustic music be it a symphony orchestra at Disney Hall or that magical concert I went to in a church in Soweto with Ladysmith Black Mambazo (maybe the most beautiful thing I have ever heard) or any number of other magical moments I have experienced with live acoustic jazz or folk etc. These experiecneces for me have been the pinnicles of aesthetic beauty in sound. So with recording and playback of live music it consistantly seems to work that the closer you get to the sounds one hears with live music (with all the qualifiers) the better the playback tends to be. Heck if some day I hear something on a hifi that simply sounds better than anything I have ever heard live...that becomes the new benchmark for me. If live music didn't set the benchmark there would be no point in using it as a reference and no point in trying to accurately recreate those sounds.

Now with studio recordings....IMO the control rooms don't set such a lofty benchmark. IOW IMO one can do much better than "accurate" with those recordings. So I see no point in seeking accuracy with such recordings.


Wow, that's quite an admission that you don't care about accuracy in sound reproduction.  I would put that in your signature "Analog Scott - "I don't care about accuracy" since that would save a lot of people the hassle and time arguing with you -

But you raise a valid point. Certainly on the recording side, most engineers/producers are not seeking to accurately capture/reproduce the "live performance"  which makes the whole live-vs-recorded method rather moot.


Cheers
Sean Olive
Audio Musings (http://seanolive.blogspot.com)
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: greynol on 2010-07-11 20:28:28
These experiecneces for me have been the pinnicles of aesthetic beauty in sound.

It's unfortunate that these anecdotal experiences are completely subjective and were likely tainted by things that had nothing to do with actual sound.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: analog scott on 2010-07-11 20:33:17


Quote
I can't say that it is an issue to me. But if one is going to ask the question then one has to define the reference. What is the reference for a studio recording if one is seeking an "accurate" reproduction of it? Unlike a recording of an original live performance a studio recording in and of itself has no intrinsic original sound.


The reference is what the artist heard in the studio plain and simple. That is the "live performance"


Quote
Yep and no thanks.

But this points to a bigger question in audio. Why seek accuracy? Why use live music or the sound originally heard in the control room as a reference? As an audiophile I think in this quest for accuracy the forest has been lost for the trees. For me live music is really a benchmark more than a literal rigid "reference." The most beautiful sounding music I have heard has come from live acoustic music be it a symphony orchestra at Disney Hall or that magical concert I went to in a church in Soweto with Ladysmith Black Mambazo (maybe the most beautiful thing I have ever heard) or any number of other magical moments I have experienced with live acoustic jazz or folk etc. These experiecneces for me have been the pinnicles of aesthetic beauty in sound. So with recording and playback of live music it consistantly seems to work that the closer you get to the sounds one hears with live music (with all the qualifiers) the better the playback tends to be. Heck if some day I hear something on a hifi that simply sounds better than anything I have ever heard live...that becomes the new benchmark for me. If live music didn't set the benchmark there would be no point in using it as a reference and no point in trying to accurately recreate those sounds.

Now with studio recordings....IMO the control rooms don't set such a lofty benchmark. IOW IMO one can do much better than "accurate" with those recordings. So I see no point in seeking accuracy with such recordings.


Wow, that's quite an admission that you don't care about accuracy in sound reproduction.  I would put that in your signature "Analog Scott - "I don't care about accuracy" since that would save a lot of people the hassle and time arguing with you -

But you raise a valid point. Certainly on the recording side, most engineers/producers are not seeking to accurately capture/reproduce the "live performance"  which makes the whole live-vs-recorded method rather moot.


Cheers
Sean Olive
Audio Musings (http://seanolive.blogspot.com)



I certainly don't care about accuracy for the sake of accuracy. I care about it in so far as it serves the higher purpose of a better aesthetic experience. Some times it serves that purpose some times it does not. Here is a great example, the Led Zeppelin BBC sessions that were mastered with Jimmy Page overseeing it. He was as much as anybody "the artist." But he is in his sixties and almost certainly is suffering from serious hearing damage. The result...ear bleeding bright compressed sound. Should I want this sound as accurately reproduced as possible because it is accurate?

I don't care about accuracy. I care about aesthetic excellence.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: googlebot on 2010-07-11 20:33:34
The reference is what the artist heard in the studio plain and simple. That is the "live performance"


Seriously, I honor your technical expertise very much. But isn't it rather daily studio routine to capture each instrument as dry and isolated as possible* into a multitrack setup? That should be pretty far from what any of the participating artists has actually heard during the session (often a personal live mix over headphones).

* without necessarily having to record in separated spaces or takes.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: Notat on 2010-07-11 23:06:25
It's unfortunate that these anecdotal experiences are completely subjective and were likely tainted by things that had nothing to do with actual sound.

Nothing unfortunate about having transcendent experiences involving sound. Eventually you've got to come down and be honest that, as there should be, there's much more to making these experiences than the sound.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: greynol on 2010-07-11 23:19:04
Quite unfortunate if someone wants to garner something meaningful and objective in order to advance the topic at hand.  We have TOS #8 for a reason.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: googlebot on 2010-07-12 02:23:13
I also have trouble wrapping my head around the idea that those other things have "tainted" the experience.
...
But one can still to a large degree seperate the sound form the overall event. The cool factor certainly plays big time at a rock concert but it doesn't "fool" me into thinking that it sounds good.
...
etc.


<edit: shortened>
When seeing an expensive amp in action can already cause a huge difference compared to blind testing (many have been there), what do you expect from peer emotions, a light show, and star appeal?
</edit>

PS

Over several threads I have got the impression that you neglect anything, that could only remotely interfere with your own judgment about your own subjective experiences. You try to pull every discussion down to a level of particularity and dismiss any approaches to gain objective knowledge by means of abstraction or protocol. At best any experience as you perceive it, have perceived, or are going to perceive, is left untouched by anything that isn't rooted in your own holistic experience.

But people here are communicating with entirely different goals. They try to find out what aspects of subjective experience can be disregarded (or controlled by protocol or setup) for the proliferation of common, objective knowledge. Of course that can't be as rich and holistic as a singular, subjective experience. But that's not the goal, either. So continued bragging about the exclusivity of ones experience won't find much understanding here.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: greynol on 2010-07-12 02:26:21
Unless he can assure us that his opinion about the relative sound quality of a live event cannot possibly be influenced by other factors than just sound quality then he hasn't a leg to stand on.

In case you aren't catching on, Scott, lofty esoteric comments about sound which are not backed by blind testing is generally frowned upon.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: greynol on 2010-07-12 03:09:54
analogscott's post binned (http://www.hydrogenaudio.org/forums/index.php?showtopic=82144) per TOS #2.  Further discussion that is not directly on topic with the first post resulting in off-topic disagreement will be binned and warnings will be issued.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: aclo on 2010-07-12 04:06:48
While going to a concert in a good hall can be an exceptional experience, I have experienced the most detail in classical music from great records. If you want every detail, IMHO, the best records have surpassed live performances (which doesn't necessarily make it the better experience) for quite some time now. But experience is hard to quantify anyway, so why don't enjoy the best of both worlds?



This pretty much sums up my experience and position, too. Complete agreement, in fact.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: aclo on 2010-07-12 04:10:04
But people here are communicating with entirely different goals. They try to find out what aspects of subjective experience can be disregarded (or controlled by protocol or setup) for the proliferation of common, objective knowledge. Of course that can't be as rich and holistic as a singular, subjective experience. But that's not the goal, either. So continued bragging about the exclusivity of ones experience won't find much understanding here.

And that nicely sums up the point of HA, as well as the argument in favour of the existence of this whole approach.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: analog scott on 2010-07-12 05:37:00
I also have trouble wrapping my head around the idea that those other things have "tainted" the experience.
...
But one can still to a large degree seperate the sound form the overall event. The cool factor certainly plays big time at a rock concert but it doesn't "fool" me into thinking that it sounds good.
...
etc.


<edit: shortened>
When seeing an expensive amp in action can already cause a huge difference compared to blind testing (many have been there), what do you expect from peer emotions, a light show, and star appeal?
</edit>

PS

Over several threads I have got the impression that you neglect anything, that could only remotely interfere with your own judgment about your own subjective experiences. You try to pull every discussion down to a level of particularity and dismiss any approaches to gain objective knowledge by means of abstraction or protocol. At best any experience as you perceive it, have perceived, or are going to perceive, is left untouched by anything that isn't rooted in your own holistic experience.

But people here are communicating with entirely different goals. They try to find out what aspects of subjective experience can be disregarded (or controlled by protocol or setup) for the proliferation of common, objective knowledge. Of course that can't be as rich and holistic as a singular, subjective experience. But that's not the goal, either. So continued bragging about the exclusivity of ones experience won't find much understanding here.



This thread deals with the topic of live music as a reference for judging speakers. In any such discussion we have to deal with a few axioms. 1. Live music is a reference for judging audio recording and playback 2. Accuracy in audio is the goal. My point was to examine the roots of these axioms and consider if they are valid and why. Sean raised a very intereting point when he brought up studio recordings that are productions unto themselves with no live acoustic performance to use as any sort of reference for live v. playback. He asked the following question "What about recordings that are not intended to sound like a live performance? That would include about 95% of all recording made today. How do we judge the accuracy of those?" To me this really segways into questioning the axiom that accuracy should be the goal in audio as well as questioning the validity of the axiom that live music should be the reference.

My discussion of my personal experience with live music was intended to address why anyone would want to use live music as a reference. I was really surprised to see the intense anger and hostility it stirred up not to mention the bizzarre interpretations of my intent. "Bragging about the excusivity of my experience?" No...not what I was doing. I sure didn't see my experiences with live music as a bragging point nor as anything "exclusive." Quite the opposite. I really thought others on this thread would relate to those sorts of experiences and would have had similar experiences. Guess not. so let's move on from there shall we? Lets just change all that to an assertion that personal experience with live music is a likely common motivating factor among those who wish to use live music as a reference for judging audio. OK?

Folks, I was calling into question these two basic axioms. They do underpin the question addressed in the OP. I would think any forum looking for "common objective knowledge" would welcome an examination of axioms. I really had no idea it would be so upsetting.  I wasn't trying to brag about anything.   

P.S. your impressions of me are *your impressions.* I won't go any further on that subject as we have all been warned to stay on topic and your opinion of *me* is off topic.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: greynol on 2010-07-12 05:45:08
BTW, if you wish to speak about how great an artist so-and-so is, we have a forum for it.  Listening Tests is most certainly not that forum.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: solive on 2010-07-12 06:19:33
The reference is what the artist heard in the studio plain and simple. That is the "live performance"


Seriously, I honor your technical expertise very much. But isn't it rather daily studio routine to capture each instrument as dry and isolated as possible* into a multitrack setup? That should be pretty far from what any of the participating artists has actually heard during the session (often a personal live mix over headphones).

* without necessarily having to record in separated spaces or takes.


Yes, you are right, that is generally how most recordings are made today.

Sometimes the studio performance is captured live (e.g. jazz recordings) but that is less common.

It's the final mix with all the reverb, auto-tuning, EQ, 20:1 compression that constitutes the "reference" as heard by the main artist/producer. That would be the equivalent of the "live performance" that Edison and AR were trying to reproduce.

My point is that the "live performance" doesn't exist in most recordings today, and moreover accuracy is not necessarily the goal. But we still need some way reliably delivering that final mix to consumers so they hear what the artist intended.


Cheers
Sean Olive
Audio Musings (http://seanolive.blogspot.com)
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: solive on 2010-07-12 06:46:18
I have a couple of questions.

Quote
Live and Recorded Performances Must Be Identical

For live-versus-recorded tests to be valid, the live and recorded performance should be identical, having the same notes, intonation, tempo, dynamics, loudness, balance between instruments, and the same location and sense of space of the instruments. Otherwise, there are extraneous cues that allow listeners to readily identify the live and recorded performances. Midi-controlled instruments (e.g. player pianos) are but one example of how this problem could be resolved.


Would it be possible to design a valid test with the opposite approach? That is, instead of trying to reproduce a single performance identically, could we use various different performances and recordings every time?

I'm thinking of such scenario: suppose we need a test with 20 trials. Take 20 different singers with different voices, make 20 recordings. And then let some 20 more singers (again, all different voices) perform during the test, a sort of A/B test.
This way, I'm thinking, the singers don't even have to perform the same piece of music. It could be different music material every trial/performance.

Would it be possible to gather any statistically significant result from such a test?



And my 2nd question: can we consider our everyday practice of enjoying recorded music as the "ultimate" proof that such recordings are indeed capable of creating an illusion of live performance?
After several decades of such practical experience all across the globe, perhaps we already have enough evidence to draw some statistically valid conclusions? or still not?
I mean, okay, measuring the accuracy of a particular loudspeaker is one thing, but can't we say anything definitive of the technology in general?


1) I think a basic tenet of a good scientific experiment is that it is repeatable. So using humans musicians as sound sources is going to cause a lot of errors, and biases if the live performance doesn't perfectly match the recorded one. If you can devise a way to compare the live performance (via live mic feeds w. no delay) and compare that double-blind  to the performance  that would eliminate some of the errors.

I don't see how your method gets around this problem.  You can have 20 singers but unless their performances perfectly match their recordings then listeners have extraneous cues besides sound quality that tell them something is different.

2) I think it has been proven that most people can  pretty well enjoy music listening to any old piece of crap. I first became really aware of how sound  quality affects my enjoyment of music when someone recorded my piano recital with 2 mics located underneath the piano, and charged me for the tape. It didn't sound at all like how I played the piano  (really boomy and dull).  I was really pissed off. At that point I realized the importance of sound recording and reproduction and decided to pursue it as a career. But most people probably never think about it, until they go to a really bad live concert, and the artist doesn't sound anything like the recordings. Even then, quantity usually matters more than quality (think rock n'roll).

Cheers
Sean Olive
Audio Musings (http://seanolive.blogspot.com)
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: kdo on 2010-07-12 21:25:43
1) I think a basic tenet of a good scientific experiment is that it is repeatable. So using humans musicians as sound sources is going to cause a lot of errors, and biases if the live performance doesn't perfectly match the recorded one. If you can devise a way to compare the live performance (via live mic feeds w. no delay) and compare that double-blind  to the performance  that would eliminate some of the errors.

I don't see how your method gets around this problem.


It turned out to be the RCT methods that I was thinking of. Please see above, in my second post in this thread (click here (http://www.hydrogenaudio.org/forums/index.php?showtopic=82115&view=findpost&p=713474)).

The way I see it, RCT does not require identical stimuli, so this whole problem becomes a non-issue in RCT. No need for any "live mic feeds" or anything exotic.
Am I wrong? Why?


However, I must emphasize one important point:

When we think of 'live-vs-recorded' tests it is quite natural to require that the recorded sound must be indistinguishable from the corresponding live sound. That is what we would want to test,  naturally. And that is, I suppose, what you are aiming at in the article. And we all know that ABX sort of tests are the best to get it done. And then, yes, we would run into all sorts of problems that would make such 'live-vs-recorded' tests practically impossible, especially w.r.t. the need for identical repeatable stimuli.

But then, if we still want to make any sort of  'live-vs-recorded' comparison, what is the next best thing we could do?

We could relax the requirements. Instead of demanding indistinguishable sound, we could ask ourselves: is our audio system able to create an illusion of a live performance?

Please note that it is a more broad, less strict demand.
Instead of demanding the system to identically reproduce some particular performance on some particular occasion, here we ask if the reproduction could be mistaken for a live performance, or not at all. I think this question can be answered by using 'explanatory' RCT methods.

Is this a meaningless question to test? I think not. I think it is a meaningful question, especially from the consumer perspective.
With that in mind, I think that useful 'live-vs-recorded' tests might be possible - by using RCT methods.


(think rock n'roll).

I think it would be wise, in the context of this discussion, to limit our definition of "live performance" to acoustic instruments and human voice.
Using mic feeds/PA etc would effectively degenerate a 'live-vs-recorded' test into a 'recorded-vs-recorded'. We don't want that.


2) I think it has been proven that most people can  pretty well enjoy music listening to any old piece of crap.


We should ask them not how they enjoy it, but whether the sound is realistic enough to be mistaken for a live performance. There are plenty of classical musicians who listen to all sorts of classical records through quite decent equipment. We could ask them.
Oh, and by the way, in the RCT vocabulary, this would be the 'pragmatic' type of RCT, 'in-the-field' trials.
Would you dismiss any 'pragmatic' type of RCT as unscientific?
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: solive on 2010-07-13 00:43:43
Would it be possible to design a valid test with the opposite approach? That is, instead of trying to reproduce a single performance identically, could we use various different performances and recordings every time?


A quick follow-up on my first question.

I did some googling and found that I was actually thinking of a kind of test called "Randomized controlled trial" (http://en.wikipedia.org/wiki/Randomized_controlled_trial) (RCT).
The "explanatory" type of RCT with "parallel-group" design and "allocation concealment", in particular. The goal of such RCT is to test the 'efficacy' of a treatment or medicine given to a group of patients.

So, here goes my analogy:
* 'efficacy' = ability to create illusion of a live performance.
* Participants (patients) are the singers/performers.
* Half of the participants are allocated to receive the 'treatment' (record and playback via loudspeakers),
and the other half is allocated to receive no 'treatment' (perform live).

The problem, I guess, is that it might be not quite properly triple-blind, since our 'participants' (singers) know which 'treatment' they are receiving, obviously.

But maybe this bias could be eliminated, too: let's make all singers perform live, but let some of them ("control group") perform before dummy-listeners in one room, and the other group perform before the actual audience in another room. Or something like that. Mix them and confuse them.
Then the singers wouldn't know which of their performances would actually count, and the test becomes fully triple-blind, I hope. 

So, on the surface it seems that it should be possible to eliminate the need for identical stimuli in a live-vs-recorded test - by using RCT method.

Any thoughts? Anybody? 


The participants in a listening test are normally the listeners - not the singers/performers, who act as one of the stimuli.

It is the listeners who decide whether or not the reproduction of the recording is similar to the live performance -- not the performers. I think you have misunderstood the original intent of the live-vs-recorded test.

Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: kdo on 2010-07-13 01:02:21
The participants in a listening test are normally the listeners - not the singers/performers, who act as one of the stimuli.

It is pure semantics. And terminology differences.

In ABX and in other kinds of small impairment tests, yes, the listeners are usually called 'participants' and recorded/live sounds are the 'stimuli'.

RCT is a different kind of test, and in RCT test things would be labeled differently.
The performers would be called test subjects (participants).
The fact of being recorded would be called the 'treatment' or 'intervention'.
And the listeners would evaluate the effect of the 'treatment' or 'no treatment'.


It is the listeners who decide whether or not the reproduction of the recording is similar to the live performance -- not the performers.

Yes. This is quite obvious. So?

And no, I have not misunderstood the intent of the live-vs-recorded test. 

Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: solive on 2010-07-13 04:09:51
The participants in a listening test are normally the listeners - not the singers/performers, who act as one of the stimuli.

It is pure semantics. And terminology differences.

In ABX and in other kinds of small impairment tests, yes, the listeners are usually called 'participants' and recorded/live sounds are the 'stimuli'.

RCT is a different kind of test, and in RCT test things would be labeled differently.
The performers would be called test subjects (participants).
The fact of being recorded would be called the 'treatment' or 'intervention'.
And the listeners would evaluate the effect of the 'treatment' or 'no treatment'.


It is the listeners who decide whether or not the reproduction of the recording is similar to the live performance -- not the performers.

Yes. This is quite obvious. So?

And no, I have not misunderstood the intent of the live-vs-recorded test. 


Sorry, then I guess  I misunderstood you. Let me reread what you've written.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: solive on 2010-07-13 04:53:07
Would it be possible to design a valid test with the opposite approach? That is, instead of trying to reproduce a single performance identically, could we use various different performances and recordings every time?


A quick follow-up on my first question.

I did some googling and found that I was actually thinking of a kind of test called "Randomized controlled trial" (http://en.wikipedia.org/wiki/Randomized_controlled_trial) (RCT).
The "explanatory" type of RCT with "parallel-group" design and "allocation concealment", in particular. The goal of such RCT is to test the 'efficacy' of a treatment or medicine given to a group of patients.

So, here goes my analogy:
* 'efficacy' = ability to create illusion of a live performance.
* Participants (patients) are the singers/performers.
* Half of the participants are allocated to receive the 'treatment' (record and playback via loudspeakers),
and the other half is allocated to receive no 'treatment' (perform live).

The problem, I guess, is that it might be not quite properly triple-blind, since our 'participants' (singers) know which 'treatment' they are receiving, obviously.

But maybe this bias could be eliminated, too: let's make all singers perform live, but let some of them ("control group") perform before dummy-listeners in one room, and the other group perform before the actual audience in another room. Or something like that. Mix them and confuse them.
Then the singers wouldn't know which of their performances would actually count, and the test becomes fully triple-blind, I hope. 

So, on the surface it seems that it should be possible to eliminate the need for identical stimuli in a live-vs-recorded test - by using RCT method.

Any thoughts? Anybody? 


OK. First the Randomized Control Trial method you refer to is designed to control "selection bias" by randomly assigning different treatments to subjects. In medical/drug studies they would give a different drug or dosage to a different subject.  What this means is you need a lot more subjects which makes the study expensive.

In audio, we typically use a repeated measures design (http://en.wikipedia.org/wiki/Repeated_measures_design) where each subject evaluates all of the available treatments (e.g. different loudspeakers) -- not just one treatment This has main advantage here is  it reduces number of subjects required to estimate the variance or effect on the subject due the treatment. So, it has great benefits in efficiency over what I think you are proposing. A team of 15 trained listeners is the typically the statistical equivalent of 100-200 untrained listeners because the latter listeners are less reliable and discriminating in their judgements.  If you are proposing assigning a single treatment  (known as a single stimulus test) to different groups of listeners, good luck getting a meaningful result.

If I understand you correctly (and please correct me if I am wrong here) you  are proposing having one group of listeners evaluate the recording/reproduction of  live performance  on an accuracy scale. The other group of listeners would evaluate the live performance on the same scale.  By definition the live performance is  a 10/10 on the accuracy scale (notwithstanding Taylor's Swift's wrong notes).  The other group have no idea of how accurate the recording is w/o having heard the live performance, so they would be guessing based on some internal reference of what they consider to be fidelity or accurate.  However, if the recording/reproduction  group gave every recording a 10/10, would you accept that as proof that the recording is 100% accurate?

I don't think live-versus-recorded apologists would accept that as a valid result or conclusion because their fundamental argument is that accuracy in sound reproduction can only be measured against a reference, which  for them, is the live performance.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: Arnold B. Krueger on 2010-07-13 17:25:32
1) I think a basic tenet of a good scientific experiment is that it is repeatable. So using humans musicians as sound sources is going to cause a lot of errors, and biases if the live performance doesn't perfectly match the recorded one. If you can devise a way to compare the live performance (via live mic feeds w. no delay) and compare that double-blind  to the performance  that would eliminate some of the errors.


The non-repeatability of live performances was illustrated to me by the following experience:

Some years back some studio techs prepared and sold sets of CDs that were designed to illustrate the characteristic colorations of microphones and mic preamps. I invensted in a set.

I decided to see what would happen if I tried to ABX them. In the process of preparing the samples for ABXing, I found that hte purportedly identical musical samples that were supposed to  differ only in terms of equipment used were different in fairly gross ways. The musical samples had different lengths if you trimmed them to be musically alike. Their average levels varied by more than enough to be audible. Once those basic issues were dealt with, there were still clearly audible differences in timing, inflection and intonation that were clearly audible. I never had any trouble ABXing them and obtaining perfect or  nearly perfect scores in short order based on just the misical differences.

The second issue is that the musical reproduction chain can be broken down into three general areas being microphones and microphone technique,  audio signal stoarge and production, and speakers and room acoustics. By various means we can show that signal storage and production can be sonically transparent. It is well known that neither of the other two areas of music reproduction have attained that level of refinement. 
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: zane9 on 2010-07-13 18:50:03
...The second issue is that the musical reproduction chain can be broken down into three general areas being microphones and microphone technique,  audio signal stoarge and production, and speakers and room acoustics. By various means we can show that signal storage and production can be sonically transparent. It is well known that neither of the other two areas of music reproduction have attained that level of refinement.


Not withstanding the deficiences of the other two areas mentioned by Arnold, my default preference is listening to a recording of unamplified music in a non-studio setting, done with a pair of microphones and no mix.

Not so easy to find these recordings, these days.

Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: kdo on 2010-07-13 20:10:19
Sorry, then I guess  I misunderstood you.

That's alright, no problem. I know I'm probably not being all too clear. It's almost like I'm thinking out loud. Trying to figure out what can we do about these 'live-vs-recorded' tests. And I don't have a clear picture yet.


OK. First the Randomized Control Trial method you refer to is designed to control "selection bias" by randomly assigning different treatments to subjects. In medical/drug studies they would give a different drug or dosage to a different subject.  What this means is you need a lot more subjects which makes the study expensive.

Well, expensive and 'hard-to-do' study is still a lot easier than an 'impossible-to-do'.

I'm no expert in statistics, so someone will have to work out all those probabilities, statistical power, sample size, etc etc.
In a quick google search I found some articles and tables, where the recommended sample size (number of test subjects) in RCT seems to vary anywhere between 50 and 200, sometimes much more, sometimes less.
I suppose that to test for a marginal effect we'd need a very large sample size, and to test for 'night-and-day' effect a smaller sample size would do.

And one more thought:
In ABX we are testing just one performer, but we need at least 10 trials, 3 performances per trial (A,B,X), so at least 30 performances in total.
In a RCT test we'd need, say, 50 or 100 test subjects (performers), but we'd only use just 1 performance of each subject, so it's 50 to 100 performances in total.

So, okay, the bad news is that we'd have to invite a lot more performers for RCT than for ABX, but the work load on the listeners (number of performances to evaluate) is not all that different - that's a good news.


In audio, we typically use a repeated measures design (http://en.wikipedia.org/wiki/Repeated_measures_design) where each subject evaluates all of the available treatments (e.g. different loudspeakers) -- not just one treatment This has main advantage here is  it reduces number of subjects required to estimate the variance or effect on the subject due the treatment. So, it has great benefits in efficiency over what I think you are proposing. A team of 15 trained listeners is the typically the statistical equivalent of 100-200 untrained listeners because the latter listeners are less reliable and discriminating in their judgements.  If you are proposing assigning a single treatment  (known as a single stimulus test) to different groups of listeners, good luck getting a meaningful result.

If I understand you correctly (and please correct me if I am wrong here) you  are proposing having one group of listeners evaluate the recording/reproduction of  live performance  on an accuracy scale. The other group of listeners would evaluate the live performance on the same scale.

I see we are not quite on the same wavelength yet. I'll try to explain a bit more what I'm proposing.

We should have only one group of listeners. Could be just one sole listener. Or a small group of trained listeners. These listeners will evaluate a randomized sequence of reproductions/live performances. No communication between listeners, strictly individual evaluations.

(The reason I mentioned "dummy-listners" earlier is that, perhaps, it could be part of the plot to eliminate performer's bias. But that's a technicality.)

It is the performers (our 'test subjects') who will be split in 2 parallel groups. One group of performers will be recorded and the listeners would be exposed only to reproductions of their recordings. The other group of performers would only perform live.

Thus, the listeners would hear each performer's sound only once (either recorded or live).

The objective of the listener is to guess which is which, live or reproduction.
No scale, no accuracy grade. Just simple "yes/no" evaluation.

Just like the imfamous "An artist or an ape?" quiz (http://reverent.org/an_artist_or_an_ape.html).

(The listeners) have no idea of how accurate the recording is w/o having heard the live performance, so they would be guessing based on some internal reference of what they consider to be fidelity or accurate.

Yes, exactly.

However, if the recording/reproduction  group gave every recording a 10/10, would you accept that as proof that the recording is 100% accurate?

Strictly speaking, it cannot be 'proof' (just like a failed ABX is not a 'proof'), because it might be false negative. Then, by the same logic as in ABX, we can say that it would be a strong indication that these reproductions (taken collectively) provide a realistic illusion of being in the presence of a live performance.

It doesn't proof anything w.r.t. whether any reproduction was a true accurate copy of the original performance.

But the opposite result (when listeners successfully guess which one is recorded and which is live) - would be proof that our recording-reproduction system is not at all able to create illusion of a live performance. Then our system is a piece of junk. No need for testing its accuracy any further.


I already tried to explain this in one of the previous posts (See above, in Post #24 (http://www.hydrogenaudio.org/forums/index.php?showtopic=82115&view=findpost&p=713613), right after the words "I must emphasize one important point")

So, okay, we cannot grade accuracy of the reproducation on a fine scale. The need for identical live performances being one of the major problems.
And then I ask, if we cannot do ABX, what is the next best thing we could do?
Let's try and evaluate reproductions using coarse scale, the simplest scale (0/1, yes/no).

I realize that it may be far from what you are interested in, in your research of loudspeakers etc., but I think it would still be an interesting test. As a consumer I would be interested to know: is my system really 'good enough', is it able to create a believable illusion of a live performance, or is it all marketing, hype and self-suggestion.


I don't think live-versus-recorded apologists would accept that as a valid result or conclusion because their fundamental argument is that accuracy in sound reproduction can only be measured against a reference, which  for them, is the live performance.

I'm guessing there must be some sort of eternal debate with "live-versus-recorded apologists" somewhere. I guess I totally missed out on that one. Oh, well...


/EDIT: by the way, I can see now that it may be difficult to control false negatives in the test I'm proposing.
Well, there must be some standard techniques how to minimize the damage. Hopefully...
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: Arnold B. Krueger on 2010-07-13 20:21:50
...The second issue is that the musical reproduction chain can be broken down into three general areas being microphones and microphone technique,  audio signal stoarge and production, and speakers and room acoustics. By various means we can show that signal storage and production can be sonically transparent. It is well known that neither of the other two areas of music reproduction have attained that level of refinement.


Not withstanding the deficiences of the other two areas mentioned by Arnold, my default preference is listening to a recording of unamplified music in a non-studio setting, done with a pair of microphones and no mix.


I've made in excess of 500 recordings of live, unamplified music using 2 microphones chosen for flat on-axis response, no equalization or other processsing, for hire, in just the past 5 years.

Changing just the position and orientation of the 2 microphones, I can adjust the timbre and soundstanging of the recording over a fairly wide range. My choices are informed by the desires of the clients, who are professional musicans, mostly high school and college educators. I'm usually tryng to duplicqte the sound in a particular range of locations in the auditorium.

In no case would I consider the resulting recordings to be "sonically accurate" in the sense that they would frustrate or even challenge attempts at identification in an ABX test. I don't think they would be very hard to differentiate from live sound in a live versus recorded comparison.

I've also made a goodly number of multitrack recordings using close micing, distant micing and even the capture of raw electical signals from amplified electronic instruments. I think that most people would find carefully mixed recordings made this way to *not* be obviously less "lifielike" than the ones made using minimal micing and no mixing or other processing.

In terms of recreation of lifelike sound, the prcedure that seems to get the closest is IME close-micing and loudspeaker reproduction in the same room, given that the room is extremely reverberant.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: Arnold B. Krueger on 2010-07-13 20:26:41
It is the listeners who decide whether or not the reproduction of the recording is similar to the live performance -- not the performers. I think you have misunderstood the original intent of the live-vs-recorded test.


An important point. While some performers have some sense of what their music sounds like, the audience knows far better what their music sounds like to the audience. That only makes common sense.

There's a reason why the preferred location for the mixer of a live performance is generally near the middle of the audience, and not the middle of the performers! ;-)
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: kdo on 2010-07-14 02:19:37
I sense a big fat TOS-8 violation right here:
In terms of recreation of lifelike sound, the prcedure that seems to get the closest is IME close-micing and loudspeaker reproduction in the same room, given that the room is extremely reverberant.

Can you back up that assertion by a rigorous 'live-vs-recorded' test with statistically significant results?


Hint: this is exactly why I believe that 'live-vs-recorded' tests are necessary. To validate any such claims about "lifelike sound".

Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: Ed Seedhouse on 2010-07-14 02:35:49
One form of "live vs. reccorded" test with less problems than others might be to have a high quality speaker recorded playing music in an anechoic chamber, then seeing how much another speaker would sound like it when they both play in the same room.

So we record speaker A playing musical recordings in an anechoic chamber.  Then we listen in a room to speaker "B" playing the recording made in the chamber and comparing it with speaker A playing the original recordens.  Assuming speaker A has some colorations, how well would speaker B do in playing these colorations accurately?

We could even use speaker A as it's own comparitor.  If it could play it's own self back unchanged vs. the original recordings then we would know it was certainly highly accurate.  In fact I doubt if any actual production speaker could pass that test, but it would be great to be proven wrong.

This would at least be much easier to arrange than with live musicians.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: kdo on 2010-07-14 02:57:02
One form of "live vs. reccorded" test with less problems than others might be to have a high quality speaker recorded playing music in an anechoic chamber, then seeing how much another speaker would sound like it when they both play in the same room.

And how do we know whether our high quality speaker is able to produce lifelike sound to start with?
Can we escape this catch 22 somehow?


Personally I have nothing against this type of testing (comparison with a reference speaker).

But I think it rather falls into category of 'recorded-vs-recorded'.

Plus there is more to the "lifelike sound" than a good loudspeaker -
various microphones and recording techniques play as big a role as loudspeakers.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: Arnold B. Krueger on 2010-07-14 03:12:27
I sense a big fat TOS-8 violation right here:
In terms of recreation of lifelike sound, the prcedure that seems to get the closest is IME close-micing and loudspeaker reproduction in the same room, given that the room is extremely reverberant.

Can you back up that assertion by a rigorous 'live-vs-recorded' test with statistically significant results?


I think you aren't getting the point of the qualifier "that seems". If I was trying to be rigorous, I would have said "that is".

Please notice my recent comments about the non-recreatability of live music events using real world musicans.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: kdo on 2010-07-14 03:30:43
Please notice my recent comments about the non-recreatability of live music events using real world musicans.

Please notice my recent comments that recreatability of live music events, it seems, is not required if all we want is to verify claims of "lifelike sound" using RCT methods.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: Arnold B. Krueger on 2010-07-14 10:42:44
Please notice my recent comments about the non-recreatability of live music events using real world musicans.

Please notice my recent comments that recreatability of live music events, it seems, is not required if all we want is to verify claims of "lifelike sound" using RCT methods.


I don't see where Randomized Controlled Trials do anything but add complexity.

Please show otherwise, if you can.

The non-repeatability of live music events means that there is no possibility of using the same stimulus. It's like a drug trial where every drug can be tried only once for all time.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: kdo on 2010-07-14 16:56:25
I don't see where Randomized Controlled Trials do anything but add complexity.

Please show otherwise, if you can.

Did you read my previous posts? Especially post #31 (http://www.hydrogenaudio.org/forums/index.php?showtopic=82115&view=findpost&p=713722).

Could you please be a bit more specific, what is not clear in my reasoning?
I'd be glad to try and rephrase and/or add more explanations.

And please keep in mind that I don't have all the answers. That is why I brought up the question  here. I was hoping that our audio experts might be able to help.


The non-repeatability of live music events means that there is no possibility of using the same stimulus.

Yes, correct.

It's like a drug trial where every drug can be tried only once for all time.

This analogy is true in the "Repeated Measures Design" category of tests, which are typically used in audio. Sean Olive noted it in one of his comments above.

However, RCT is a whole different category. Different approach. Unconventional to audio. So I'm trying to think outside-the-box. Please bear with me for a moment.


My understanding is that in the RCT framework the analogy should go like this:

Performer is the 'patient' (test subject).
Or better yet, not performer himself, but the live sound of the performer is the 'patient' (test subject).

The process of recording/playback is the 'drug' ('treatment', 'intervention') that affects our 'patient' (the sound of the performer).

Thus, live sound is a patient who didn't receive any drug.
Recorded sound is a patient who was given the drug.

Obviously, there is no such thing as two identical patients (two identical live sounds), therefore we use multiple patients (multiple live performances by different performers). One group of patients is given the drug (sound is recorded and reproduced). The control group of patients is not given any drug (sound is live).

A trained listener is the 'expert/doctor' who evaluates randomized sequence of 'patients'.
The listener ('doctor') must answer a question: is the 'patient' dead (recorded sound) or still alive ("lifelike sound") .
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: Arnold B. Krueger on 2010-07-14 17:15:12
The non-repeatability of live music events means that there is no possibility of using the same stimulus.

Yes, correct.

It's like a drug trial where every drug can be tried only once for all time.

This analogy is true in the "Repeated Measures Design" category of tests, which are typically used in audio. Sean Olive noted it in one of his comments above.

However, RCT is a whole different category. Different approach. Unconventional to audio. So I'm trying to think outside-the-box. Please bear with me for a moment.

My understanding is that in the RCT framework the analogy should go like this:

Performer is the 'patient' (test subject).
Or better yet, not performer himself, but the live sound of the performer is the 'patient' (test subject).

The process of recording/playback is the 'drug' ('treatment', 'intervention') that affects our 'patient' (the sound of the performer).

Thus, live sound is a patient who didn't receive any drug.
Recorded sound is a patient who was given the drug.

Obviously, there is no such thing as two identical patients (two identical live sounds), therefore we use multiple patients (multiple live performances by different performers). One group of patients is given the drug (sound is recorded and reproduced). The control group of patients is not given any drug (sound is live).

A trained listener is the 'expert/doctor' who evaluates randomized sequence of 'patients'.
The listener ('doctor') must answer a question: is the 'patient' dead (recorded sound) or still alive ("lifelike sound") .


Thanks for the explanation.

OK, I think I get it. Yes I think it could work. It seems to be horribly expensive in terms of time and qualified test participants as compared to say an ABX test. Note that the amount of time required to do ABX tests is widely objected to.


Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: kdo on 2010-07-14 17:58:33
Thanks for the explanation.

OK, I think I get it. Yes I think it could work. It seems to be horribly expensive in terms of time and qualified test participants as compared to say an ABX test. Note that the amount of time required to do ABX tests is widely objected to.

You're welcome.

But how much horrible it would be, really?
Would be very interesting to work out some ballpark numbers. Perhaps, we need to ask a statistician.


Earlier I tried to make a simple 'uneducated estimate' of complexity. It seems that the work load on the listeners (number of performances to evaluate) in RCT may be not so different from ABX.

I'll just quote the relevant part here:
In a quick google search I found some articles and tables, where the recommended sample size (number of test subjects) in RCT seems to vary anywhere between 50 and 200, sometimes much more, sometimes less.
...
In ABX we are testing just one performer, but we need at least 10 trials, 3 performances per trial (A,B,X), so at least 30 performances in total.
In a RCT test we'd need, say, 50 or 100 test subjects (performers), but we'd only use just 1 performance of each subject, so it's 50 to 100 performances in total.

So, okay, the bad news is that we'd have to invite a lot more performers for RCT than for ABX, but the work load on the listeners (number of performances to evaluate) is not all that different - that's a good news.


But now I'm also thinking: can we, perhaps, get away with using just 1 performer to produce all live and recorded sounds?
This might considerably reduce complexity of the test. (No need for hundreds of performers - that would be fantastic).
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: 2Bdecided on 2010-07-14 18:54:06
The reference is what the artist heard in the studio plain and simple. That is the "live performance"
What do you mean - in the studio while they were performing, or in the studio as they listened to playback?

I would suggest that both have little relevance. What a person sounds like to them selves is very different from the way they sound to others. What a given recording sounds like through a specific set of studio loudspeakers is of little relevance to me - especially if those studio speakers are crap.

I know the engineer will make decisions based on the sound they hear through those speakers, but a skilled engineer mixes for all speakers, and isn't likely to tailor the sound specifically to overcome the deficiencies of one pair of speakers. They may do so to some extent, but the more skill and experience they have, the less this will be an issue.

I can justify this with a great example: recordings from the 1930s sound closer to "real" instruments when replayed today than they did when replayed in the 1930s, yet according to your argument, the sound heard in the control room in 1935 is "accurate", and the sound heard on "better" equipment today is less accurate.

Cheers,
David.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: 2Bdecided on 2010-07-14 19:04:58
The second issue is that the musical reproduction chain can be broken down into three general areas being microphones and microphone technique,  audio signal stoarge and production, and speakers and room acoustics. By various means we can show that signal storage and production can be sonically transparent. It is well known that neither of the other two areas of music reproduction have attained that level of refinement.
I think this is probably factually accurate - but until we have all three areas "transparent", I wonder how we can truthfully say that any one or two of them are. How do we know?!

I suppose there are spatial dimensions of sound (e.g. directional response in the room) which are simply not present on stereo (or arguably conventional multi-channel) recordings. Since they're not present, then I guess any speaker which reproduces the other cues (which are present) correctly, can be said to be transparent. Which may mean that two arguably "transparent" speakers can sound different in a real room, due to their different spatial response patterns. I think.

(I've confused myself here. I'm not trying to start an argument).

It would be nice to have an audio recording and reproduction system that was end-to-end transparent - even if we started with one sound source (e.g. someone talking) in an anechoic chamber. If we can't even manage that after a century, what have we been playing at?

Cheers,
David.

Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: analog scott on 2010-07-14 20:40:29
The second issue is that the musical reproduction chain can be broken down into three general areas being microphones and microphone technique,  audio signal stoarge and production, and speakers and room acoustics. By various means we can show that signal storage and production can be sonically transparent. It is well known that neither of the other two areas of music reproduction have attained that level of refinement.
I think this is probably factually accurate - but until we have all three areas "transparent", I wonder how we can truthfully say that any one or two of them are. How do we know?!

I suppose there are spatial dimensions of sound (e.g. directional response in the room) which are simply not present on stereo (or arguably conventional multi-channel) recordings. Since they're not present, then I guess any speaker which reproduces the other cues (which are present) correctly, can be said to be transparent. Which may mean that two arguably "transparent" speakers can sound different in a real room, due to their different spatial response patterns. I think.

(I've confused myself here. I'm not trying to start an argument).

It would be nice to have an audio recording and reproduction system that was end-to-end transparent - even if we started with one sound source (e.g. someone talking) in an anechoic chamber. If we can't even manage that after a century, what have we been playing at?

Cheers,
David.


This why it is a trickier to talk about speaker "accuracy." You take something like a preamp (maybe one of the easiest components to judge for accuacy) and you literally can take the input and output and directly compare them. With speakers you defintiely can't compare the input and the output realistically since the output of a speaker is so fundamentally different than it's input.

So long as audio recording and playback systems are designed to create an aural illusion of the original event form a single perspective rather than an accuracte reconstruction of the original soundfield,  transparency of the system in it's entirety becomes a bit dodgy. No matter how transparent the precieved sound may be to the original event the soundfileds of the original event and the playback will always be incomparable.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: Arnold B. Krueger on 2010-07-16 07:58:21
It would be nice to have an audio recording and reproduction system that was end-to-end transparent - even if we started with one sound source (e.g. someone talking) in an anechoic chamber. If we can't even manage that after a century, what have we been playing at?


IMO way too much time has been spent playing with the part of the chain that has been sonically transparent for 2-3 decades.

I see that the AES is still fighting the battle of 24/192:

Yet another example of people who should know better wasting time building sand castles (http://www.aes.org/events/128/workshops/?ID=2268)

:-(
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: 2Bdecided on 2010-07-16 16:29:18
It would be nice to have an audio recording and reproduction system that was end-to-end transparent - even if we started with one sound source (e.g. someone talking) in an anechoic chamber. If we can't even manage that after a century, what have we been playing at?


IMO way too much time has been spent playing with the part of the chain that has been sonically transparent for 2-3 decades.
You might be right.

Quote
I see that the AES is still fighting the battle of 24/192:

Yet another example of people who should know better wasting time building sand castles (http://www.aes.org/events/128/workshops/?ID=2268)

:-(
You're upset that the discussion happened. I'm upset that the flipping AUDIO engineering society can't even record the discussion and post it on-line!!!

(unless I've missed it)

Still, a report says...
Quote
These thought provoking presentations gave some teaching for psychoacoustic test
methods and some fascinating recent results on perception thresholds. Peter Craven gave
an insight into subjective testing and how the forced decision ABX test may in fact fail to
find out what the ear /brain perception is doing, where the test blocks the natural
perceived response to audio quality variation unless the differences are relatively gross.
Milind Kunchur outlined the extreme care necessary to establish sensitive tests to
establish a 5uS or so temporal detection threshold, backed by a theoretical analysis of this
aspect of hearing.
from http://www.hificritic.com/downloads/HDA2010.pdf (http://www.hificritic.com/downloads/HDA2010.pdf)
...so it might have been good, or it might have been nonsense. It would be good to have some papers to read.

Critics of (some caricature of) ABX need to show that some other double-blind test methodology can allow listeners to hear a difference that ABX masks. Did this happen here? Who can tell!

Cheers,
David.
Title: Why Live-vs-Recorded Listening Tests Don't Work
Post by: Arnold B. Krueger on 2010-07-16 23:59:04
Critics of (some caricature of) ABX need to show that some other double-blind test methodology can allow listeners to hear a difference that ABX masks. Did this happen here? Who can tell!


That is the meat of the discussion. It is easy to say that ABX sucks or that all blind tests suck. It seems to be very hard to actually ring up strong reliable results any other way that doesn't also give away the store by giving clues about what people are listening to, other than plain old sound quality.

As a somewhat OT aside, I am fighting a similar battle at work. We've got some people who probably have classic hypersensitive hearing (due to age and/or hearing damage) who are objecting strongly when the music peaks briefly to over 90 dB at their seats. I see their point. The music gets a little loud and they get a headache. A few other people report the same problem. The audiologist at their hearing aid dealer says that their hearing is good. Most people find that a few peaks up to 100 dB are fun. Some of the sources that get really loud are acoustic instruments so the sound guy who gets some of their wrath, can't do anything about it anyway.

The similarity is that what they perceive completely supports their viewpoint. How could they be wrong?