Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Evaluation of Sound Quality of High Resolution Audio (Read 21376 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Evaluation of Sound Quality of High Resolution Audio

Reply #1
So 55.6% of the subjects preferred hi-res over 128K LAME MP3? That doesn't exactly sound like night/day to me.

OTOH, it doesn't necessarily mean that many of them could not hear a difference, only that on average they had a slight preference for the hi-res sound over MP3.

Evaluation of Sound Quality of High Resolution Audio

Reply #2
Quote
Quantization resolution has been converted into 16 bits using MATLAB.

No word i find how exactly. Truncated? Dither?

Quote
The  reference  HRA  was  band limited  using  the steep low pass filter, of which cut off frequency is 20 kHz, and downsampled  into  48  kHz  using  MATLAB.  The,  two versions were prepared with 24 bits and 16 bits accuracy.

Also a strange wording. Do they talk about 16bit files and 24bit files? Was the resampling done with 16bit precision (accuracy) when most resamplers use 32bit float at minimum?
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Evaluation of Sound Quality of High Resolution Audio

Reply #3
55.6% hardly seems statistically significant. They did manage to get 74.1% on the comparison between 320kbps and 128kbps MP3, but it stands to reason that they should have been able to replicate that result for HRA and CD vs 128kbps, but the success rates for those are much lower. It speaks to the low number of participants and the influence of random chance.

The testing procedure is described very briefly, and it specifies that the participants listening to pairings consisting of a recording and the same recording in lower quality, ie. HRA vs CD, CD vs 320kbps MP3, and 320kbps MP3 vs 128kbps MP3, so they listened to each format, successively paired with all of its lower-quality counterparts.

But there is no mention of whether the higher-quality sample was always played first, or if the playback order was randomized. I get the distinct impression from the paper that the high-quality sample was always played first for 120 seconds, then a pause of 30 seconds before finally playing the lower-quality sample for 120 seconds. Hasn't ABX testing (and common sense) shown us that the stimuli needs to be randomized? And that for best results, the participant must be able to switch instantly and as often as they want, for proper comparison?

Another thing that bothers me is that they seem to be using CBR MP3, at 128kbps and 320kbps, with no mention of VBR. Does anyone actually use MP3 at CBR 128kbps for anything other than testing anymore?

Still, the fact that simple CBR 128kbps MP3 fares so well against HRA is a clear indicator most peoples' hearing is nowhere near as good as they would have you believe.

Evaluation of Sound Quality of High Resolution Audio

Reply #4
Quote
Quantization resolution has been converted into 16 bits using MATLAB.

No word i find how exactly. Truncated? Dither?

Quote
The  reference  HRA  was  band limited  using  the steep low pass filter, of which cut off frequency is 20 kHz, and downsampled  into  48  kHz  using  MATLAB.  The,  two versions were prepared with 24 bits and 16 bits accuracy.

Also a strange wording. Do they talk about 16bit files and 24bit files? Was the resampling done with 16bit precision (accuracy) when most resamplers use 32bit float at minimum?

Yes, the use of a MATLAB 20Khz stopband filter caught my eye also.
Loudspeaker manufacturer


Evaluation of Sound Quality of High Resolution Audio

Reply #6
What bothers me most is that this was a forced-choice test, i.e. even if you could not distinguish between the two samples you were forced to select one at random. That goes a long way toward explaining the apparent randomness of the overall results.

Evaluation of Sound Quality of High Resolution Audio

Reply #7
When i read it correctly (alcohol at play welcoming 2016)  they use a Pioneer N-50. This Pioneer N-50 they use may have some non neutral behaviour with 48kHz 44.1 response
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!


Evaluation of Sound Quality of High Resolution Audio

Reply #9
I see our old friend the "Super tweeter" making an appearance again, iirc was this not one of the likely confounding issues with the Oohashi paper ? A paper which the authors treat somewhat uncritically. I am also a tad concerned about the  "In this experiment, it is confirmed that the effect of the non-linear distortion could be negligible." statement, how did they assure this?

Unless I missed it I do not see any discussion of level matching.

The protocol needs more explanation as others have mentioned.

The last section shows a possible 60.3% discrimination between 24 bits and 16 bits which is potentially interesting, but we do not know how many tests were same or different. As we know there is a bias to say different even when there is no difference, if the number of same/different trials does not account for this the result will be skewed. The lack of correlation between the preference trials and the (really only 4 trials per subject?) 24 vs 16 bit trials is also unexpected. I'd like to see the raw data...
This is not a signature !

Evaluation of Sound Quality of High Resolution Audio

Reply #10
57% seems like crap to me. Even I know that.

I have ABXing high bit rate lossy vs. lossless since I was a teenager so my conclusion is that lossy is no good for me.

I did a test that was HRA vs. CD quality and I failed with 7 out of 8 of the test tracks but I passed one. My conclusion is that the difference is insignificant to non-existent.
FLAC -> JDS Labs ODAC/O2 -> Sennheiser HD 650 (equalized)

Evaluation of Sound Quality of High Resolution Audio

Reply #11
Even then you may have only found your equipment behaves different at different samplerates. A reupsampling to 192kHz may have prevented errors from that.
In the case of the paper i see no attempt to clarify that.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Evaluation of Sound Quality of High Resolution Audio

Reply #12
Will be part of an AES paper evaluating Hi Res papers.

cheers,

AJ



who will be the author(s) of that?


http://www.stereophile.com/comment/555431#...cHBXxMyDBbtT.97


A lot of fluffery, hardly unexpected, from Atkinson; a seasoned eye will nota benehow he immediately shifts the narrative to 'real but small differences' that can only be reliably validated by extremely well trained listeners and rigorously ITU-conforming tests.    Yet it's hardly just 'small differences' that Stereophile touts regularly, is it?  And how highly trained are Stereophile's stable of reviewers, exactly? How well do their test methods conform?  Not to mention all the subscribers and camp followers who are sure the difference between hirez and CD is night and day?

That Oct 31 AES session he cites, featuring Bob Katz (who has made the occasional dubious sighted listening claim himself) and George Massenburg (who has a travelling road show that feature mediocre mp3 vs lossless 'difference signal' as evidence of OMG lossy is the worst),  could sure have used some JJ.

As for a meta-analysis of 20 audio DBT papers (out of 'some 80') by Dr. Reiss --  that should be interesting when it's finally published, assuming it will be (I see no indication of that one way or the other). Meanwhile it's interesting to note how Meyer and Moran (2007) still bugs the sh*t out of these guys.  (BTW, Moran replies in the comments to the Atkinson's implication that their data was biased)

Evaluation of Sound Quality of High Resolution Audio

Reply #13
Vinyl now only be relevant as most easily accessible a source analog and raw directly from the RIAA phono preamplifier.
  To compare 16 vs. 24 bit, it is also necessary to have two ADC and DAC, accordingly, two. Then need to a fully analog mixer for leveling volume DACs, then the switch on mechanical relays with electronic timer (If compared between a single pair of devices), after all would need amplifier. It is also important that the principle of operation and the type of filtration in 16 and 24bit DACs coincided. Example: 1bit delta-sigma ADC and Butterworth filter of the 20th order is implemented with an equal number of stages. MP3 may be also encoded lake a one of the chain ADC-DAC in realtime.
Happy New Year everyone!:D

Evaluation of Sound Quality of High Resolution Audio

Reply #14
A lot of fluffery, hardly unexpected, from Atkinson; a seasoned eye will nota benehow he immediately shifts the narrative to 'real but small differences' that can only be reliably validated by extremely well trained listeners and rigorously ITU-conforming tests.    Yet it's hardly just 'small differences' that Stereophile touts regularly, is it?  And how highly trained are Stereophile's stable of reviewers, exactly? How well do their test methods conform?  Not to mention all the subscribers and camp followers who are sure the difference between hirez and CD is night and day?


Like for instance their star player, Michael Fremer. Who penned this fine review of a set of analog cables:

http://www.stereophile.com/cables/805harm/...C0P2UDJw5DLe.97

Read his review, then look at the measurements. Absolutely hopeless.

Evaluation of Sound Quality of High Resolution Audio

Reply #15
55.6% hardly seems statistically significant.


Theoretically you can get any percentage to be statistically significant. That's why effect sizes are important. Audiophiles that claim audible differences should score much much better than 56%. I'm thinking >>75%.
"I hear it when I see it."

Evaluation of Sound Quality of High Resolution Audio

Reply #16
Like for instance their star player, Michael Fremer. Who penned this fine review of a set of analog cables:

I'm not a vinyl aficionado, so I don't peruse that section here, but apparently he isn't too pleased with some:
Quote
But anyway, they Hydrogen Audio guys, having lost, were, like good little terrorists, back with new demands

 
Loudspeaker manufacturer

Evaluation of Sound Quality of High Resolution Audio

Reply #17
As for a meta-analysis of 20 audio DBT papers (out of 'some 80') by Dr. Reiss --  that should be interesting when it's finally published, assuming it will be

I have a partial list and will ask him if its ok to post here.
Some may be a bit controversial IMO, so it appears this is actually just a statistical analysis of the results of tests and not so much a review of actual validity of test methods-results.

cheers,

AJ
Loudspeaker manufacturer

Evaluation of Sound Quality of High Resolution Audio

Reply #18
As for a meta-analysis of 20 audio DBT papers (out of 'some 80') by Dr. Reiss --  that should be interesting when it's finally published, assuming it will be

I have a partial list and will ask him if its ok to post here.
Some may be a bit controversial IMO, so it appears this is actually just a statistical analysis of the results of tests and not so much a review of actual validity of test methods-results.

cheers,

AJ



mmm-hmm..that's what a meta-analysis paper is -- stats on stats --  his 'review' of the method validity would have come first, during the winnowing stage. 

of course, the flavor of a sauce has a to do with ingredients left out, as well as those included

Evaluation of Sound Quality of High Resolution Audio

Reply #19
Like for instance their star player, Michael Fremer. Who penned this fine review of a set of analog cables:

I'm not a vinyl aficionado, so I don't peruse that section here, but apparently he isn't too pleased with some:
Quote
But anyway, they Hydrogen Audio guys, having lost, were, like good little terrorists, back with new demands





so, Fremer is as clueless about terrorism as he is about science. 


it's always ...*interesting* to see the spin the Stereophile guys put on their confrontations with science.  Too bad Mikey didn't link to a pertinent HA thread.

Evaluation of Sound Quality of High Resolution Audio

Reply #20
Atkinson has posted a photo he took of Reiss's slide on M&M 2007





From M&M's report, it seems they ran 554 trials, apparently(?) in blocks of 10 trials per test (which doesn't divide evenly, I know...so maybe some subjects bailed during some tests)

According to the paper, 2 subjects out of 55 achieved 7/10 correct, 1 subject achieved 8/10 correct. Those were the best scores.

A score of 7/10 has a p=0.172, indicates that in 100 such tests, we 'expect' that  'success' rate to occur ~17 times 'by chance'.  For 55 tests we 'expect' 7/10 to occur ~8-9 times by chance. NB:  a score with a p value of 0.172 is nowhere near the common 'threshold' for rejecting the 'sounds the same' null hypothesis: p<0.05 . For a 10-trial test, that means a score of at least 9/10. 

Similarly for 8/10, the p=0.055, which is still just a hair over the 'significant' threshold. For 100 tests,  we expect an 8/10 score to occur ~5-6 times by chance;  so for 55 tests, we expect it to occur by chance ~2-3 times.

Reiss does a chi-square analysis, which compares what 'should' have occurred to what 'did' occur.  His report is that the actual distribution -- just 3 tests who results were 7/10 or better -- departs significantly from what's expected 'by chance' (truly random) -- i.e., at least ~8 such test results.

That's how I understand it.  Anyone? 

One problem I see is that M&M do not break down per-subject results -- we don't know how many tests a given subject took, for example.  We know there were 554 trials and 276 correct answers.  We have breakdowns by broad categories like gender,  audiophile/audio professional vs not,  and hearing/age.  That's it.  Can we consider all *tests* to be equivalent, when the *subjects* are not?

Evaluation of Sound Quality of High Resolution Audio

Reply #21
Reiss does a chi-square analysis, which compares what 'should' have occurred to what 'did' occur.  His report is that the actual distribution -- just 3 tests who results were 7/10 or better -- departs significantly from what's expected 'by chance' (truly random) -- i.e., at least ~8 such test results.  That's how I understand it.  Anyone?


Chi-square analysis is known to be not very accurate with small sample sizes.


"The problem with small numbers
Chi-square and G–tests of goodness-of-fit or independence give inaccurate results when the expected numbers are small. For example, let's say you want to know whether right-handed people tear the anterior cruciate ligament (ACL) in their right knee more or less often than the left ACL. You find 11 people with ACL tears, so your expected numbers (if your null hypothesis is true) are 5.5 right ACL tears and 5.5 left ACL tears. Let's say you actually observe 9 right ACL tears and 2 left ACL tears. If you compare the observed numbers to the expected using the exact test of goodness-of-fit, you get a P value of 0.065; the chi-square test of goodness-of-fit gives a P value of 0.035, and the G–test of goodness-of-fit gives a P value of 0.028. If you analyzed the data using the chi-square or G–test, you would conclude that people tear their right ACL significantly more than their left ACL; . . .

. . .


Recommendation
I recommend that you always use an exact test (exact test of goodness-of-fit, Fisher's exact test) if the total sample size is less than 1000."

http://www.biostathandbook.com/small.html

Sorry I can't comment any further because this isn't my strong point but thought to pass it on should it be of any value to you.


Evaluation of Sound Quality of High Resolution Audio

Reply #22
From M&M's report, it seems they ran 554 trials, apparently(?) in blocks of 10 trials per test (which doesn't divide evenly, I know...so maybe some subjects bailed during some tests)

According to the paper, 2 subjects out of 55 achieved 7/10 correct, 1 subject achieved 8/10 correct. Those were the best scores.

A score of 7/10 has a p=0.172, indicates that in 100 such tests, we 'expect' that  'success' rate to occur ~17 times 'by chance'.

A score of 7/10 should have a probability of nchoosek(10,7) x 0.5^7 x 0.5^(10 - 7) ~ 11.72%, not 17.2%. And this is not the p-value, by the way. For a one-sided hypothesis test, you would add up the probabilities of 7/10, 8/10, etc.

Once we have 55 subjects (each of which went through 10 trials), the distribution of outcomes is approximately normal and you use the normal (or Student's t) to do hypothesis tests.

Evaluation of Sound Quality of High Resolution Audio

Reply #23
https://www2.ia-engineers.org/iciae/index.p...iewFile/160/146

Will be part of an AES paper evaluating Hi Res papers.


Evaluation of Sound Quality of High Resolution Audio
Naoto Kanetadaa,, Ryuta Yamamotob, Mitsunori Mizumach


Quote
"Duration of each stimulus was set at 120 s, and the rest of
30 s was inserted in between two successive stimuli. In total,
it took 270 s for each paired-stimuli set. There was the rest
of more than 60 s in between the sets. In general, duration
of audio stimulus is set in the range between 10 s and 30 s.
Oohashi et al. mention that the hypersonic effect is caused
with temporal delay, when we listen to sounds containing
rich high frequency components. The hypersonic effect is
considered in this experiment.


A listening test that is very likely to be inherently insensitive to small differences because of the enforced and frequent relatively long length of the stimuli.  If you are forced to listen to stimuli of a certain a priori chosen length then you have to wait for the enforced period to end to listen to the other version of the stimuli. Thus many seconds may elapse between listening to the two forms of the same stimuli, and short term memory for small audible differences is lost.

Quote
Paired comparison was carried out in sound quality
between two successive stimuli. In two-alternative
forced-choice, the subjects selected the better stimulus for
each pair


IOW this was a preference test, not a same/different test.  Another strong potential source of listener confusion and/or insensitivity

Listener training appears to be very vaguely described. Was there any effective  listener selection, qualification, or training?

http://hometheaterhifi.com pioneer-elite-n-50-network-audio-player://http://hometheaterhifi.com pioneer-...rk-audio-player



Frequency response showing > 10 dB variations (overall slope) in the audible band.  Seems to be very colored and perhaps even capable of masking differences.

Evaluation of Sound Quality of High Resolution Audio

Reply #24
Dr Reiss promises to look at both false negative and false positive possibilities. Given some of the tests he will cite, that's not a bad idea. 

Of course, the daydream believers, chronic peekers and audio fashion peddlers are desperate for anything, despite "officially" dismissing any form of controlled testing. The sphile column after Reiss publishes will be great reading for those fascinated with the disorder.
Loudspeaker manufacturer