Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Understanding ABX Test Confidence Statistics (Read 53438 times) previous topic - next topic
0 Members and 2 Guests are viewing this topic.

Understanding ABX Test Confidence Statistics

Reply #100
What is the small impairment difference between these two files? I'm not familiar with them


Oh, right I searched & found them - are these the same files that Tonly Lauck is talking about here? I believe the files have an easy timing click "click" between them that easily differentiates between them
Quote
I looked at the waveforms in Just My Imagination. They were out of time alignment by 1021 samples. I deleted 1021 samples from the start of file A2 to put the two files in alignment, at which point the two files nulled out to about -30 dB below the original music.

At this point, I tried the same "click test" technique. I tried more than a dozen times, but no matter where the start point was selected I was unable to notice any obvious difference in the clicks. So, unlike the jangling key files, PC ABX did not have a problem with click artifact invalidation of any results. However, any positive results reported with these files are invalidated by the time alignment error, so the results must be thrown out and any debate is in allocating the fault.

Understanding ABX Test Confidence Statistics

Reply #101
What is the small impairment difference between these two files? I'm not familiar with them

> avs-aix-high-resolution-audio-test-take-2

The initial post already documents complete and utter failure by Waldrep PhD, and it still does not even mention the tiny time delay in the file, which I used in the ABX above.

Also see my earlier post.
"I hear it when I see it."

Understanding ABX Test Confidence Statistics

Reply #102
The two songs are easy to tell apart, yes, so? Whenever A and B are easy to ID, they are easy to vote on, no?
More intellectual dishonesty from mzil  ArnyK's test in which he cheated, was of Flac Vs 256kbps mp3 of the following: File A: 15 Haydn_ String Quartet In D, Op_ 76, No_ 5 - Finale - Presto + cues 256kbps.mp3 SHA1: f24d8c506ae5d38fd7d3a8e7700ee8595cd5e025 File B: 15 Haydn_ String Quartet In D, Op_ 76, No_ 5 - Finale - Presto + cues.wav SHA1: 961320fa0baa1983130304bed02df943a32cfe25  I'm sure he will give you these files to run your test again but quit the dishonesty - it's unbecoming


Ah, telling me the songs *proves* you are under the impression I don't know what songs were used. You are absolutely correct on that point. How on earth would I know the difficulty level in discerning between them if I don't even know what they were in the first place? [nor if Arny passed the test or instead failed to reach any level of statistical significance, although I have just now found that thread, I think, and see his results were 8 correct/16 trials] It's impossible to be intellectually dishonest when you don't know all the facts and the background!

My point that, at least in *some* scenarios, ABX tests can be passed with TRUE listening in just one or two seconds per trial, stands.


Understanding ABX Test Confidence Statistics

Reply #104
Again, you fail - I'm asking for more robust blind tests by addressing a known problem - what exactly is it you object to about this?

You fail in presenting a shred of evidence that ABX negative results are due to false negatives.
Show one single rerun of an ABX using more robust methods (ITU etc), that yielded positives "missed" by ABX.
Strong pecuniary interests causing blindness here. Discrediting ABX is financially beneficial for Biochemically engineered boutique DACs and $50k $cam-amps.
Bad for busine$$, these ABX tests.

cheers,

AJ
Loudspeaker manufacturer

Understanding ABX Test Confidence Statistics

Reply #105
What is the small impairment difference between these two files? I'm not familiar with them

> avs-aix-high-resolution-audio-test-take-2

The initial post already documents complete and utter failure by Waldrep PhD, and it still does not even mention the tiny time delay in the file, which I used in the ABX above.

Also see my earlier post.

Yep, so you are using two files with a the time delay between them which gives you a noticeable & immediate "click" tell.
Where is your honesty about running a "normal" ABX trial in 1 second between two files that don't have structural problems like these files.
I can see now why the great emphasis was put on honesty earlier in this thread - honesty is sadly lacking in all these responses.
As I said already try ArnyK's files - I'm sure he'll give you access, run this test.

Understanding ABX Test Confidence Statistics

Reply #106
My point that, at least in *some* scenarios, ABX tests can be passed with TRUE listening in just one or two seconds per trial, stands.

This is completely dishonest - you used two different songs in your ABX test - not what the ABX is meant for

The lengths you people will go to try to win an argument just shows how dishonest you really are

Your songs:
File A: Let's Stay Together.mp3
SHA1: 8ad8471870c0082dd8e8fcbfb3408e60a3c9cf79
File B: Eleanor Rigby [Strings Only, Take 14].mp3
SHA1: ea9d76e58ba1249e41a64d11a5ae23006de8850c

Understanding ABX Test Confidence Statistics

Reply #107
Oh, right I searched & found them - are these the same files that Tonly Lauck is talking about here? I believe the files have an easy timing click "click" between them that easily differentiates between them
Quote
I looked at the waveforms in Just My Imagination. They were out of time alignment by 1021 samples. I deleted 1021 samples from the start of file A2 to put the two files in alignment, at which point the two files nulled out to about -30 dB below the original music.

At this point, I tried the same "click test" technique. I tried more than a dozen times, but no matter where the start point was selected I was unable to notice any obvious difference in the clicks. So, unlike the jangling key files, PC ABX did not have a problem with click artifact invalidation of any results. However, any positive results reported with these files are invalidated by the time alignment error, so the results must be thrown out and any debate is in allocating the fault.


It may be an additional sub-sample delay, but the bolded (by me) number shows another fail. A nulling test is done in a few seconds and should result in dithered silence (below -110 dB with Audition's spectrum analyzer) up to the cutoff frequency of the resampling filter, so silence up to about 20 kHz.

This is funny because their own screenshots show Adobe Audition with which you can resample in a few seconds without delay or volume problems. Whether is was due to ignorance or deliberately, I do not know, nor even care.
But a PhD that cannot do a null test, and Wilkinson not immediately pulling the plug after noticing the mistakes... not exactly honest. Oh and don't even get me started on the home theater geeks video..

So again, the problem with online ABX tests is honesty.
"I hear it when I see it."

Understanding ABX Test Confidence Statistics

Reply #108
This is completely dishonest - you used two different songs in your ABX test - not what the ABX is meant for

The lengths you people will go to try to win an argument just shows how dishonest you really are

Are you that desparate? 

mzil simply showed you that it is easily possible to click buttons in 1 second:
please explain how to actually listen to two samples & select the XisB or YisB button in 1 second. Arny did 3 trials - each taking 1 second.

There is no reason for your paranoia. He didn't hide the tracks, because it doesn't matter what tracks you use to show how quickly one can click buttons.

Also, you don't need to listen to two samples. Another misunderstanding about ABX.
"I hear it when I see it."

Understanding ABX Test Confidence Statistics

Reply #109
Again, you fail - I'm asking for more robust blind tests by addressing a known problem - what exactly is it you object to about this?

You fail in presenting a shred of evidence that ABX negative results are due to false negatives.
Do you deny that all tests are prone to both false positives & false negatives? Do you know the level of false negatives in the ABX tests? If you don't know this then you cannot offer any evidence to refute my suspicions - it can only be refuted by including such controls that I outlined in the blind tests.

I even suggested what controls I would use & how to administer them - not one suggestion came from your "camp" - just denials

Understanding ABX Test Confidence Statistics

Reply #110
This is completely dishonest - you used two different songs in your ABX test - not what the ABX is meant for

The lengths you people will go to try to win an argument just shows how dishonest you really are

Are you that desparate? 

mzil simply showed you that it is easily possible to do click buttons in 1 second:
please explain how to actually listen to two samples & select the XisB or YisB button in 1 second. Arny did 3 trials - each taking 1 second.

There is no reason for your paranoia. He didn't hide the tracks, because it doesn't matter what tracks you use to show how quickly one can click buttons.

Also, you don't need to listen to two samples. Another misunderstanding about ABX.

You or mzil tried to reference auditory echoic memory earlier & it's short length - now you try to maintain that you play one sample & remember the other sample from memory - it's hilarious   

Ah, the twists, the turns, the disingenuous posing - it's all very entertaining but really just shows your & mzil''s lack of integrity. We already knew about Arny's lack of integrity & now your exposure is also hilarious

I would avoid posting any ABX results, if I were you guys - you only show how dishonest you are by doing so. Stick with denials & abstract arguments - it's more plausible

Understanding ABX Test Confidence Statistics

Reply #111
This is completely dishonest - you used two different songs in your ABX test - not what the ABX is meant for  The lengths you people will go to try to win an argument just shows how dishonest you really are  Your songs: File A: Let's Stay Together.mp3 SHA1: 8ad8471870c0082dd8e8fcbfb3408e60a3c9cf79 File B: Eleanor Rigby [Strings Only, Take 14].mp3 SHA1: ea9d76e58ba1249e41a64d11a5ae23006de8850c


More like I picked two things, A and B, I instantly had on hand which I knew would sound very different from one another, and I posted what I used plainly for all to see. I just as easily could have used two identical songs which were poorly time aligned, [It can happen] or two that were different levels. Remember, I had no idea what Arny had listened to, so I had no idea what was necessary to simulate his test, only that he was being accused of having not listened based on his test logs being "impossibly fast". *That* was the only info I had at the time [but I have since learned more though].

Have you ever taken a Foobar ABX test, by the way? I know little about you.

Understanding ABX Test Confidence Statistics

Reply #112
Do you deny that all tests are prone to both false positives & false negatives?

Show your data for ABX test negative results being overturned by ABCHR et al. positives "missed".
Show your robust ITU blind test results for your Biochemically engineered DACs, since plain old ABX is too prone to false negatives according to you.
While you're at it, have Amir avail himself for ABX testing of his $50k amps that will be observed and administered by someone else, without him seeing or touching a Windows computer.
Otherwise, keep whining about ABX while daring not subjecting the wares you peddle to them.
Loudspeaker manufacturer

Understanding ABX Test Confidence Statistics

Reply #113
Ah, the twists, the turns, the disingenuous posing - it's all very entertaining but really just shows your & mzil''s lack of integrity. We already knew about Arny's lack of integrity & now your exposure is also hilarious

I would avoid posting any ABX results, if I were you guys - you only show how dishonest you are by doing so. Stick with denials & abstract arguments - it's more plausible

It is quite sad to see what cognitive dissonance and desperation does to people.

Almost everything you said is wrong. Is the flailing around a sign of you soon going back into lurking mode for another 5 years? .. and a personal question, are you related to amirm?
"I hear it when I see it."

Understanding ABX Test Confidence Statistics

Reply #114
Do you deny that all tests are prone to both false positives & false negatives?

Show your data for ABX test negative results being overturned by ABCHR et al. positives "missed".
Show your robust ITU blind test results for your Biochemically engineered DACs, since plain old ABX is too prone to false negatives according to you.
While you're at it, have Amir avail himself for ABX testing of his $50k amps that will be observed and administered by someone else, without him seeing or touching a Windows computer.
Otherwise, keep whining about ABX while daring not subjecting the wares you peddle to them.

We have ArnyK's ABX results which show a heap of trials where he didn't listen - these are false negatives.
We have Amir & my others positive ABX results which overturns 15 or so years of null results using the same jangling keys files, doh!.

As I said anybody with a whit of intelligence knows that tests can have false positives & false negatives.
If you've ever done an ABX test you don't need to be a genius to know that after so many repetitions you get tired & stop actually listening (even though you may not recognise this for a while) or you lose focus (& stop actually listening) or you have an expectation bias which prevents you from actually hearing any difference or........

The depth of your denial is astounding

I bet next I will be told that ABX test is not actually a listening test

Understanding ABX Test Confidence Statistics

Reply #115
Try to follow, Arny, it was your test, after all so you should be familiar with it.


Wrong. The Foobar2000  ABX test has significant differences from the ABX test and ABX Comparators that I invented. But lacking enough experience with and knowledge of ABX,  you don't know that, right?

Since you can't post my log, it is easy to conclude that you are misrepresenting it and don't want to admit where you got your purported data from - which appears to be where the sun shines not. Same place much of the rest of your claims seem to come from!

Pretty strange reading a lecture on lying from someone who has set himself up this way!

Understanding ABX Test Confidence Statistics

Reply #116
We have Amir & my others positive ABX results which overturns 15 or so years of null results using the same jangling keys files, doh!.
  Did Amir conveniently forget to tell you all at WTF that I also was one of the "many" to be able to differentiate file A from B in Arny's original published jangling keys test at AVS, but it had nothing to do with the hi-res files's keys jangling sound but rather an artifact, an error Arny had made in preparing the files, acting as a tell? This of course then calls into question anyone else's findings based on "audible differences heard in the jangling keys".

At AVS I wrote: "Here are my log files using Arny's old files  [the only version of his files yet posted in this thread, up to now] where I heard no IM distortion 4/5 kHz tones/noise after the training tone, at all,  even at blaring levels, just a couple of identical clicks common to both files:

foo_abx 1.3.4 report
foobar2000 v1.3.3
2014/07/26 18:30:29
File A: C:\Users\Me\Documents\Keys jangling folder\keys jangling full band 2496 test tones 1644.wav
File B: C:\Users\Me\Documents\Keys jangling folder\keys jangling full band 2496 test tones.wav
18:30:29 : Test started.
19:03:56 : 01/01 50.0%
19:05:38 : 02/02 25.0%
19:08:15 : 03/03 12.5%
19:10:27 : 04/04 6.3%
19:12:03 : 05/05 3.1%
19:16:13 : 06/06 1.6%
19:21:46 : 07/07 0.8%
19:23:08 : 08/08 0.4%
19:41:54 : 09/09 0.2%
19:45:00 : 10/10 0.1%
19:51:02 : 11/11 0.0%
19:52:12 : 12/12 0.0%
19:53:44 : 13/13 0.0%
19:55:33 : 14/14 0.0%
19:57:20 : 15/15 0.0%
20:02:51 : 16/16 0.0%
20:03:33 : Test finished.
----------
Total: 16/16 (0.0%)

Today, using his new files, I unfortunately hear a faint IM problem so I can't do that test, however I did want to point out that the data I provide above was accomplished by my keying on a secret, audible "tell", I don't think I should disclose, calling into question anybody else's published data prior to mine, using the same files, even if they truly had no IM problems in their system, just like I didn't.

No dogs, no bats, no children with >22kHz hearing, no analyzer, and no text editor used, nor was I comparing the click noises themselves; it was just me and my headphones listening intently for over an hour in suboptimal conditions [refrigerator compressor noise, distant train whistles, etc.]"

source: http://www.avsforum.com/forum/91-audio-the...ml#post26078698

Note this ABX test predates the signature verification system. I later disclosed I was listening to the noise floor of a tiny section he left vulnerable.

Understanding ABX Test Confidence Statistics

Reply #117
Do you deny that all tests are prone to both false positives & false negatives?


In theory any test can be flawed that way, even a sighted evaluation. I've definitely seen sighted evaluations produce false negatives.

Quote
Do you know the level of false negatives in the ABX tests?


I generally have a reasonable estimate of them.  However, they cannot be accurately known because of the well-known (to most of us) difficulties with reading other people's minds.

Of course Mr. Keny with your claims of perfect knowledge of other people's state of mind, you probably think you do know this to perfection.

Here's a question for you to evade Mr. Keny. Is it a false negative when the listener hears no audible difference but there is some kind of a difference?

Consider this well because in fact there is a difference between the unknowns in every listening test that has ever been done.


Understanding ABX Test Confidence Statistics

Reply #118
The depth of [...] denial is astounding

Yes, it is. It is astounding that you still do not get what a test result means.

I don't know what Arny did, nor do I really care. If someone is not interested in listening for a difference, then so be it. A null result does not prove that the null hypotheses is correct. We've all told you this several times.
It really seems like you got infected by some kind of amirm disease, who also insisted on people posting (negative) logs, as if those somehow prove inaudibility. But he put a stupid spin on his little war game: if you don't hear a difference then you didn't train enough so he wins, if you do hear a difference he wins again. Too bad I am not stupid enough for his games.

Again, for the last time, an online ABX is a tool for the participant to get statistical validation if what he thinks he heard in a sighted comparison was actually a real audible difference.

Now please stop embarrassing yourself.
"I hear it when I see it."

Understanding ABX Test Confidence Statistics

Reply #119
We have ArnyK's ABX results which show a heap of trials where he didn't listen - these are false negatives.


Actually Mr. Keny, you have been challenged to provide that particular ABX log and you have failed to do so. Therefore the above would appear to be a false claim.  You don't have my ABX results, now do you? 

You also know that if you post those results they will show the lies in your relevant post. Checkmate!

Understanding ABX Test Confidence Statistics

Reply #120
Try to follow, Arny, it was your test, after all so you should be familiar with it.


Wrong. The Foobar2000  ABX test has significant differences from the ABX test and ABX Comparators that I invented. But lacking enough experience with and knowledge of ABX,  you don't know that, right?
I was talking about that specific Foobar ABX test you did, not ABX testing in general.

Btw, here's your ABX log, I retrieved from AVS - I thought you would still have this on your PC?

Quote
The ABX log is as follows:

foo_abx 2.0 beta 4 report
foobar2000 v1.3.5
2015-01-06 21:04:53

File A: 15 Haydn_ String Quartet In D, Op_ 76, No_ 5 - Finale - Presto + cues 256kbps.mp3
SHA1: f24d8c506ae5d38fd7d3a8e7700ee8595cd5e025
File B: 15 Haydn_ String Quartet In D, Op_ 76, No_ 5 - Finale - Presto + cues.wav
SHA1: 961320fa0baa1983130304bed02df943a32cfe25

Output:
DS : Primary Sound Driver

21:04:53 : Test started.
21:05:18 : 00/01
21:05:39 : 01/02 --- 21 seconds
21:06:39 : 02/03 --- 60 seconds
21:06:45 : 03/04 --- 6 seconds
21:06:47 : 04/05 --- 2 seconds!!!
21:06:50 : 04/06 --- 3 seconds!!!
21:06:54 : 04/07 --- 4 seconds!!!
21:06:56 : 05/08 --- 2 seconds!!!
21:06:58 : 06/09 --- 2 seconds!!!
21:06:59 : 07/10 --- 1 second!!!!
21:07:01 : 07/11 --- 2 seconds!!!
21:07:04 : 07/12 --- 3 seconds!!!
21:07:05 : 08/13 --- 1 second!!!
21:07:08 : 08/14 --- 3 seconds!!!
21:07:10 : 08/15 --- 2 seconds!!!
21:07:31 : 08/16 --- 21 seconds
21:07:31 : Test finished.

----------
Total: 8/16
Probability that you were guessing: 59.8%

-- signature --
b54eb2a632d09ae60dbb1c13774d4152ee32f110


Quote
Since you can't post my log, it is easy to conclude that you are misrepresenting it and don't want to admit where you got your purported data from - which appears to be where the sun shines not. Same place much of the rest of your claims seem to come from!

Pretty strange reading a lecture on lying from someone who has set himself up this way!
I've reproduced the ABX log along with the trial timings beside each trial

Understanding ABX Test Confidence Statistics

Reply #121
We have....

You have nothing. No ITU retest showing anything missed by ABX. No retest showing $cam products and the D-K gang failing to discern differences, being any way related to ABX. The fact of the matter is that your "camp" has strong pecuniary interests in discrediting anything that exposes the $cam "business". That's the real "problem" with "ABX" and why it's a dog whistle for the shysters to use for the sheep.
You know perfectly well that there will never be any ITU tests of Biochemically engineered DACs and $50k amps, so you make that the standard and thereby dodge the ABX tests the "hobbyists" use to expose the shyster $cams.
That's what we have.
Loudspeaker manufacturer

 

Understanding ABX Test Confidence Statistics

Reply #122
foo_abx 2.0 beta 4 report
foobar2000 v1.3.5
2015-01-06 21:04:53

File A: 15 Haydn_ String Quartet In D, Op_ 76, No_ 5 - Finale - Presto + cues 256kbps.mp3
SHA1: f24d8c506ae5d38fd7d3a8e7700ee8595cd5e025
File B: 15 Haydn_ String Quartet In D, Op_ 76, No_ 5 - Finale - Presto + cues.wav
SHA1: 961320fa0baa1983130304bed02df943a32cfe25

Output:
DS : Primary Sound Driver

21:04:53 : Test started.
21:05:18 : 00/01
21:05:39 : 01/02 --- 21 seconds
21:06:39 : 02/03 --- 60 seconds
21:06:45 : 03/04 --- 6 seconds
21:06:47 : 04/05 --- 2 seconds!!!
21:06:50 : 04/06 --- 3 seconds!!!
21:06:54 : 04/07 --- 4 seconds!!!
21:06:56 : 05/08 --- 2 seconds!!!
21:06:58 : 06/09 --- 2 seconds!!!
21:06:59 : 07/10 --- 1 second!!!!
21:07:01 : 07/11 --- 2 seconds!!!
21:07:04 : 07/12 --- 3 seconds!!!
21:07:05 : 08/13 --- 1 second!!!
21:07:08 : 08/14 --- 3 seconds!!!
21:07:10 : 08/15 --- 2 seconds!!!
21:07:31 : 08/16 --- 21 seconds
21:07:31 : Test finished.

----------
Total: 8/16
Probability that you were guessing: 59.8%

Looks to me he tried 21 seconds and thought to have found a difference but didn't. He tried 60 seconds on the next try. After that he lost interest or gave up because he was not sure.
How on earth does this discredit abx testing?

Edit: doing this and posting the log to me shows indeed some honesty you can't expect from everyone.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!

Understanding ABX Test Confidence Statistics

Reply #123
Looks to me he tried 21 seconds and thought to have found a difference but didn't. He tried 60 seconds on the next try. After that he lost interest or gave up because he was not sure.
How on earth does this discredit abx testing?

Yep he gave up & stopped listening (but he never owned up to this on AVS) - the test was over at that point - probably trial 4, certainly trial 5. Yet he presented this set of results without mention that he stopped listening. Yes, honesty is needed in these tests but that's a fairly shaky foundation on which to judge the results (as we've seen on this thread)

Whose trying to discredit ABX testing? I'm saying we could do with a measure of the false negatives in the results - in Arny's trial 5 to 15, a cannon shot could have been in one sample & a humming bird in another  he wouldn't have discriminated the difference due to not listening.

Understanding ABX Test Confidence Statistics

Reply #124
Yep he gave up & stopped listening (but he never owned up to this on AVS) - the test was over at that point - probably trial 4, certainly trial 5. Yet he presented this set of results without mention that he stopped listening. Yes, honesty is needed in these tests but that's a fairly shaky foundation on which to judge the results (as we've seen on this thread)

Whose trying to discredit ABX testing? I'm saying we could do with a measure of the false negatives in the results - in Arny's trial 5 to 15, a cannon shot could have been in one sample & a humming bird in another  he wouldn't have discriminated the difference due to not listening.

False negative? I see a perfect negative abx test of Arny. I guess he is used to 16 trials so he finished it.
Is troll-adiposity coming from feederism?
With 24bit music you can listen to silence much louder!