Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Blind tests and HydrogenAudio (Read 8062 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Blind tests and HydrogenAudio

I've at last updated the Term Of Service number 8, and posted a sticky explaining what is an ABX test.

I'd like to thank, by alphabetical order, Canar, Dibrom, FF123, and Garf for their previous contribution, that I partially copied in the TOS and the sticky, and KikeG for his binomial table.

Any suggestion for improvement is welcome.

Blind tests and HydrogenAudio

Reply #1
that's a nice thread for new users here, IMO it's well explained, without getting to difficult.
however, I see one thing that does confuse me:

Quote
If the probability is in the green (<5%) or, better, the yellow zone (<1%), the test is considered as successful.
isn't normaly the color-sheme (used in pc-games, encspot, traffic signs and whatnot) from good to bad in that order: green, yellow, red.
so, the quote would indicate having a chance to guess of 5% is better than 1% 

as I said, a very usefull nice sticky, but to prevent confusion I guess the colour thing should be changed...
Nothing but a Heartache - Since I found my Baby ;)

Blind tests and HydrogenAudio

Reply #2
Thanks for your suggestion, I let KikeG see if he can update the binomial table.

By the way, I deliberately left out the "The Hydrogenaudio staff might not take action against users that post these harsh responses" part of the previous version, because I think that we can't handle anymore the "harsh responses" currently posted in many threads (I talk about no one especially, there were many people involved in harsh or unuseful answers recently).

Blind tests and HydrogenAudio

Reply #3
Most ABX tools mentioned are Windows only. Are there other options for members who use a different OS?
I'm personally interested in OS X, but curious about Linux possibilities for those users also.
If there are not accepted ABX tools available for a particular OS, what are the ramifications for those users and comments they are allowed to make on HA? Just curious.

Blind tests and HydrogenAudio

Reply #4
You can do ABX testing on any computer with Java using Schnofler's ABC/HR

http://rarewares.hydrogenaudio.org/others.html

Also, you can do it from a web browser (!) using smok3's ABX2go

http://users.volja.net/smoker/abx.htm

The java comparator uses the same randomizing routines as the other tools, so it should be OK.

Dunno about ABX2go.


Blind tests and HydrogenAudio

Reply #6
I read this page. I would have used it if I had it from the beginning, but for a reason I don't remember, I only read this text after having finished the draft of the new post, and I actually forgot that it was from the wiki (because I had gathered the documentation long before, and yesterday all I started from were text files).

I considered from the beginning if the new post should be in the Wiki or in a sticky. The best answer was the wiki. The problem was that I'm really short of time, and I have not assimilated the rules and formatting of the wiki. I thought it would had taken me one more hour. But if I had remembered about this page, I would just have copied it in a sticky instead of writing a new post.

Blind tests and HydrogenAudio

Reply #7
I updated the FAQ with the two java programs, as well as Foobar2000 and Linabx (for Linux).

I changed the link in the TOS to the Wiki rather than the Sticky.

Blind tests and HydrogenAudio

Reply #8
sticking into this thread i should state that we had a thread being in mp3 general about differences between JS and TS. as im newer to the board then others my working progress was a little aside the rules...and just step by step in that thread i got into what ABX is and so on... afterwards it came to the necessary rulechange that is mentioned here and so i decided to ask some things via PM before posting. after some conference PIO and me decided that some of the question might be posted here to be discussed and declared open.

as i agree to that:


my first question was about WEBSPACE, as not anybody has Webspace for posting its samples.

WHERE TO PUT THE SAMPLES IN THE WEB?

HA gives the fine opportunity to load them to its own server though the restriction is 30 seconds of copyrighted material. second restriciton is 9 MB as users with modem should have the chance to load that.

my very next question was MAY I UPLOAD MORE THEN 30 SECONDS, IF THE MATERIAL IS NON-COPYRIGHTED MADE MYSELF, AS I CAN CHOSE SOME SPECIAL EFFECTS OR INSTRUMENATION TO SHOW WHATS THE PROBLEM.

pio gave the answer that if it is not copyrighted in any way and theres no law restriction to my own label... for licensing or any other problematic way... i can upload more then 30 seconds still restricted to 9 MB for modem

so i came to the quality of the samples thinking about samplefaking and in the same way we came to WHICH FORMAT IS TO BE CHOSEN FOR UPLOAD THE SAMPLE.

conclusion was, it MUST be lossless anyway. as there are all presets and recommendations available its no need for posting more then the PURE UNENCODED LOSSLESS SAMPLE thats the grounding for the DOUBLE ABX TEST. everything else will be reworkable for anybody else as it MUST BE KNOWN (see recommendation list of this forum) still the question for particular formats is open. IT IS RECOMMENDED TO TAKE .FLAC as default lossless unencoded file as it is portable to LINUX and other OS ( i had a question about ape, which actually became open source, as far as i know but still is not ported at all, though FLAC is) in some cases if the sample is short enough .WAV CAN BE TAKEN TOO, though you still should think of modem users.

i think these were the questions


WHERE TO PUT THE SAMPLES?
ARE THERE RESTRICTIONS ABOUT FILESIZE AND SAMPLELENGTH?
WHICH FORMAT TO CHOOSE FOR THE UNENCODED SAMPLE FOR UPLOAD?

i hope i didnt forget anything, and if ...pio may find this post and adds it to me.


-max

Blind tests and HydrogenAudio

Reply #9
IMO, samples longer than 30 seconds aren't very useful for ABX tests. Quite the contrary, the sample should be as short as possible.
Over thinking, over analyzing separates the body from the mind.

Blind tests and HydrogenAudio

Reply #10
sooo...
any chance that the green-yellow thing could be corected?
Nothing but a Heartache - Since I found my Baby ;)

Blind tests and HydrogenAudio

Reply #11
Hello,

Am rather a recent poster to the board. 

Why don't any of the tests use two alternative forced choice testing?

Perhaps this has already been covered here.

Blind tests and HydrogenAudio

Reply #12
Quote
Why don't any of the tests use two alternative forced choice testing?

If I understand correctly what a "two alternative forced choice test" is, this is exactly what ABX programs do:

At each trial you have two choices:
1. A = X (and B = Y, depending on the program) or
2. B = X (and A = Y)

You're forced to choose between either 1. or 2.

Have I missed something?

BTW: Welcome to this place, esldude! 
Let's suppose that rain washes out a picnic. Who is feeling negative? The rain? Or YOU? What's causing the negative feeling? The rain or your reaction? - Anthony De Mello

Blind tests and HydrogenAudio

Reply #13
Quote
Hello,

Am rather a recent poster to the board. 

Why don't any of the tests use two alternative forced choice testing?

Perhaps this has already been covered here.

A 2-AFC is typically used to determine directional preference.  For example, a typical test from the food tasting industry would ask which of two beer samples is more bitter.  The analog in music samples would be to ask which sample is "better" or "preferred."

This is subtly different from asking which sample is closer to the original.

A better test would be to use the Same/Different test, where listeners are presented with two samples, asking whether the samples are the same or different.  In half the pairs, the samples are different, and in half the pairs, they're the same (the four combinations would be equally distributed (A/A, B/B, A/B, B/A).  This test is more general than the "A - Not A" test, in that A or B do not necessarily have to be the original (i.e., they can both be lossy samples).

The apparent disadvantage of this type of test is its inefficiency -- the information on possible differences is obtained by comparing responses obtained from the different pairs (A/B and B/A) with those obtained from matched pairs (A/A and B/B).  Ideally, the listener would listen to all 4 pairs to avoid bias from creeping in (repeat multiple times).  I would think that many more trials would have to be performed to achieve the same level of confidence that ABX produces.

The advantage of this test, of course, is that deciding whether A is the same as B is a simpler task than determining whether A or B sounds the same as X.

ff123

Blind tests and HydrogenAudio

Reply #14
Quote
sooo...
any chance that the green-yellow thing could be corected?

Maybe this Christmas, when I have more spare time. I'm sorry, but I don't think the colors used are something very important.

Blind tests and HydrogenAudio

Reply #15
Quote
Quote
sooo...
any chance that the green-yellow thing could be corected?

Maybe this Christmas, when I have more spare time. I'm sorry, but I don't think the colors used are something very important.

I agree that it is not utterly important, but still it might confuse everybody new to this area (as in every other aspect of daily life, the coloursheme is viseversa).
of course there's no need to jump and correct it immeadently, but I would appriciate a change in the long run.
however, after all, I'm not in the position to demand such a thing, I can only suggest.
Nothing but a Heartache - Since I found my Baby ;)

Blind tests and HydrogenAudio

Reply #16
Quote
I agree that it is not utterly important, but still it might confuse everybody new to this area (as in every other aspect of daily life, the coloursheme is viseversa).

It depends. For me it's obvious the the smaller the p-value the better, since the "big" p-values have no color. And, personally, I wouldn't extract any conclusion basing just on the colors used.

Blind tests and HydrogenAudio

Reply #17
Quote
For me it's obvious the the smaller the p-value the better

right, that's obious.

Quote
And, personally, I wouldn't extract any conclusion basing just on the colors used.
and this is where you might be wrong, IMHO. if you see some trafficlight, which colour represents the right to cross? red?
this whole sheme of the colours green, yellow, red and them representing abstract values like good ar bad (or go, don't go, in one piece, broken etc) is a natural, unconcious thing in our society.
it is not just colours. it is what the colours mean, and they mean the exact opposite of the p value (in some cases): 5% (=not as good) green (=good), 1% (=good) yellow (=not as good).
that would be like a sentece: I like her very much, but I can't stand her...

so, while this is a maybe a little far fetched, the basic unconguent message is still there, and thus it's confusing.
I agree that everybody who takes a closer look to that will have no problems, but why make this potential confusion possible in the first place?
Nothing but a Heartache - Since I found my Baby ;)

Blind tests and HydrogenAudio

Reply #18
ABX is not 2AFC.

It is as ff123 describes it.

You have two choices that are different in some way.  You pick one.  The question is whether or not the difference is perceptible.  If you score 75% or better, then the difference is perceptible.  This 75% value scales over differing sample sizes.  It has some validity with 16 samples.  And of course is statistically quite sound with 30 or more samples. 

According to researchers in psychoacoustics, 2AFC is more discriminating, more sensitive than ABX or other methods.  Subjectively to test subjects, it is usually simpler and easier.  So extra sampling isn't that much of a big deal.

With some of the codecs getting pretty good at 128 kbps, I thought maybe the extra sensitivity of 2AFC would be useful.

Does anyone know of readily available software that uses a 2AFC methodology?

Blind tests and HydrogenAudio

Reply #19
Quote
If you score 75% or better, then the difference is perceptible. This 75% value scales over differing sample sizes.  It has some validity with 16 samples.  And of course is statistically quite sound with 30 or more samples.

75 % has not much validity at all. this result may give an indication, but I would definatly not say it's statistically valid with 30 trials.

Quote
According to researchers in psychoacoustics, 2AFC is more discriminating, more sensitive than ABX or other methods
can you point me to any links that back this up?
Nothing but a Heartache - Since I found my Baby ;)

Blind tests and HydrogenAudio

Reply #20
"Fundamentals of Hearing" by William Yost
"Psychology of hearing" by Brian C. J. Moore

The levels of significance for 2afc are different.  A random result will yield 50% so the scale is 50-100% not 0-100%.  And 75% is much more significant than 75% in ABX would be.

Blind tests and HydrogenAudio

Reply #21
Quote
Quote

If you score 75% or better, then the difference is perceptible. This 75% value scales over differing sample sizes. It has some validity with 16 samples. And of course is statistically quite sound with 30 or more samples.


75 % has not much validity at all. this result may give an indication, but I would definatly not say it's statistically valid with 30 trials.

75% at 16 trials: pval = 3.8%
75% at 30 trials (22 correct): pval = 0.08%
So, yes, 75% at 30 trials is a highly significant result.

Blind tests and HydrogenAudio

Reply #22
Quote
"Fundamentals of Hearing" by William Yost
"Psychology of hearing" by Brian C. J. Moore

The levels of significance for 2afc are different.  A random result will yield 50% so the scale is 50-100% not 0-100%.  And 75% is much more significant than 75% in ABX would be.

thx. if I have some spare time, I'm gonna do some basic reading.
Nothing but a Heartache - Since I found my Baby ;)

Blind tests and HydrogenAudio

Reply #23
Quote
According to researchers in psychoacoustics, 2AFC is more discriminating, more sensitive than ABX or other methods.  Subjectively to test subjects, it is usually simpler and easier.  So extra sampling isn't that much of a big deal.

With some of the codecs getting pretty good at 128 kbps, I thought maybe the extra sensitivity of 2AFC would be useful.

Does anyone know of readily available software that uses a 2AFC methodology?

Yes, I've read that too.  But again, 2-AFC is usually about directional preference of a single characteristic.  So 2-AFC might be more sensitive than ABX (or actually ABC/HR, which is the inverse of ABX) if a question such as: "Which of the following two samples are louder?" is asked.

ABX and ABC/HR are more robust in that a multiple or complex characteristics can be tested for.

The Same/Difference test I described would be ok for testing large groups of people for one trial (instead of multiple trials for one person), but I think listener fatigue would be a real problem for one-person testing.  ABX is already quite fatiguing as it is, and it's more efficient than Same/Different testing.

But as far as I know, there isn't a piece of software readily available which implements 2-AFC or Same/Difference for audio signals.

ff123