HydrogenAudio

CD-R and Audio Hardware => Audio Hardware => Topic started by: Pio2001 on 2003-02-18 21:57:45

Title: Blind test challenge
Post by: Pio2001 on 2003-02-18 21:57:45
Can anyone ABX any of these samples vs the original (that is resampled and leveled, but in the digital domain, all others are analog copies) ?

Please, tell me I'm either tired or deaf.

KikeG, there are soundcards analog recordings, you might be interested.

Right click to download original.flac (http://pageperso.aol.fr/lyonpio2001/samples/original.flac)
Right click to download 2.flac (http://pageperso.aol.fr/lyonpio2001/samples/2.flac)
Right click to download 3.flac (http://pageperso.aol.fr/lyonpio2001/samples/3.flac)
Right click to download 4.flac (http://pageperso.aol.fr/lyonpio2001/samples/4.flac)
Right click to download 5.flac (http://pageperso.aol.fr/lyonpio2001/samples/5.flac)

Less than 900 kB each
Title: Blind test challenge
Post by: KikeG on 2003-02-19 08:12:18
Nice, I'll give a try when I have some time.
Title: Blind test challenge
Post by: Pio2001 on 2003-02-19 11:56:34
I was unable to ABX any of these samples with the original, though I can hear the SB64 one. Isn't it strange ? I can hear the coloration added by the SB64, and a spectral analysis showed exactly the defects I hear in the frequency, but I got only 4/8 on ABX.

I think that ABXing the sound of such samples, that show continuous difference in the sound itself, rather than localized artifacts audible at a given place, is more difficult. The difference I hear seems to vanish after listening 4 or 5 times to the samples.

I'll try slower, maybe one session per day, to see if there is an improvement in my scores. It is possible that some differences in the sound are very difficult to get right in an ABX test with short samples played over and over.

I still can use another sample for the test (audiophiles would maybe prefer a Rebecca Pidgeon recording), and / or resample and level all analog recording to 44.1 kHz so as to have a bit perfect original to compare with various distorded files.
The later would be interesting if tested on audiophile CD players, and if the ABX is impossible (would prove that some device don't add distortion to the sound), while the current setup is better at comparing the tested devices between them (analog samples are closer to the analog source).
Title: Blind test challenge
Post by: Patsoe on 2003-02-19 12:55:24
I'm not really understanding what this is about. But since you sent me to this thread, I guess its about using the analog out of your soundcard with different cables?
Should we listen to the samples before knowing what they are made of? I mean, if I put them in abx-software it will still be blind even if you tell us what you did to the samples, right?

Anyway, I'm not very familiar with ABX yet, so I'll first practice with some example files before trying these, probably hardly distinguishable recordings. I'll report back in a few days, then.

Quote
...while the current setup is better at comparing the tested devices between them (analog samples are closer to the analog source).


I really don't get this. Could you explain some more?
Title: Blind test challenge
Post by: KikeG on 2003-02-19 16:04:58
Quote
I was unable to ABX any of these samples with the original, though I can hear the SB64 one. Isn't it strange ? I can hear the coloration added by the SB64, and a spectral analysis showed exactly the defects I hear in the frequency, but I got only 4/8 on ABX.

I think that ABXing the sound of such samples, that show continuous difference in the sound itself, rather than localized artifacts audible at a given place, is more difficult. The difference I hear seems to vanish after listening 4 or 5 times to the samples.

If this is what is happening to you (you get good scores at first, and bad at last, differences vanish as the test goes on) you need to take the test in a more relaxed manner, without stressing too much, taking your time after every trial. Sometimes listening tireness causes this effect. On the other side, concentrating a lot can help overcoming tireness... I guess the best procedure dependes on everyone.

Quote
I'll try slower, maybe one session per day, to see if there is an improvement in my scores.

That could be a posible way of avoiding tireness.

Quote
It is possible that some differences in the sound are very difficult to get right in an ABX test with short samples played over and over.

Mmm... the important thing is if you can tell them appart reliabily in a blind manner. "Usual" ABX is only a standard procedure to achieve this, but the precise method can be "flexibilized" to suit your preferences, as long as blindness and statistical validity is assured.

Quote
The later would be interesting if tested on audiophile CD players, and if the ABX is impossible (would prove that some device don't add distortion to the sound)...


All analog input/output devices add distortion, the thing is how big is this distortion and if it's audible or not. This depends also on the type of music used, I'd say noisy and already distorted music is worse to detect further distortion.
Title: Blind test challenge
Post by: Pio2001 on 2003-02-19 23:22:24
Quote
I'm not really understanding what this is about. But since you sent me to this thread, I guess its about using the analog out of your soundcard with different cables?


I sent you there because among these samples, there is a recording of the analog output of the Marian Marc 2 soundcard, about which you asked. I can tell you in private message which one it is. This way, listening to it, you may see if there is an obvious problem with the sound for you.
The other samples are, and the subject shows, a recording of the SoundBlaster 64V PCI output, and two other ones, that are not recorded from soundcards.

Quote
Should we listen to the samples before knowing what they are made of? I mean, if I put them in abx-software it will still be blind even if you tell us what you did to the samples, right?


Yes, it would still be blind. But knowing what they are, you may try to find some expected sound characteristics in the samples. It's better to let people give a unbiased opinion rather than to ask them which sample sounds rather like this or like that, especially if we want to test, blindly, if a device really has the sound we expect it to have.
Example : let's say I hid a vinyl recording among them. If you know it, you are going to search for clicks and noise, maybe mistake a CD click or MP3 noise for them, and tell that a CD or MP3 file "clicks like a vinyl", while otherwise, you would maybe not have thought about it this way.

Quote
I'll report back in a few days, then.


Thanks in advance for anyone willing to participate, but no one is forced to do so, if you have not got the time, but are interested, you can ask me in a private message which sample is what (but I'll ask you not to reveal them until I do it myself).

Quote
Quote
...while the current setup is better at comparing the tested devices between them (analog samples are closer to the analog source).


I really don't get this. Could you explain some more?


I've got a problem : the external ADC I use for recording analog sources is a Sony DTC 55ES, and it doesn't support 44100 Hz sample rate. Only 32 and 48 kHz, as most early consumer DAT decks. However, the original is 44.1 kHz, since it is a CD.
So to compare original vs copies, we end up comparing 44.1 kHz and 48 kHz. This is not good, since soundcards can behave differently at different sample rates (think about resampling).

There are two solutions :
1) Resample the 48 kHz recordings to 44.1 kHz
2) Resample the original to 48 kHz.

I did the second choice, so as to provide test samples as close as possible to the analog source. This way, comparing samples between them, the differences in quality are not attenuated by a common resampling process.
On the other hand, since some samples can be difficult to ABX, the first option would have allowed to compare a soundcard recording, for example, to the unmodified original. In the case a difference is audible, it will be difficult to know if it is the soundcard's fault or the resampling to 44.1 kHz fault, but imagine that someone uses a listening system superior to the devices tested, eg a high end audiophile CD Player, reading a CDR with the samples burned on it. If he doesn't hear the difference between the bit exact original and the analog resampled copy, it would mean the the soundcard tested  (for example) is really very good.

I can upload 44.1 kHz versions if someone wants to burn the files on CDR and perform the tests on a high end hifi system.
Title: Blind test challenge
Post by: Patsoe on 2003-02-20 09:40:08
Thanks for the thorough explanation. I'm very interested in doing this as an abx comparison, so I won't ask for the answers just now.

Just a last question though: do you think I should be able to detect a difference, since I have no equipment like a Marian Marc (not yet, that is )? I'm using a SB128 with Academy monitoring headphones. I could borrow a SB Live from my room mate and install it with kX drivers, but I guess that wouldn't help too much...?
Title: Blind test challenge
Post by: Pio2001 on 2003-02-20 12:02:38
You may detect the SB64 sample. The other ones will be very difficult. It would be better to perform the test on a CD player, if you've got a good one, but in this case, I should upload a 44.1 kHz version.

EDIT : I wonder if an MP3 compatible standalone DVD Player can play wav files as well (48 kHz).
Title: Blind test challenge
Post by: Patsoe on 2003-02-20 12:14:14
Quote
You may detect the SB64 sample. The other ones will be very difficult. It would be better to perform the test on a CD player, if you've got a good one...

I don't have a very high-end unit. It's a Denon, type DCD-6..something (screaming front saying 20bit labda dac or so). Sorry, I would have enjoyed participating. Perhaps I'll just go out and get the Marian card
Title: Blind test challenge
Post by: KikeG on 2003-02-20 14:32:02
I've been able to ABX one of the samples conclusively.

I have to admit I did a little trick to save time and effort. I did a quick and not accurate comparative spectral analysis in order to know which is the file that seemed to have worse frequency response, and found one with quite many attenuation of frequencies above 20 KHz, so I assumed this was the worst one, and the one I tried to ABX.

The computer I used for the test has a not very good SoundMax built-in soundcard, and I did tests in a quite noisy environment, with headphones connected to the soundcard.

First I tried with my somehow boomy Sony MDR-7506, and seemed to hear some diferences, but I was unable to get any good ABX scores. Then I tried plugging my old Seenheiser HD560, that gave a quite more detailed picture, and started hearing some subtle but clear differences.

On two different parts of the song, I got 7/7 (p<0.8) and 14/17 (p<0.6%). I didn't try the other samples.

Pio has more details, that I will post when he thinks it's ok to do so.
Title: Blind test challenge
Post by: KikeG on 2003-02-20 15:23:17
More:

I guess I couldn't abx the file with the Sony headphones also because I was looking for a treble reduction, or something similar. Now that I know where the difference lies (I've also performed a detailed frequency response analysis), I've been able to ABX without much difficulty the file too with the Sony headphones (10/11 p<0.6%, 12/14 p<0.6%).
Title: Blind test challenge
Post by: Continuum on 2003-02-21 19:19:04
I have only tried original><file 5 yet. I was quiet confident that I heard a difference in the first 4 trials (4/4) but then the impression vanished and the results dropped.
Title: Blind test challenge
Post by: Bedeox on 2003-02-21 20:22:16
I got 14/15 result after half hour meditation.
Without, I had only 7/15... weird, isn't it?

<edit>
I'm using headphones too...

Of course it might have been a lucky shot... I'll try to do it again.

Heh. Forgot to mention I was testing against file 5.
</edit>
Title: Blind test challenge
Post by: voltron on 2003-02-22 01:35:05
Here are my ABX results. I never want to hear those samples ever again! 

(http://home.attbi.com/~voltron/abx/o2.PNG)
(http://home.attbi.com/~voltron/abx/o3.PNG)
(http://home.attbi.com/~voltron/abx/o4.PNG)
(http://home.attbi.com/~voltron/abx/o5.PNG)

I am not sure if I even did this test correctly, but my best guess is that the SB64 is sample 2. Can someone message me the correct answer? Thank you.

System specs: Turtlebeach SantaCruz with Sony MDR-v250 Headphones.

voltron
Title: Blind test challenge
Post by: Pio2001 on 2003-02-22 02:40:05
Thank you people !

Well, the results are not going in the direction I expected ! You have better ears than me. This test would have been conclusive if some of the samples were not ABXable at all, but maybe I'll have to ask for two more ABX (between two samples each, that I yet have to record) to draw a conclusion.

If you've got extra time, you can already try to ABX 3 vs 5. According to what is usually said by the hardcore ABXers in HA, this should be impossible.
Title: Blind test challenge
Post by: Continuum on 2003-02-22 07:44:55
Quote
Here are my ABX results. I never want to hear those samples ever again! 

Uhm.. wouldn't it be easier to use the text written to the log-file?
Title: Blind test challenge
Post by: KikeG on 2003-02-23 22:02:13
Voltron, in order to ABX succesfully a sample you must get a minimum probability of guessing of < 5%, and <1% is better to make totally sure it's not by chance. So, the only sample you could say to have abx'ed is 4.wav.
Title: Blind test challenge
Post by: KikeG on 2003-02-23 22:28:35
By the way, the sample I abxed was 4.wav.

Bedeox, 14/15 is below 0.1% of guessing, I doubt you did it by chance.
Title: Blind test challenge
Post by: Bedeox on 2003-02-24 10:17:35
It might be chance as well (this means, 0.1% hit, very rare, but possible)
I did the test again, and it IS different. (6/7) (5.wav)

<edit>
Forgot to say that I can't ABX 3.wav vs 5.wav.
</edit>
Title: Blind test challenge
Post by: KikeG on 2003-02-25 08:48:12
I ABXed yesterday 5.wav (17/22  p= 0.8%). At first I couldn't really notice any differences, just my imagination, so I didn't get any good scores. Then, I compared spectrums of the files, and saw some differences. After that, I repeated the test focusing on the theorical differences, and started hearing them.
Title: Blind test challenge
Post by: Pio2001 on 2003-02-27 21:40:01
It's going to be time to tell the results, and propose further samples (the current results are little informative, due to the limited choice I made).
Patsoe, must we wait for you, or do you give up ?
Title: Blind test challenge
Post by: Patsoe on 2003-02-28 07:54:59
Quote
It's going to be time to tell the results, and propose further samples (the current results are little informative, due to the limited choice I made).
Patsoe, must we wait for you, or do you give up ?

I give up I guess. I couldn't get hold of a decent soundcard this week. Thanks for waiting 
Title: Blind test challenge
Post by: KikeG on 2003-02-28 08:54:35
Quote
I couldn't get hold of a decent soundcard this week. Thanks for waiting 

I don't think you need a very good card to detect some of the differences found here. I did it with a mediocre one, I'd say it could be more important what speakers or headphones you use.
Title: Blind test challenge
Post by: Garf on 2003-02-28 09:53:55
Quote
Bedeox, 14/15 is below 0.1% of guessing, I doubt you did it by chance.

It's 7/15 + 14/15 what makes 21/30, and I don't know how singificant that is.

If you do multiple tries, add them up, or the probabilities become meaningless.
Title: Blind test challenge
Post by: Moneo on 2003-02-28 10:23:04
Quote
It's 7/15 + 14/15 what makes 21/30, and I don't know how singificant that is.

About 1,2%.
Title: Blind test challenge
Post by: KikeG on 2003-02-28 10:37:19
Quote
Quote
It's 7/15 + 14/15 what makes 21/30, and I don't know how singificant that is.

About 1,2%.

21/30 is more like 2.1%.

But, if you add the other results, (6/7), then you get 27/37, which is 0.4 %.

BTW, at http://www.kikeg.arrakis.es/winabx (http://www.kikeg.arrakis.es/winabx) you can download a cute coloured Excel table with all binomial distribution p-values up to 100/100.
Title: Blind test challenge
Post by: Garf on 2003-02-28 12:36:45
I could not ABX any of the clips in a casual session.

Question to KikeG: if you add up *all* your attempts, is it still significant too? I ask because you stated things like

Quote
'First I tried with my somehow boomy Sony MDR-7506, and seemed to hear some diferences, but I was unable to get any good ABX scores.'


Quote
At first I couldn't really notice any differences, just my imagination, so I didn't get any good scores.


But you ignore those results.

This isn't nitpicking - I abx'ed 2.wav 12/13 but if I add up all my attempts it's not significant.
Title: Blind test challenge
Post by: Continuum on 2003-02-28 15:19:50
The problem is: the probability of reaching a 95%-confidence with (e.g.) 100 trials is far more than 5%, it's 20%!

So the overall test-length (like 16) has to be fixed in a certain way a priori.

If a listener produces divergent results at different test-sessions (like 14/16, 5/12), I tend to think, that he could hear a difference in one case and none in the other. Our hearing precision changes. Whatever that means... 

http://www.hydrogenaudio.org/forums/index....t=ST&f=1&t=3175 (http://www.hydrogenaudio.org/forums/index.php?act=ST&f=1&t=3175)
Title: Blind test challenge
Post by: KikeG on 2003-02-28 15:30:36
Quote
But you ignore those results.

From a totally strict point of view, yes, I guess I should account all trials to account for total statistical probability, without possible dispute.

However, I think one can follow a more flexible way of doing things: one could consider the first rounds as "warm-up" or search for something to latch on, discard them once you latch on something audible, and consider as valid rounds the ones where you have latched on something, trying to get a significantly low p (<1%). It's also quite significant that in this previous "warm-up" rounds one gets bad scores, but in the "latched" ones one gets good scores, and I think this means something.

I think that these first rounds can't be counted has having same significance as the final "latched" ones. I don't know how statistically valid is this approach, but IIRC ff123 used to do something similar.

All this is more valid if you haven't done lots of unsuccesful trials before the "latched" ones. Even if you have, a few rounds of succesful rounds once you are "latched" are proof of audible difference for me.

Anyway, my total results are 47/61 (p<0.1%) for 4.wav, and 21/37 (p=25.6%) for 5.wav. For both I did  several short warm-up-search-for-something rounds, that I didn't consider very significant. For the first, at last I did several succesful rounds that raised up the global score, but for the last I did just one, the last one. I could try more succesful rounds of 5.wav, but I'm quite confident the results are significant, and I don't have much time to keep on it. However, if anyone wants absolute proof, I'll try again.
Title: Blind test challenge
Post by: Pio2001 on 2003-02-28 19:19:20
No need for additional rounds. I think that we can consider the results of different rounds separately.

There are times when I'm tired and can't concentrate... but usually, I keep on anyway, answer eagerly... and get all wrong.
But after a good rest, concentrating properly, doing pauses, and patiently waiting to be sure to hear something before answering, I can get a perfect result on the same samples.
In this case, I discard the first round, and only take the second one into account.

From statistic point of view, the total score must be taken into account to give an accurate result about by general ability to hear a difference, but the result is meaningless because it's just a fixed number, while my hearing varies, as I listen carefully or not.

The separate results of each round tells my ability to hear the difference 1-when I'm not listening carefully, and 2-when I'm listening carefully. The general result, taking all rounds into account, is somehow a compromise between the two.
Title: Blind test challenge
Post by: Garf on 2003-03-01 16:12:01
I think the concept of 'warm-up' rounds and 'serious' rounds as discussed here has a problem if they are not fixed a priori.

How do you determine when you are done 'warming-up' and ready to go for the real test? You look at the ABX scores you are getting. But the ABX score is also what determines whether we get a significant result or not. So whether or not something 'counts' depends on how well it proves what we're trying to prove. That's not sound.

Even more problematic is the concept of 'bad tests' (when you were tired they don't count). How do you determine that it's a bad test? You see that you're not getting good ABX scores. Oops.

I think neither of this is statistically sound. I'd like to point that for the MAD Challenge ff123 doesn't allow it either. It may be passable for less serious tests, but I'm starting to be more strict on myself after noticing how easy it is to pass a test 'by accident' if you're not careful.
Title: Blind test challenge
Post by: Continuum on 2003-03-01 18:13:38
Due to the following passus,
Quote
My recommendation is that the moment you achieve 95% confidence, you should stop and claim victory.
the MAD-challenge might be a bad example: Everyone could pass it, theoretically, if he has enough time at his hands (because of the reasons linked above).

But I agree: the point when the "serious" test starts should be well-defined a priori. The best way would be to start the abx-machinery only when you believe to hear a difference.
Title: Blind test challenge
Post by: Pio2001 on 2003-03-01 18:21:21
It's a question of time for me. I can very well define when an ABX is serious a priori, but this can happen no more than once a week.
When I need to test, I run the ABX whether I'm tired or not, and see the results. All the best if they match.
Title: Blind test challenge
Post by: ff123 on 2003-03-01 23:38:43
Quote
Due to the following passus,
Quote
My recommendation is that the moment you achieve 95% confidence, you should stop and claim victory.
the MAD-challenge might be a bad example: Everyone could pass it, theoretically, if he has enough time at his hands (because of the reasons linked above).

But I agree: the point when the "serious" test starts should be well-defined a priori. The best way would be to start the abx-machinery only when you believe to hear a difference.

I never went back to change the rules in the MAD challenge after we went through the whole exercise of figuring out the best way to do ABX "profiles."  Fixed profiles would eliminate the bias inherent in being able to see the ABX scores as the test is performed (so you can stop whenever it's to your advantage to do so).

For that matter, I never incorporated the ABX profile concept into ABC/hr.  So much to do and so much laziness preventing me from actually doing it

ff123
Title: Blind test challenge
Post by: Garf on 2003-03-02 22:15:46
Quote
Due to the following passus,
Quote
My recommendation is that the moment you achieve 95% confidence, you should stop and claim victory.
the MAD-challenge might be a bad example: Everyone could pass it, theoretically, if he has enough time at his hands (because of the reasons linked above).

Please explain, I don't see how having a lot of time allows you to pass the MAD Challenge.
Title: Blind test challenge
Post by: Pio2001 on 2003-03-02 22:20:25
Here are the present results (Voltron, your results have disappeared, could you recover them please ? They were close to success)

The common setup for recording is the analog input of the Sony DTC55ES DAT deck, sample rate=48 kHz (it only supports 32k and 48k). Optical output, Fostex optical to coaxial SPDIF converter, Marian Marc 2 coaxial digital input, clock set to digital input, recording in SoundForge 4.5, 48 kHz 16 bits stereo. The Marian digital Recording have been checked to be error free with CD Playback.
The leveling is done selecting exactly the same range (with maybe two or three sample of difference along  30 seconds of selection) in the original and the copy, and asking the statistics. Then a level correction is applied (typically 1.2 db), with two digits accuracy. No dither, no floating point process (SF 4.5 has a 16 bits engine, I've been told).

File 1 : same as File2, with a cheap (8 €) 5 meters CINCH extention [2] in addition to the cable [1] used for File2, leveled.
File 2 : Winamp 2.81, WinXP, WaveOut, Marian Marc 2 analog output, max volume, two meters cheap TRS to CINCH adapter with a loose contact [1]. Oddly, the Marian clock was slaved to the 48 kHz input, but the Winamp 44.1 kHz playback went flawless. I don't know if it can set the output to 44.1 while the input s 48kHz. Leveled
File 3 : same as File 5, with the 5 meters cheap cable [2] in addition. Leveled
File 4 : Winamp 2.81, WinXP, WaveOut, SoundBlaster 64 PCI-V, max volume, cable number [1] (see above). Leveled
File 5 : Yamaha CDX860 CD Player from 1991 (450 € at this time). Show no errors on pressed or burned CD in the SPDIF output. Custom RG179bu CINCH cable.

Original : CD ripped in secure mode, resampled to 48 kHz, leveled equal to File 5

KikeG : listening on computer built in SoundMax soundcard+ Senheiser HD560 and Sony MDR-7506
Pio2001 : external Sony DTC55ES as converter, Arcam DivaA85 Ampli, Senheiser HD-600 headphones
Voltron : Turtlebeach SantaCruz with Sony MDR-v250 Headphones
Bedeox : headphones


ABX results :

File 1 (Marian with 7 meters cable) :
Pio2001 : Failure
Garf : Failure
Voltron : Failure ?

File 2 (Marian with 2 meters cable) :
Garf : Failures, then 12/13
Pio2001 : Failure
Voltron : Failure ?

File 3 (Yamaha CD Player with 6 meters cable)
Pio2001 : Failure
Garf : Failure
Voltron : Failure ?

File 4 (Soundblaster 64)
Pio2001 : Failure
Garf : Failure
KikeG : Failure then 7/7, 14/17, 10/11, 12/14, total 47/61
Voltron : Success

File 5 (Yamaha CD Player with custom CINCH cables)
Pio2001 : Failure
Garf : Failure
KikeG : Failures, then Success. Total 21/37
Continuum : Failure
Bedeox : 7/15 then 14/15 then 6/7
Voltron : Failure ?

File 3 vs File 5 (Addition of 5 meters of cheap cable on the Yamaha Player)
Bedeox : Failure

In conclusion, this test brings more questions than answers.

The failures tend to show that an SB64 soundcard can sound very close to the original, not to mention the CD player, and that 5 meters of cheap CINCH cables have no effect on the sound (not to mention one meter only).
However, it can be objected that the listening sessions were done on computer soundcards (exept Pio2001, and maybe Continuum, Garf and Bedeox), and with headphones (exept maybe Continuum and Garf). While audiophile CD Players and audiophile CINCH cables are supposed to improve sensitive high end speaker systems.

The success on File 5 would show that 450 € is not enough for a CD Player (at least back in 1991) in order to get a perfect sound, and that audiophile CD players in the 1000 € range are worth the price.
But before jumping to this interesting conclusion, some obvious flaws must be eliminated.
The difference between the reference file and the number 5 can also come from
-The two processes (resampling and leveling) through which the reference file went, at 16 bits processing.
-The quality of the Sony DTC 55ES recording

The first problem can be tested passing the same sample in two opposite leveling/resampling processes. One leveling should be done between the two resampling processes, so as to avoid getting conjugate process for up/downsampling. It should be 44.1->48, level -1.5 db, 48->44.1, level +1.5 db.
If the result is not ABX able, the processes are ruled out as source of audible differences.
The second problem is more difficult to test. I could record the same as File 5, but with the Marian Analog input instead of the Sony. If both recordings sound the same (no ABX possible), and are both ABXable from the reference one, it is likely that the difference come from the CD Player, and not the recording device.

Bedeox and KikeG, you abxed File 5. Are you interested in going on ? Or anyone else. I can provide the mentionned samples, along with a new reference one, if you want. This time, instead of Depeche Mode, that I chose casually, I would rather use Rebecca Pidgeon : an audiophile recording from Chesky records (but you would need to ABX File 5 again)

At the end, the possible proof that audiophile CD Players are worth, but in 1991, who knows if nowadays CD players are better... it is said so.
If one of you has a recent hifi CD player worth at least 300 € and a good ADC, it would be better to use them.

For now, one thing is sure, if a cheap line cable has any effect on the sound, it is very, very litttle. The RMS level loss is 0.00 +/- 0.01 db for 5 meters.

Sorry for not providing more informative results. When I started this, test, I hoped than no one could ABX the CD Player, even on high end systems, but I see that it is not the case.
Title: Blind test challenge
Post by: Garf on 2003-03-02 22:26:40
I used SB128 into HD580's for this test.

The 12/13 ABX was gotten by randomly hitting the keys while not wearing the headphones (Case is my witness on IRC). It's got <0.2% significance, something to ponder about. (Main reason why I argued the significance of the other tests as well )
Title: Blind test challenge
Post by: Pio2001 on 2003-03-03 00:31:29
Did you really hit randomly the keys ? No only A, only B, or AB scheme ? I've noticed in ABX comparator, that the same sample may be played several times in a row (though I didn't compute the probability it could happen).
Are the ABX programs using trustful random generators (our programming teacher told us "Never use the built in random generator ! Always use the one in the math library...")
Title: Blind test challenge
Post by: ff123 on 2003-03-03 03:55:13
Quote
Did you really hit randomly the keys ? No only A, only B, or AB scheme ? I've noticed in ABX comparator, that the same sample may be played several times in a row (though I didn't compute the probability it could happen).
Are the ABX programs using trustful random generators (our programming teacher told us "Never use the built in random generator ! Always use the one in the math library...")

abchr, at least, no longer uses rand().  It uses the "Mersenne Twister" Garf found:

http://www.math.keio.ac.jp/~matumoto/ver980409.html (http://www.math.keio.ac.jp/~matumoto/ver980409.html)

Hans Heijden had found one sequence which showed moderate evidence against randomness on a runs test, which prompted me to change the random function.  However, all of the other runs I tried myself passed for randomness, so I'm not sure any change was really necessary.

ff123
Title: Blind test challenge
Post by: KikeG on 2003-03-03 11:21:59
Quote
abchr, at least, no longer uses rand().  It uses the "Mersenne Twister" Garf found:

Funny, I have been using that same random number generator for some of my internal utilities from some time, but right now I don't remember for sure if I WinABX uses that one or the built-in rand() function of BC++ Builder, I think it uses the later (I don't have access to the code right now). But it uses it in the "proper" way, the one you use too ( not rand()%n ), so this shouldn't be a problem.
Title: Blind test challenge
Post by: KikeG on 2003-03-03 12:07:44
Quote
The difference between the reference file and the number 5 can also come from
-The two processes (resampling and leveling) through which the reference file went, at 16 bits processing.
-The quality of the Sony DTC 55ES recording
...
At the end, the possible proof that audiophile CD Players are worth, but in 1991, who knows if nowadays CD players are better... it is said so.

I suspect that differences heard are more due to the Sony DAT recorder. Let me explain why:

Analyzing 5.wav against the original using a FFT analizer, seems that the 5.wav file has some strange frequency and phase response behaviour, that I think can be due to slight speed-up and slow down of the recording, similar to wow and flutter of analog recorders. To check this you could repeat the procedure but with a single 1 KHz tone signal.

In this 5.wav clip, what I heard is some slight emphasis of highs at the beginning of the song, that are really easy to hear having "fresh" ears.

By the way, I just ABX'ed it again, in just in a single round: 16/20 p=0.6%. I quickly (1 minute) got 7/7 (p=0.8%) at the beginning, but I wanted "absolute" proof and kept on, I guess my ears got a little bit tired or stressed, and then failed some trials, up to the final score.

Global score is 37/57, p=1.7%.
Title: Blind test challenge
Post by: KikeG on 2003-03-03 13:33:48
Looking a bit more into the FFT analyses, seems that the difference is only of speedup in case of 5.wav.

Also, I ABX'ed 2.wav too, in a single round 25/35 p=0.8%. This time the difference seems to be the opposite: 2.wav sounds a little bit duller, which is confirmed by FFT analyses, it seems to be a little bit slown down in comparison with the original.

This is a little bit strange, a detailed objective analysis (measurements) should be used to see what is happening.
Title: Blind test challenge
Post by: Continuum on 2003-03-03 15:16:46
Quote
Please explain, I don't see how having a lot of time allows you to pass the MAD Challenge.

The conventional p-value calculation uses the fact that the number of trials is fixed a priori.
E.g. You decide to perform 8 trials. Then you achieve a score of 7 correct trials.
The p-val then is the probability to get 7 or 8 trials correct.

Now consider a different situation: Instead of fixing the number of trials, you decide on a certain confidence level (calculation based on the current trial in the same way as above) you want to reach.
E.g. You want to reach 95%-confidence (in the classical sense) and stop as soon as this condition is satisfied. Now the following are your win-conditions:
5/5, 7/8, 9/11, 10/13, 12/16, 13/18, ...
So, the probability to pass this is test by guessing is not only 0.05 but something like:
P(5/5) + P(7/8 and not 5/5) + P(9/11 and neiter 5/5 nor 7/8) + ...
which tends to 1  .

If you are interested in more information, check the Statistics for Abx (http://www.hydrogenaudio.org/forums/index.php?act=ST&f=1&t=3175)-thread. There are some experimental results and calculations and a proposed compromise between the free-length test and sufficiently-significant-while-not- to-hard tests.
Title: Blind test challenge
Post by: NumLOCK on 2003-03-03 16:42:28
Quote
Quote
Did you really hit randomly the keys ? No only A, only B, or AB scheme ? I've noticed in ABX comparator, that the same sample may be played several times in a row (though I didn't compute the probability it could happen).
Are the ABX programs using trustful random generators (our programming teacher told us "Never use the built in random generator ! Always use the one in the math library...")

abchr, at least, no longer uses rand().  It uses the "Mersenne Twister" Garf found:

http://www.math.keio.ac.jp/~matumoto/ver980409.html (http://www.math.keio.ac.jp/~matumoto/ver980409.html)

Hans Heijden had found one sequence which showed moderate evidence against randomness on a runs test, which prompted me to change the random function.  However, all of the other runs I tried myself passed for randomness, so I'm not sure any change was really necessary.

ff123

If you don't need speed, the best known pseudo-random generator is B.B.S (Blum-Blum-Shub).  The difficulty to predict a single output bit from all previous others, is proven to be as hard as factoring an arbitrary-sized integer.

If factoring a 500-digit number sounds too easy    , it can be possible to make a PRNG based on the discrete logarithm problem 
Title: Blind test challenge
Post by: Pio2001 on 2003-03-03 21:22:53
I've compared the reference with File 5. File 5 runs indeed faster... 0.002 % faster (which is, from a pitch point of view, is 0.0002 tones, the extreme limit of audibility being 0.01 tone, for very well trained people).

Here (http://perso.numericable.fr/laguill2/pictures/clockshift.gif)'s the sonogram of the difference between the two files (offsetted by 40 samples, so that the symmetry is more visible). You have to substract the samples and listen to the result to understand the pattern.
It means that both clocks are wow and flutter free, the two clocks (playback and record) just don't run at the same frequency.

Listening to the vanishing point of the differences, where the two clocks are in synch, it sems that there is a difference between the two files in the low frequencies.

But the speed difference can't account for the audible difference, and isn't necessary the Sony's fault.
Title: Blind test challenge
Post by: TJA on 2003-03-04 02:09:11
You just cannot use most random functions in libraries.
Only thing i know that works for a SHORT time is /dev/random from LINUX.

Here a part of the man-page to that:

      The random number generator  gathers  environmental  noise
      from  device  drivers  and  other  sources into an entropy
      pool.  The generator also keeps an estimate of the  number
      of  bit  of  the  noise  in  the  entropy pool.  From this
      entropy pool random numbers are created.

      When read, the /dev/random device will only return  random
      bytes  within the estimated number of bits of noise in the
      entropy pool.  /dev/random should  be  suitable  for  uses
      that  need  very  high quality randomness such as one-time
      pad or key generation.  When the entropy  pool  is  empty,
      reads  to /dev/random will block until additional environ­
      mental noise is gathered.

All other implementation - that mostly use mathematical function and NOT an entropy pool - will not work!
I´m sorry if those mentioned library has the above entropy pool, but as far as i know, most libraries do NOT!
Title: Blind test challenge
Post by: KikeG on 2003-03-04 08:23:39
Quote
I've compared the reference with File 5. File 5 runs indeed faster... 0.002 % faster (which is, from a pitch point of view, is 0.0002 tones, the extreme limit of audibility being 0.01 tone, for very well trained people).

I think the difference is not on the perceived pitch (musical tone), but maybe more in the fact that the 5.wav file has its high frequencies a little bit displaced up in the frequency scale due to this faster play, resulting into perceived louder highs. The higher the frecuency, the more it is displaced up in the frequency scale.

And yes, clock speed differences aren't necesarily the DAT's fault.
Title: Blind test challenge
Post by: Pio2001 on 2003-03-04 11:50:16
Quote
the fact that the 5.wav file has its high frequencies a little bit displaced up in the frequency scale due to this faster play, resulting into perceived louder highs

I can't believe it.

Do you realize that all the 1000 to 2000 Hz octave, for example, is just changed into 1000.002 to  2000.004 Hz ?

Edit : how much better is our threshold of hearing at 1000.002 Hz compared to 1000.000 Hz ?
Title: Blind test challenge
Post by: KikeG on 2003-03-04 13:26:00
The effect is higher at, say, 15 KHz. But it's still quite small, so I don't know for sure what is really happening.
Title: Blind test challenge
Post by: Garf on 2003-03-04 19:53:47
Quote
E.g. You want to reach 95%-confidence (in the classical sense) and stop as soon as this condition is satisfied. Now the following are your win-conditions:
5/5, 7/8, 9/11, 10/13, 12/16, 13/18, ...
So, the probability to pass this is test by guessing is not only 0.05 but something like:
P(5/5) + P(7/8 and not 5/5) + P(9/11 and neiter 5/5 nor 7/8) + ...
which tends to 1  .

Are you sure? It's counterintuitive to me (as are many statistics, but anyway )

It's P(5/5) + P(7/8 or 8/8 and not 5/5) + P(9/11 or 10/11 or 11/11 and not 5/5 or not 7/8 or not 8/8) + ...

The chances are interdependent, failure on the first influences success on the second one and so on.

A silly test is to write a simulation that keeps guessing in ABX, if you are right it has to pass eventually.
Title: Blind test challenge
Post by: ff123 on 2003-03-04 21:24:14
I verified Continuum's formulas by simulation.

I hadn't thought to simulate a huge ABX test which just continues until it finally passes what is basically a "no difference" situation, but I have little doubt that's what will eventually happen.

ff123
Title: Blind test challenge
Post by: KikeG on 2003-03-05 10:28:42
Regarding the sequential ABX test problems mentioned, for me it would be more comfortable something like always knowing your "classic" p value, but needed to reach a different value to achieve test pass confidence, depending on the number of trials performed. Since I'm not very good at statistics, would it be possible to calculate the needed p-values or something similar that you need to achieve, depending on the nº of trials performed?
Title: Blind test challenge
Post by: ff123 on 2003-03-05 15:22:23
Quote
Regarding the sequential ABX test problems mentioned, for me it would be more comfortable something like always knowing your "classic" p value, but needed to reach a different value to achieve test pass confidence, depending on the number of trials performed. Since I'm not very good at statistics, would it be possible to calculate the needed p-values or something similar that you need to achieve, depending on the nº of trials performed?

If the number of trials are fixed before the test starts and enforced by the tool, then classic p-value calculations will work as is.

Otherwise, I think we decided it's best to use a "profile" because one rapidly loses the ability to get a significant result if allowed to see the results and to stop at any time.

ff123
Title: Blind test challenge
Post by: KikeG on 2003-03-05 16:32:31
I know, but I would rather prefer an alternative method as the one I suggested that would not impose a fixed number of trials required.
Title: Blind test challenge
Post by: ff123 on 2003-03-05 16:46:08
Quote
I know, but I would rather prefer an alternative method as the one I suggested that would not impose a fixed number of trials required.

Yes, there is a method.  I think I posted a graph of it once here, but I must have deleted it earlier.  Here it is again:

(http://ff123.net/export/sequential.gif)

A description of the formulas to derive the two lines are in that monster statistics thread.  I think we discarded this because it was less sensitive than the profile method.  However, it does have the advantage of simplicity and the trials don't have to be fixed beforehand.

ff123
Title: Blind test challenge
Post by: Continuum on 2003-03-05 17:05:53
Quote
Quote
E.g. You want to reach 95%-confidence (in the classical sense) and stop as soon as this condition is satisfied. Now the following are your win-conditions:
5/5, 7/8, 9/11, 10/13, 12/16, 13/18, ...
So, the probability to pass this is test by guessing is not only 0.05 but something like:
P(5/5) + P(7/8 and not 5/5) + P(9/11 and neiter 5/5 nor 7/8) + ...
which tends to 1  .

Are you sure? It's counterintuitive to me (as are many statistics, but anyway )

Well, I have no proof for the 1-limit at hands, but it is suggested by empirical results.

Quote
It's P(5/5) + P(7/8 or 8/8 and not 5/5) + P(9/11 or 10/11 or 11/11 and not 5/5 or not 7/8 or not 8/8) + ...

Yes, but this makes no difference: After less than 5 of 5 a result of 8/8 is impossible.

Quote
The chances are interdependent, failure on the first influences success on the second one and so on.

Yes, but they are all disjunct. They just represent the list of winning conditions.

Quote
A silly test is to write a simulation that keeps guessing in ABX, if you are right it has to pass eventually.

Who knows, how long it takes?

The probability to pass a 0.95 test with not more than 500 trials is ~30%. (My Computer had to work half a minute to calculate this already, and it becomes far worse for more trials  )
Title: Blind test challenge
Post by: Continuum on 2003-03-05 17:14:18
Quote
A description of the formulas to derive the two lines are in that monster statistics thread.  I think we discarded this because it was less sensitive than the profile method.  However, it does have the advantage of simplicity and the trials don't have to be fixed beforehand.

IIRC, I had some reservations about it, because I didn't understand the implied calculations. I suspect that they could be only approximations. 

But it definitely would be possible to construct a test with infinite length. The only problem is, that the test would be really hard at later stages. (The above sum must converge to e.g. 0.05, so each following term has to be smaller and smaller.)
Title: Blind test challenge
Post by: ff123 on 2003-03-05 18:10:27
Quote
IIRC, I had some reservations about it, because I didn't understand the implied calculations. I suspect that they could be only approximations. 

But it definitely would be possible to construct a test with infinite length. The only problem is, that the test would be really hard at later stages. (The above sum must converge to e.g. 0.05, so each following term has to be smaller and smaller.)

I believe that method was a Bayesian approach, and the path we chose was frequentist.  So we always needed to know the upper limit of trials involved.
Title: Blind test challenge
Post by: Continuum on 2003-03-05 18:34:13
Quote
So we always needed to know the upper limit of trials involved.

Not really. We constructed a sum as above, so that terms were reasonable large (not to small). For the 28-profile this was: 0,015625 + 0,013916016 + 0,007171631 + 0,00677526 + 0,005667329 = 0,049155235.

But there is no theoretical bound for this summation. We could search for a small enough value on-the-fly. We only have to assure that the sum stays below 0.05.

E.g. we could construct something like p/2 + p/4 + p/8 + ... (p=0.05).
Title: Blind test challenge
Post by: ff123 on 2003-03-05 19:54:14
I think I was concerned with an upper bound because I wanted to keep the difficulty of passing about the same at each stopping point.  This is in contrast to a test which gets progressively harder to pass at each of the stopping points.

Hmm.  I wonder if I should just get off my butt and implement the thing.  Currently none of the programs (including abchr) does it quite correctly when they display interim results.

ff123
Title: Blind test challenge
Post by: KikeG on 2003-03-05 20:28:27
Quote
Yes, there is a method.  I think I posted a graph of it once here, but I must have deleted it earlier.  Here it is again:

I'll take a look at THE thread again, focusing on the posts related to this. I'd prefer a method like this also because of its simplicity for the tester who has no experience about ABX or statistics, and because makes the test easier to administer, to implement from a interface design point of view, and to understand for the tester.
Title: Blind test challenge
Post by: Bedeox on 2003-03-07 23:56:53
Having read through the 'Statistics(...)' thread I have to mention that I only use few types of test:
7-8 tries (short), 14 or 16 tries (long). If I need to make a long test after a short one to be sure,
I double number of tries, not just repeat it. I'm not expecting certain probability.
I can state that I'm sure if I achieve > 6/7 or 7/8 and know the type of an artifact.
(Otherwise I do additional long test.)

Quote
Example: The probability to pass an "traditional" 0.95-test by guessing when one's allowed to stop at every point up to 30 is 0.129! (you can test this with my Excel-sheet from above)

That's the point... The only proper ABX has either 'hard' stop points or no stop points.
I'm using the latter.