Hey Ivan, just started doing your AAC vs. MP3 @128 Test #1. For those who dont know, below is the test announcement. But, anyway, it seems that 16 samples to evaluate (12 different) is a bit too much even for me to evaluate "correctly"..
Are you really going to include this many samples in test #2 also and 30sec samples?
How many results you have got so far?
Just did the first part of the test, second one to go...
Ivan Dimkovic wrote:
Hi All
Ok, WavRate is almost finished, only thing that needs to be done is multiple test analysis (but project file is already compatible with that).
What's new:
- Better look&feel Small glitches corrected, like playback info
- Automatic randomization of the items prior to testing
- Slider selection of the range of playback
- Automatic encryption of the output files (security purposes)
You can download wavrate from:
http://www.psytel-research.co.yu/downloads/wavrate.exe (http://www.psytel-research.co.yu/downloads/wavrate.exe)
Ok, so first test iteration will be castanets test:
http://www.audiocoding.com/listening_tests...tanets_test.zip (http://www.audiocoding.com/listening_tests/castanets_test.zip)
Codec info for the Test #1 could be found here:
http://www.psytel-research.co.yu/downlods/...s_codecinfo.zip (http://www.psytel-research.co.yu/downlods/castanets_codecinfo.zip)
This zip file is enrypted - I will supply everyone with the password when listening test #1 is finished.
Please unpack those LPAC files into the same directory and open those .wrf files from the wavrate. You can rank each codec with mark ranging from 1 (bad) to 5 (excellent) in 0.1 increments.
Please perform those tests carefully, and comment your results for each item as much as you can - tell me what do you think what's wrong with particular test item and why you think it deserves particular mark.
Please send saved results (File/Save...) to: listening_tests@psytel-research.co.yu - subject "Castanets Test"
Next test will incorporate larger sample (30 seconds)
Maybe I will reduce number of codecs to 6, with hidden reference and anchor. Not yet decided
I've god 3 reports so far - results differ for most codecs except for one codec that is ranked 'excellent' by all testers
Of course, these results can't be taken as accurate unless at least 30 people submit results. After that, we will perform statistical analysis and see what the real results are...
Hmm, pretty long way to go until 30 people has done it..
Program is working nicely, but it´s sad you had to drop the encryption of output files for now. Any idea when you could implement it again?
As soon as I figure out what went wrong with the encryption
[deleted]
I'll take a look at pgplib - actually, I used MD5 encryption from Win32 API (CryptoAPI) but it worked on my computer only...
Thanks,
-- Ivan
Well, I peered into the test a bit (after I took it and submitted my results, of course). I'm not exactly sure how this will be best analyzed. Probably the most conservative method will be to treat the two parts as two separate tests. Let's just say I was concerned right after I took the tests about the coherence of the results as a whole, but was encouraged some after I looked at the setup.
I may look into supplementing the Friedman analysis program I wrote with an ANOVA analysis option. ANOVA does make certain assumptions, such as normal distribtion of the listening panel, and equal-interval rating scale, but is more sensitive than the Friedman in return. BS. 1116-1 recommends ANOVA whenever possible over non-parametric methods like the Friedman.
ff123
Hmm... deviations between listeners are quite big.
For example, some of them are ranking codecs in range of 1.0 to 3.5 and others are ranking them in range of 3.0 - 5.0
I have collected 7 results up to date. Maybe we will have more accurate results after 30 results are submitted.
ff123 - what do you suggest for the statistical analysis?
Hmm... deviations between listeners are quite big.
For example, some of them are ranking codecs in range of 1.0 to 3.5 and others are ranking them in range of 3.0 - 5.0
Yeah. Some listeners are more sensitive than others.
I have collected 7 results up to date. Maybe we will have more accurate results after 30 results are submitted.
ff123 - what do you suggest for the statistical analysis?
You can run my friedman.exe program on it even now. The conclusions it pops out right now, though, will be the obvious ones.
ff123
I downloaded wavrate and tried the listening test, but it crashed when I opened either of the projects and kept doing so each time they were invoked. Sorry.
You have to unpac the files first. At least this was the reason why another person experienced a crash.
ff123
Yup, you are correct ff123. It seems to work fine now, only problem being I can hardly hear any difference whatsoever between the samples. Luckily my mother could hear a few but after 5 samples she also became tonedeaf and couldn't hear the differences anymore either. So I dunno if my results would be representative enough, at least if you're going for a "Is this audiophile transparent?" measurement. I'll try to get my results in though, I wanna do something in return for Ivan's excellent work.
Ok, I submitted my results also. Well, in the end that wasn't too much, since I evaluated part2 quite quickly.
So it's not so awful job.
But this number of clips and Archival Quality -test would take a long time...
Nice utilty. I had to figure out I had to decompress the files for it to work too though
Not having to ABX makes it much easier and faster to do tests
like this. On the other hand, it may mean that my results are essentially random (of course I don't _think_ they are).
Well, you'll see in the results I guess.
--
GCP
Ivan,
How is the test coming along?
ff123
Ok, castanets test is over and it seems that FhG AAC is #1 (as expected), PsyTEL AACEnc is #2, etc..
I will post results later today, and prepare test #2
...CASTANETS 128 TEST...
FRIEDMAN version 1.10 (Sept 29, 2001) http://fastforward.iwarp.com/ (http://fastforward.iwarp.com/)
Friedman Analysis
t1 t2 t3 t4 t5 t6 t7 t8 person s1 s2 s3 s4 s5 s6 s7 s8
2 0 3 0 0 1 1 1 1 4.0 7.0 1.5 8.0 4.0 1.5 4.0 6.0
1 1 1 2 0 3 0 0 2 7.0 1.0 4.5 7.0 2.0 7.0 3.0 4.5
1 1 2 0 1 1 1 1 3 7.0 1.0 3.5 8.0 6.0 5.0 3.5 2.0
1 1 1 1 1 1 1 1 4 4.0 2.0 6.0 8.0 5.0 7.0 1.0 3.0
1 1 2 0 2 0 1 1 5 1.0 7.0 5.5 8.0 3.5 5.5 2.0 3.5
1 1 2 0 1 1 1 1 6 3.5 6.0 5.0 8.0 3.5 7.0 1.0 2.0
1 1 1 1 1 1 1 1 7 6.0 5.0 2.0 8.0 4.0 7.0 3.0 1.0
1 1 1 1 1 1 1 1 8 7.0 5.0 4.0 8.0 1.0 6.0 2.0 3.0
1 1 1 1 1 1 2 0 9 6.0 2.0 7.5 7.5 1.0 5.0 3.0 4.0
Input filename: results1.txt
Samples compared:
s1: Commercial AAC A, ranksum = 4.550E+01
s2: FhG IIS MP3Enc, ranksum = 3.600E+01
s3: PsyTEL FastAAC 2.0b, ranksum = 3.950E+01
s4: FhG IIS Reference AAC, ranksum = 7.050E+01
s5: LAME --nspsytune, ranksum = 3.000E+01
s6: PsyTEL AACEnc 1.2, ranksum = 5.100E+01
s7: FhG IIS FastENC, ranksum = 2.250E+01
s8: LAME 3.90, ranksum = 2.900E+01
Number of listeners: 9
Significance of data: 7.170E-05 (highly significant)
Critical significance of Fisher's LSD analysis: 5.000E-02
Fisher's LSD for rank sums: 2.037E+01
The following comparisons are each true with 95.0 percent confidence:
FhG IIS Reference AAC is better than Commercial AAC A
Commercial AAC A is better than FastENC
FhG IIS Reference AAC is better than FhG IIS MP3Enc
FhG IIS Reference AAC is better than PsyTEL FastAAC 2.0b
FhG IIS Reference AAC is better than LAME --nspsytune
FhG IIS Reference AAC is better than FastENC
FhG IIS Reference AAC is better than LAME 3.90
PsyTEL AACEnc 1.2 is better than LAME --nspsytune
PsyTEL AACEnc 1.2 is better than FastENC
PsyTEL AACEnc 1.2 is better than LAME 3.90
Ok, 96 kbits/s results along with ANOVA will follow-up shortly...
Too bad only nine listeners.
Which wav files corresponded to the various encoders?
ff123
Edit: here is the output as formatted by friedman.exe version 1.20:
http://ff123.net/export/aac128log.txt (http://ff123.net/export/aac128log.txt)
Hmmm. How do I do pre-formatted editing on this forum?
doesn't seem to work.
[span style='font-size:9']
Edit by Dibrom: The correct tag is [ code] [/ code] of course without the spaces. Anyway, here you go:[/span]
FRIEDMAN version 1.20 (Oct 8, 2001) [url]http://ff123.net/[/url]
Friedman Analysis
Number of listeners: 9
Critical significance: 0.05
Significance of data: 7.17E-05 (highly significant)
Fisher's protected LSD for rank sums: 20.369
PsyAAC12 ComAAC_A PsyFast2 MP3Enc LamePsy Lame390 FastEnc
FhGAAC 19.50 25.00* 31.00* 34.50* 40.50* 41.50* 48.00*
PsyAAC12 5.50 11.50 15.00 21.00* 22.00* 28.50*
ComAAC_A 6.00 9.50 15.50 16.50 23.00*
PsyFast2 3.50 9.50 10.50 17.00
MP3Enc 6.00 7.00 13.50
LamePsy 1.00 7.50
Lame390 6.50
FhGAAC PsyAAC12 ComAAC_A PsyFast2 MP3Enc LamePsy Lame390 FastEnc
70.50 51.00 45.50 39.50 36.00 30.00 29.00 22.50
FhGAAC is better than ComAAC_A, PsyFast2, MP3Enc, LamePsy, Lame390, FastEnc
PsyAAC12 is better than LamePsy, Lame390, FastEnc
ComAAC_A is better than FastEnc
Ah.. yes...
5006933F-DCCD-4c82-A7EF-E9E9BE22A78D is the WinZip password for the http://www.psytel-research.co.yu/downloads...s_codecinfo.zip (http://www.psytel-research.co.yu/downloads/castanets_codecinfo.zip)
Critical significance of Fisher's LSD analysis: 5.000E-02
Fisher's LSD for rank sums: 2.037E+01
:idea: Ummm, where can I get a hit of that stuff? (Fisher's LSD) Is it blotter?
Originally posted by ff123
Hmmm. How do I do pre-formatted editing on this forum? doesn't seem to work.
Hrmm, I'm not actually sure if it is possible or not. I'll have to check around, but if not I can just create a new tag which allows this behavior.
hi, ivan
when do you want to end the test?
now, do you only require the results for test1?
regard.
yan.