Doing a large scale public listening test...

Topic: Doing a large scale public listening test... (Read 15673 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Doing a large scale public listening test...

2002-12-08 21:07:23

I could convince my Math teacher to let me do a public listening test for my math project (including mathematical evaluation of the results; Chi squares, deviations... etc.)

Currently I'm in the stage of planning the test, here's what I got so far:

100 randomly selected students
That's what my math-teacher suggested. Will be a lot of work, but the more results the better.

ABC/HR by ff123
ABC/HR tests give most data for evaluation (it's a math project)

64 kbps test
Having completely untrained testers everything above 64kbps would lead to unusable/no results.

Only 1(!) sample
Students are impatient and I'm short on time, too, so only one sample (ca. 30 sec.) will be tested. I need suggestions what sample (no problem sample) to use (Is the sample used in the c't test any good?)

Codecs to include
lame 3.93.1 @ --preset 64 --scale 1 --resample 44 --lowpass 14
lame 3.94a6 (?)
Ogg Vorbis 1.0 @ -q 0
Ogg Vorbis 1.0 @ -b 64 --managed (?)
aacenc (which setting?)
mpc --thumb/--telephone (suggestions on PNS setting?)
MP3Pro (?)
Real Audio (?)
WMA9 (?)

I would really appreciate input/suggestions on this project. Results will - of course - be made public as well as the test itself.

dev0

Doing a large scale public listening test...

Reply #1 – 2002-12-08 21:53:55

I would also include FhG, because there are many people saying it´s better at lower bitrates, also only one lame sample.
For the test sample I would maybe cut more toghether, so you have 3 10sec-samples.
I´ve read that "Rolling Stones - Start me up" is problematic at the beginning, this could be 10 sec.
It would be good if you could split the result in female / male and age. So that you doesn´t only have 17 year old male, you choose maybe from all 7th classes 7 male and 7 female testers, same for 8th.... That´s much more data for evaluation. Also you should select stundents that are trustable, because some would say yust shit to annoy you. Musepack isn´t tuned for that, I would´t use it (people might think musepack is the worst codec at all, not yust at that bitrates). I would ask Ivan to encode your sample with his new encoder in nero, he said it´s much better at low bitrates. He also probably knows the best commandline.

Doing a large scale public listening test...

Reply #2 – 2002-12-08 22:02:48

The students will be randomly picked, but I'll pass out a survey asking for age, gender and experience with listening tests.
FhG is a good idea, but which one? Is FastEnc okay? Which cmd.line?

ToDo
aacenc cmd.line
mppenc cmd.line
fastenc cmd.line
Include WMA/MP3Pro/RA?
Include Vorbis managed?
Include a lame3.94 ALPHA built?
SAMPLE???

THX for your input.

dev0

Doing a large scale public listening test...

Reply #3 – 2002-12-08 22:12:28

Me and ff123 had some large discussion about how to analyse such a test statistically, which led to the creation of the bootstrap analysis tools. It should be in the archive and contains many things that you need to take into account. Good luck!

Doing a large scale public listening test...

Reply #4 – 2002-12-08 22:28:47

Quote

The students will be randomly picked, but I'll pass out a survey asking for age, gender and experience with listening tests.

You maen maybe there are randomly 71% male and 43% are randomly 16? If cannot check if your students are trustable, I would make 14 groups (7th class - male, 7th class - female, .... , 13th class - feamle) and then randomly select 7 or 8 students for each group, so you have a very balanced testing-group. You could also include teachers to see if the hearing of people over 50 years old is much different.
I would include WMA/MP3pro/RA, otherwise people might say "this test is old, now we have WMA9 which is soooo much better".
for aacenc commandline see my first post, aacenc 2.15 in my opinion sounds terrible at 64kBit/s (and I cannot tell the difference from 96kBit/s vorbis to the original!), that encoder is outdated.
For the lame alpha 1 would encode 3.93.1 and 3.94 and see before the listening test which sounds better and use only the better one (maybe up the files here and make a poll).
I have never used FhG, I don´t know if it´s the newest version, but for yust 30 sec. you could use the demo encoder from http://www.iis.fraunhofer.de/amm/download/index.html
It´s mp3enc version 3.1, but it doesn´t seem to support VBR.

Edit: where is Palmer, AK? I couldn´t find it in the atlas or in the internet.

Doing a large scale public listening test...

Reply #5 – 2002-12-08 22:50:58

ff123/Garf discussing analysis of listening tests thread

Thanks Garf for that, I'll go through it tonite, that's exactly what I needed for my Mathematical background.

Codecs to be included (updated)
lame 3.93.1 --preset 64 --scale 1 --resample 44 --lowpass 14
FhG mp3enc31 no cmd.line yet
Vorbis 1.0 --quality 0
Psytel/Nero AACenc I'd love to use Ivan's new encoder
MP3Pro
WMA

Any suggestions additions to the choice of codecs? I decided to keep the selection focused on 'production-ready' codecs, so it represents the actual state of development. I'm still thinking about including MPC though, some people reported pretty good quality using PNS.

dev0

Doing a large scale public listening test...

Reply #6 – 2002-12-09 00:09:17

Quote

lame 3.93.1 --preset 64 --scale 1 --resample 44 --lowpass 14

Why not use the default preset? It sounds better to me than this line you suggest. (I only tested two samples though...)

MPC with PNS is good in the range 80-120 kbit but below that the quality falls - even faster than mp3 imo. So my tip is to save mpc for a 128 kbit test if you ever make one.

Doing a large scale public listening test...

Reply #7 – 2002-12-09 00:34:38

If I was a lazy student (and that's not hard for me to imagine ), I would want to test as few codecs as possible, because I'd quickly get sick of listening to the sample repeatedly, and I'd just haphazardly rate them after a while.

On the other hand, maybe I wouldn't mind testing a couple short samples, if each sample was only ~10 seconds and there were only a few codecs to rate. ff123's 64kbit/s test showed that codec quality varies considerably with the type of music being encoded, so testing only one sample could call into question the applicability of the test's results. Also, testing with a c't-like multi-part sample might not work because your listeners likely won't bother listening to every part of the sample, or at least they won't be able to focus on everything.

Right now, you have three MP3 codecs... IMO you should try to reduce that number. Maybe bring the # of codecs down to four, so including the original would mean students have to rate five encodes of the sample. To narrow down the codecs, ask yourself: which codecs do you care about, which codecs are competetive in quality, or which codecs are in widespread use? Any of these three questions may have considerably different answers.

Maybe then you can have each student rate two samples. AND, maybe you can use six total samples, and if each student listens to two samples, then you can have each sample rated 33 times.

I'm not sure how significant the results will be using that many tests per sample, but I think ff123's test used under 30 people per sample. However, he had a sensitive group to work with, whereas your population of untrained listeners may require a higher number of tests before significant results emerge. I read ff123's links about using bootstrap analysis to make the results more, but I don't understand it as well as I'd like to, so maybe someone else can comment on this idea of using multiple samples. But I did check out ff123's favorite book ( ), Sensory Evaluation Techniques 3rd Ed., and hopefully that will clue me in.

Finally, when picking samples and codecs, ask yourself: who will be using this information? Do you want this test to be as broad as possible, or do you want to focus it for a certain application? For example, maybe you want to tell teenagers what codec to use on their portables, in which case you can omit MPC from the test, and you might want to use samples that are representative of popular music (ass-rock, gangsta rap, trashy pop ) and avoid jazz/classical samples. By having a clearly defined goal, you can focus your test and get more conclusive results.

Doing a large scale public listening test...

Reply #8 – 2002-12-09 02:12:50

Hi. Sounds like a fantastic project.

I think you are predisposing Lame to sound like shit by re-sampling to 44 and keeping the lowpass at 14.

There are very few bits to throw around at this bitrate.

Doing a large scale public listening test...

Reply #9 – 2002-12-09 03:31:59

100 randomly selected students is most definitely overkill.

30 students would be fine, 50 students would be better, 100 students would be a major waste of time, because you are not going to find students at the same school in approximately the same age range with a large enough individual deviation at such a young age / similar environmental variables.

I think the only benefits you would get from having 100 students would be a little more certainty as to whether or not your results are statistically relevant and a little more "effort points" from the teacher who's grading you :)

(a couple of well received semester-long research psychology projects and a statistics class speaking)

Doing a large scale public listening test...

Reply #10 – 2002-12-09 07:26:02

A few comments on some things:

lame cmd.line
I got this cmd.line from Gabriel's WMA9 comparsion test, where he calls it 'tuned'. To my ears it sounds better than the untuned one, because the resampling is just a pain and scale should be set to 1 when doing a listening test.

Students participating
I also think that 100 students will be total overkill, so I'll try to convince my mathteacher to reduce the number to 50 tomorrow.

Codecs
lame 3.93.1 --preset 64 --scale 1 --resample 44 --lowpass 14
Vorbis 1.0 --quality 0
Psytel/Nero AACenc I'd love to use Ivan's new encoder
MP3Pro or WMA

Doing a large scale public listening test...

Reply #11 – 2002-12-09 08:25:43

This lame command-line is designed to sound similar to wma9.
It is a question of taste. It seems that people unused to audio compression prefer this line, and people used to audio compression prefer just "--preset 64 --scale 1".

On the first one you have more bandwitdh, but also more artefacts.

Doing a large scale public listening test...

Reply #12 – 2002-12-09 08:27:34

IMO, the most biggest criticism of your proposed test would be that it uses only one sample. Is the important thing about statistics the math it uses or is it about trying to find out what's true or not? I know, this isn't the real world, it's school

ff123

Doing a large scale public listening test...

Reply #13 – 2002-12-09 09:13:48

Heh.. and with certain chosen sample you could probably make any of those codecs win the test..

Doing a large scale public listening test...

Reply #14 – 2002-12-09 09:45:14

Quote

Thanks Garf for that, I'll go through it tonite, that's exactly what I needed for my Mathematical background.

You might also consider to read through some listening test reports from the EBU, because they do this very professionally (MUSHRA test methods, SPSS analysis etc.). These PDF reports are usually 50-70 pages long and cover all aspects of the tested objects in question. Maybe they can give you some idea how to design your test procedure properly, too.

Quote

Codecs to be included (updated)
lame 3.93.1 --preset 64 --scale 1 --resample 44 --lowpass 14

I think a resampling to at least 32 kHz or even 24 kHz (with a cutoff at 12 kHz then) would sound better for LAME, at least this was my impression during my own low bitrates tests using v3.92.

Quote

Psytel/Nero AACenc I'd love to use Ivan's new encoder

For the old one I'd suggest to use the preset -radio with -resample 32000, because a plain -br 64 won't sound too good with PsyTEL. Of course this would result in a higher average bitrate than 64 kbps, with the c't reference.wav and after converting it to MP4 it came out as 71 kbps. If you feel that you have to stick to strictly 64 kbps average bitrate, I would recommend to use at least a variable bitrate with -qvbr and a resampling, too: aacenc -qvbr 17 -resample 32000.

Talking about c't: I don't know if it's a good idea to use only one sample, but if you must, this could do the job. Of course your probants should be able to switch between the three different excerpts individually, so they wouldn't have to remember the sound of Kylie Minogue in the first sample after listening to "Horny" and the opera excerpt.

Quote

MP3Pro
WMA

If you can, use them, but WMA9 if it's possible, not WMA8. Also a variable bitrate with mp3PRO (not enabled in the free version) would probably be better than CBR.

Quote

Any suggestions additions to the choice of codecs? I decided to keep the selection focused on 'production-ready' codecs, so it represents the actual state of development. I'm still thinking about including MPC though, some people reported pretty good quality using PNS.

I'm not so sure that PNS is usable at the moment with MPC, because guruboolez sent me some ~64 kbps samples (baroque strings and er-hu) probably done with PNS, and they sounded really awful because of a constant "shortwave radio" background noise that might have been generated by PNS.

I tested the profile --thumb recently against PsyTEL's -radio preset (with -resample 32000), and for me (and my samples) MPC v1.14 was better. I don't know if PNS is enabled in this profile with this codec version, but I would assume it's not, because I couldn't detect any of these background noises in my test. Like I wrote already, this profile uses almost the same amount of bits like PsyTEL's -radio, which means that the c't reference.wav needed 76.4 kbps in average (= too much for your test).

Doing a large scale public listening test...

Reply #15 – 2002-12-09 16:13:55

Quote

I'm not so sure that PNS is usable at the moment with MPC, because guruboolez sent me some ~64 kbps samples (baroque strings and er-hu) probably done with PNS, and they sounded really awful because of a constant "shortwave radio" background noise that might have been generated by PNS.

PNS is very useful, all quality levels under 5 use it with latest encoder. The files guruboolez sent you were encoded with 'below telephone' quality to limit their size, if you encode the original with --thumb you will notice that bitrate is a lot higher but quality is excellent.

Doing a large scale public listening test...

Reply #16 – 2002-12-09 16:37:57

Quote

PNS is very useful, all quality levels under 5 use it with latest encoder. The files guruboolez sent you were encoded with 'below telephone' quality to limit their size, if you encode the original with --thumb you will notice that bitrate is a lot higher but quality is excellent.

It's a great tool when it works. But when it breaks (like it does on some of the "standard" test samples: trumpets, dogwhistle and applaud) it doesn't sound good at all.

Doing a large scale public listening test...

Reply #17 – 2002-12-09 16:52:13

Quote

IMO, the most biggest criticism of your proposed test would be that it uses only one sample. Is the important thing about statistics the math it uses or is it about trying to find out what's true or not? I know, this isn't the real world, it's school

ff123

Being a math-project the math side of the project is definetly more important, but if we'd gain useful information from it that would be nice, too, wouldn't it.

The problem is, that I have to let all those students let do the test on my laptop during lunch/breaks/etc. I don't want to let them do multiple samples, because it would practically just become too much work. Does anyone still have a download link to the c't test sample or any other good idea what to use.

After listening to it myself I decided to change the lame cmd.line into --preset 64 --scale 1, because of the reason Gabriel mentioned and the fact, that this it what most people would probably use.

Codecs
lame 3.93.1 --preset 64 --scale 1
Vorbis 1.0 --quality 0
AACenc 2.15 -radio (-resample 32000)
WMA9 some setting

Do you think that this slection of codecs, though being small, represents the current state of development pretty well?

dev0

Doing a large scale public listening test...

Reply #18 – 2002-12-09 17:14:29

Quote

Being a math-project the math side of the project is definetly more important, but if we'd gain useful information from it that would be nice, too, wouldn't it.

I guess my real point is that sometimes the forest gets lost for the trees in school. Like the difference between a "valid" experiment and just a "reliable" one. Also, the importance of asking the right question in an experiment and in designing it properly.

However, as long as people understand that the results would apply, at most, only to the type of music your sample represents, it should be ok. I'm not sure what that means if your sample is a compilation of different samples, though.

Quote

The problem is, that I have to let all those students let do the test on my laptop during lunch/breaks/etc. I don't want to let them do multiple samples, because it would practically just become too much work. Does anyone still have a download link to the c't test sample or any other good idea what to use.

I have the original sample available, if you like. Let me know. Also, if you want a medley, you could use one of the compilation samples available on the xiph.org site:

http://xiph.org/ogg/vorbis/listen.html

However, one might reasonably question whether or not the selections were chosen to highlight vorbis's strengths and to minimize its weaknesses.

Quote

Codecs
lame 3.93.1 --preset 64 --scale 1
Vorbis 1.0 --quality 0
AACenc 2.15 -radio (-resample 32000)
WMA9 some setting

Do you think that this slection of codecs, though being small, represents the current state of development pretty well?

mp3pro is missing. It's one of the better codecs at 64 kbit/s.

ff123

Doing a large scale public listening test...

Reply #19 – 2002-12-09 17:30:50

Quote

mp3pro is missing. It's one of the better codecs at 64 kbit/s.

I'm still thinking about including it... I'd end up with almost the same collection as you did in your 64kbps (though the intended audience is a completely different one).

dev0

Doing a large scale public listening test...

Reply #20 – 2002-12-09 18:09:05

My suggestion would be to avoid the medley/combination sample idea. It's very confusing for subjects. And they'll be very confused anyway - untrained subjects brought in to listen to something during their dinner hour are always confused!

As it's more important for you to do well in your maths project than to discover something new about 64kbps audio codecs, I think you should keep this as simple as possible. Even the most simple subjective tests can become surprisingly complicated when you do them!

Be careful and clear about the question you are asking.
Be very clear about the quality grading scale. Is it absolute or relative? Or do you just want the samples ranking?
How many times can each subject listen to each sample. Is the playback automated, or under their control?

I'd suggest:

1 revealing test sample (I liked "That's just the way it is" but don't know what it will show up at 64kbps)

4 audio codecs. (Not to pre-judge the test, but it'll give you something to talk about in your analysis if two are obviously different, and two are quite similar)

Since it's a maths project, you'll be sad to find that most "official" published tests are rather thin on the statistical analysis, and some might even include bits that your maths teacher won't approve of. Garf and ff123 have done lots of good work in this, as mentioned.

Have a read of a MUSHRA test:
http://www.ebu.ch/trev_283-contents.html
(Scroll down to the Internet Audio test)

and a BS1116 test:
visit http://www.mp3-tech.org/
and go to programmers corner, Technical Audio papers, and download the last one on the list. The MPEG-4 one may be interesting as well.

Whatever you do - enjoy it. And pick a piece of music for the test sample that you're not too bothered about. Don't pick something you really like, otherwise you'll hate it by the end of the test!

Good luck!
Cheers,
David.
http://www.David.Robinson.org/

Doing a large scale public listening test...

Reply #21 – 2002-12-09 21:32:28

Quote

Quote
I'm not so sure that PNS is usable at the moment with MPC, because guruboolez sent me some ~64 kbps samples (baroque strings and er-hu) probably done with PNS, and they sounded really awful because of a constant "shortwave radio" background noise that might have been generated by PNS.

PNS is very useful, all quality levels under 5 use it with latest encoder.

But what is the "latest encoder" in Musepack, only the v1.9x version or also the v1.14 I used? The files I got from guruboolez were encoded with v1.14, and he did not use a profile but rather a quality setting, as far as I know. If this obvious background noise is not caused by PNS, it must be something else, but it reminded me a little bit of the sound of PNS in PsyTEL, that's why I guessed this.

Quote

The files guruboolez sent you were encoded with 'below telephone' quality to limit their size, if you encode the original with --thumb you will notice that bitrate is a lot higher but quality is excellent.

I know (and I've repeated it more than once now)... By the way, is there a Winamp plugin for *.ape files? The one for Monkey Audio I downloaded from Winamp's website only recognizes *.mac files, it seems.

Doing a large scale public listening test...

Reply #22 – 2002-12-10 08:24:48

There is a winamp plug-in for .ape file. The one bundled with the Monkey Pack from the official website works perfectly.

The sample I sent to you are not 'rare', but exemples for many (hundred) CD I can encode from my library (there are only classical + 1 er-hu disc ). I encoded them with --quality 1.50 approximatly, in order to reach the 64 kbps needed for :
· comparison with other formats
· mpc behaviour at low bitrate, actually.

Mpc in general seems to work differently than other formats. Continuous tone need much bitrate than a signal interrupted by a strong energy one [sorry for bad description]. This is not annoying for an audiophile encoding [size doesn't matter too much], but when priority to size is given, this is a real problem.

I tried to find a track with a 'pure' tonal instrument, and followed by the same signal, but with a drum supperposed. I find one in a Faith No more album (Midnight cowbow, track 13 from Midlife Crisis). The 14 first seconds are dedicated to accordion ; I isolate them in a single file. I isolate the following 14 seconds (accordion + drums & cymbals), more complex in appearance. Then , I encode the two passages [same size in .wav] with different codec in VBR, in order to prove that mpc will be the one to allocate more bitrate at the first sample.

Code: [Select]

                                                SAMPLE 1
mpc --thumb   :    128 kbps     [229 kb]
vorbis -q1 :        70 kbps     [125 kb]
aac PsyTEL -radio : 72 kbps     [129 kb]
mp3pro VBR 60    :  61 kbps     [110 kb]
Fastenc VBR[nero]:  84 kbps     [150 kb]

Code: [Select]

                                                SAMPLE 2
mpc --thumb   :    114 kbps    [204 kb]  -11%
vorbis -q1 :        80 kbps    [143 kb]  +14%
aac PsyTEL -radio : 90 kbps    [161 kb]  +24%
mp3pro VBR 60    :  78 kbps    [141 kb]  +28%
Fastenc VBR[nero]:  98 kbps    [175 kb]  +16%

I had a good feeling. MPC allocate in a different way the bitrate. Maybe a different behaviour in masking (don't know, I understand anything in psycho-accoustic).

Now, just imaging a whole album, with no drums, no attacks (for exemple, all string quartetts, all violin sonata, all flute music or quite symphonic music....), and it seems to be difficult to obtain with mpc (at least the SV7 1.14 beta and previous) a competitive challenger for vorbis, mp3pro or aac (note that I haven't listen to the files I encode ; I just know that mpc at 80 kbps will be the worse of all [-qual 1.99 gave 75 kbps for the first sample, --qual 2 = 97 kbps /// --qual 1.99 = 58 kbps with sample 2, and 85 kbps at --qual 2, but is not awful])

I suppose that mpc needs not only more tweaking in order to compete (or beat) vorbis and aac, but change his internal behaviour. Actually, all other formats are much better for classical (in general) than mpc at 60-80 kbps.

N.B. for hans-jürgen : I doesn't think that PNS responsable is for the 'washing-machine sound' you can heard with the file I sent to you and to Case. You can disable pns with --pns 0 if you want to compare.

EDIT : the whole track at --thumb with mppenc 1.14 is only 99 kbps. I'm not sure that a listening test at xx kbps is useful if the xx is based on a small and critical sample, except for a streaming purpose.

Doing a large scale public listening test...

Reply #23 – 2002-12-10 16:32:39

Quote

There is a winamp plug-in for .ape file. The one bundled with the Monkey Pack from the official website works perfectly.

OK thanks, then Winamp's plugin download site should probably be updated more often.

Quote

Mpc in general seems to work differently than other formats. Continuous tone need much bitrate than a signal interrupted by a strong energy one [sorry for bad description]. This is not annoying for an audiophile encoding [size doesn't matter too much], but when priority to size is given, this is a real problem.

Interesting, hopefully Frank Klemm knows and/or reads this.

Quote

I suppose that mpc needs not only more tweaking in order to compete (or beat) vorbis and aac, but change his internal behaviour. Actually, all other formats are much better for classical (in general) than mpc at 60-80 kbps.

The c't reference.wav has an opera excerpt at the end which MPC v1.14 --thumb could handle much better than Ogg Vorbis or MP3 and even a little bit better than PsyTEL AACEnc -radio -resample 32000 or mp3PRO for my taste, so I wouldn't buy your generalizing comment again.

Quote

N.B. for hans-jürgen : I doesn't think that PNS responsable is for the 'washing-machine sound' you can heard with the file I sent to you and to Case. You can disable pns with --pns 0 if you want to compare.

Good idea, I'll have to look for the complete Monkey Audio package first then.

Doing a large scale public listening test...

Reply #24 – 2002-12-10 20:58:21

Quote

I'm still thinking about including it... I'd end up with almost the same collection as you did in your 64kbps (though the intended audience is a completely different one).

I yust tested RealAudio and thought it must sound horible, but it didn´t! I couldn´t believe it! I encoded at 64kBit/s with Helix Producer Basic and it sounded great compared to WMA. If you want to test the best codecs at 64kBit/s RA8 should be definetly one of the canidats. Also it´s free and playable not only on win32, also on linux, mac... you should consider to add this codec to your test.

Notice