Digital Radio listening test

Topic: Digital Radio listening test (Read 16179 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Digital Radio listening test

Reply #25 – 2003-03-05 09:57:46

Quote

I'd remove 112 and 160kbps for Layer II in order to reduce the number of samples.

This is the most sensible answer, looking at it purely as a listening test. The problem is that these are the two most likely bitrates for DAB stations to change to! So I have to include them in the test.

D.

Digital Radio listening test

Reply #26 – 2003-03-05 10:17:53

Quote

Quote
It may turn out that the bitrate is the least of the problems, or it may turn out that, given a sufficiently high bitrate, some of the others are less of a problem.

You may wish to perform a small preliminary test so that you have an idea of what's going to happen, even if that means just you sitting with headphones going through some of the permutations.

I'm sat here with my headphones on! It's quite interesting. It's quite shocking too - you have to transcode and add DRC to make it sound as bad as a typical DAB station - mp2 128kbps on its own doesn't sound that bad. Then again, I'm using the Qdesign encoder, but I have it on good authority that the BBC are using hardware encoders from 1996, while the commercial broadcasters are using dist10!!!

Quote

Let me look through my book tonight to see how it works. I don't remember seing a statistical method for the exact way you propose (always include a high and low anchor). But now that you mention it, that does sound like the way to do it. The statistical method that EBU proposes is basically a mean and 95% confidence intervals. It doesn't adjust for the fact that many different samples might be under test, which would tend to yield higher type I errors (false positives). However, for what you're doing, it might be good enough.

ff123

Edit: BTW, the EBU paper recommends sticking with one type of transducer for all participants, and keeping volume within +/- 4 dB of a reference level. It is my opinion that not following these recommendations will tend to reduce the sensitivity of the test, but at the same time, it may make it more representative of real-world listeners.

That's one of the distinctive points of this test - "real world" conditions and samples. Not golden ears with an excellent hi-fi system listening to harpsichord arpeggios! (OK, you and I know that the official listening tests aren't that obscure, but that's how people try to play them when they want to dismiss them). It will be much less sensative (or at least, some of the test subjects will, due to themselves of their equipment, be much less sensative), but that's real life!

How about this: Let's say each listening test is split in two - maybe everyone has to do both halves; maybe each listener only has to do one half.

In each half, I include the low and high anchors. If the anchors can be two of the test codecs, then the high anchor can be mp2 256 (not particularly high by the standards here, but it should be OK?), the low anchor can be mp2 112 (128 transcode may be even worse - is this a problem? - maybe a preliminary test to decide which is the low anchor!). This leaves:
mp2: 192, 160, 128, transcode, different encoder
AAC: 96 128 (problem: AAC 128 may be better than mp2 256 for some/all?)
FM simulation

8 to do (excluding anchors) - that's 4 in each half - or 6 in each half including anchors. That's a reasonable number. The split (which codec goes in which half) should be random.

If I remove AAC 128, then I can (hopefully) assume mp2 256 is the high anchor (though for some, maybe FM will be!) - the only problem then is that people will hear the anchor codecs twice, but the other codecs once - which may be a problem since the anchors are part of the test. Do you think this will work, or wreck it? Is there a better way? Should I make everyone do the two halves?

Maybe an alternative is to have two reference samples - not part of the test, but available for replay, basically saying "This is about as good as you can get - it's 4.9 or 5 depending on whether you can hear any problems" and "This is about as bad as you can get - it's 1.0 (or 2.0 or whatever it really is!)". Would that work? I was going to send people to your artefact examples page before the test for some training anyway!

Cheers,
David.

Digital Radio listening test

Reply #27 – 2003-03-05 10:40:36

Quote

Quote
You suggested ff123 that maybe they could be split across different tests, so no listener compared them all in one test, but overall, they were all tested against each other. Maybe that would work, but how?

Well, http://www.soundexpert.info works this way (interesting preliminary results, by the way... )

Hans,

Yes the soundexpert.info test seems to be working quite well. Is it the ogg results that you don't like? I think the problems of intorducing different codecs at different times will take a while to average out!

Quote

Right, that's why the EBU chose to expand their standard listening tests to the new MUSHRA method (MUlti Stimulus test with Hidden Reference and Anchors), so that the listeners would always have a low quality sample and the original within each test sequence to "rest their ears" and/or normalize their hearing ability. See http://www.ebu.ch/trev_283-kozamernik.pdf (in case you don't know it already).

It's always good to read it again!

ff123's "ABC/HR audio comparison tool" isn't quite MUSHRA, but I think it's just what's needed. BS1116 is great when the audible problems are just detectable. MUSHRA is great when the audible problems are obvious, and the problem is ranking them. Various papers suggest that "either" can be used in the region where the two tests overlap, but in truth "neither" is very useful in this case! So, when you know that you'll probably have some "can I really hear a difference?" samples, and some "yuk - that's horrible!" samples, ff123s hybrid tool is perfect. It's especially useful when using untrained listeners, because it gives the best of both (or even 3!) worlds, and whatever level they're at, it should be able to get reliable answers out of them! (I know you know this already, but I'm thinking out loud here, and letting you know that your advice was useful, and made me think!)

Quote

One or two less MP2 samples, maybe only 96 kbps AAC.

Maybe - they won't be using 128kbps AAC for a while in broadcasting, and it may prove to be a high anchor, so maybe I'll scrap it.

I'm going to try dropping all the possible codecs into ABC/HR to see how fatiguing ten codecs are in a single test.
EDIT: OK, eight

Cheers,
David.

Digital Radio listening test

Reply #28 – 2003-03-05 15:10:52

I'm wondering if a high anchor is really necessary: are we likely to get 4 or 5 codecs together which all sound really bad? Plus the original is always there on every choice.

Quote

I'm going to try dropping all the possible codecs into ABC/HR to see how fatiguing ten codecs are in a single test.

You'll realize in about 5 seconds that I only have space for eight, which I personally think is already too much

I've looking through my book, and what I'll probably do is scan it in to allow you to read the relevant sections for yourself. The is one method of interest called a "Balanced Incomplete Block" (BIB) design:

----------------------

BIB DESIGNS
"In BIB designs the panelists evaluate only a portion of the total number of samples (notationally, each panelist evaluates k of the total of t samples, k < t). The specific set of k samples that a panelist evaluats is selected so that in a single repetition of a BIB design every sample is evaluated an equal number of times (denoted by r), and all pairs of samples are evaluated together an equal number of times (denoted by lambda). The fact that r and lambda are constant for all the samples in a BIB design ensures that each sample mean is estimated with equal precision and that all pair-wise comparisons between two sample means are equally sensitive."

---------------------

It sounds very much like the method you propose, 2B, but perhaps it requires more listeners than you suspect to work out the statistics.

ff123

Digital Radio listening test

Reply #29 – 2003-03-05 15:51:59

Quote

It sounds very much like the method you propose, 2B, but perhaps it requires more listeners than you suspect to work out the statistics.

Thanks for looking this up - it does sound like what I was thinking.

I can understand how to make r constant, but the possible process for making lambda constant is making my head spin. There must be a formula for the minimum number of tests in a BIB cycle, even if I can't presently figure out how to juggle the codecs so that they meet the criteria.

As you say, this could require many more subjects (or repetitions) than is reasonable. Is there a way of calculating the required number?

What I did in a previous test (different circumstance, same resulting problem) was to say that, OK, I should really have either 24 or 48 or n times 24 subjects (in that particular test, IIRC), to make sure all possible combinations are tested equally; BUT since that's just not going to happen, AND because keeping track of all the possible combinations would make life more complicated, AND because most of the things which made a combination "different" weren't that important (e.g. presentation order); INSTEAD I'll just randomise all the things I don't want to measure or worry about (e.g. presentation order), and say that, over all, any bias due to any one or other of these things should cancel out, even though I haven't been ultra careful to make sure each and every combination of them all is equally likely.

In fact, we do that all the time - ABC/HR randomises the order, without checking that every possible order is used once or "n times" during an entire test. We don't worry, because, we assume that as long as the statistical bias due to each sample always appearing in the same place is removed, then whatever effect remains is drastically smaller than the one we hope to measure.

I'm not sure I can get away with this here. I did previously, because the effect I was measuring was huge (and I was only measuring 4 samples) whereas the effects which multiplied to 24 combinations were quite small. The effect of partitioning samples into two groups, where the actual content of each group may effect the judgement, may not be small. If there were no effect, then with samples with obvious artefacts, BS1116 would give the same results as MUSHRA (though more slowly) - this clearly isn't the case, because the comparison afforded by MUSHRA is useful to test subjects.

I'd be interested to know the required subject numbers for doing it the statistically correct way.

Cheers,
David.

P.S. I agree with your first statement - a high quality anchor is probably redundant in such a high quality test. Maybe a low quality anchor is too, if the BIB method means that it'll all average out anyway.

Digital Radio listening test

Reply #30 – 2003-03-05 15:54:43

Ah - a web search appears to answer my question about subject numbers using BIB

EDIT: I retract that statement - I don't understand a word of it!

EDIT 2: Ok, I understand some of it - it's not simple, is it?

If I've got it correct, splitting 8 codecs into blocks of 4 would require 14 blocks to complete the BIBD sequence. (Assuming I can figure out the correct sequence.)

This raises one huge problem: It's alright the server sending out the 14 blocks in the sequence, but if one person doesn't come back with answers, you're one block short, and the whole thing falls apart. A solution is to have all 8 codecs in each download, and ABC/HR contacts the server to check which 4 should be chosen for the current block. If someone never sends in the results for that block, someone else will have to be given it later. You need "14 times n" people for each sample.

This all adds an amazing amount of complexity to the task - do you think it's possible, or worth it? It's an interesting challenge, but it's easy for me to say that because it's not my program!

Unless this is something you want to do (since it's work for you rather than me), I'll sleep on it and try and think of something different! (Like fewer codecs).

Cheers,
David.

Digital Radio listening test

Reply #31 – 2003-03-06 06:02:30

Quote

If I've got it correct, splitting 8 codecs into blocks of 4 would require 14 blocks to complete the BIBD sequence. (Assuming I can figure out the correct sequence.)

Yeah, that's a problem isn't it? My book doesn't explain how to construct the table, although it does have an example of one which uses 3 out of 7 samples.

Quote

This raises one huge problem: It's alright the server sending out the 14 blocks in the sequence, but if one person doesn't come back with answers, you're one block short, and the whole thing falls apart. A solution is to have all 8 codecs in each download, and ABC/HR contacts the server to check which 4 should be chosen for the current block. If someone never sends in the results for that block, someone else will have to be given it later. You need "14 times n" people for each sample.

Although I don't know how many times each pair of codecs gets tested in this particular table, I'd guess it's something like 3 or 4 times. So assuming it's 4 times, you'd need 14 * 4 = 56 people to get the equivalent of 16 people listening to all codecs at a single sitting. That's a high price to pay. Not to mention the waste that occurs if the entire table isn't completed. And the complexity on the server side. I'm liking this option less and less.

ff123

Digital Radio listening test

Reply #32 – 2003-03-06 10:16:53

Quote

I'm liking this option less and less.

Me too.

I'll include 8 codecs per test, and that's that.

I'll have some trial runs to see if it's still too many. I know it probably is, but I can't see what else to leave out. I'm currently down to:

mp2: 112, 128, 160, 192
AAC: 96
FM simulation
mp2 128 transcoded
mp2 128 using a different encoder

I'll probably do the normal four mp2s with a good encoder, and the alternative with a poor one. Or maybe the other way round! I've been told that most UK broadcasters are using something based on dist10.

Sample selection next. Are there any killer/difficult samples for mp2? Preferably "normal" music - ie something that might actually make it onto the radio!

Cheers,
David.

Digital Radio listening test

Reply #33 – 2003-03-06 14:03:43

Well, any Beatles recording should be catastrophic, considering the joint stereo mode (IS) of mp2.
There are 2 choices for the encoder:
*do not use joint stereo, and so full stereo at 128kbps would be catastrophic
*use joint stereo and totally destroy the not-so-subtle stereo effects of Beatles recordings

Digital Radio listening test

Reply #34 – 2003-03-06 15:01:53

Quote

I'll include 8 codecs per test, and that's that.

I'll have some trial runs to see if it's still too many. I know it probably is, but I can't see what else to leave out. I'm currently down to:

Ouch. Perhaps there is still an alternative. You can split the codecs up into two tests and treat them separately. The disadvantage, of course is that you can't really directly compare one set against another in a statistically proper way, but maybe that's the lesser of two evils when compared with listening to 8 codecs per session.

Quote

Sample selection next. Are there any killer/difficult samples for mp2? Preferably "normal" music - ie something that might actually make it onto the radio!

How about if you have people post or upload short samples of music they like and have heard (or think they might hear) over the radio? But I hope you have a broadband connection for getting those samples by email or by ftp or by Usenet.

ff123

Digital Radio listening test

Reply #35 – 2003-03-07 10:54:43

I'm a big Beatles fan - I'll try some CDs.

Please - do not email me clips of what you hear on the radio! I do have a broadband connection at work, but I can easily listen to the radio myself!

I was wondering if, like mp3, there were any samples that mp2 couldn't cope with at any bitrate. I might not use them, but it would be interesting to hear them!

So far I have harpsichord.

As for 8 samples being too many - I'll try them on some unsuspecting person - but I fear you're right, because you've done this plenty of times before. I tried it and I found it easy-ish, because I could discount two as being transparent, another as obviously being the worst, which "only" left 5 to rank. Your ABC/HR program makes it a lot easier (and less biassed) than ABXing them and then playing them in Winamp!

Maybe AAC will have to go, but that removes any relevance for those in the States.

Cheers,
David.

Digital Radio listening test

Reply #36 – 2003-03-11 13:42:08

If anyone is following this...

The Glockenspiel from the SQAM CD is another mp2 killer sample to go along side the Harpsichord from that same CD.

However, it's not quite the "normal music" mp2 killer sample I was looking for! If anyone has any ideas, I'm all ears.

Cheers,
David.

Digital Radio listening test

Reply #37 – 2003-03-11 17:31:02

On ff123.net
Filburt's test sample, the first 30 seconds from Dave Matthews Band, "#41."

This is fairly recent popular music recording with sharp hi-hat transients that could encourage pre-echo and high frequency problems, and it's natural music.

The fatboy codec killer (Fatboy Slim's opening to "Kalifornia", track 6 on You've Come A Long Way Baby) is also popular and recent, albeit a somewhat artificially generated sample.

Or the applause sample? Live concerts are going to be played on DAB.

I don't know which of these would give problems to .mp2

What are the criteria for the sample being a legitimate test of DAB? Popular music, recent music, natural sounds (as opposed to artificial)?

Regards,

DickD

Digital Radio listening test

Reply #38 – 2003-03-13 14:48:56

Thanks DickD!

Because some of the worst mp3 killers do nothing with mp2 (fatboy, castanets), I hadn't tried the others. But after your suggestion, it turns out that the applause sample is absolutely terrible!

I'll try and find a different, but equally challenging piece of applause, though I will use the "well known" one if I must.

I don't have a strict criteria for test samples in mind - I'm just trying to avoid anything that would never be played on mainstream radio. When normal people or radio execs read the list of samples, they should think "yeah, I play/listen to that - if digital radio can't broadcast this properly, then we have a problem on our hands!" not "who'd listen to this?!"

Thanks for the help,

Cheers,
David.

P.S. where did applaud.wav come from? I have the file, but the link from the lame samples site "with information" doesn't have the information anymore! Which CD is it taken from?

Digital Radio listening test

Reply #39 – 2003-03-13 15:34:19

Quote

P.S. where did applaud.wav come from? I have the file, but the link from the lame samples site "with information" doesn't have the information anymore! Which CD is it taken from?

It came from the Eagles's live version of Hotel California.

On the 64 kbit/s test, I used a sample with applause in it:

http://ff123.net/samples/Layla.flac

ff123

Digital Radio listening test

Reply #40 – 2003-03-13 17:29:24

David,

I haven't encoded with mp2 since about 1997 (using whatever encoder I had in CoolEdit 96 at 256 kbps or so) before I found FhG's l3enc for .mp3

Surmising that the original MiniDisc encoding (before LP) was rather similar to MP2, I guessed that fatboy.wav would trash MP2 like it trashed MiniDisc (loads of added hiss, but fairly pleasant, analogue-sounding hiss, so I'm still content to use MiniDisc, which only occasionally trips up and not too badly on the stuff I mostly listen too). Glad I'm wrong, as the sort of artificial audio manipulation used on that track isn't all that uncommon in today's music.

Cheers,
DickD

Digital Radio listening test

Reply #41 – 2003-03-14 10:17:43

Thanks ff123 - I guess I'll have to buy it. Is it the 1989 CD? This is the only "Live" Eagles album at cdnow. I know someone who owns it - but they have the DTS 5.1 CD - it's very nice.

DickD - There are two types of codecs: sub-band and transform. mp2 uses a filter bank only, so it's a sub-band codec (like mpc), mp3 adds a DCT (Discrete cosine TRansform) to this, making it a transform codec. AAC and OGG don't have a filterbank, but keep the DCT (or similar?) transform, so they're pure transform codecs. MiniDisc ATRAC is a hybrid codec, which (I think) means it uses both; whether this is both in series like mp3, or both in parallel/switched somehow, I'd have to look at the MD website to find out. But whatever, it's a transform codec.

There were well known problems with sub-band codecs, and mp2 in particular. The biggest problem is that they won't work well below approx 192kbps, but there are also a few problem samples at higher bitrates. Transform codecs like mp3 (and ATRAC on MD) get the bitrate down, but the transform introduces a whole new set of problems - mainly on transients - especially tonal sounds made up from repeated transients.

Many of the "problem" clips for mp3 (and, to a lesser extent, ATRAC, OGG, and AAC) fall into this category: it's the time > frequency transform that causes the problem. These clips aren't a problem for mpc (and, to a lesser extent, mp2) because it doesn't contain a time > frequency transform.

Where the mp3 test clips are a "problem" due to Joint Stereo, they do trip up lower bitrate mp2. The "problem" mpc clips I have (from the development stage - they're not a problem anymore) are so subtle that, given the general performance of mp2 at lower bitrates, you'd hardly count it as a serious problem!

If anyone reading this is thinking "Why don't we just use mp2 at higher bitrates then" - well, mpc is like mp2 in some ways, and better in many others - so it's a great choice. If you're stuck with mp2 (like in DAB) then the simple answer to decent quality is to use the codec at the higher bitrates it was designed for. We've just got to convince the UK broadcasters of this!

Cheers,
David.

Digital Radio listening test

Reply #42 – 2003-03-14 16:32:17

Quote

Thanks ff123 - I guess I'll have to buy it. Is it the 1989 CD? This is the only "Live" Eagles album at cdnow. I know someone who owns it - but they have the DTS 5.1 CD - it's very nice.

Hmm, I don't know. Someone has an mp3 of it at work, and I happened to notice the applause section at the end.

I'm downloading a version right now. The funny thing is that it's different from the one at work (at least I think it is). But the applause at the end seems to be the same, starting from where the singer says "Thank you," then there's a whistling, and then there's somebody who makes a distinctive whoohoo cheer!

They don't just graft on stock applause, do they?

ff123

I'll check to make sure the one at work is really different. Maybe I'm just misremembering it.

Digital Radio listening test

Reply #43 – 2003-03-14 16:40:26

Quote

Thanks ff123 - I guess I'll have to buy it. Is it the 1989 CD? This is the only "Live" Eagles album at cdnow. I know someone who owns it - but they have the DTS 5.1 CD - it's very nice.

No, it's off the 1994 "Hell Freezes Over". It will probably not me marked as "live" because it also includes some new studio tracks.

(Just a question, I think I've missed something here - why do you need to buy the CD if you've got the sample already?)

Digital Radio listening test

Reply #44 – 2003-03-21 14:12:00

I have a problem...

I have to choose 1 mp2 encoder to use for most of the clips/bitrates. I have Qdesign, tooLame, and SoloH. The latter two have psychoacoustic models 1 and 2 (toolame has -1, 0, 3, and 4 as well!). Qdesign has only a quality/speed switch - the quality setting doesn't give better quality that the speed setting 100% of the time!!!

To my ears, where there's little stereo image, Qdesign and tooLame psychoacoustic model 1 (p1) come out on top. Where there's lots of stereo image, these two come out worst (especially Qdesign) - tooLame p2 is best. I'd be tempted to use tooLame p2 as representative, but it gives problems in some vocals, making it sounds like the singer has a "frog in their throat". Mike Cheng (tooLame author) notes that it sounds like Davros (of Dr Who fame - never mind - you're all too young! )

My gut feeling is that it would be wrong and unrepresentative to pick the best encoder (+ encoder option) for each sample - since no one in a radio station will sit changing encoding parameters to match source material all day. So I need to pick the overall best. Can I reliably do this myself, or shall I run a pre-test test first to determine this?

It's even more difficult than taking a sample and comparing 6 encodes (3 encoders, 2 options each) - there are different bitrates, and at some bitrates, the choice of stereo or joint stereo is not obvious. An exhaustive pre-test test would be even bigger than the planned test itself!

I'm leaning towards lame p2 as the best all-round encoder, and just going with that - ideas? comments?

Cheers,
David.

Digital Radio listening test

Reply #45 – 2003-03-21 15:40:46

I'm surprised to hear that QDesign might not be quite better than ISO code.

I think that you should perhaps ask David McIntyre, who is (was) working for QDesign.
Some of us probably remember him from old the mp3.com msg board times.

Digital Radio listening test

Reply #46 – 2003-03-21 17:02:27

Quote

It's even more difficult than taking a sample and comparing 6 encodes (3 encoders, 2 options each) - there are different bitrates, and at some bitrates, the choice of stereo or joint stereo is not obvious. An exhaustive pre-test test would be even bigger than the planned test itself!

I'm leaning towards lame p2 as the best all-round encoder, and just going with that - ideas? comments?

So, in real life, the radio station would first choose their bitrate, and then choose the mp2 encoder they're going to use, based on which one sounds best at that bitrate?

Why am I somewhat skeptical that it actually works this way?

The pre-test is meant to save you time, not to add time! Maybe the best advice is to use your best judgment. However, short pretests wouldn't be prohibitive.

ff123

Digital Radio listening test

Reply #47 – 2003-03-24 09:48:32

Quote

So, in real life, the radio station would first choose their bitrate, and then choose the mp2 encoder they're going to use, based on which one sounds best at that bitrate?

Why am I somewhat skeptical that it actually works this way?

Well, listening to the output, the radio stations probably bought the first (i.e. oldest) encoders, left them set to psy model 1 (desipte psy model 2 being better on most material), and now run them at 128kbps.

I'm half tempted to do the same in my test!

However, I want a test which accurately reflects the possibilities on DAB, not just it's current limitations. So, if it would honestly be better to buy a new encoder rather than increase the bitrates, then so be it. (I would imagine it would be best to do both, but...)

I'll just go with my best judgement on this, since it's probably as good as that of most engineers in the field - at least in the are of audio coding - I don't claim to share their knowledge in other areas.

Cheers,
David.

Digital Radio listening test

Reply #48 – 2003-04-26 04:30:07

Quote

Quite OT: hey, do you think that OpenDRM (based on Vorbis) could be useful ?

Where did you see about this?

I searched for OpenDRM on Google, and all I found was info about Open Digital Rights Management. :B

Notice