New Listening Test

Topic: New Listening Test (Read 105862 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

New Listening Test

Reply #125 – 2006-04-12 21:44:54

Quote

And Dzamburu, there is no such thing as Vorbis 2.

I mean on this Aotuv vorbis, is amazing @ 48, did you try WMA 10, becouse i can force MP10 to 48kbs i have lastest vista 5342 or something like that.

New Listening Test

Reply #126 – 2006-04-12 21:45:10

If there will be 2 samples. Very bad quality 1 point and other even worse 0.5 point but min. mark is 1.

So the mark scale 0-100 will be more appropriate.

New Listening Test

Reply #127 – 2006-04-13 09:43:54

Quote from: IgorC on 2006-04-12 21:45:10

If there will be 2 samples. Very bad quality 1 point and other even worse 0.5 point but min. mark is 1.

So the mark scale 0-100 will be more appropriate.

Well, isn't this the reason why we have a low anchor? The low anchor gets the worst score and the other samples are ranked according to the low anchor. So, if low anchor gets 1, the others get 1.3 for example. If one sample sounds even worse than the low anchor, the low anchor can get 1.5 and the sample 1.

New Listening Test

Reply #128 – 2006-04-13 11:00:21

I agree with Sebastian that a scale of 0-100 is not really justifiable and seems too granular. When a person ranks a sample as 73, he should be able to explain why he didn't rank it 72 or 74 instead. I can only speak for myself of course, but I certainly wouldn't be able to do this.

New Listening Test

Reply #129 – 2006-04-13 11:21:56

I think that a 0-100 range would be appropriate if we expect big impairements.
In this context you are likely to only use the lower part of the scale for most contenders.
We will perhaps encounter 1 good contender, that would be ranked at 70% of the scale, and several "less good" ones that would be ranked between 25 and 50% of the scale, the low anchor beein ranked at about 10%.
In such case, you only have about 25% of dynamic range to rank most contenders. To me, a 0-100 scale makes sense.

0-100 scale and tools:
Right now we have a 40 points scale (1-5 with a .1 resolution). The 0-100 scale is more a matter of visual presentation to the user than a real granularity matter.
I think that a modified testing tool could display a 0-100 scale to the user, but output results in the 1-5 range in results file, allowing to keep tools modifications to a minimum.

New Listening Test

Reply #130 – 2006-04-13 14:45:41

Quote from: stephanV on 2006-04-13 11:00:21

When a person ranks a sample as 73, he should be able to explain why he didn't rank it 72 or 74 instead.

The same for mark scale 1-5 . Why to choose 3.7 and not 3.6 or 3.8? Sometimes I wanted to put somethig like 3.75 .

The idea of previous post is 100% scale has higher "definition" (range) . Personaly I will feel myself more comfortable/free to mark samples this way.

Sebastian Mares hm. I didn't think about low anchor at that moment. But 100% scale has error 0.01 , and 1-5 40 steps - error 0.025.
From previous 48 he-aac test CT and Nero were on par. With error 0.025 CT aac+ would have 3.18*102.5% = 3.25 , with error 0.01 - 3.18*101% = 3.21

New Listening Test

Reply #131 – 2006-04-14 02:02:04

OK, just so the discussion is not needlessly simplified by lack of suitable software, I just put support for custom rating scales into ABC/HR for Java: Binaries, Sources

New Listening Test

Reply #132 – 2006-04-14 02:43:02

You do quite the kick-azz job, schnofler. Way to go!

New Listening Test

Reply #133 – 2006-04-14 06:46:26

Quote from: schnofler on 2006-04-14 02:02:04

OK, just so the discussion is not needlessly simplified by lack of suitable software, I just put support for custom rating scales into ABC/HR for Java: Binaries, Sources

Thank you very much.

New Listening Test

Reply #134 – 2006-04-14 08:42:56

Any chance you could add an option for incrementing 2000 to the calculated offsets? Currently, you have to calculate the offsets and then go through each sample and manually type in the offset.

New Listening Test

Reply #135 – 2006-04-14 14:19:41

Quote from: Sebastian Mares on 2006-04-14 08:42:56

Any chance you could add an option for incrementing 2000 to the calculated offsets? Currently, you have to calculate the offsets and then go through each sample and manually type in the offset.

Here you go: Binaries, Sources

New Listening Test

Reply #136 – 2006-04-14 14:58:46

Thanks. Do I have to enter the additional offset before calculating the offsets, after, or doesn't it even matter?

New Listening Test

Reply #137 – 2006-04-14 15:07:09

Quote from: Sebastian Mares on 2006-04-14 14:58:46

Thanks. Do I have to enter the additional offset before calculating the offsets, after, or doesn't it even matter?

Doesn't matter. The additional offset is simply added to the individual sample offsets.

New Listening Test

Reply #138 – 2006-04-15 17:12:25

OK, thanks schnofler!

As for the new WMA codec, after several e-mails, and newgroups / forums posts I was told that I have to check with the licensing department of Microsoft. However, I doubt that they are going to give me a permission in time (or at all). There are two things that are possible now:

Wait until the public beta test of Windows Vista begins with the hope that Microsoft will also change the EULA allowing results to be posted
Test the old WMA codec

Personally, I would go with 2. However, I am not sure if the test is going to be so interesting.

The public beta test is scheduled to start in May.

New Listening Test

Reply #139 – 2006-04-15 18:15:27

Damn, I don't know becouse wma9 don't belong in group of low bitrate encoders and have terrible quality at 48kbs. If you make second decision then let's test begin, personally, i would to see that new Microsoft beast

New Listening Test

Reply #140 – 2006-04-16 08:08:19

If you intend for this to be the last 48kbps test for a while then I'd say wait for the public beta.

New Listening Test

Reply #141 – 2006-04-16 08:20:32

Well, I don't know what to intend. Unless the big players HE-AAC and Vorbis don't face major improvements, it's going to be the last at 48 kbps for at least a year.

New Listening Test

Reply #142 – 2006-04-21 08:19:06

Small update: Microsoft is discussing my request internally and someone will be back in contact with me soon.

New Listening Test

Reply #143 – 2006-04-22 13:58:22

Quote from: Sebastian Mares on 2006-04-21 08:19:06

Small update: Microsoft is discussing my request internally and someone will be back in contact with me soon.

was MS going to present open beta test for WMA 10 already in may?

New Listening Test

Reply #144 – 2006-04-22 14:09:03

Nobody ever mentioned an open beta test of WMA 10. The only open beta test that is planned for May is for Windows Vista, but I am not sure if they are also going to change the EULA to allow publishing of benchmark results and that sort of things.

New Listening Test

Reply #145 – 2006-04-22 15:16:31

Threre is a small chance that Microsoft will provide you with a stand-alone app, then Vista's EULA does not matter.
Let's wait until MS decision, then discuss

New Listening Test

Reply #146 – 2006-04-22 16:05:28

Yeah, I hope so too.

New Listening Test

Reply #147 – 2006-04-25 18:24:46

Today's new major update to Nero's AAC codec should be relevant to this listening test.

Also, Sebastian, any love from Microsoft?

New Listening Test

Reply #148 – 2006-04-25 18:32:02

Nope.

I also have problems getting the new April build to run under VMware - it keeps crashing with a BSOD. I have to squeeze my Ubuntu partition a bit.

New Listening Test

Reply #149 – 2006-05-02 14:14:24

Quote from: Ivan Dimkovic on 2006-04-12 16:55:38

Quote

I agree with you, range 0-100 is silly.

It is the range standardized in ITU for the ITU-R BS.1534 recommendation - testing of audio signals with large impairment, such as codecs at 48 kbps.

I don't see the point in the MUSHRA test, to be honest. It seems to be more of an excuse to allow mediocre-performing codec/bit rate combinations to get high scores. If a codec/bit rate combination does cause a large impairment, then I think it should be scored as a large impairment on the original BS.1116 test. At the end of the day, reduced bit rate audio coding should be about trying to get as close to the original as possible, IMO, not letting codecs off just because they're using a low bit rate.

And if you change the scale from 0-5 to 0-100 you run the risk of people confusing results from a BS.1116 test with a MUSHRA test, which has a totally different scale:

MUSHRA (EBU Tech Review article about MUSHRA):

80 - 100 Excellent
60 - 80 Good
40 - 60 Fair
20 - 40 Poor
0 - 20 Bad

which is not the same as the BS.1116 scale. And according to the EBU article it says:

"MUSHRA uses an unprocessed original programme material of full bandwidth as the reference signal. In addition, at least one additional signal (anchor) – being a low-pass filtered version of the unprocessed signal – should be used. The bandwidth of this additional signal should be 3.5 kHz."

So it's not really surprising that you get very high scores for mediocre performance when you've got such an incredibly low quality anchor.

For example, CT 48kbps HE AAC scored 88 in a MUSHRA test (if I remember correctly), whereas in the recent test on HA it only got somewhere in the region of 3.5 out of 5, and such a big difference has got to be down to the difference in the testing process.

I think you need to stick with the 0-5 scale using the BS.1116 test, as you've always used, or if you're going to change the scale to 0-100 you should fully implement the MUSHRA testing methodology, but not mix the two tests together.

Notice