Skip to main content

Topic: New Listening Test (Read 83837 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • Dzamburu
  • [*][*]
  • Banned
New Listening Test
Reply #125
Quote
And Dzamburu, there is no such thing as Vorbis 2.
I mean on this Aotuv vorbis, is amazing @ 48, did you try WMA 10, becouse i can force MP10 to 48kbs i have lastest vista 5342 or something like that.

  • IgorC
  • [*][*][*][*][*]
New Listening Test
Reply #126
If there will be 2 samples. Very bad quality 1 point and other even worse 0.5 point but min.  mark is 1.

So the mark scale 0-100 will be  more appropriate.

New Listening Test
Reply #127
If there will be 2 samples. Very bad quality 1 point and other even worse 0.5 point but min.  mark is 1.

So the mark scale 0-100 will be  more appropriate.


Well, isn't this the reason why we have a low anchor? The low anchor gets the worst score and the other samples are ranked according to the low anchor. So, if low anchor gets 1, the others get 1.3 for example. If one sample sounds even worse than the low anchor, the low anchor can get 1.5 and the sample 1.

  • stephanV
  • [*][*][*][*]
New Listening Test
Reply #128
I agree with Sebastian that a scale of 0-100 is not really justifiable and seems too granular. When a person ranks a sample as 73, he should be able to explain why he didn't rank it 72 or 74 instead. I can only speak for myself of course, but I certainly wouldn't be able to do this.
"We cannot win against obsession. They care, we don't. They win."

  • Gabriel
  • [*][*][*][*][*]
  • Developer
New Listening Test
Reply #129
I think that a 0-100 range would be appropriate if we expect big impairements.
In this context you are likely to only use the lower part of the scale for most contenders.
We will perhaps encounter 1 good contender, that would be ranked at 70% of the scale, and several "less good" ones that would be ranked between 25 and 50% of the scale, the low anchor beein ranked at about 10%.
In such case, you only have about 25% of dynamic range to rank most contenders. To me, a 0-100 scale makes sense.

0-100 scale and tools:
Right now we have a 40 points scale (1-5 with a .1 resolution). The 0-100 scale is more a matter of visual presentation to the user than a real granularity matter.
I think that a modified testing tool could display a 0-100 scale to the user, but output results in the 1-5 range in results file, allowing to keep tools modifications to a minimum.

  • IgorC
  • [*][*][*][*][*]
New Listening Test
Reply #130
When a person ranks a sample as 73, he should be able to explain why he didn't rank it 72 or 74 instead.


The same for mark scale 1-5 . Why to choose 3.7 and not 3.6 or 3.8?  Sometimes I wanted  to put somethig like 3.75 .

The idea of previous post is  100% scale has higher "definition" (range) . Personaly I will feel myself more comfortable/free  to mark samples this way.

Sebastian Mares hm. I didn't think about low anchor at that moment. But 100% scale has error 0.01 , and 1-5 40 steps - error 0.025.
From previous 48 he-aac test CT and Nero were on par. With error 0.025  CT aac+  would have 3.18*102.5% = 3.25  , with error 0.01 - 3.18*101% = 3.21
  • Last Edit: 13 April, 2006, 09:54:34 AM by IgorC

  • schnofler
  • [*][*][*]
  • Developer
New Listening Test
Reply #131
OK, just so the discussion is not needlessly simplified by lack of suitable software, I just put support for custom rating scales into ABC/HR for Java: Binaries, Sources

  • Shade[ST]
  • [*][*][*][*][*]
New Listening Test
Reply #132
You do quite the kick-azz job, schnofler.  Way to go!

  • Gabriel
  • [*][*][*][*][*]
  • Developer
New Listening Test
Reply #133
OK, just so the discussion is not needlessly simplified by lack of suitable software, I just put support for custom rating scales into ABC/HR for Java: Binaries, Sources

Thank you very much.

New Listening Test
Reply #134
Any chance you could add an option for incrementing 2000 to the calculated offsets? Currently, you have to calculate the offsets and then go through each sample and manually type in the offset.

  • schnofler
  • [*][*][*]
  • Developer
New Listening Test
Reply #135
Any chance you could add an option for incrementing 2000 to the calculated offsets? Currently, you have to calculate the offsets and then go through each sample and manually type in the offset.

Here you go: Binaries, Sources

New Listening Test
Reply #136
Thanks. Do I have to enter the additional offset before calculating the offsets, after, or doesn't it even matter?
  • Last Edit: 14 April, 2006, 09:59:01 AM by Sebastian Mares

  • schnofler
  • [*][*][*]
  • Developer
New Listening Test
Reply #137
Thanks. Do I have to enter the additional offset before calculating the offsets, after, or doesn't it even matter?

Doesn't matter. The additional offset is simply added to the individual sample offsets.

New Listening Test
Reply #138
OK, thanks schnofler!

As for the new WMA codec, after several e-mails, and newgroups / forums posts I was told that I have to check with the licensing department of Microsoft. However, I doubt that they are going to give me a permission in time (or at all). There are two things that are possible now:
  • Wait until the public beta test of Windows Vista begins with the hope that Microsoft will also change the EULA allowing results to be posted
  • Test the old WMA codec

Personally, I would go with 2. However, I am not sure if the test is going to be so interesting.

The public beta test is scheduled to start in May.

  • Dzamburu
  • [*][*]
  • Banned
New Listening Test
Reply #139
Damn, I don't know becouse wma9 don't belong in group of low bitrate encoders and have terrible quality at 48kbs. If you make second decision then let's test begin,  personally, i would to see that new Microsoft beast

  • SwiftBiscuit
  • [*]
  • Members (Donating)
New Listening Test
Reply #140
If you intend for this to be the last 48kbps test for a while then I'd say wait for the public beta.

New Listening Test
Reply #141
Well, I don't know what to intend. Unless the big players HE-AAC and Vorbis don't face major improvements, it's going to be the last at 48 kbps for at least a year.

New Listening Test
Reply #142
Small update: Microsoft is discussing my request internally and someone will be back in contact with me soon.

  • IgorC
  • [*][*][*][*][*]
New Listening Test
Reply #143
Small update: Microsoft is discussing my request internally and someone will be back in contact with me soon.


was MS going to present open beta test for WMA 10 already in may?

New Listening Test
Reply #144
Nobody ever mentioned an open beta test of WMA 10. The only open beta test that is planned for May is for Windows Vista, but I am not sure if they are also going to change the EULA to allow publishing of benchmark results and that sort of things.

  • SirGrey
  • [*][*][*]
New Listening Test
Reply #145
Threre is a small chance that Microsoft will provide you with a stand-alone app, then Vista's EULA does not matter.
Let's wait until MS decision, then discuss 

New Listening Test
Reply #146
Yeah, I hope so too.

  • Supacon
  • [*][*][*][*][*]
  • Members (Donating)
New Listening Test
Reply #147
Today's new major update to Nero's AAC codec should be relevant to this listening test.

Also, Sebastian, any love from Microsoft?

New Listening Test
Reply #148
Nope.

I also have problems getting the new April build to run under VMware - it keeps crashing with a BSOD. I have to squeeze my Ubuntu partition a bit.

New Listening Test
Reply #149
Quote

I agree with you, range 0-100 is silly.


It is the range standardized in ITU for the ITU-R BS.1534 recommendation - testing of audio signals with large impairment, such as codecs at 48 kbps.



I don't see the point in the MUSHRA test, to be honest. It seems to be more of an excuse to allow mediocre-performing codec/bit rate combinations to get high scores. If a codec/bit rate combination does cause a large impairment, then I think it should be scored as a large impairment on the original BS.1116 test. At the end of the day, reduced bit rate audio coding should be about trying to get as close to the original as possible, IMO, not letting codecs off just because they're using a low bit rate.

And if you change the scale from 0-5 to 0-100 you run the risk of people confusing results from a BS.1116 test with a MUSHRA test, which has a totally different scale:

MUSHRA (EBU Tech Review article about MUSHRA):

80 - 100 Excellent
60 - 80 Good
40 - 60 Fair
20 - 40 Poor
0  - 20 Bad

which is not the same as the BS.1116 scale. And according to the EBU article it says:

"MUSHRA uses an unprocessed original programme material of full bandwidth as the reference signal. In addition, at least one additional signal (anchor) – being a low-pass filtered version of the unprocessed signal – should be used. The bandwidth of this additional signal should be 3.5 kHz."

So it's not really surprising that you get very high scores for mediocre performance when you've got such an incredibly low quality anchor.

For example, CT 48kbps HE AAC scored 88 in a MUSHRA test (if I remember correctly), whereas in the recent test on HA it only got somewhere in the region of 3.5 out of 5, and such a big difference has got to be down to the difference in the testing process.

I think you need to stick with the 0-5 scale using the BS.1116 test, as you've always used, or if you're going to change the scale to 0-100 you should fully implement the MUSHRA testing methodology, but not mix the two tests together.
  • Last Edit: 02 May, 2006, 10:03:45 AM by digitalradiotech