Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: New Listening Test (Read 105862 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

New Listening Test

Reply #125
Quote
And Dzamburu, there is no such thing as Vorbis 2.
I mean on this Aotuv vorbis, is amazing @ 48, did you try WMA 10, becouse i can force MP10 to 48kbs i have lastest vista 5342 or something like that.

New Listening Test

Reply #126
If there will be 2 samples. Very bad quality 1 point and other even worse 0.5 point but min.  mark is 1.

So the mark scale 0-100 will be  more appropriate.

New Listening Test

Reply #127
If there will be 2 samples. Very bad quality 1 point and other even worse 0.5 point but min.  mark is 1.

So the mark scale 0-100 will be  more appropriate.


Well, isn't this the reason why we have a low anchor? The low anchor gets the worst score and the other samples are ranked according to the low anchor. So, if low anchor gets 1, the others get 1.3 for example. If one sample sounds even worse than the low anchor, the low anchor can get 1.5 and the sample 1.

New Listening Test

Reply #128
I agree with Sebastian that a scale of 0-100 is not really justifiable and seems too granular. When a person ranks a sample as 73, he should be able to explain why he didn't rank it 72 or 74 instead. I can only speak for myself of course, but I certainly wouldn't be able to do this.
"We cannot win against obsession. They care, we don't. They win."

New Listening Test

Reply #129
I think that a 0-100 range would be appropriate if we expect big impairements.
In this context you are likely to only use the lower part of the scale for most contenders.
We will perhaps encounter 1 good contender, that would be ranked at 70% of the scale, and several "less good" ones that would be ranked between 25 and 50% of the scale, the low anchor beein ranked at about 10%.
In such case, you only have about 25% of dynamic range to rank most contenders. To me, a 0-100 scale makes sense.

0-100 scale and tools:
Right now we have a 40 points scale (1-5 with a .1 resolution). The 0-100 scale is more a matter of visual presentation to the user than a real granularity matter.
I think that a modified testing tool could display a 0-100 scale to the user, but output results in the 1-5 range in results file, allowing to keep tools modifications to a minimum.

New Listening Test

Reply #130
When a person ranks a sample as 73, he should be able to explain why he didn't rank it 72 or 74 instead.


The same for mark scale 1-5 . Why to choose 3.7 and not 3.6 or 3.8?  Sometimes I wanted  to put somethig like 3.75 .

The idea of previous post is  100% scale has higher "definition" (range) . Personaly I will feel myself more comfortable/free  to mark samples this way.

Sebastian Mares hm. I didn't think about low anchor at that moment. But 100% scale has error 0.01 , and 1-5 40 steps - error 0.025.
From previous 48 he-aac test CT and Nero were on par. With error 0.025  CT aac+  would have 3.18*102.5% = 3.25  , with error 0.01 - 3.18*101% = 3.21

New Listening Test

Reply #131
OK, just so the discussion is not needlessly simplified by lack of suitable software, I just put support for custom rating scales into ABC/HR for Java: Binaries, Sources

New Listening Test

Reply #132
You do quite the kick-azz job, schnofler.  Way to go!





New Listening Test

Reply #137
Thanks. Do I have to enter the additional offset before calculating the offsets, after, or doesn't it even matter?

Doesn't matter. The additional offset is simply added to the individual sample offsets.

New Listening Test

Reply #138
OK, thanks schnofler!

As for the new WMA codec, after several e-mails, and newgroups / forums posts I was told that I have to check with the licensing department of Microsoft. However, I doubt that they are going to give me a permission in time (or at all). There are two things that are possible now:
  • Wait until the public beta test of Windows Vista begins with the hope that Microsoft will also change the EULA allowing results to be posted
  • Test the old WMA codec

Personally, I would go with 2. However, I am not sure if the test is going to be so interesting.

The public beta test is scheduled to start in May.

New Listening Test

Reply #139
Damn, I don't know becouse wma9 don't belong in group of low bitrate encoders and have terrible quality at 48kbs. If you make second decision then let's test begin,  personally, i would to see that new Microsoft beast

New Listening Test

Reply #140
If you intend for this to be the last 48kbps test for a while then I'd say wait for the public beta.



New Listening Test

Reply #143
Small update: Microsoft is discussing my request internally and someone will be back in contact with me soon.


was MS going to present open beta test for WMA 10 already in may?

New Listening Test

Reply #144
Nobody ever mentioned an open beta test of WMA 10. The only open beta test that is planned for May is for Windows Vista, but I am not sure if they are also going to change the EULA to allow publishing of benchmark results and that sort of things.

New Listening Test

Reply #145
Threre is a small chance that Microsoft will provide you with a stand-alone app, then Vista's EULA does not matter.
Let's wait until MS decision, then discuss 

 

New Listening Test

Reply #147
Today's new major update to Nero's AAC codec should be relevant to this listening test.

Also, Sebastian, any love from Microsoft?


New Listening Test

Reply #149
Quote

I agree with you, range 0-100 is silly.


It is the range standardized in ITU for the ITU-R BS.1534 recommendation - testing of audio signals with large impairment, such as codecs at 48 kbps.



I don't see the point in the MUSHRA test, to be honest. It seems to be more of an excuse to allow mediocre-performing codec/bit rate combinations to get high scores. If a codec/bit rate combination does cause a large impairment, then I think it should be scored as a large impairment on the original BS.1116 test. At the end of the day, reduced bit rate audio coding should be about trying to get as close to the original as possible, IMO, not letting codecs off just because they're using a low bit rate.

And if you change the scale from 0-5 to 0-100 you run the risk of people confusing results from a BS.1116 test with a MUSHRA test, which has a totally different scale:

MUSHRA (EBU Tech Review article about MUSHRA):

80 - 100 Excellent
60 - 80 Good
40 - 60 Fair
20 - 40 Poor
0  - 20 Bad

which is not the same as the BS.1116 scale. And according to the EBU article it says:

"MUSHRA uses an unprocessed original programme material of full bandwidth as the reference signal. In addition, at least one additional signal (anchor) – being a low-pass filtered version of the unprocessed signal – should be used. The bandwidth of this additional signal should be 3.5 kHz."

So it's not really surprising that you get very high scores for mediocre performance when you've got such an incredibly low quality anchor.

For example, CT 48kbps HE AAC scored 88 in a MUSHRA test (if I remember correctly), whereas in the recent test on HA it only got somewhere in the region of 3.5 out of 5, and such a big difference has got to be down to the difference in the testing process.

I think you need to stick with the 0-5 scale using the BS.1116 test, as you've always used, or if you're going to change the scale to 0-100 you should fully implement the MUSHRA testing methodology, but not mix the two tests together.