Skip to main content

Topic: Listening test using 2013-03-09 build (Read 11297 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • RobertM
  • [*]
Listening test using 2013-03-09 build
I completed a listening test against Opus files encoded with the latest build (as of 2013-09-03). This time I've actually been more thorough - ABX test results from foobar2000 are attached along with the Opus-encoded files. I also took azaqiel's advice and updated the version reported by the encoder, to prevent any confusion.

"Sample 01" from the page below was used for the test. May repeat the test later with other difficult samples.
http://people.xiph.org/~greg/opus/ha2011/


Summary:

Results were very much as expected. Opus quality has definitely improved over time and gets closer to transparency with higher bitrate.

1. 64kb/s from the above page (old opus version) and 64kb/s from the newest Opus version

There was a noticeable improvement in quality with the new Opus version

2. 64kb/s vs original

It was fairly easy to tell the difference, but still quite good quality

3. 96 kb/s vs original

Could still tell the difference but artifacts were noticably improved from the 64kb/s file

4. 128 kb/s vs original

Still can hear a very subtle artifact introduced by the codec (which appears on the note between 2.155 seconds and 2.423 seconds) but had to strain to hear it.

5. 256 kb/s vs original

Very close to transparent. I managed to tell the difference sometimes by listening very hard for the artifact. However, my ability to tell the two apart was far from perfect.

6. 500 kb/s vs original

This was transparent to me.

  • zerowalker
  • [*][*][*][*]
Listening test using 2013-03-09 build
Reply #1
Isn´t that pretty bad, to not be able to reach transparency at 256kbps?
Or is this some kind of super killer sound we are talking about?

Cause i think that Vorbis and AAC can pretty much reach Transparency at 196-256 most of the time, though i am not some kind of master within this.
  • Last Edit: 10 March, 2013, 05:51:58 PM by db1989

  • db1989
  • [*][*][*][*][*]
  • Global Moderator
Listening test using 2013-03-09 build
Reply #2
Yes, it was a sample that is known to be difficult to encode, not just everyday music, as you would know if you had followed the link and read the description.

Another thing you would know if that were true is that Opus was the highest rated codec in the test overall.

Neither of these things require being “some kind of master”, just the simplest kind of research before posting.

  • saratoga
  • [*][*][*][*][*]
Listening test using 2013-03-09 build
Reply #3
Isn´t that pretty bad, to not be able to reach transparency at 256kbps?
Or is this some kind of super killer sound we are talking about?


Its one of the lowest scored samples in that test, so evidently its a very difficult sample for current Opus encoders. 

  • zerowalker
  • [*][*][*][*]
Listening test using 2013-03-09 build
Reply #4
Yes, it was a sample that is known to be difficult to encode, not just everyday music, as you would know if you had followed the link and read the description.

Another thing you would know if that were true is that Opus was the highest rated codec in the test overall.

Neither of these things require being “some kind of master”, just the simplest kind of research before posting.


Ah well that explains it:)

Well what i meant with "master" was more, that i myself can´t distinguish artifacts easily, i can feel that 128kbps mp3 is much "weaker" than 196+, but i can´t really say. At the point in time i hear an artifacts compared to the other codec etc.
If it´s not very easily of course.

But yeah, my bad for not going to the link, the kbps and results took the best of me, and i was a bit disappointed at first, sorry for that.

  • IgorC
  • [*][*][*][*][*]
Listening test using 2013-03-09 build
Reply #5
RobertM,

Let me comment two things. First, one sample isn't enough representative to conclude if there was an improvement. The ratio quality/quantity starts to work out from 10 samples. Second, this particular sample as all other were quickly adopted by developers for tuning of Opus almost 2 years ago, so it's  not surprising  that latest Opus 1.1a did better on it.

Anyway it's a nice start.

P.S. It's more usefull to perform a tests on two samples with 7/7 instead of one sample but 14/14. The probability of guessing with 7 correct trials is already less than 1 %.  Personally I perform test on 20 samples or so with 5/5 trials (3.2%) when not sure about perceived differences.
  • Last Edit: 10 March, 2013, 07:05:00 PM by IgorC

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Listening test using 2013-03-09 build
Reply #6
Its one of the lowest scored samples in that test, so evidently its a very difficult sample for current Opus encoders.


Well, it was the lowest Opus score for 1.0. The new version has significant improvements on that sample.

  • eahm
  • [*][*][*][*][*]
Listening test using 2013-03-09 build
Reply #7
Is there a Windows compiled 2013-09-03?

Listening test using 2013-03-09 build
Reply #8
Is there a place that houses updated builds of the alpha branch for Win32?  I'm interested in testing these on a certain demented project of mine.
  • Last Edit: 10 March, 2013, 09:51:20 PM by wswartzendruber

  • RobertM
  • [*]
Listening test using 2013-03-09 build
Reply #9
RobertM,

Let me comment two things. First, one sample isn't enough representative to conclude if there was an improvement. The ratio quality/quantity starts to work out from 10 samples. Second, this particular sample as all other were quickly adopted by developers for tuning of Opus almost 2 years ago, so it's  not surprising  that latest Opus 1.1a did better on it.

Anyway it's a nice start.

P.S. It's more usefull to perform a tests on two samples with 7/7 instead of one sample but 14/14. The probability of guessing with 7 correct trials is already less than 1 %.  Personally I perform test on 20 samples or so with 5/5 trials (3.2%) when not sure about perceived differences.


I agree, and hope to test more samples as I get the time, but it does prove that Sample 1 (which was one of the hardest samples for Opus to encode back then) has been improved by the latest work on the encoder. Also that it is virtually transparent (to my ears) at 256 kb/s. If you need to listen as carefully as I did and still can't tell the difference all the time, then it's just as good as the uncompressed version.

I've also shared the compiled windows binaries with one other member but not sure if it's ok to post in a public thread. Can't see any TOS against it, but can an admin confirm if a link to the binaries is fine to post here?

  • RobertM
  • [*]
Listening test using 2013-03-09 build
Reply #10
In an effort to be "fair" to the Opus encoder, I've chosen a sample which Opus was quite good at but the other codecs had trouble with -  "Sample 16".
http://people.xiph.org/~greg/opus/ha2011/

Samples from the new encoder and ABX results attached.

Summary:

These results surprised me - I wasn't able to detect any improvement due to the new encoder, but originally I thought the sample was transparent at 64kb/s. After listening many times, I was able to detect a slight difference on the first guitar chord at some bitrates.

1. 48kb/s vs original

A small amount of distortion on the guitar notes at this bitrate, but still good quality

2. 64kb/s from the above page (old opus version) vs original

It took me a long time to be able to differentiate these two but when I spotted the tiny difference in the first guitar chord, I was able to repeatedly identify it.

3. 64kb/s vs original

As above, was able to hear a slight difference

4. 64kb/s from the above page (old opus version) vs 64kb/s from the newest Opus version

Was unable to differentiate these two, indicating no major difference between the new encoder and old encoder for this sample.

5. 96kb/s vs original

This was transparent to me. The ABX results swing slightly towards a small difference, but I think it was due to chance.
  • Last Edit: 11 March, 2013, 05:06:58 AM by RobertM

  • kabal4e
  • [*]
Listening test using 2013-03-09 build
Reply #11
Thanks to RobertM I have an opus-tools build from 2013.03.09.
I mistakenly believed it had variable framesize as in opus_exp branch built in. Unfrtunately, it didn't, but after some ABX-ing I realised I couldn't distinguish the difference between the latest general and experimental builds anyway.
However, a while ago, maybe not in Opus branch of HA, a sweep sample was tested. And Opus performed very bad. I was hoping to see some improvement, but there wasn't any. Please, listen to samples attached and judge yourself.
  • Last Edit: 11 March, 2013, 10:35:45 PM by kabal4e

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Listening test using 2013-03-09 build
Reply #12
Thanks to RobertM I have an opus-tools build from 2013.03.09.
I mistakenly believed it had variable framesize as in opus_exp branch built in. Unfrtunately, it didn't, but after some ABX-ing I realised I couldn't distinguish the difference between the latest general and experimental builds anyway.
However, a while ago, maybe not in Opus branch of HA, a sweep sample was tested. And Opus performed very bad. I as hoping to see some improvement, but there wasn't any. Please, listen to samples attached and judge yourself.


Wow! As much as I think sine sweep tests are stupid for codecs, there's no excuse for the behaviour you're seeing on this file with 1.1-alpha and later. That sine sweep is actually hitting a corner case in the bandwidth detection code of the encoder (see commit 7509fdb8). Thankfully, it shouldn't be too hard to fix. It's quite spectacular, but not that big a deal overall because fortunately it's highly unlikely to occur on real music.

  • kabal4e
  • [*]
Listening test using 2013-03-09 build
Reply #13
As much as I think sine sweep tests are stupid for codecs

However, Vorbis, Apple AAC and Nero AAC performed well with this. With Vorbis ended up with the lowest bitrate of all, given the same target bitrate.
But, when a sweep is hidden in a real music, such as glitchhop or dubstep, Opus performes really well. So, I've got no complaints for real music samples. I could attach a few samples if people are interested.
  • Last Edit: 11 March, 2013, 10:37:04 PM by kabal4e

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Listening test using 2013-03-09 build
Reply #14
However, Vorbis, Apple AAC and Nero AAC performed well with this. With Vorbis ended up with the lowest bitrate of all, given the same target bitrate.


Sure, one of the things the Opus format does to gain efficiency is assuming that it's encoding signals with a wide spectrum. This assumptions saves bits on the vast majority of files and wastes bits on synthetic tests like this. So I've no problem with being less efficient in terms of bitrate. Of course, the problem here is that it doesn't even encode properly -- and that's something that needs fixing.

But, when a sweep is hidden in a real music, such as glitchhop or dubstep, Opus performes really well. So, I've got no complaints for real music samples. I could attach a few samples if people are interested.


Sure, I understand exactly what's happening and it's really a corner case. it not only requires no spectral content above the sine, but I think even a downward sine sweep would actually have worked fine.

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Listening test using 2013-03-09 build
Reply #15
Wow! As much as I think sine sweep tests are stupid for codecs, there's no excuse for the behaviour you're seeing on this file with 1.1-alpha and later. That sine sweep is actually hitting a corner case in the bandwidth detection code of the encoder (see commit 7509fdb8). Thankfully, it shouldn't be too hard to fix. It's quite spectacular, but not that big a deal overall because fortunately it's highly unlikely to occur on real music.


The problem is now fixed in git. Here's the fix for those who are curious. With the change, the sweep doesn't have dropouts anymore. It still uses a higher bit-rate than necessary, but I'm not really concerned with that.

  • RobertM
  • [*]
Listening test using 2013-03-09 build
Reply #16
Wow! As much as I think sine sweep tests are stupid for codecs, there's no excuse for the behaviour you're seeing on this file with 1.1-alpha and later. That sine sweep is actually hitting a corner case in the bandwidth detection code of the encoder (see commit 7509fdb8). Thankfully, it shouldn't be too hard to fix. It's quite spectacular, but not that big a deal overall because fortunately it's highly unlikely to occur on real music.


The problem is now fixed in git. Here's the fix for those who are curious. With the change, the sweep doesn't have dropouts anymore. It still uses a higher bit-rate than necessary, but I'm not really concerned with that.


That's excellent - can confirm that the sine sweep is good now. Thanks jmvalin

I'll do a repeat of the listening tests soon to see if anything has changed in the music samples.

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Listening test using 2013-03-09 build
Reply #17
I'll do a repeat of the listening tests soon to see if anything has changed in the music samples.


Feel free to do that, but I highly doubt this impacted any music samples. In general, what's useful would be to check if there's any regression between 1.0.x and the current master.

  • kabal4e
  • [*]
Listening test using 2013-03-09 build
Reply #18
I highly doubt this impacted any music samples.

Did the testing and couldn't find any impact.
Foobar's bit compare tool shows only 25-50% of samples to be different, which is an amazing result. Usually, I get 99.9999%. (please, note I understand that this has nothing to do with human hearing)

In general, what's useful would be to check if there's any regression between 1.0.x and the current master.

Personally, I couldn't find any regressions between 1.0.2 and 1.1a. For me 1.1a sounds better. If I had more time I could do some ABX-ing, but not today.

  • db1989
  • [*][*][*][*][*]
  • Global Moderator
Listening test using 2013-03-09 build
Reply #19
Foobar's bit compare tool shows only 25-50% of samples to be different, which is an amazing result. Usually, I get 99.9999%. (please, note I understand that this has nothing to do with human hearing)
Audible or not, this is almost totally useless as a way to evaluate a lossy codec, even were it not the case that phase-shifting, etc. will completely confound naïve bit-comparisons.

Quote
For me 1.1a sounds better. If I had more time I could do some ABX-ing, but not today.
Please wait until you’ve ABXd it to make claims, in that case.

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Listening test using 2013-03-09 build
Reply #20
Foobar's bit compare tool shows only 25-50% of samples to be different, which is an amazing result. Usually, I get 99.9999%. (please, note I understand that this has nothing to do with human hearing)
Audible or not, this is almost totally useless as a way to evaluate a lossy codec, even were it not the case that phase-shifting, etc. will completely confound naïve bit-comparisons.


Well, bit comparisons are very useful. If two clips are bit-identical, they have the same quality (no matter what your ABX test says), which saves a lot of time. Also, for many changes, just having a single bit change means you screwed up something.

  • db1989
  • [*][*][*][*][*]
  • Global Moderator
Listening test using 2013-03-09 build
Reply #21
If two clips are bit-identical, they have the same quality
But we’re talking about a lossy codec.

Quote
Also, for many changes, just having a single bit change means you screwed up something.
I presume this means it’s useful during the process of development. But again, the post was addressed to an end-user. Bit-comparing lossy streams to their uncompressed source can be confounded in so many ways and is not likely to be informative even if they’re controlled for.

  • bawjaws
  • [*][*][*]
Listening test using 2013-03-09 build
Reply #22
If two clips are bit-identical, they have the same quality
But we’re talking about a lossy codec.

Quote
Also, for many changes, just having a single bit change means you screwed up something.
I presume this means it’s useful during the process of development. But again, the post was addressed to an end-user. Bit-comparing lossy streams to their uncompressed source can be confounded in so many ways and is not likely to be informative even if they’re controlled for.


"Bit identical" and "not bit-identical" seem to give useful info for various purposes, but only bit identical gives you info on comparitive quality
  • Last Edit: 14 March, 2013, 12:55:41 PM by bawjaws

  • db1989
  • [*][*][*][*][*]
  • Global Moderator
Listening test using 2013-03-09 build
Reply #23
Please explain how a bit-comparison provides any information except from ‘this file is different from that file’, as already noted by jmvalin above, and which is very basic and limited in its utility. Please then elaborate about how the information from a bit-comparison can indicate relative quality between streams.

Can anyone provide a justification for discussion of bit-comparing in reference to a lossy codec—except from ‘this≠that’—, for example an explanation of why it isn’t even less useful than difference signals, which we already tend to advise against? If not, this is all just clutter in the thread, and I’m inclined to remove it.

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Listening test using 2013-03-09 build
Reply #24
Can anyone provide a justification for discussion of bit-comparing in reference to a lossy codec—except from ‘this?that’—, for example an explanation of why it isn’t even less useful than difference signals, which we already tend to advise against? If not, this is all just clutter in the thread, and I’m inclined to remove it.


The information contained in A!=B, is that something actually changed. What you compare is not original to coded, but codedA to codedB. It tells you whether whatever you changed actually had *any* impact on the result. For example, in some circumstances, adding a certain option to opusenc will produce *exactly* the same output as without the option. Before you waste an hour trying to ABX, you can quickly see that the decoded files are identical. The opposite is also true. If you have two different builds of the same code that produce non-identical results (even if it sounds the same), it's often worth at least investigating (it's sometimes just different rounding, but sometimes not). This is why bit comparisons are useful. They're a sanity check. I've myself made the error before: asking people to tell me which of two files sounded the best when in fact they were bit-identical.