HydrogenAudio

Lossy Audio Compression => Opus => Topic started by: RobertM on 2013-03-09 09:49:42

Title: Listening test using 2013-03-09 build
Post by: RobertM on 2013-03-09 09:49:42
I completed a listening test against Opus files encoded with the latest build (as of 2013-09-03). This time I've actually been more thorough - ABX test results from foobar2000 are attached along with the Opus-encoded files. I also took azaqiel's advice and updated the version reported by the encoder, to prevent any confusion.

"Sample 01" from the page below was used for the test. May repeat the test later with other difficult samples.
http://people.xiph.org/~greg/opus/ha2011/ (http://people.xiph.org/~greg/opus/ha2011/)


Summary:

Results were very much as expected. Opus quality has definitely improved over time and gets closer to transparency with higher bitrate.

1. 64kb/s from the above page (old opus version) and 64kb/s from the newest Opus version

There was a noticeable improvement in quality with the new Opus version

2. 64kb/s vs original

It was fairly easy to tell the difference, but still quite good quality

3. 96 kb/s vs original

Could still tell the difference but artifacts were noticably improved from the 64kb/s file

4. 128 kb/s vs original

Still can hear a very subtle artifact introduced by the codec (which appears on the note between 2.155 seconds and 2.423 seconds) but had to strain to hear it.

5. 256 kb/s vs original

Very close to transparent. I managed to tell the difference sometimes by listening very hard for the artifact. However, my ability to tell the two apart was far from perfect.

6. 500 kb/s vs original

This was transparent to me.
Title: Listening test using 2013-03-09 build
Post by: zerowalker on 2013-03-10 21:44:51
Isn´t that pretty bad, to not be able to reach transparency at 256kbps?
Or is this some kind of super killer sound we are talking about?

Cause i think that Vorbis and AAC can pretty much reach Transparency at 196-256 most of the time, though i am not some kind of master within this.
Title: Listening test using 2013-03-09 build
Post by: db1989 on 2013-03-10 21:55:40
Yes, it was a sample that is known to be difficult to encode, not just everyday music, as you would know if you had followed the link and read the description.

Another thing you would know if that were true is that Opus was the highest rated codec in the test overall.

Neither of these things require being “some kind of master”, just the simplest kind of research before posting.
Title: Listening test using 2013-03-09 build
Post by: saratoga on 2013-03-10 21:56:14
Isn´t that pretty bad, to not be able to reach transparency at 256kbps?
Or is this some kind of super killer sound we are talking about?


Its one of the lowest scored samples in that test, so evidently its a very difficult sample for current Opus encoders. 
Title: Listening test using 2013-03-09 build
Post by: zerowalker on 2013-03-10 22:00:59
Yes, it was a sample that is known to be difficult to encode, not just everyday music, as you would know if you had followed the link and read the description.

Another thing you would know if that were true is that Opus was the highest rated codec in the test overall.

Neither of these things require being “some kind of master”, just the simplest kind of research before posting.


Ah well that explains it:)

Well what i meant with "master" was more, that i myself can´t distinguish artifacts easily, i can feel that 128kbps mp3 is much "weaker" than 196+, but i can´t really say. At the point in time i hear an artifacts compared to the other codec etc.
If it´s not very easily of course.

But yeah, my bad for not going to the link, the kbps and results took the best of me, and i was a bit disappointed at first, sorry for that.
Title: Listening test using 2013-03-09 build
Post by: IgorC on 2013-03-10 22:53:06
RobertM,

Let me comment two things. First, one sample isn't enough representative to conclude if there was an improvement. The ratio quality/quantity starts to work out from 10 samples. Second, this particular sample as all other were quickly adopted by developers for tuning of Opus almost 2 years ago, so it's  not surprising  that latest Opus 1.1a did better on it.

Anyway it's a nice start.

P.S. It's more usefull to perform a tests on two samples with 7/7 instead of one sample but 14/14. The probability of guessing with 7 correct trials is already less than 1 %.  Personally I perform test on 20 samples or so with 5/5 trials (3.2%) when not sure about perceived differences.
Title: Listening test using 2013-03-09 build
Post by: jmvalin on 2013-03-10 23:05:06
Its one of the lowest scored samples in that test, so evidently its a very difficult sample for current Opus encoders.


Well, it was the lowest Opus score for 1.0. The new version has significant improvements on that sample.
Title: Listening test using 2013-03-09 build
Post by: eahm on 2013-03-10 23:26:03
Is there a Windows compiled 2013-09-03?
Title: Listening test using 2013-03-09 build
Post by: wswartzendruber on 2013-03-11 01:51:07
Is there a place that houses updated builds of the alpha branch for Win32?  I'm interested in testing these on a certain demented project of mine.
Title: Listening test using 2013-03-09 build
Post by: RobertM on 2013-03-11 07:30:14
RobertM,

Let me comment two things. First, one sample isn't enough representative to conclude if there was an improvement. The ratio quality/quantity starts to work out from 10 samples. Second, this particular sample as all other were quickly adopted by developers for tuning of Opus almost 2 years ago, so it's  not surprising  that latest Opus 1.1a did better on it.

Anyway it's a nice start.

P.S. It's more usefull to perform a tests on two samples with 7/7 instead of one sample but 14/14. The probability of guessing with 7 correct trials is already less than 1 %.  Personally I perform test on 20 samples or so with 5/5 trials (3.2%) when not sure about perceived differences.


I agree, and hope to test more samples as I get the time, but it does prove that Sample 1 (which was one of the hardest samples for Opus to encode back then) has been improved by the latest work on the encoder. Also that it is virtually transparent (to my ears) at 256 kb/s. If you need to listen as carefully as I did and still can't tell the difference all the time, then it's just as good as the uncompressed version.

I've also shared the compiled windows binaries with one other member but not sure if it's ok to post in a public thread. Can't see any TOS against it, but can an admin confirm if a link to the binaries is fine to post here?
Title: Listening test using 2013-03-09 build
Post by: RobertM on 2013-03-11 09:05:24
In an effort to be "fair" to the Opus encoder, I've chosen a sample which Opus was quite good at but the other codecs had trouble with -  "Sample 16".
http://people.xiph.org/~greg/opus/ha2011/ (http://people.xiph.org/~greg/opus/ha2011/)

Samples from the new encoder and ABX results attached.

Summary:

These results surprised me - I wasn't able to detect any improvement due to the new encoder, but originally I thought the sample was transparent at 64kb/s. After listening many times, I was able to detect a slight difference on the first guitar chord at some bitrates.

1. 48kb/s vs original

A small amount of distortion on the guitar notes at this bitrate, but still good quality

2. 64kb/s from the above page (old opus version) vs original

It took me a long time to be able to differentiate these two but when I spotted the tiny difference in the first guitar chord, I was able to repeatedly identify it.

3. 64kb/s vs original

As above, was able to hear a slight difference

4. 64kb/s from the above page (old opus version) vs 64kb/s from the newest Opus version

Was unable to differentiate these two, indicating no major difference between the new encoder and old encoder for this sample.

5. 96kb/s vs original

This was transparent to me. The ABX results swing slightly towards a small difference, but I think it was due to chance.
Title: Listening test using 2013-03-09 build
Post by: kabal4e on 2013-03-12 01:54:52
Thanks to RobertM I have an opus-tools build from 2013.03.09.
I mistakenly believed it had variable framesize as in opus_exp branch built in. Unfrtunately, it didn't, but after some ABX-ing I realised I couldn't distinguish the difference between the latest general and experimental builds anyway.
However, a while ago, maybe not in Opus branch of HA, a sweep sample was tested. And Opus performed very bad. I was hoping to see some improvement, but there wasn't any. Please, listen to samples attached and judge yourself.
Title: Listening test using 2013-03-09 build
Post by: jmvalin on 2013-03-12 02:17:24
Thanks to RobertM I have an opus-tools build from 2013.03.09.
I mistakenly believed it had variable framesize as in opus_exp branch built in. Unfrtunately, it didn't, but after some ABX-ing I realised I couldn't distinguish the difference between the latest general and experimental builds anyway.
However, a while ago, maybe not in Opus branch of HA, a sweep sample was tested. And Opus performed very bad. I as hoping to see some improvement, but there wasn't any. Please, listen to samples attached and judge yourself.


Wow! As much as I think sine sweep tests are stupid for codecs, there's no excuse for the behaviour you're seeing on this file with 1.1-alpha and later. That sine sweep is actually hitting a corner case in the bandwidth detection code of the encoder (see commit 7509fdb8). Thankfully, it shouldn't be too hard to fix. It's quite spectacular, but not that big a deal overall because fortunately it's highly unlikely to occur on real music.
Title: Listening test using 2013-03-09 build
Post by: kabal4e on 2013-03-12 02:34:07
As much as I think sine sweep tests are stupid for codecs

However, Vorbis, Apple AAC and Nero AAC performed well with this. With Vorbis ended up with the lowest bitrate of all, given the same target bitrate.
But, when a sweep is hidden in a real music, such as glitchhop or dubstep, Opus performes really well. So, I've got no complaints for real music samples. I could attach a few samples if people are interested.
Title: Listening test using 2013-03-09 build
Post by: jmvalin on 2013-03-12 02:45:20
However, Vorbis, Apple AAC and Nero AAC performed well with this. With Vorbis ended up with the lowest bitrate of all, given the same target bitrate.


Sure, one of the things the Opus format does to gain efficiency is assuming that it's encoding signals with a wide spectrum. This assumptions saves bits on the vast majority of files and wastes bits on synthetic tests like this. So I've no problem with being less efficient in terms of bitrate. Of course, the problem here is that it doesn't even encode properly -- and that's something that needs fixing.

But, when a sweep is hidden in a real music, such as glitchhop or dubstep, Opus performes really well. So, I've got no complaints for real music samples. I could attach a few samples if people are interested.


Sure, I understand exactly what's happening and it's really a corner case. it not only requires no spectral content above the sine, but I think even a downward sine sweep would actually have worked fine.
Title: Listening test using 2013-03-09 build
Post by: jmvalin on 2013-03-12 17:45:16
Wow! As much as I think sine sweep tests are stupid for codecs, there's no excuse for the behaviour you're seeing on this file with 1.1-alpha and later. That sine sweep is actually hitting a corner case in the bandwidth detection code of the encoder (see commit 7509fdb8). Thankfully, it shouldn't be too hard to fix. It's quite spectacular, but not that big a deal overall because fortunately it's highly unlikely to occur on real music.


The problem is now fixed in git. Here's the fix (http://git.xiph.org/?p=opus.git;a=commitdiff;h=c5e04e) for those who are curious. With the change, the sweep doesn't have dropouts anymore. It still uses a higher bit-rate than necessary, but I'm not really concerned with that.
Title: Listening test using 2013-03-09 build
Post by: RobertM on 2013-03-12 19:14:42
Wow! As much as I think sine sweep tests are stupid for codecs, there's no excuse for the behaviour you're seeing on this file with 1.1-alpha and later. That sine sweep is actually hitting a corner case in the bandwidth detection code of the encoder (see commit 7509fdb8). Thankfully, it shouldn't be too hard to fix. It's quite spectacular, but not that big a deal overall because fortunately it's highly unlikely to occur on real music.


The problem is now fixed in git. Here's the fix (http://git.xiph.org/?p=opus.git;a=commitdiff;h=c5e04e) for those who are curious. With the change, the sweep doesn't have dropouts anymore. It still uses a higher bit-rate than necessary, but I'm not really concerned with that.


That's excellent - can confirm that the sine sweep is good now. Thanks jmvalin

I'll do a repeat of the listening tests soon to see if anything has changed in the music samples.
Title: Listening test using 2013-03-09 build
Post by: jmvalin on 2013-03-12 19:44:02
I'll do a repeat of the listening tests soon to see if anything has changed in the music samples.


Feel free to do that, but I highly doubt this impacted any music samples. In general, what's useful would be to check if there's any regression between 1.0.x and the current master.
Title: Listening test using 2013-03-09 build
Post by: kabal4e on 2013-03-12 22:39:05
I highly doubt this impacted any music samples.

Did the testing and couldn't find any impact.
Foobar's bit compare tool shows only 25-50% of samples to be different, which is an amazing result. Usually, I get 99.9999%. (please, note I understand that this has nothing to do with human hearing)

In general, what's useful would be to check if there's any regression between 1.0.x and the current master.

Personally, I couldn't find any regressions between 1.0.2 and 1.1a. For me 1.1a sounds better. If I had more time I could do some ABX-ing, but not today.
Title: Listening test using 2013-03-09 build
Post by: db1989 on 2013-03-12 23:13:05
Foobar's bit compare tool shows only 25-50% of samples to be different, which is an amazing result. Usually, I get 99.9999%. (please, note I understand that this has nothing to do with human hearing)
Audible or not, this is almost totally useless as a way to evaluate a lossy codec, even were it not the case that phase-shifting, etc. will completely confound naïve bit-comparisons.

Quote
For me 1.1a sounds better. If I had more time I could do some ABX-ing, but not today.
Please wait until you’ve ABXd it to make claims, in that case.
Title: Listening test using 2013-03-09 build
Post by: jmvalin on 2013-03-13 02:04:32
Foobar's bit compare tool shows only 25-50% of samples to be different, which is an amazing result. Usually, I get 99.9999%. (please, note I understand that this has nothing to do with human hearing)
Audible or not, this is almost totally useless as a way to evaluate a lossy codec, even were it not the case that phase-shifting, etc. will completely confound naïve bit-comparisons.


Well, bit comparisons are very useful. If two clips are bit-identical, they have the same quality (no matter what your ABX test says), which saves a lot of time. Also, for many changes, just having a single bit change means you screwed up something.
Title: Listening test using 2013-03-09 build
Post by: db1989 on 2013-03-13 08:35:40
If two clips are bit-identical, they have the same quality
But we’re talking about a lossy codec.

Quote
Also, for many changes, just having a single bit change means you screwed up something.
I presume this means it’s useful during the process of development. But again, the post was addressed to an end-user. Bit-comparing lossy streams to their uncompressed source can be confounded in so many ways and is not likely to be informative even if they’re controlled for.
Title: Listening test using 2013-03-09 build
Post by: bawjaws on 2013-03-14 16:54:03
If two clips are bit-identical, they have the same quality
But we’re talking about a lossy codec.

Quote
Also, for many changes, just having a single bit change means you screwed up something.
I presume this means it’s useful during the process of development. But again, the post was addressed to an end-user. Bit-comparing lossy streams to their uncompressed source can be confounded in so many ways and is not likely to be informative even if they’re controlled for.


"Bit identical" and "not bit-identical" seem to give useful info for various purposes, but only bit identical gives you info on comparitive quality
Title: Listening test using 2013-03-09 build
Post by: db1989 on 2013-03-14 17:49:06
Please explain how a bit-comparison provides any information except from ‘this file is different from that file’, as already noted by jmvalin above, and which is very basic and limited in its utility. Please then elaborate about how the information from a bit-comparison can indicate relative quality between streams.

Can anyone provide a justification for discussion of bit-comparing in reference to a lossy codec—except from ‘this≠that’—, for example an explanation of why it isn’t even less useful than difference signals, which we already tend to advise against? If not, this is all just clutter in the thread, and I’m inclined to remove it.
Title: Listening test using 2013-03-09 build
Post by: jmvalin on 2013-03-14 20:38:56
Can anyone provide a justification for discussion of bit-comparing in reference to a lossy codec—except from ‘this?that’—, for example an explanation of why it isn’t even less useful than difference signals, which we already tend to advise against? If not, this is all just clutter in the thread, and I’m inclined to remove it.


The information contained in A!=B, is that something actually changed. What you compare is not original to coded, but codedA to codedB. It tells you whether whatever you changed actually had *any* impact on the result. For example, in some circumstances, adding a certain option to opusenc will produce *exactly* the same output as without the option. Before you waste an hour trying to ABX, you can quickly see that the decoded files are identical. The opposite is also true. If you have two different builds of the same code that produce non-identical results (even if it sounds the same), it's often worth at least investigating (it's sometimes just different rounding, but sometimes not). This is why bit comparisons are useful. They're a sanity check. I've myself made the error before: asking people to tell me which of two files sounded the best when in fact they were bit-identical.
Title: Listening test using 2013-03-09 build
Post by: db1989 on 2013-03-14 21:00:27
I definitely don’t disagree, and I can appreciate how useful that is for a developer. I see and agree with all your points about bit-comparison being used to determine that files are either identical or not, but that seems to be about as much as the technique can reveal, and I would like to think that this use should be easy to work out from first principles.

In contrast, as I said I was asking for examples “except from ‘this≠that’ ”, in reply to bawjaws’ comment about “comparative quality” and how bit-comparing can provide any other information.

Whether or not that was exactly what bawjaws meant, this tangent started because kabal4e attempted to comment on the performance of Opus by proffering statistics from a bit-comparator, albeit while not specifying what was compared to what – and claiming to acknowledge that such a method has no useful relation to hearing, or to the complex workings of lossy encoding, but feeling that posting it somehow remained appropriate anyway. As I’ve said before in reference to ‘I know this isn’t valid, but’–type arguments, things like that just seem like an attempt to ‘have your cake and eat it’: try to make a point that might run contrary to the rules – or just basic principles – but secure immunity from this dischord by acknowledging that it might exist… doesn’t make sense, does it?
Title: Listening test using 2013-03-09 build
Post by: jmvalin on 2013-03-14 22:09:33
I definitely don’t disagree, and I can appreciate how useful that is for a developer. I see and agree with all your points about bit-comparison being used to determine that files are either identical or not, but that seems to be about as much as the technique can reveal, and I would like to think that this use should be easy to work out from first principles.


What I'm trying to say is that kabal4e's comment that most samples were bit-identical *is* useful. It tells me that the change I made to fix a corner case indeed only impacts corner cases because the majority of the time it's not triggered at all. That *is* more useful than "no audible difference". There's comparing quality and there's "let's figure out what's going on here". Let's not confuse the two.
Title: Listening test using 2013-03-09 build
Post by: kabal4e on 2013-03-14 23:15:36
What I'm trying to say is that kabal4e's comment that most samples were bit-identical *is* useful. It tells me that the change I made to fix a corner case indeed only impacts corner cases because the majority of the time it's not triggered at all. That *is* more useful than "no audible difference". There's comparing quality and there's "let's figure out what's going on here". Let's not confuse the two.

Hi all,

Yes. That's what I was trying to say. Thank you Jean-Marc for translating my ESOL into something clearer ))
What happened was I tried ABX-ing one track encoded with 2013.03.12 and 2013.03.13 builds of opus-tools at 64kbps. I failed to spot the difference reliably and then didn't save the ABX log, which is not unusual for me. However, just out of curiosity I used the foobar replaygain tool for a lossless original and two encoded files; all track gains were identical and there was only a small difference between two encoded files peak values. Then I used foobar bitcompare tool, which showed the bit difference between two encoded tracks of only 25-50% and the maximum difference between samples of approx. 0.25.
To sum up, I couldn't ABX the difference, track gains identical, slight full track peak difference, up to 50% sample values exactly matching, and max difference between sample values was 0.25. That's why I said there was no difference.

The main thing is that I never said I used only bit-comparison tool for tracks comparison. Will try to attach the ABX report in the future, however, in case a person desperately wants to proove some point, what stops him from faking the ABX-log?
Title: Listening test using 2013-03-09 build
Post by: db1989 on 2013-03-15 00:33:46
I do apologise if I misread anything or underestimated the usefulness of such reports to a developer!

But here’s the inevitable however

To sum up, I couldn't ABX the difference, track gains identical, slight full track peak difference, up to 50% sample values exactly matching, and max difference between sample values was 0.25. That's why I said there was no difference.
You could, and probably should , just have stopped after the ABX test. Ranking encodings based upon the statistics output by a bit-comparator is only slightly informative at best, potentially misleading at worse. Besides, if you can’t tell the difference, does it matter how many small divergences might have been introduced by the lossy encoding process?

Quote
The main thing is that I never said I used only bit-comparison tool for tracks comparison. Will try to attach the ABX report in the future, however, in case a person desperately wants to proove some point, what stops him from faking the ABX-log?
If someone wants to cheat, s/he’ll find a way. They always do. That doesn’t mean people who want to promote proper practices should just abandon all their principles because some people might be dishonest. I could apply this to plenty of contexts in life, but then I’d be getting boring.  Anyway, for reference, there has been discussion here about possible ways – and, for all I know since I didn’t follow it, perhaps even the release of tools – to make ABX logs ‘cheat-proof’; so, whilst I don’t think the current vulnerability is any reason for the rest of us to stop promoting such testing using the presently available methods, you might find those previous posts interesting.