Skip to main content

Topic: Personal Listening Test of 2 Opus encoders (Read 26374 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
  • Kamedo2
  • [*][*][*][*]
Personal Listening Test of 2 Opus encoders
Abstract:
Blind sound quality comparison between experimental and released Opus encoders, at 32 kbps, 48 kbps, 64 kbps and 80kbps.

Encoders:
Opus 1.1, with opus-tools 0.1.9 https://archive.mozilla.org/pub/opus/win32/opus-tools-0.1.9-win32.zip
Opus 1.1.1-rc-49-g5db9e14 branch exp_lbr_tune, with opus-tools 0.1.9.

Settings:
opusenc --bitrate 32 in.wav out.opus
opusenc --bitrate 48 in.wav out.opus
opusenc --bitrate 64 in.wav out.opus
opusenc --bitrate 80 in.wav out.opus

opusenc --bitrate 32 in.wav out.opus
opusenc --bitrate 48 in.wav out.opus
opusenc --bitrate 64 in.wav out.opus
opusenc --bitrate 80 in.wav out.opus

Samples:
Total 15 samples from my corpus, tested twice per sample.

Hardwares:
Sony PSP-3000 + RP-HT560(1st), RP-HJE150(2nd).

Results:


The change of scores in exp_lbr_tune. They are mostly positive.

The bitrate vs quality graph.


Conclusions & Observations:
The Opus development branch exp_lbr_tune, has solid quality improvements in 32kbps and 48 kbps.
In 64 kbps and 80kbps, there are no big difference in quality between the exp_lbr_tune and the trunk Opus 1.1.

Anova analysis:
Code: [Select]
FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/
Blocked ANOVA analysis

Number of listeners: 15
Critical significance:  0.05
Significance of data: 1.11E-016 (highly significant)
---------------------------------------------------------------
ANOVA Table for Randomized Block Designs Using Ratings

Source of         Degrees     Sum of    Mean
variation         of Freedom  squares   Square    F      p

Total              119          92.40
Testers (blocks)    14           6.66
Codecs eval'd        7          81.87   11.70   296.65  1.11E-016
Error               98           3.86    0.04
---------------------------------------------------------------
Fisher's protected LSD for ANOVA:   0.144

Means:

exp80k   org80k   org64k   exp64k   exp48k   org48k   exp32k   org32k  
  4.14     4.13     3.90     3.87     3.23     3.05     2.24     1.82  

---------------------------- p-value Matrix ---------------------------

         org80k   org64k   exp64k   exp48k   org48k   exp32k   org32k  
exp80k   0.854    0.001*   0.000*   0.000*   0.000*   0.000*   0.000*  
org80k            0.002*   0.001*   0.000*   0.000*   0.000*   0.000*  
org64k                     0.714    0.000*   0.000*   0.000*   0.000*  
exp64k                              0.000*   0.000*   0.000*   0.000*  
exp48k                                       0.015*   0.000*   0.000*  
org48k                                                0.000*   0.000*  
exp32k                                                         0.000*  
-----------------------------------------------------------------------

exp80k is better than org64k, exp64k, exp48k, org48k, exp32k, org32k
org80k is better than org64k, exp64k, exp48k, org48k, exp32k, org32k
org64k is better than exp48k, org48k, exp32k, org32k
exp64k is better than exp48k, org48k, exp32k, org32k
exp48k is better than org48k, exp32k, org32k
org48k is better than exp32k, org32k
exp32k is better than org32k


Raw data:
Code: [Select]
exp32k	exp48k	exp64k	exp80k	org32k	org48k	org64k	org80k	
1.650 3.100 3.500 3.950 1.800 2.800 3.650 4.000
2.600 3.650 4.100 4.350 1.950 3.200 4.150 4.350
2.350 3.400 4.000 4.250 1.550 3.100 3.900 4.200
1.700 2.900 3.750 4.050 1.700 2.850 3.950 4.000
3.050 3.700 4.200 4.300 2.000 3.400 4.100 4.250
1.750 2.350 3.450 3.700 1.500 2.400 3.550 3.550
2.650 3.250 4.150 4.250 2.000 3.200 4.150 4.250
2.100 3.450 3.900 4.200 1.750 3.150 3.950 4.200
2.350 3.100 3.450 3.700 1.900 2.850 3.500 3.700
1.650 2.450 3.800 4.250 1.700 2.300 3.750 4.250
2.200 3.200 3.850 4.300 1.800 3.050 3.950 4.300
2.150 3.350 4.050 4.100 1.650 3.050 4.000 4.150
2.650 3.500 4.050 4.400 1.900 3.600 4.000 4.400
1.950 3.450 3.950 4.150 1.650 3.200 4.000 4.150
2.750 3.600 3.850 4.150 2.450 3.600 3.850 4.150
%samples 41_30sec Perc.
%samples finalfantasy Strings
%samples ATrain Jazz
%samples BigYellow Pops
%samples FloorEssence Techno
%samples macabre Classic
%samples mybloodrusts Guitar
%samples Quizas Latin
%samples VelvetRealm Techno
%samples Amefuribana Pops
%samples Trust Gospel
%samples Waiting Rock
%samples Experiencia Latin
%samples HearttoHeart Pops
%samples Tom'sDiner Acappella


Bitrates:
The exp_lbr_tune branch has more bitrate deviations down to 32 kbps.

Code: [Select]
%bitrate
exp32k exp48k exp64k exp80k org32k org48k org64k org80k
36749 54937 73371 92363 33033 52076 73438 92430
48355 68543 87084 106794 34026 59325 87161 106872
37576 56210 74723 93189 33190 52377 74764 93229
35411 53122 70453 87718 33374 51285 70523 87786
42211 62890 82978 103395 33742 56865 83108 103540
34296 50768 67697 84163 33250 50278 67966 84437
31638 47318 61532 75075 33409 48583 61552 75096
38539 57412 76608 95888 33395 53455 76761 96040
41265 60008 78435 97111 34648 55392 78772 97397
36911 54502 72279 90022 33483 51861 72311 90054
34338 50941 67863 85263 33064 49980 68072 85471
38476 56844 74928 92831 34037 53583 75361 93264
36575 54928 72355 89386 33628 52318 72495 89527
33863 51076 67929 85114 33186 50203 67987 85174
33287 50391 75053 88353 33507 51445 75864 89179

%album bitrate
33564 50005 66320 82681 32480 49071 66477 82842


  • saratoga
  • [*][*][*][*][*]
Re: Personal Listening Test of 2 Opus encoders
Reply #1
Are you blinded to which encoder/bitrate is used for which file?

  • Kamedo2
  • [*][*][*][*]
Re: Personal Listening Test of 2 Opus encoders
Reply #2
Are you blinded to which encoder/bitrate is used for which file?
Yes. This test is fully blinded, with 4*2 = 8 samples shuffled in a single ABC/HR session. 30 sessions were performed in this test.

  • eahm
  • [*][*][*][*][*]
Re: Personal Listening Test of 2 Opus encoders
Reply #3
Damn nice work! Thank you Kamedo2.

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Re: Personal Listening Test of 2 Opus encoders
Reply #4
Thanks very much for this test. It not only validates my work in progress, but it also provides me with some clues for further improvements. I'm also interested in any other (qualitative) information you might have. For example, do you think the improvement at 32 kbps is mostly due to reduced noise/roughness or to the wider stereo image (or something else)?

  • Kamedo2
  • [*][*][*][*]
Re: Personal Listening Test of 2 Opus encoders
Reply #5
For example, do you think the improvement at 32 kbps is mostly due to reduced noise/roughness or to the wider stereo image (or something else)?
They are of about equal contribution. I've got the good stereo image representation and reduced noise in the "better 32k" version. They are both great!
On 48 kbps, only the reduced noise/roughness was significant.
(Although the test is blinded, I could easily tell the 32k and 48k groups.)

Re: Personal Listening Test of 2 Opus encoders
Reply #6
Where can I obtain a binary of this experimental Encoder from?

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Re: Personal Listening Test of 2 Opus encoders
Reply #7
I've been doing more (highly experimental) work on very low bitrates. Here's three files on which I'd be interested in getting feedback:
http://jmvalin.ca/misc_stuff/collapse0.wav
http://jmvalin.ca/misc_stuff/collapse1.wav
http://jmvalin.ca/misc_stuff/collapse2.wav
The original is at http://jmvalin.ca/misc_stuff/comp_stereo48.wav but it shouldn't be needed for such low rate.
I'm interested both in which one you think sounds best and the qualitative aspect. Trying to see if I'm on the right track here...

  • Kamedo2
  • [*][*][*][*]
Re: Personal Listening Test of 2 Opus encoders
Reply #8
Jmvalin, I tested them but the results were not significant.
Code: [Select]
ABC/HR for Java, Version 0.53a, 09 5月 2016
Testname: jmvalin_opus_111798_20160506

Tester: Kamedo2

1L = C:\Users\PCC\Documents\ABC-HR\collapse1.wav
2R = C:\Users\PCC\Documents\ABC-HR\collapse2.wav
3R = C:\Users\PCC\Documents\ABC-HR\collapse0.wav

Ratings on a scale from 1.0 to 5.0

---------------------------------------
General Comments: Hard to say which is better. Not significant.
---------------------------------------
1L File: C:\Users\PCC\Documents\ABC-HR\collapse1.wav
1L Rating: 2.7
1L Comment: (4)Mid and LF are muddy.
---------------------------------------
2R File: C:\Users\PCC\Documents\ABC-HR\collapse2.wav
2R Rating: 2.7
2R Comment: (2) Less noise (6)More HF noise
---------------------------------------
3R File: C:\Users\PCC\Documents\ABC-HR\collapse0.wav
3R Rating: 2.6
3R Comment: (1) Mid Freq. is muddy.(3) Pre and Post echo problem. (6)HF noise and post-echo
---------------------------------------

ABX Results:
score per each track.
Code: [Select]
collapse1	collapse2	collapse0
2.8 2.9 2.7
3 3.1 3
2.2 2.2 2
2.3 2.5 2.6
2.8 2.8 2.8
3 2.6 2.7
=================
2.683333333 2.683333333 2.633333333

  • IgorC
  • [*][*][*][*][*]
Re: Personal Listening Test of 2 Opus encoders
Reply #9
While a differences are subtle probably Collapse1 is somewhat better (less noisy):

Scores:
Collapse0 - 2,9833
Collapse1 - 3,0333
Collapse2 - 2,9833

Code: [Select]
	
Collapse0 Collapse1 Collapse2 Observation
01 Guitar intro 3,3 3,4 3,4 Coll 0 has a bit more distortion/ hoarse noise
02 Drum into 2,6 2,7 2,7 Coll 0 has a bit more distortion/ hoarse noise
03 hihat 3,1 3,2 3
04 sp. Guitar 3 3,1 3
05 wood drums 3,1 2,9 3
06  female voice 2,8 2,9 2,8

Average Score 2,983333333 3,033333333 2,983333333
Geometric Mean 2,974506988 3,024765597 2,975505423

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Re: Personal Listening Test of 2 Opus encoders
Reply #10
Kamedo2, IgorC, thanks for the feedback. It looks like the effect I'm trying to produce is still a bit too subtle. I uploaded a variant on collapse1:
http://jmvalin.ca/misc_stuff/collapse1b.wav
Can you guys compare to collapse0 and see if you can hear more of an improvement?

  • Kamedo2
  • [*][*][*][*]
Re: Personal Listening Test of 2 Opus encoders
Reply #11
Can you guys compare to collapse0 and....
Did you mean collapse1? collapse0 seems to be the worst candidate.

  • Kamedo2
  • [*][*][*][*]
Re: Personal Listening Test of 2 Opus encoders
Reply #12
This is the average of the two results above.
Code: [Select]
		collapse0	collapse1	collapse2
01 Guitar intro 3.00 3.10 3.15
02 Drum into 2.80 2.85 2.90
03 hihat 2.55 2.70 2.60
04 sp. Guitar 2.80 2.70 2.75
05 wood drums 2.95 2.85 2.90
06 female voice 2.75 2.95 2.70

I like the behavior of the yellow(collapse1) because it is the best in the worst performing sample(03.hihat).

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Re: Personal Listening Test of 2 Opus encoders
Reply #13
Can you guys compare to collapse0 and....
Did you mean collapse1? collapse0 seems to be the worst candidate.

I meant comparing to collapse0 since that's the "baseline" (i.e. what you tested in exp_lbr_tune). Though I guess you could also compare to collapse1 too.

  • Kamedo2
  • [*][*][*][*]
Re: Personal Listening Test of 2 Opus encoders
Reply #14

  • lithoc
  • [*][*]
Re: Personal Listening Test of 2 Opus encoders
Reply #15
Can you guys compare to collapse0 and....
Did you mean collapse1? collapse0 seems to be the worst candidate.

I meant comparing to collapse0 since that's the "baseline" (i.e. what you tested in exp_lbr_tune). Though I guess you could also compare to collapse1 too.

The binaries have dependancy libopus-0.dll missing

  • 2012
  • [*][*][*]
Re: Personal Listening Test of 2 Opus encoders
Reply #16
Opus-tools 0.1.9 Windows binaries (win32 + win64).
Linked against the experimental branch exp_lbr_tune.

Statically linked against:
* libopus (branch: exp_lbr_tune, commit: 4ed368e)
* libflac  1.3.1
* libogg  1.3.2

Built in a safe GNU/Linux environment with:
mingw-w64 4.0.6 / GCC 6.1.1

https://archive.org/download/opus-tools-exp_lbr_tune-g4ed368e/opus-tools-exp_lbr_tune-g4ed368e.zip
  • Last Edit: 16 May, 2016, 04:55:50 PM by 2012
saldl: A lightweight well-featured CLI downloader optimized for speed and early preview.
https://saldl.github.io

  • Kamedo2
  • [*][*][*][*]
Re: Personal Listening Test of 2 Opus encoders
Reply #17
Jmvalin, I had the time to test collapse1b.
Collapse1 seems to be the best.

Code: [Select]
collapse0	collapse1	collapse1b
3.0 2.9 3.2
2.8 3.1 3.0
2.7 2.8 2.6
2.9 2.8 2.7
3.0 3.0 2.9
2.8 2.9 3.0

Average: 2.866 2.916 2.9

  • IgorC
  • [*][*][*][*][*]
Re: Personal Listening Test of 2 Opus encoders
Reply #18
Collapse1b  - 2,78
Collapse1 - 2,77
Collapse0 - 2,74

Code: [Select]
	Collapse0	Collapse1	Collapse1b
01 Guitar intro 3,2 3,4 3,3
02 Drum into 2,4 2,3 2,4
03 hihat 2,55 2,6 2,5
04 sp. Guitar 2,7 2,8 2,75
05 wood drums 3 2,9 3
06  female voice 2,6 2,6 2,7

Average 2,742 2,767 2,775
Geometric Mean 2,728 2,747 2,759

  • Kamedo2
  • [*][*][*][*]
Re: Personal Listening Test of 2 Opus encoders
Reply #19
This is the average of the two results above.
Code: [Select]
		collapse0	collapse1	collapse1b
01 Guitar intro 3.10 3.15 3.25
02 Drum into 2.60 2.70 2.70
03 hihat 2.63 2.70 2.55
04 sp. Guitar 2.80 2.80 2.73
05 wood drums 3.00 2.95 2.95
06 female voice 2.70 2.75 2.85

  • Last Edit: 23 May, 2016, 06:33:16 AM by Kamedo2

  • IgorC
  • [*][*][*][*][*]
Re: Personal Listening Test of 2 Opus encoders
Reply #20
Kamedo2, IgorC, thanks for the feedback. It looks like the effect I'm trying to produce is still a bit too subtle. I uploaded a variant on collapse1:
http://jmvalin.ca/misc_stuff/collapse1b.wav
Can you guys compare to collapse0 and see if you can hear more of an improvement?
BTW was it 48 kb?

Your set of samples has different kind of samples. That's excellent. But if it's possible also to try my set of samples for future tests as well, I will be glad.  Samples

Thank You.

  • jmvalin
  • [*][*][*][*][*]
  • Developer
Re: Personal Listening Test of 2 Opus encoders
Reply #21
BTW was it 48 kb?

Your set of samples has different kind of samples. That's excellent. But if it's possible also to try my set of samples for future tests as well, I will be glad.  Samples

The test was at 32 kb/s VBR, which for these hard samples averaged about 38 kb/s. If you have a way to build an Opus encoder from source, I can point you the exact version I was testing.

About the results, thanks very much for these tests. It does look like I can produce some improvement, but it's still a bit marginal. The idea in the collapse* series (except for collapse0 which was the control) is to do a sort of "gradual" intensity stereo rather than a binary decision.

  • IgorC
  • [*][*][*][*][*]
Re: Personal Listening Test of 2 Opus encoders
Reply #22
If you have a way to build an Opus encoder from source, I can point you the exact version I was testing.
Sincerely, I have no anymore. Lack of time. Barely a hour or so to do a blind test. If someone here can provide binaries and
If You tell which snapshots + settings to pay attention then I'll submit some results from time to time.

  • DonP
  • [*][*][*][*][*]
  • Members (Donating)
Re: Personal Listening Test of 2 Opus encoders
Reply #23
Thanks for both the testing and intuitive graphics.  I've been using low bitrate opus for on-phone storage, so this of interest!

  • eahm
  • [*][*][*][*][*]
Re: Personal Listening Test of 2 Opus encoders
Reply #24
Thanks for testing again.

Does 1.1.2 change anything?