Skip to main content
Topic: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps (Read 7556 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Abstract:
Blind sound quality comparison between modified experimental Opus encoders named analysis24k, build C, and build E at 36 kbps and 48 kbps.

Encoders:
https://jmvalin.ca/misc_stuff/opus-tools-analysis24k.zip
https://jmvalin.ca/misc_stuff/opus-tools-buildC.zip
https://jmvalin.ca/misc_stuff/opus-tools-buildE.zip

Settings:
opusenc --bitrate 36 in.wav out.opus
opusenc --bitrate 48 in.wav out.opus

Samples:
Total 15 samples from Kamedo2's 15 samples.
Total 12 samples from IgorC's 12 samples.

Hardwares:
Sony PSP-3000 + RP-HT560.

Results:




Conclusions & Observations:
The version buildC and build E is better than the analysis24k on 36 kbps.
On 48 kbps, it is very hard to say which is better, but the version buildC is more likely to be better than the rest.(p=0.061 and p=0.131 on paired student t-test.)

Anova analysis:
Code: [Select]
FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/
Blocked ANOVA analysis

Number of listeners: 27
Critical significance:  0.05
Significance of data: 0.00E+000 (highly significant)
---------------------------------------------------------------
ANOVA Table for Randomized Block Designs Using Ratings

Source of         Degrees     Sum of    Mean
variation         of Freedom  squares   Square    F      p

Total              161          39.28
Testers (blocks)    26          23.75
Codecs eval'd        5          11.60    2.32   76.58  0.00E+000
Error              130           3.94    0.03
---------------------------------------------------------------
Fisher's protected LSD for ANOVA:   0.094

Means:

cexp48   eexp48   aexp48   cexp36   eexp36   aexp36  
  3.56     3.50     3.48     3.03     3.01     2.91  

---------------------------- p-value Matrix ---------------------------

         eexp48   aexp48   cexp36   eexp36   aexp36  
cexp48   0.186    0.103    0.000*   0.000*   0.000*  
eexp48            0.755    0.000*   0.000*   0.000*  
aexp48                     0.000*   0.000*   0.000*  
cexp36                              0.755    0.020*  
eexp36                                       0.044*  
-----------------------------------------------------------------------

cexp48 is better than cexp36, eexp36, aexp36
eexp48 is better than cexp36, eexp36, aexp36
aexp48 is better than cexp36, eexp36, aexp36
cexp36 is better than aexp36
eexp36 is better than aexp36


Raw data:
Code: [Select]
aexp36	cexp36	eexp36	aexp48	cexp48	eexp48	
2.000 2.100 2.200 2.800 2.700 2.600
3.300 3.200 3.400 3.700 3.800 3.800
2.900 3.300 3.200 3.600 3.900 3.600
2.200 2.200 2.300 3.200 3.300 3.400
3.400 3.600 3.600 3.900 4.000 3.800
2.000 2.200 2.200 2.800 3.000 3.400
3.300 3.500 3.600 3.800 3.900 3.700
3.100 3.400 3.300 3.600 3.700 3.800
3.000 2.900 2.800 3.400 3.600 3.500
2.400 2.400 2.500 3.100 3.400 2.900
3.000 3.200 3.300 3.400 3.600 3.400
3.100 3.400 3.200 3.600 3.700 3.500
3.400 3.300 3.600 4.000 3.700 3.800
2.100 2.200 2.100 3.300 3.100 2.900
2.600 2.900 2.700 3.600 3.500 3.300
2.900 3.100 3.200 3.500 3.700 3.800
3.700 3.800 3.500 3.900 4.000 3.900
3.100 3.200 3.200 3.400 3.700 3.500
3.300 3.200 3.200 3.800 3.900 3.600
2.100 2.200 2.200 2.900 3.000 2.700
2.900 3.200 3.100 3.700 3.900 3.800
3.300 3.500 3.100 3.700 3.200 3.600
3.000 2.800 2.900 3.500 3.300 3.400
3.400 3.100 3.300 3.600 3.700 3.800
2.600 2.800 2.900 3.300 3.300 3.400
3.200 3.600 3.400 3.300 3.800 3.700
3.400 3.400 3.300 3.600 3.700 3.800
%samples 41_30sec Perc.
%samples finalfantasy Strings
%samples ATrain Jazz
%samples BigYellow Pops
%samples FloorEssence Techno
%samples macabre Classic
%samples mybloodrusts Guitar
%samples Quizas Latin
%samples VelvetRealm Techno
%samples Amefuribana Pops
%samples Trust Gospel
%samples Waiting Rock
%samples Experiencia Latin
%samples HearttoHeart Pops
%samples Tom'sDiner Acappella
%samples 01 castanets inst.
%samples 02 fatboy_30sec Techno
%samples 03 eig Techno
%samples 04 Bachpsichord inst.
%samples 05 Enola Techno
%samples 06 trumpet inst.
%samples 07 applaud Live
%samples 08 velvet perc.
%samples 09 Linchpin Rock
%samples 10 spill_the_blood guitar
%samples 11 female_speech Speech
%samples 12 French_Ad Speech


Bitrates:
There three encoders are almost the same in how they use bits.

Code: [Select]
aexp36	cexp36	eexp36	aexp48	cexp48	eexp48	
%bitrate
43815 44030 43822 58140 58384 58052
53733 54628 54771 68337 68833 68888
42110 42392 42150 55727 56221 55861
40547 40598 40505 54045 54083 54015
50393 50942 50902 66508 66931 66934
39505 39464 39683 51945 52030 52369
42200 42279 42308 54842 54893 55048
43599 44058 44004 57696 58369 58324
44756 45367 45351 58109 58829 58803
43487 43569 43493 57103 57337 57116
38296 38280 38234 50953 50895 50851
44147 44510 44402 58027 58410 58290
41029 41062 40988 54681 54618 54530
40188 40411 40300 53633 53981 53856
44890 44668 44316 57636 57315 57040
47441 46995 46847 62518 61873 61543
53175 53216 53165 70884 70900 70826
50224 49817 49706 65493 65047 64897
54947 56701 57249 70772 70830 70887
42135 42418 42301 55510 55840 55746
53648 53792 53314 67355 67359 67007
46547 46561 46478 60583 60545 60506
41672 41798 41738 55071 55075 54947
36014 36026 35990 47860 47726 47612
45580 46392 46559 59591 60089 60162
36442 36442 36417 48386 48336 48277
45099 45319 45256 58969 59308 59257

%album bitrate
38400 38561 38504 50830 51064 50970

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #1
Interesting test, Kamedo2.  :)

Can You describe how Build C (or E) sound better for You? Does it happen in particular situations (samples, frequencies ranges etc)?  That will be probably useful for developers .

Thank You!

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #2
FYI, the only thing that buildC changes is the slope of the bit allocation by 1/64 bit per Bark. In practice, we end altering the LF vs HF balance by the equivalent of 1.5 dB SNR. So HF get more bits, at the expense of the LF. buildE was trying to be fancier about how to do it (depending on the context), but it doesn't seem like it worked.

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #3
Interesting possible improvement though not a statistically significant result.

I've wondered, with the 1.2 push towards high bandwidth at lower bitrates and higher fidelity in the HF, whether some of these developments may be overfitting a model to data from a few helpful HA listeners.

The tests we see encode samples of professionally mastered studio recorded audio and have them rated by "golden ears" listeners using high-quality setups in ideal silent listening conditions. But won't high frequency content and stereo separation matter considerably less to listeners when the original recording was noisy, their equipment and their hearing are average rather than stellar, or their listening environment has background noise? Won't the low frequencies typically be less masked? And aren't those use cases especially important for WebRTC etc?

A couple years ago I subjected a few laypeople I know to some blind listening tests, seeing what they thought of different encodings of academic conference presentations recorded with handheld recorders, mostly to decide on bitrates for MP3 and Opus. I was surprised to find they generally preferred the LP7.5 .wav to the originals! Of course that's unthinkable for the material used in most codec tests, but for these, even after intelligent postprocessing, removing HF noise more than made up for what HF signal was removed.

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #4
The tuning we're talking about here is very different from what's used in WebRTC. As in a WebRTC VoIP session isn't even going to use this code. I've been doing the speech tuning differently. In general, there's no guarantee that the feedback I get here will generalize, but then again, what else can I do. If one person prefers B to A and I have data for nobody else, it's not proof, but B is still (slightly) more likely to be a better choice than A. The alternative would be just picking the worse. And for music, the thing to actually optimize for is noise-free studio recordings, so there's no issue with that. For speech, you want to look at both clean and noisy, but as I said, it's a different kind of tuning, involving different code, different people, different clips, ...

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #5
The tests we see encode samples of professionally mastered studio recorded audio and have them rated by "golden ears" listeners using high-quality setups in ideal silent listening conditions.
First of all.  "golden ears" is a myth. Maybe good hearing.
Second. Jean-Marc Valin has no other choice. There were only two guys (Kamedo2 and me  ::) ) who have submited some tests here in Hydrogenaudio and I wasn't active in this year yet :( . So it's just Kamedo2. https://hydrogenaud.io/index.php/topic,111798.50.html

Third. There were some studies demostrating that controled tests (experienced listeners, dedicated setups etc...) are more useful than not controled.

I didn't complete test on BuildC but my results were  inline with the results of Kamedo2.
And we don't talk about super HF stuff (16kHz+) and extremely wide stereo separation. It's not an "audiophile"/golden ears stuff.   BuildC works rather in MF ranges as well (pretty audible). Opus 1.2 has better stereo separation but without hurting quality of one particular channel.

Everybody is welcome to try Build C/E or any other build. Just imagine if 10-20 members will submit just 1-2 samples/results. That would be very useful.

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #6
I'm certainly not saying these tests aren't helpful or that the information shouldn't be used! Just that in taking the information into account one needs to consider not only statistical strength but also possible sources of bias.

If a different tuning process is happening for speech, it takes noise on both ends into account, and the classifier continues to improve, that covers basically all my concerns. Though there are use cases for noisy music I can see optimizing for that wouldn't be a priority.

Part of why I wondered about this was hearing about both the HF tilt work here and the "fullband speech starting from 14kbps" in the 1.2a changelog.

If there are ways we here at HA could be of help in testing speech improvements, I'd be interested to know.

IgorC: By "golden ears" I just mean someone who can ABX things most listeners can't (despite some training), not any kind of audiophile superstition. Also, it's certainly possible to have controlled tests with artificial noise sources in either the input or the listening setup, suboptimal equipment, or untrained listeners. But unless you have a sizable budget it's not easy to organize something like that and get enough participation to have statistical strength given the likely higher variance of responses.

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #7
Interesting possible improvement though not a statistically significant result.
36kbps is significant(p=0.020 and p=0.044).
It's a paired test so you cannot look at the error bar to determine the significance.

The tests we see encode samples of professionally mastered studio recorded audio and have them rated by "golden ears" listeners using high-quality setups in ideal silent listening conditions. But won't high frequency content and stereo separation matter considerably less to listeners when the original recording was noisy, their equipment and their hearing are average rather than stellar, or their listening environment has background noise? Won't the low frequencies typically be less masked? And aren't those use cases especially important for WebRTC etc?
If you listen to the music on moving trains, cars, etc, low frequencies typically be more masked. And they are the typical use case of Opus at low rates.

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #8
I'd be interested in comments on the quality of buildC at higher bitrates. I suspect it will be inaudible at 96kb/s, but if there's a difference at 64 kb/s, it would be good to confirm whether it still helps or if the change should only apply to low bitrates.

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #9
64 kbps, 48 kHz, stereo.

Build "A" https://jmvalin.ca/misc_stuff/opus-tools-analysis24k.zip
Build "C" https://jmvalin.ca/misc_stuff/opus-tools-buildC.zip

Average MOS:
Build C - 4.062
Build A - 4.038

Code: [Select]
	64k	
A C
01 Moth 4 4,2
02 castanets 3,8 4
03 eig 4 3,8
04 Bachpsichord 3,5 3,4
05 Enola 3,9 4,1
06 trumpet 4,2 4,2
07 applaud 4 4
08 velvet 3,4 3,5
09 linchpin 3,8 4
10 spill the bolld 3,7 3,6
11 Female speech 5 5
fatboy 4,8 4,8
13 male speech w/background 4,4 4,2

Average 4,038 4,062


Samples: https://drive.google.com/file/d/0ByvUr-pp6BuUY0lrWTVHZWFrMG8/view

Observation: While there are no big differences between audio quality of  two builds I personally like how Build C sounds. It doesn't break anything at all and it sounds more balanced.

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #10
96 kbps, 48 kHz, stereo.

Average MOS:
Build C - 4.685
Build A - 4.669

Code: [Select]
	96k	
A C
01 Moth 4,9 4,8
02 castanets 3,9 4,2
03 eig 4,8 4,7
04 Bachpsichord 4,7 4,6
05 Enola 4,7 4,6
06 trumpet 4,7 4,8
07 applaud 4,5 4,5
08 velvet 4,6 4,6
09 linchpin 4,3 4,5
10 spill the bolld 4,8 4,8
11 Female speech 4,9 4,9
fatboy 5 5
13 male speech w/background 4,9 4,9


Average 4,669 4,685
Min 3,9 4,2

Samples: https://drive.google.com/file/d/0ByvUr-pp6BuUYzFiWTYtaEl5VHM/view?usp=sharing

Observation: MOS are on par however  build C was significantly better on two critical samples (Castanets and Linchpin).  Better handling of HF pre-echo on castantets clicking and better on guitar of Linchpin sample (more detail & less distortion on both guitars and bass drum  hitting).

All in all, as for me Build C is way to go at 64 and 96 kbps.  Absolutely yes.
 

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #11
I have also tried Build C at 128 kbps but Opus gets into transparency region very quickly at this bitrate.

Only the most critical samples were tested at 128 kbps. (Linchpin and Castanets).

A graph of previous 64, 96 kbps results and two samples at 128 kbps.


 

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #12
Thank you for your tests, IgorC. They show that Opus is great at 128 kbps and very good at 96 kbps.
lame3995o -Q1.7
opus --bitrate 140

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #13
 hi, halb27
Well, I have tested my set of samples. It makes sense to test more critical samples at 128 kbps and higher.
If You just would try yours  that would be great :)

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #14
Observation: MOS are on par however  build C was significantly better on two critical samples (Castanets and Linchpin).  Better handling of HF pre-echo on castantets clicking and better on guitar of Linchpin sample (more detail & less distortion on both guitars and bass drum  hitting).

All in all, as for me Build C is way to go at 64 and 96 kbps.  Absolutely yes.
Thanks a lot for taking the time to test this. I just released 1.2-beta, incorporating this change. To be more precise, the change I made in build C is fully enabled below 64 kb/s and then gradually phased out between 64 kb/s and 80 kb/s. For now I didn't want to change the behaviour at high bit-rate just to be on the safe side (it's already high quality so I want to make sure I don't make anything worse).

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #15
@ IgorC: I did that. Just finished.
I used opus-tools-buildC.zip from your link some posts ago.

I started with herding_calls using --bitrate 96. I was not able to ABX it. For a comparison I used --bitrate 48, and though I succeeded I'd call even the --bitrate 48 result pretty good.

I continued with trumpet and could ABX it using --bitrate 96. I could not ABX it using --bitrate 128.

I was not able to ABX trumpet_myPrince using --bitrate 96. Using --bitrate 48 I could but the --bitrate 48 result is acceptable.

I could ABX lead_voice using --bitrate 96, but not using --bitrate 128 (though I got at 6/8, so there is a chance I could ABX it with a lot more ABXing pain. But some years ago I decided not to do extreme ABXing any more: no fun and questionable significance for practical listening).

The weak point comes with harp40_1. I'd call the --bitrate 96 result pretty annoying, and also the --bitrate 128 result is easy to ABX.

With the exception of harp40_1 I'm really impressed by the quality of the --bitrate 128 and also the --bitrate 96 results.

Some remarks:
  • opus seems to have a good sense for critical situations. The bitrates for these difficult samples were quite a way higher than the average bitrate for the bitrate settings.
  • For my test set of various kind of pop music the average bitrate was ~9 per cent higher than the --bitrate value (for --bitrate 48, 64, 96, 128)
  • For my test set of various pop music the opus track peak value was very often something like 1.36. How can I make sure that clipping is avoided on any device I'd like to play the opus file with?


lame3995o -Q1.7
opus --bitrate 140

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #16
Thank You, Halb

I've tried harp40_1
sample ftp://ftp.tnt.uni-hannover.de/pub/MPEG/audio/sqam/harp40_1.wav
http://sound.media.mit.edu/resources/mpeg4/audio/sqam/

Agree, 96 kbps wasn't enough for acceptable quality. 128 kbps was easy abxable but differences weren't such annoying as at 96 kbps.

I just released 1.2-beta, incorporating this change. To be more precise, the change I made in build C is fully enabled below 64 kb/s and then gradually phased out between 64 kb/s and 80 kb/s. For now I didn't want to change the behaviour at high bit-rate just to be on the safe side (it's already high quality so I want to make sure I don't make anything worse).
Yes, it's understandable. Glad to see Kamedo2's and my results were useful.

Opus hits 1.2-beta.  :)  Nice.

Also I think build C demonstrates a potential of possible smart adaptive bitrate distribution for LF/HF (in future?). Anyway build C  already does good job here and You guys work hard on AV1 video format so I don't pretend to see drastic quality changes those will require a huge amount of work. (hm, neither I have submited any useful tests in previous months)

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #17
This morning I also tested harp40_1 using --bitrate 160 (172 kbps on average for my test set of varios pop music) and --bitrate 192 (205 kbps for my test set) using buildA and build C. I can ABX each of the encoding results.

Using --bitrate 160 I'd call the results acceptable, using --bitrate 192 the issue is negligible to me.

Considering the fact that --bitrate 128 is transparent for nearly everything, --bitrate 160 is my sweet spot giving some (but not exaggerated) headroom for very evil samples.

Ignoring very evil samples --bitrate 96 is great and --bitrate 128 is pefect (judging from what we know so far).
lame3995o -Q1.7
opus --bitrate 140

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #18
halb27, thank you for the sample harp40_1 and tests.
Let's see if devs can do something with it.

Re: Personal Listening Test of Experimental Modified Opus Encoders at 36, 48 kbps

Reply #19
FYI, the new release I uploaded (1.2-rc1) should improve quality on signals that have a few strong tones, like glockenspiel (bot not harpsichord which has tones everywhere). I also recalibrated the average VBR rates on my collection, which means that the bitrate will be about 2% lower than with 1.2-beta.

 
SimplePortal 1.0.0 RC1 © 2008-2018