Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Personal blind sound quality comparison of Opus hard-CBR with framesize options (Read 6927 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Personal blind sound quality comparison of Opus hard-CBR with framesize options

Abstract:
Personal blind sound quality comparison of the Opus codec from 28 to 80 kbps, hard-cbr, on various framesize.
Hard CBR is designed so that encrypted secret speech cannot be guessed by network eavesdroppers through the bitrate side channel.
The Opus encoding introduces 26.5 ms of delay on default. 26.5 ms is short but according to the manual, the small delay can be reduced further, by setting a shorter framesize. In this test, we will see whether this is useful for narrowband connections down to 56 kbps.
In this test, longer framesize was tested as well.

Encoders:
The official Opus encoder, official Windows build opus-tools-0.2-opus-1.3, the latest distributed versions as of 2021 July, was used.

Settings:
opusenc --hard-cbr --bitrate 28 --framesize 20 in.wav out.opus
opusenc --hard-cbr --bitrate 28 --framesize 60 in.wav out.opus
opusenc --hard-cbr --bitrate 40 --framesize 20 in.wav out.opus
opusenc --hard-cbr --bitrate 40 --framesize 60 in.wav out.opus
opusenc --hard-cbr --bitrate 56 --framesize 20 in.wav out.opus
opusenc --hard-cbr --bitrate 56 --framesize 5 in.wav out.opus
opusenc --hard-cbr --bitrate 80 --framesize 20 in.wav out.opus
opusenc --hard-cbr --bitrate 80 --framesize 5 in.wav out.opus

Sample tracks:
15 sound samples from Kamedo2's samples.
12 sound samples from IgorC's samples.
Total 27 diverse music and speech sound samples.
Due to my mistake, three sound sample tracks (Waiting, Experiencia, Heart to Heart) were tested twice. The duplicated ratings were averaged.

Hardware:
Sony PSP-3000 + AKG K712.

Results:




Conclusions & Observations:
  • Opus at 56kbps hard-CBR, default framesize, exceeded 3.0 on a MOS scale on the majority of the test tracks.
  • Opus at 80kbps hard-CBR, default framesize, exceeded 3.0 on a MOS scale on all of the test tracks.
  • Changing the Opus framesize, from the default 20 millisecond setting, always resulted in worse fidelity or remained about the same--The default setting was the best.
  • In Opus 56kbps and 80kbps, setting shorter framesize resulted in severe fidelity loss, compared to the default setting.
  • In Opus 28kbps and 40kbps, setting longer framesize also resulted in fidelity loss on the average on the test tracks, which is in line with the official manual. Strictly speaking, it is unclear whether the longer framesize will decrease fidelity on average. They passed the Student's paired t-test (p<0.05) but the multiple comparisons problem remains.
  • In Opus 56kbps, on the "Trust" sample track, setting shorter framesize resulted in better fidelity as an exception. It was a transient sample; the applause from all directions was recorded in the sample track. Adaptive use of 5ms framesize in Opus may be worth considering in the future Opus development.
  • In Opus 56kbps, on the "applaud" sample track, on the default framesize, was an outlier. The volume of the crowd is notably bigger in 0:02, like something other than the delight have happened in the scene. It indeed changes the nuance of the original recording, where codecs should achieve fidelity and transparency. Like the "Trust" sample, setting 5ms framesize resulted in better fidelity.

Anova analysis:
Code: [Select]
FRIEDMAN version 1.24 (Jan 17, 2002) http://ff123.net/
Blocked ANOVA analysis

Number of listeners: 27
Critical significance:  0.05
Significance of data: 1.11E-016 (highly significant)
---------------------------------------------------------------
ANOVA Table for Randomized Block Designs Using Ratings

Source of         Degrees     Sum of    Mean
variation         of Freedom  squares   Square    F      p

Total              215         140.21
Testers (blocks)    26          33.35
Codecs eval'd        7          82.91   11.84   89.97  1.11E-016
Error              182          23.96    0.13
---------------------------------------------------------------
Fisher's protected LSD for ANOVA:   0.195

Means:

std-80k  std-56k  std-40k  short-80 long-40k short-56 std-28k  long-28k
  3.92     3.38     2.78     2.64     2.63     2.17     2.11     2.01

---------------------------- p-value Matrix ---------------------------

         std-56k  std-40k  short-80 long-40k short-56 std-28k  long-28k
std-80k  0.000*   0.000*   0.000*   0.000*   0.000*   0.000*   0.000*
std-56k           0.000*   0.000*   0.000*   0.000*   0.000*   0.000*
std-40k                    0.173    0.130    0.000*   0.000*   0.000*
short-80                            0.881    0.000*   0.000*   0.000*
long-40k                                     0.000*   0.000*   0.000*
short-56                                              0.549    0.097
std-28k                                                        0.287
-----------------------------------------------------------------------

std-80k is better than std-56k, std-40k, short-80k, long-40k, short-56k, std-28k, long-28k
std-56k is better than std-40k, short-80k, long-40k, short-56k, std-28k, long-28k
std-40k is better than short-56k, std-28k, long-28k
short-80k is better than short-56k, std-28k, long-28k
long-40k is better than short-56k, std-28k, long-28k


Raw data:
Code: [Select]
std-28k	long-28k	std-40k	long-40k	std-56k	short-56k	std-80k	short-80k
%feature 7 20 ms 60 ms 20 ms 60 ms 20 ms 5 ms 20 ms 5 ms
%feature 10 28kbps 28kbps 40kbps 40kbps 56kbps 56kbps 80kbps 80kbps
%feature 11 --bitrate 28 --bitrate 28 --bitrate 40 --bitrate 40 --bitrate 56 --bitrate 56 --bitrate 80 --bitrate 80
%feature 12 --framesize 20 --framesize 60 --framesize 20 --framesize 60 --framesize 20 --framesize 5 --framesize 20 --framesize 5
%genre Kamedo2's 15 sample
1.800 1.600 2.600 2.400 3.300 1.700 3.900 1.900
2.200 2.000 2.500 2.600 3.800 1.800 4.300 3.100
1.600 1.500 2.900 2.200 3.300 1.700 3.700 2.400
2.000 1.700 2.800 2.500 3.500 1.800 3.800 2.400
2.100 2.300 2.700 2.800 3.700 2.900 3.900 3.200
1.800 1.600 2.300 2.200 3.400 1.500 3.900 1.900
2.600 2.700 3.700 3.500 3.900 2.300 4.100 3.200
1.900 1.700 2.800 3.100 3.500 1.500 3.700 1.800
1.800 1.700 2.400 2.000 2.600 1.600 3.300 2.200
1.600 1.600 2.500 2.400 3.400 1.500 3.900 1.900
1.900 1.600 2.300 2.000 2.800 3.600 3.700 3.500
2.250 2.250 2.600 2.700 3.550 1.750 3.800 2.000
2.100 1.950 3.100 3.100 3.750 2.000 4.050 2.250
1.750 1.850 2.700 2.750 3.650 1.550 4.150 2.400
2.200 1.900 2.600 2.700 3.300 2.100 3.700 2.300

%genre IgorC's 12 sample
2.300 2.200 2.600 2.500 3.400 2.400 3.900 3.300
3.300 3.200 3.900 3.700 4.200 3.500 4.300 4.000
1.800 1.900 2.900 2.300 3.500 1.500 4.000 1.600
1.600 1.700 2.300 2.500 2.900 1.500 3.600 1.800
1.900 1.800 2.500 2.300 3.600 2.100 4.100 3.100
1.500 1.600 1.800 1.700 2.900 1.900 3.900 2.700
2.000 1.900 2.600 2.200 1.500 2.800 3.700 2.500
2.700 2.600 3.000 2.900 3.300 2.800 3.500 3.100
2.600 2.700 3.300 3.100 3.600 2.900 3.900 3.500
2.400 2.200 2.900 3.000 3.300 2.100 3.900 2.300
3.500 2.700 4.300 3.600 3.800 3.100 5.000 3.400
1.800 1.700 2.400 2.200 3.800 2.700 4.100 3.600

%samples 41_30sec Perc.
%samples finalfantasy Strings
%samples ATrain Jazz
%samples BigYellow Pops
%samples FloorEssence Techno
%samples macabre Classic
%samples mybloodrusts Guitar
%samples Quizas Latin
%samples VelvetRealm Techno
%samples Amefuribana Pops
%samples Trust Gospel
%samples Waiting Rock
%samples Experiencia Latin
%samples Heart to Heart Pops
%samples Tom's Diner Acappella

%samples 01 castanets inst.
%samples 02 fatboy_30sec Techno
%samples 03 eig Techno
%samples 04 Bachpsichord inst.
%samples 05 Enola Techno
%samples 06 trumpet inst.
%samples 07 applaud Live
%samples 08 velvet perc.
%samples 09 Linchpin Rock
%samples 10 spill_the_blood guitar
%samples 11 female_speech Speech
%samples 12 French_Ad Speech

Bitrates:

The tested setting was the hard-cbr, so every frame was exactly the same size.
According to the official document of Opus-tools, hard-cbr delivers lower overall quality but is useful where bitrate changes might leak data in encrypted channels or on synchronous transports.
Understandably, the file-based bitrates were almost unchanged over many diverse musical contents.
However, by using a longer framesize, the Opus file occupied slightly less storage. About 0.1 ~ 0.3 kbps worth of file-based bitrates were saved.



Code: [Select]
%y_axis File-based bitrates of each track [bps]
%bitrate
std28k long28k std40k long40k std56k short56k std80k short80k
28866 28607 40874 40748 56885 58067 80901 82075
28852 28600 40857 40738 56863 58064 80872 82073
29000 28734 41010 40877 57023 58209 81043 82222
28925 28667 40938 40813 56955 58121 80981 82132
28985 28718 40995 40861 57007 58194 81026 82206
29027 28773 41035 40915 57046 58231 81063 82240
28979 28723 40990 40868 57005 58178 81028 82189
28869 28610 40873 40748 56879 58079 80886 82087
29309 29043 41321 41188 57337 58538 81360 82561
28917 28659 40927 40802 56941 58108 80962 82115
28876 28616 40884 40758 56896 58077 80914 82086
28994 28727 41009 40876 57029 58188 81059 82200
28982 28715 40990 40856 57000 58187 81016 82197
28871 28612 40877 40752 56885 58076 80897 82083
28987 28732 41000 40878 57018 58176 81044 82184
29752 29485 41799 41665 57861 58935 81954 82974
28865 28606 40871 40745 56878 58069 80889 82076
29115 28849 41131 40997 57151 58314 81182 82329
28916 28658 40926 40801 56938 58116 80957 82126
28877 28617 40884 40758 56894 58074 80909 82081
29333 29067 41344 41211 57358 58559 81380 82581
29478 29212 41490 41356 57505 58706 81528 82730
29236 28987 41257 41142 57286 58440 81330 82463
29073 28821 41080 40960 57088 58289 81101 82302
28851 28591 40854 40728 56859 58059 80865 82066
28995 28728 41002 40869 57013 58199 81029 82208
29225 28976 41243 41127 57267 58420 81302 82436


Re: Personal blind sound quality comparison of Opus hard-CBR with framesize options

Reply #1
Thanks for all the effort you clearly put into it.

Out of sheer curiosity: how long does an endeavour such as this one takes, from methodology implementation till the final analysis?

edit: how => how long
Listen to the music, not the media it's on.
União e reconstrução

Re: Personal blind sound quality comparison of Opus hard-CBR with framesize options

Reply #2
Out of sheer curiosity: how long does an endeavour such as this one takes, from methodology implementation till the final analysis?

It took me 68 days. I wrote the config in July 24th 2021. The first track was tested in July 26th, and the last 27th track was tested in September 19th.

Re: Personal blind sound quality comparison of Opus hard-CBR with framesize options

Reply #3
Blimey! Congrats where it's due.
Listen to the music, not the media it's on.
União e reconstrução

Re: Personal blind sound quality comparison of Opus hard-CBR with framesize options

Reply #4
It is quite likely that using the default framesize (20ms) will result in better overall quality in 28 kbps and 40 kbps.
p = 0.009957 and p = 0.005292, respectively.
It may seem strange that allowing the encoder to delay processing and you get worse quality, but that's what's stated in the xiph.org and confirmed in this test.


Code: [Select]
> a = c(1.8, 2.2, 1.6, 2, 2.1, 1.8, 2.6, 1.9, 1.8, 1.6, 1.9, 2.25, 2.1, 1.75, 2.2, 2.3, 3.3, 1.8, 1.6, 1.9, 1.5, 2, 2.7, 2.6, 2.4, 3.5, 1.8)
> b = c(1.6, 2, 1.5, 1.7, 2.3, 1.6, 2.7, 1.7, 1.7, 1.6, 1.6, 2.25, 1.95, 1.85, 1.9, 2.2, 3.2, 1.9, 1.7, 1.8, 1.6, 1.9, 2.6, 2.7, 2.2, 2.7, 1.7)
> t.test(a,b, paired=TRUE)

        Paired t-test

data:  a and b
t = 2.7806, df = 26, p-value = 0.009957
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.02752335 0.18358776
sample estimates:
mean of the differences
              0.1055556

> e = c(2.6, 2.5, 2.9, 2.8, 2.7, 2.3, 3.7, 2.8, 2.4, 2.5, 2.3, 2.6, 3.1, 2.7, 2.6, 2.6, 3.9, 2.9, 2.3, 2.5, 1.8, 2.6, 3, 3.3, 2.9, 4.3, 2.4)
> f = c(2.4, 2.6, 2.2, 2.5, 2.8, 2.2, 3.5, 3.1, 2, 2.4, 2, 2.7, 3.1, 2.75, 2.7, 2.5, 3.7, 2.3, 2.5, 2.3, 1.7, 2.2, 2.9, 3.1, 3, 3.6, 2.2)
> t.test(e,f, paired=TRUE)

        Paired t-test

data:  e and f
t = 3.0437, df = 26, p-value = 0.005292
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.04869816 0.25130184
sample estimates:
mean of the differences
                   0.15
> R.version
               _                          
platform       x86_64-w64-mingw32         
arch           x86_64                     
os             mingw32                    
system         x86_64, mingw32            
status                                    
major          4                          
minor          0.4                        
year           2021                       
month          02                         
day            15                         
svn rev        80002                      
language       R                          
version.string R version 4.0.4 (2021-02-15)
nickname       Lost Library Book

Re: Personal blind sound quality comparison of Opus hard-CBR with framesize options

Reply #5
Thanks for this interesting test. What surprises me is that, on the applaud sample, Opus (20ms) at 56 kbps sounds worse than at the two lower bit-rates, at least to your ears. Do you have an explanation for this? Could this be a bug?

Chris
If I don't reply to your reply, it means I agree with you.

Re: Personal blind sound quality comparison of Opus hard-CBR with framesize options

Reply #6
Do you have an explanation for this? Could this be a bug?

I have no idea why this particular outlier is possible.
I tested those, and cbr.52k, cbr.56k, and normal.52k have the similar volume issue at 0:02.
cbr.56k is the most pronounced.

opus-tools-0.2-opus-1.3\opusenc --hard-cbr --bitrate 44 --framesize 20 "07 applaud.wav" "07 applaud.cbr.44k.opus"
opus-tools-0.2-opus-1.3\opusenc --hard-cbr --bitrate 48 --framesize 20 "07 applaud.wav" "07 applaud.cbr.48k.opus"
opus-tools-0.2-opus-1.3\opusenc --hard-cbr --bitrate 52 --framesize 20 "07 applaud.wav" "07 applaud.cbr.52k.opus"
opus-tools-0.2-opus-1.3\opusenc --hard-cbr --bitrate 56 --framesize 20 "07 applaud.wav" "07 applaud.cbr.56k.opus"
opus-tools-0.2-opus-1.3\opusenc --hard-cbr --bitrate 60 --framesize 20 "07 applaud.wav" "07 applaud.cbr.60k.opus"
opus-tools-0.2-opus-1.3\opusenc --hard-cbr --bitrate 64 --framesize 20 "07 applaud.wav" "07 applaud.cbr.64k.opus"
opus-tools-0.2-opus-1.3\opusenc --bitrate 44 --framesize 20  "07 applaud.wav" "07 applaud.normal.44k.opus"
opus-tools-0.2-opus-1.3\opusenc --bitrate 48 --framesize 20  "07 applaud.wav" "07 applaud.normal.48k.opus"
opus-tools-0.2-opus-1.3\opusenc --bitrate 52 --framesize 20  "07 applaud.wav" "07 applaud.normal.52k.opus"
opus-tools-0.2-opus-1.3\opusenc --bitrate 56 --framesize 20  "07 applaud.wav" "07 applaud.normal.56k.opus"
opus-tools-0.2-opus-1.3\opusenc --bitrate 64 --framesize 20  "07 applaud.wav" "07 applaud.normal.64k.opus"

Re: Personal blind sound quality comparison of Opus hard-CBR with framesize options

Reply #7
OK, I figured it out, and explain here so that Jean-Marc (jmvalin) can read it. For stereo, 56 kbit/s CBR uses a different encoder configuration than for 56 kbit/s VBR, one where the encoder adaptively switches between the speech (Silk) and music (CELT) core. On the applaud sample, the first few seconds are interpreted as speech, and Silk apparently sounds quite a bit worse on applause-like signals than CELT. Using the --music option in Opus's command-line forces CELT throughout this sample. That avoids the problem you describe on this sample but, of course, a more robust speech/music discriminator would be a better solution.

Chris
If I don't reply to your reply, it means I agree with you.

Re: Personal blind sound quality comparison of Opus hard-CBR with framesize options

Reply #8
On the applaud sample, the first few seconds are interpreted as speech, and Silk apparently sounds quite a bit worse on applause-like signals than CELT.

C.R.Helmrich, thank you for the clear analysis.
Indeed, the first two seconds or so sounds like Silk-ish (the applaud is muffled and muted), and the rest sounds like CELT-ish (vivid and bright), and the abrupt increase in volume alters the nuances of the scene.
Forcing the use of CELT with --music indeed solves the problem.
Now, all of those applauds below sounds lively and close to the original.

opus-tools-0.2-opus-1.3\opusenc --hard-cbr --bitrate 48 --music --framesize 20 "07 applaud.wav" "07 applaud.cbr.music.48k.opus"
opus-tools-0.2-opus-1.3\opusenc --hard-cbr --bitrate 52 --music --framesize 20 "07 applaud.wav" "07 applaud.cbr.music.52k.opus"
opus-tools-0.2-opus-1.3\opusenc --hard-cbr --bitrate 56 --music --framesize 20 "07 applaud.wav" "07 applaud.cbr.music.56k.opus"
opus-tools-0.2-opus-1.3\opusenc --hard-cbr --bitrate 60 --music --framesize 20 "07 applaud.wav" "07 applaud.cbr.music.60k.opus"
opus-tools-0.2-opus-1.3\opusenc --bitrate 48 --music --framesize 20 "07 applaud.wav" "07 applaud.music.48k.opus"
opus-tools-0.2-opus-1.3\opusenc --bitrate 52 --music --framesize 20 "07 applaud.wav" "07 applaud.music.52k.opus"
opus-tools-0.2-opus-1.3\opusenc --bitrate 56 --music --framesize 20 "07 applaud.wav" "07 applaud.music.56k.opus"
opus-tools-0.2-opus-1.3\opusenc --bitrate 60 --music --framesize 20 "07 applaud.wav" "07 applaud.music.60k.opus"

Re: Personal blind sound quality comparison of Opus hard-CBR with framesize options

Reply #9
Thank You, Kamedo2.

It's useful to see that frames size of 60 ms doesn't bring any advantage but actually slightly inferior to 20 ms.  ::)

Re: Personal blind sound quality comparison of Opus hard-CBR with framesize options

Reply #10
From what I remember of the Opus specification, frame sizes over 20ms are just bundles of multiple 20ms frames anyway.

 

Re: Personal blind sound quality comparison of Opus hard-CBR with framesize options

Reply #11
A bit better bitrate visualization.
A shorter frame occupies slightly more space.
Longer frame saves space very slightly.


std. = --framesize 20
long = --framesize 60
short = --framesize 5