Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Best tweaks for encoding speech with Vorbis (Read 9631 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Best tweaks for encoding speech with Vorbis

The title says it already: I want to encode a lot of spoken messages (mainly just human speech, mono) with Vorbis. I want to go as low as possible, but still not suffer too much from artifact of lossy compression.

My question is: has anybody found a relatively optimal oggenc tweaks to get a nice-sounding audio at low bitrate, but not suffering from lossy artifact?

In the past I have typically used 48kbps compression:

oggenc -o out.ogg --bitrate=48 --downmix src.wav

Something like that. It yields sound that is better than MP3 (in my opinion; I can be wrong since now there are so many more MP3 encoders), but as I listen more often, I realize there is a kind of strange "echo" here and there, especially when there is rich sound like American "are". The strange "echo" is somewhat like the "robot" sound in movies. I can upload a sample Vorbis stream to point that out (please let me know how to upload it, I am new to this forum).

I have been using oggenc version 1.0.2 provided by Ubuntu 7.04. The original stream has 44kHz sampling rate. I tried a simple tweak by compiling aotuv beta 5.5 (b5.5_20080330) and use its shared library in place of the stock liboggenc, by invoking this kind of script (Bourne shell script):

#!/bin/sh
export LD_LIBRARY_PATH=/usr/local/aotuv-b5.5_20080330/lib
exec oggenc "$@"

Still, the artifact is there.

As another attempt, I tried to reduce the bitrate using "ssrc", then invoking oggenc. Here's what I got for oggenc-ing the data stream:

Code: [Select]
Encoding speech: TEST 04

Subdir: /data1/wirawan/test/vorbis/speech04
Sample: pet_30.flac
The original filename was cut from LS Peter radio message #30 (1 minute length).

                        Sample        Bitrate            File size
  Filename                rate  Nominal Avg Inflation  Actual  Inflation
                        (kHz)  (kbps)  (kbps)  (%)    (bytes) (%)
  16khz/oggenc-32kbps.ogg 16    32    29.81  -6.85    226969  -41.41
  16khz/oggenc-48kbps.ogg 16    48    38.84  -19.09    294674  -23.93
  16khz/oggenc-64kbps.ogg 16    64    48.52  -24.18    367650    -5.1
  16khz/oggenc-80kbps.ogg 16    80    61.93  -22.59    468206  20.86
  22khz/oggenc-32kbps.ogg 22    32    39.03  21.96    296110  -23.56
  22khz/oggenc-48kbps.ogg 22    48    59.89  24.76    452540  16.82
  22khz/oggenc-64kbps.ogg 22    64    75.60  18.12    570709  47.32
  22khz/oggenc-80kbps.ogg 22    80    91.83  14.79    692450  78.74
  32khz/oggenc-32kbps.ogg 32    32    37.83  18.22    287232  -25.86  Very robotic
  32khz/oggenc-48kbps.ogg 32    48    55.48  15.59    419620    8.32  OK, but second man's voice is not great
  32khz/oggenc-64kbps.ogg 32    64    65.38    2.16    493593  27.41
  32khz/oggenc-80kbps.ogg 32    80    74.31  -7.12    560914  44.79
  44khz/oggenc-32kbps.ogg 44    32    37.65  17.64    285854  -26.21
  44khz/oggenc-48kbps.ogg 44    48    51.18    6.64    387396  Baseline
  44khz/oggenc-64kbps.ogg 44    64    63.96  -0.06    482875  24.65
  44khz/oggenc-80kbps.ogg 44    80    70.87  -11.41    534853  38.06

  Inflation is the percent kbps inflation of the avg kpbs in comparison to
  the nominal (target) kbps.

  File size inflation is against the "baseline" of 44khz/48kbps encoding.

Interesting! At lower sampling freq (22 and 32kHz), actually the file size is larger (at 48, 64, 80 kbps). Now this can be a topic on its own, but my main question remains: how to optimize the compression-vs-quality?

For your notes, this may be relevant: the original audio may not be directly from a raw source (I mean, like recorded directly, or from faithful CD-quality recording). In the case above, it is actually from a high-quality MP3 mono stream (which I guess is 80kbps mono stream).

Linux "file" utility yields the following information (filename is different, but they are of the same kind) for the original file:

/data1/wirawan/test/vorbis/speech04 $ file /d/temp/ls/luk/Luke_01.mp3
/d/temp/ls/luk/Luke_01.mp3: MPEG ADTS, layer III, v1, 160 kBits, 44.1 kHz, Monaural

Any help and pointer will be appreciated. Unfortunately I don't have time to deeply study this matter, so it is best to go to the point, and point the deeper explanation (web pages, wiki) as a "side note".

Wirawan

 

Best tweaks for encoding speech with Vorbis

Reply #2
I did try speex a little bit, but I did not find it very satisfactory. probably I wasn't trying seriously. Another problem, as many other members already point out, is that speex is not widely available on systems other than "computer". It is not yet supported on small hardware like portable audio players. I want to create a copy of OGG file which can be played both on computers and portable audio players alike.

Best tweaks for encoding speech with Vorbis

Reply #3
Quote
I did try speex a little bit, but I did not find it very satisfactory. probably I wasn't trying seriously.


Did you try ulta-wideband mode? Speex also has echo cancellation.

Quote
Another problem, as many other members already point out, is that speex is not widely available on systems other than "computer".


It supported on the Rockbox open-source firmware, which is used by many DAP. Take a look at the website:

http://www.rockbox.org/twiki/bin/view/Main/WhyRockbox
budding I.T professional

Best tweaks for encoding speech with Vorbis

Reply #4
If you're still open to the idea of using mp3 for your application, try LAME. I find the following parameters to provide amazingly small files that are transparent for me:
Code: [Select]
lame -V8 -m m --resample 24

If you have the time, try it and let us know what you think.

Best tweaks for encoding speech with Vorbis

Reply #5
FWIW, I have some vorbis files.. don't recall the options used, but they show as mono, 44.1 khz sampling, 30 kbps.

They play ok in DBpoweramp player and my Rockbox Sansa, but won't play in foobar2000 or winamp.

If I recall correctly, when I first started playing with mono, DBpoweramp played it back at double speed (like it split the available mono samples between the L and R channels,) but Spoon fixed it promptly when I reported the problem.