Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: AAC for speech - as low as reasonable? (Read 5692 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

AAC for speech - as low as reasonable?

Hi,

I'm a teacher without a lot of audio knowledge. I made a teaching tool for my pupils where they need to listen to speech files on whatever devices they have lying around (so probably lots of eons-old Android devices, PCs, also modern stuff like iPads). My research says I need to offer to codecs: AAC and OGG.

Can you help me optimize the audio to keep the size of files down as much as reasonable? It's just my voice, it will be listened to via the phone's speaker or via headphones. Obviously I'll take mono / 22khz. But what bitrate would you suggest? And what encoder do you suggest? So far I planned on using Audacity for recording and exporting but I'm happy to install free stuff, also I have Cubase 6.5 somewhere in my drawer, and Final Cut Pro 10.2 as well as a complete Adobe CS5, if any of those deliver better quality).

Thank you very much for your advice!
Harry

Re: AAC for speech - as low as reasonable?

Reply #1
i use qaac V36 mono at 44.1khz. It should be around 36kbps but in reality it is usually around 40kbps.

22khz will work very well with these settings as well, but i don't know how much bits you will shave off.

Re: AAC for speech - as low as reasonable?

Reply #2
Thank you for your reply. I read something about AAC-HE and AAC-LC. High Efficiency sounds like something I would want. But does anyone know if this requires more modern decoders? Remember, students will use any old Android phone/PC/iPad … This chart only mentions AAC as compatible for all browsers. Does this mean vanilla AAC, or would it include HE-AAC?

Edit: Wiki says I should be fine. "HE-AAC v2 decoders are provided in all versions of Android. iOS 4 includes full decoding of HE-AAC v2 parametric stereo streams."

Re: AAC for speech - as low as reasonable?

Reply #3
Ah yes, i assumed LC-AAC. I don't use HE-AAC so i can' help you with the "best" settings for voice or its compatibility.

What about ogg? Is it required?

Re: AAC for speech - as low as reasonable?

Reply #4
If you're targeting computers and smartphones, HE-AAC should work fine, because I think anything made or updated in the last 10 years should play it out of the box. The possible rare exception could be some GNU/Linux distro with no non-free bits, but I would expect the user to be able to install a 3rd party player/decoder in that case. If one of your pupils is Richard Stallman, you might just have to deal with that individually. :)
If you want to cover all the "dumb" devices like car stereos and old phones... you're stuck with MP3, I'm afraid.
Opus would give you better quality than HE-AAC, but it would probably make at least some users install new software.

So I'd go with HE-AAC. I don't know for sure which HE-AAC encoder is the best, I'll let someone else chime in on that. But I think Apple's (iTunes) encoder should be a safe choice, with something like 48-64 kbps VBR.
And I don't think HE-AAC v2 has any advantage over HE-AAC v1 for mono signals, but correct me if I'm wrong. When I try to encode a mono file with v2 in foobar2000 I get the exact same result as if I select v1.

Re: AAC for speech - as low as reasonable?

Reply #5
Thanks, I'll go with HE then. I'm on a Mac so I can use their command line tool, neat (unless anyone advises against / has a better suggestion). As per the "supported platforms / codecs" link above, I should offer AAC with OGG as fallback. Having read about OPUS yesterday I'm intrigued but I don't know if it does any good in the field. At least my web developer friend just talked about the other two. Nobody will install anything for their homework, that's for sure. But I believe there are ways for the browser to select which of my offered files it can play.

If one of your pupils is Richard Stallman...
Damn, I knew this wouldn't be easy ;)

Re: AAC for speech - as low as reasonable?

Reply #6
Converting files to 16KHz and encoding them at 16/24kbps HE-AAC would be more than enough if we are talking about single channel speech audio. This will sound near transparent.

Re: AAC for speech - as low as reasonable?

Reply #7
Try Winamp fhgaacenc.exe encoder. It has a surprisingly nice quality at low bitrates. If you are going with HE, then don't downsample.

Re: AAC for speech - as low as reasonable?

Reply #8
As per the "supported platforms / codecs" link above, I should offer AAC with OGG as fallback.
If it's for browser playback, I think you can do AAC + Opus. Opus is significantly better than OGG Vorbis and it should work just as well as an alternative for the few browsers that might not support AAC. (Opus is in fact even better than HE-AAC, so you should list it first, if you want to save bandwidth.)
The only reason to use Vorbis instead of Opus would be for something like a libre-only Linux distro with an old version of Firefox that hasn't been updated in almost 10 years -- a very unlikely scenario. (According to this, Firefox has supported Opus since 2012.)

Re: AAC for speech - as low as reasonable?

Reply #9
The OP mentioned Opus briefly above. with Opus I think you can go as low as 13kbps and still have passable audio quality and will drastically cut down on file size, especially if you got many hours of audio. you can tell sound quality at 13kbps is not as good as original source, as you can notice the sound quality drop off, but it's not significantly worse.

basically when it comes to speech with Opus I typically use 13kbps, but I would avoid going any lower. but you can play with it a bit and if it's not up to the standard you want you can raise bit rates a bit (maybe 16kbps, 20kbps, 24kbps, 32kbps). but since it's just speech, we don't need music level transparency.

but in terms of smart phones one can play Opus with the mobile version of Foobar2000 etc and one can play Opus files in Firefox browser etc.

p.s. for measure, just looking at a bunch of speech recordings I encoded with Opus @ 13kbps... 5 days 16hrs 44min 30sec of speech audio is only 783MB in size.
For music I suggest (using Foobar2000)... MP3 (LAME) @ V5 (130kbps). NOTE: using on AGPTEK-U3 as of Mar 18th 2021. I use 'fatsort' (on Linux) so MP3's are listed in proper order on AGPTEK-U3.

Re: AAC for speech - as low as reasonable?

Reply #10
So far I planned on using Audacity for recording and exporting

I always had MUCH better results (quality) by using my mobile phone to record voice
and then doing a little bit of editing (cut, amplify...) using Audacity.
My house is very old. Back then proper grounding was not required
so I have lots of noise and interference.

I just did my own test.
The lowest I can go is 32 kHz. I noticed big quality decrease by using 24 kHz and lower.
I tested MP3 (LAME) and AAC-LC (qaac).
The lowest I would go is qaac V54 (56 kbit/s) and LAME -V6 (64 kbit/s)

gold plated toslink fan

Re: AAC for speech - as low as reasonable?

Reply #11
Thank you all very much for your time and input!

Re: AAC for speech - as low as reasonable?

Reply #12
Try Winamp fhgaacenc.exe encoder. It has a surprisingly nice quality at low bitrates. If you are going with HE, then don't downsample.
Is there a proper listening test of HE-AAC encoders somewhere?

Anyway, I did a little test with a sample that seemed somewhat problematic. I converted it with 4 AAC encoders in Foobar at around 32kbps VBR/ABR. I used a mono file for input and then again the same file converted to stereo (for compatibility and consistency). I'm attaching the files.

Some quick takeaways:
1. FDK crashes with a mono input.
2. Nero is the only one that, with a mono input, produces a mono AAC file. Is this standard compliant or can it cause problems?
3. Apple sounded the worst here (even though it's arguably the best encoder for AAC-LC).
4. Winamp sounded the best to me, but I didn't spend a lot of time matching bitrates.

Re: AAC for speech - as low as reasonable?

Reply #13
Indeed AAC files produced by Winamp with HE or HEv2 preset are reported as stereo. LC is mono. I don't know why that is. When given a mono file, the encoder appears to automatically switch off parametric stereo and the files are HE-AAC without v2.

I can't really say with confidence whether QAAC or WA is better. They have different artifacts. The quality goes down fast below 32 kbit/s. I wouldn't go any lower than that today. The aspect I was impressed with in Winamp was the parametric stereo without distracting jumps in the panning. It is reasonable to have stereo with speech content where one person is panned left and another one right or center, without sacrificing bandwidth. In AAC-LC, Winamp has slightly more bandwidth at 24 kbit/s.

Re: AAC for speech - as low as reasonable?

Reply #14
Just to follow up: If anyone is in a similar situation then check out auphonic.com That's a service for podcasters that you can use for free for a few hours of material every month. So after recording all my words I put them through their post-processing and they adjusted all the levels etc. As I said I have very little knowledge about this topic and this certainly helped quite a lot.

They also have a blogpost from 2017 explaining what codecs / settings they suggest:

mp3: 80kbit (mono)
opus: 32-40kbit (mono)
HE-AAC: 40-48kbit (mono)

That's different to the suggestions here. Maybe that's because the blog post is so old. In any case I would consider this a solid starting / reference point to see how low I can get.

Re: AAC for speech - as low as reasonable?

Reply #15
In my opinion you are trying to build a house starting from the roof, audio compression is only the least of the problems to be addressed. I use 36kbps as bitrate, with xHE-AAC and Opus you can think of going down to 24kbps, with HE-AAC I prefer to stop at 40kbps.

Your choice depends on how you have to distribute the content to students, whether through a podcast, or embedded in web pages. In the latter case the only alternative to AAC is Opus.

For now I'll leave you only a quickly example, later I'll give you the files to compare for your every need and I'll write you how to get them.

https://www.celona.it/test/

Re: AAC for speech - as low as reasonable?

Reply #16
Can you help me optimize the audio to keep the size of files down as much as reasonable? It's just my voice...

Hi Harry (PupilsPet), did you hear the audio that I have embedded in the web page?

Since using AAC is easier, I preferred to show you the case of Opus with a one-minute monophonic (wrong) recording that a professional made to test the microphones. Obviously I followed the path you already took, so the recording is treated with Auphonic Leveler before compression.

The source file can be found here:
https://www.celona.it/test/balestri.wav

or if you prefer I have created a version in FLAC too:
https://www.celona.it/test/balestri.flac

In the source of the web page you will have found the following lines:
<audio controls>
<source src="balestri.caf" type="audio/x-caf">
<source src="balestri.webm" type="audio/webm">
Your browser does not support the audio tag.
</audio>

The files with the extension .caf and .webm have the same content compressed in Opus at 24kbps. As with the exception of MPEG-1 layer 3 (known as .mp3) and MPEG-2 or MPEG-4 AAC-LC (also known as .m4a) there are no other compressed formats that will work on any browser, I have used two different containers, doubling the space disk that you will need to host them on the server.

CAF, the first container, was created by Apple and (oversimplifying it) only works on its devices. At the moment it is necessary because only macOS and iPadOS support WebM (the second container) and only in the latest update of the operating systems and in the future it can be removed. In particular, we need it because only 80% of Apple customers constantly update operating systems and older models would remain uncovered. Also, iPod touches and iPhones do not currently play audio contained in WebM.

WebM I preferred it to the Ogg container for Opus because in the old versions of Android and browsers there is fragmentation in the extensions recognized for the HTML Audio tag, today the extension .opus is used instead of .ogg or .oga and who update your browser or apps will always find the supported this container that Google created by simplifying Matroska (.mkv or .mka).

Another reason why I propose Opus at 24kbps is due to the fact that you can keep the sampling frequency at 48kHz, the same used today in videos, in fact by reducing the target bitrate the opusenc encoder eliminates the sounds above 12kHz, making recording a little more closed at high frequencies without major limitations because our voice is always below 16kHz.

Finally you can export to Opus directly from Auphonic, but I do not recommend it, although they get paid well, their software is not updated frequently and the libraries in use today are replaced by newer versions that offer less artifacts in the sound.

If you are interested in learning more about the topic, you can find me here, albeit occasionally.

Below I leave you the links to all the versions I have created so that you can hear them and orient yourself towards what you prefer more quickly:
https://www.celona.it/test/balestri.caf
https://www.celona.it/test/balestri.m4a
https://www.celona.it/test/balestri.m4b
https://www.celona.it/test/balestri.mka
https://www.celona.it/test/balestri.mp3
https://www.celona.it/test/balestri.mp4
https://www.celona.it/test/balestri.ogg
https://www.celona.it/test/balestri.opus
https://www.celona.it/test/balestri.ts
https://www.celona.it/test/balestri.webm

https://www.celona.it/test/exhale0.m4a
https://www.celona.it/test/exhale1.m4a
https://www.celona.it/test/exhale2.m4a
https://www.celona.it/test/exhale3.m4a
https://www.celona.it/test/exhalea.m4a
https://www.celona.it/test/exhaleb.m4a
https://www.celona.it/test/exhalec.m4a
https://www.celona.it/test/exhaled.m4a

If you prefer to use recordings in english, french or german I recommend the following EBU tests:
https://www.celona.it/test/ebu/49.wav
https://www.celona.it/test/ebu/50.wav

https://www.celona.it/test/ebu/51.wav
https://www.celona.it/test/ebu/52.wav

https://www.celona.it/test/ebu/53.wav
https://www.celona.it/test/ebu/54.wav

Christian Celona