Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music) (Read 28418 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Re: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music)

Reply #50
Yes, exhale 1.1.5 (x64), the test file is mono.

Re: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music)

Reply #51
Thanks for doing this test! It got me curious about xHE-AAC and indeed, it works really well in many cases.

However, I can't make it perform better than Opus for mono voice/audiobooks at ~20-30kbps.
To me, Opus at 24k sounds better than xHE-AAC at ~28k, at least with the two clips I tried. The xHE artifacts are quite obvious and unpleasant.

I used exhale 1.1.5 from rarewares and settings 0 and b in foobar (I had to resample to 32kHz to make 0 work).
I'm not too familiar with exhale settings, so maybe I messed something up... anyway, I'm attaching all the samples, source flacs included.

Thanks for your interest :)
I admit that your xHE-AAC encodings don't sound great. Opus is indeed clearer and less distorted.
I joined xHE-AAC encoded with Fraunhofer's encoder. I must be completely tired because I can't get matching loudness  :o
Tell me if it sound better to your ear

Ah, yes, now these sound much better than the exhale encodes. After a quick examination, I'd say about as good as Opus or even a bit better, at least for clip1.

The loudness matching is probably a challenge because my samples are mono and yours are stereo. (The sources are mono.)
Switching to mono playback (with foobar's DSP) made them equally loud for me.

Re: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music)

Reply #52
I can't make it perform better than Opus for mono voice/audiobooks at ~20-30kbps.
To me, Opus at 24k sounds better than xHE-AAC at ~28k, at least with the two clips I tried. The xHE artifacts are quite obvious and unpleasant.

I used exhale 1.1.5 ...
Please keep in mind that xHE-AAC is a coding standard, and exhale is a particular encoder for that standard. Also, exhale does not implement the speech coding functionality which the xHE-AAC standard provides, so it's very likely that, at low bit-rates, the Fraunhofer xHE-AAC encoder (which supports the speech coding part) sounds quite a bit better than exhale on voice recordings.

Chris
If I don't reply to your reply, it means I agree with you.

 

Re: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music)

Reply #53
It's mistake of method. No encoder at the lowest bitrates encodes rightly all bands, and if you only try the human voice, especially male voices, between 12kHz and 15kHz you will stop hearing any difference, simply because they contain only noise.

Opus simply eliminates all information between 12kHz and 24kHz to reserve the bits where they are most useful, in the lower bands (and at lower bitrates it cuts even more). This makes it look best when the content is voice-only, which is the prevailing audio content, but it's not the only one. It makes no sense to compare Exhale with Opus in this case, because Exhale only partially implements xHE-AAC, all patented parts are missing and they are for voice at low bitrates. It makes more sense to compare with the Fraunhofer IIS encoder, but the EZ CD implementation got worse to my credit, I advised against using some sample rates and in less than 24 hours the developer removed all except two of them.

If you encoding monophonic speech content and you step from 32kHz to 48kHz sampling the bitrate has to increase by 50%, with Opus it is not obvious because it deceives the user by removing the bands. For this reason I have indicated 36kbps as necessary (24kbps + 50%).

https://hydrogenaud.io/index.php?topic=120997.msg998434#msg998434

Again for the same reason I have provided a Wave (lossless from the start) file to use as a test. It also contains noise, it is an Italian TV commercial and contains 2 words capable of highlighting the limits of the encoding and a fade-out that show the limit of Opus. The ACELP part, like Opus, requires the human voice to be encoded up to 16kHz, using sampling frequency at 44.1kHz or even worse at 48kHz for voices (except Opus cheating) only serves to wreak havoc on the efficiency of the encoder, in especially with Exhale. In my opinion it is more interesting to observe how they behave just above 32kbps, where the quality changes completely for everyone, for example at 36kbps (use CBR because otherwise you will get much higher average bitrates).

Obviously, when the content is musical, higher sampling rates are required and for a monophonic source, without SBR, given to Exhale 12kbps more than the others, in summary it takes 48kbps to be sure of obtaining a high quality and it was created for this, it takes double the space that today it costs nothing as bandwidth and as space, otherwise you will have to settle for what's left. I can't hear the music encoded with these encoders at this bitrate, but neither can speech, for work I keep hours of political debates every day and I have not yet decided to use any of these new encoders. Guruboolez has perfectly understood that prolonged listening is difficult, I also add that with intramaurals it is also dangerous if at high volume and with Opus too.

Our brains don't find hissing annoying for no good reason. If you need stereo track, you can find it here:

http://celona.altervista.org/pelizzoni/Sky_In_treatment-s-48k.wav
http://celona.altervista.org/pelizzoni/Sky_In_treatment-s-44k.wav
http://celona.altervista.org/pelizzoni/Sky_In_treatment-s-32k.wav
http://celona.altervista.org/pelizzoni/Sky_In_treatment-s-24k.wav
http://celona.altervista.org/pelizzoni/Sky_In_treatment-s-16k.wav
http://celona.altervista.org/pelizzoni/Sky_In_treatment-s-12k.wav

Or mono here:
http://celona.altervista.org/pelizzoni/Sky_In_treatment-m-48k.wav
http://celona.altervista.org/pelizzoni/Sky_In_treatment-m-44k.wav
http://celona.altervista.org/pelizzoni/Sky_In_treatment-m-32k.wav
http://celona.altervista.org/pelizzoni/Sky_In_treatment-m-24k.wav
http://celona.altervista.org/pelizzoni/Sky_In_treatment-m-16k.wav
http://celona.altervista.org/pelizzoni/Sky_In_treatment-m-12k.wav

You will be able to verify by hand that with xHE-AAC as the bitrate decreases you have to reduce the high frequencies, as Opus does without your knowledge. Otherwise you will hear much worse.

Re: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music)

Reply #54
More proof that mp3 is a dead format.

Thanks,
guruboolez

MP3 still very popular format & is transparent for most at 192Kbps VBR. Universal support is a hard thing to shake off with 256GB memory cards many will just encode at V1 ~ 320kbps without much care about the newer codecs.
Got locked out on a password i didn't remember. :/

Re: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music)

Reply #55
Look, one thing is to move from 24kbps to 48kbps and another is to reach 192kbps. What you call MP3 is MPEG-1 Layer 3 and you can still use it only because all MPEG-2 NBC decoders (known as AAC) support the older standard.

The advantage of using standard formats is this and should be recognized for containers too.

I think it is useless to deny that MPEG-2 has brought significant improvements allowing to contain with AAC-LC at 64kbps for voice (3 times less than 192kbps) what today we could consider to use between 36 and 48kbps (from 4 to 5.3 times less of 192kbps).

USAC will do the same, with 1/4 of the bitrate and without the old tools of HE-AAC v1 and v2 (SBR first) which now only serve to make unacceptable compromises with quality (like MP3 Pro). My interest is not aimed at reducing the bitrate, but at improving the quality at the same bitrate.

In fact, these bitrates are not far from what can already be achieved with HE-AAC, a format of 18 years ago (2003). The descent of the bitrate can reach the synthesized voices, at which point I have the eyes to read.

Re: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music)

Reply #56
and you can still use it only because all MPEG-2 NBC decoders (known as AAC) support the older standard.

That statement has an easy-to-test consequence: "all MP3 players from the last couple of decades, play AAC".
True or false?

You would have had some kind of point about MP2; you had several players that would not accept the ".mp2" suffix - but, happily eat MP2 files when renamed to .mp3.  But people want to be able to play back the files they already have, so not supporting MP3 would remove quite a bit of your market potential. Well nowadays cell phones have relegated dedicated portable players to niche product, so the question is kinda moot anyway.

Re: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music)

Reply #57
I played around with the Fraunhofer's xHE-AAC encoder (EZ CD Converter), just with the two audiobook clips I used before.

To me, FhG does worse than 24k Opus at 12k and also at 16k. At 16k it maintains a good timbre, but has some ringing/resonant artifacts, very apparent for example in the second half of clip1.
At 20k, however, I'd give a slight edge to FhG, but it's also a matter of taste... FhG sounds a bit "sharper".
Opus at 12k is a mess, but we already know this. At 16k it's not as bad, though. Maybe you would even prefer it to FhG at 16k, if you find the ringing artifacts annoying.

Anyway, I don't want to make any big conclusions from this. It's just two samples at just a few bitrates. The more interesting fight for audiobooks is probably going to be at around 30-40k, where we're already reaching transparency, or at least no obvious artifacts.

Re: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music)

Reply #58
More proof that mp3 is a dead format.

Thanks,
guruboolez

MP3 still very popular format & is transparent for most at 192Kbps VBR. Universal support is a hard thing to shake off with 256GB memory cards many will just encode at V1 ~ 320kbps without much care about the newer codecs.
More proof that mp3 is a dead format.

Thanks,
guruboolez

MP3 still very popular format & is transparent for most at 192Kbps VBR. Universal support is a hard thing to shake off with 256GB memory cards many will just encode at V1 ~ 320kbps without much care about the newer codecs.

I agree with your thoughts and settings. I'd use something around V1 .
In a general sense, anything from V4 or more.

Re: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music)

Reply #59
At this bitrate its obvious that size over quality is favoured .
mp3 can be used similarly say around 96 k for stereo or less for mono.
Try -V5 --lowpass 12.5 -b96 , -b 96 --lowpass 12.5 , -b 48 -mm --lowpass 12.5 ,
-b32 -mm --lowpass 7

Re: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music)

Reply #60
I played around with the Fraunhofer's xHE-AAC encoder (EZ CD Converter), just with the two audiobook clips I used before ...

Anyway, I don't want to make any big conclusions from this. It's just two samples at just a few bitrates. The more interesting fight for audiobooks is probably going to be at around 30-40k, where we're already reaching transparency, or at least no obvious artifacts.

With the second clip the previous version of Exhale shows more difficulties. We have many variables in the field, different versions of the encoders, different contents, different bitrates, it is not easy to reach conclusions. If from the example I have proposed we can deduce the difficulty of Opus with the fade in and the fade out of single notes, with only spoken contents the judgment is completely overturned in favor of Opus. I think we need to start distinguishing the new encoders by their ability to behave like telephone encoders for low bitrate voice. We write Opus but we will get similar results with Silk and therefore I also tried the AMR-WB predecessor and it seems to behave better.

We can also add the two EBU tests proposed by Fraunhofer and the substance would not change much even with Opus, up to 32kbps only the encoding of Silk is used, which however is used for sounds up to 8kHz exclusively and up to 16kHz cooperatively even at higher bitrates.

https://www2.iis.fraunhofer.de/AAC/xhe-aac-compare-tab.html for checking EBU test, tracks 49 and 50 compressed by Fraunhofer xHE-AAC encoder.

We are comparing telephone and MPEG-1 encoders, it just can't stand up to any comparison due to its simplicity and lightness of MP3.

Uncompressed test file:
Clip 1 from Brand;
Clip 2 from Brand;
Female voice, Track 49 from EBU test (2008);
Male voice, Track 50 from EBU test (2008).

Compressed files at 24kbps:
AMR-WB - Clip 1 - Clip 2 - Track 49 - Track 50;
MPEG-1 (MP3) - Clip 1 - Clip 2 - Track 49 - Track 50;
Opus - Clip 1 - Clip 2 - Track 49 - Track 50;
HE-AAC - Clip 1 - Clip 2 - Track 49 - Track 50;
xHE-AAC - Clip 1 - Clip 2 - Track 49 - Track 50;
Exhale with SBR - Clip 1 - Clip 2 - Track 49 - Track 50;
Exhale without SBR - Clip 1 - Clip 2 - Track 49 - Track 50.

I tried to create files in Opus without Silk encoding, to get more information about it, but I'm not sure if this is actually the case, because the encoder used has evolved over the years.

Opus without Silk in CAF container - Clip 1 - Clip 2 - Track 49 - Track 50;
Opus without Silk decoded in Flac - Clip 1 - Clip 2 - Track 49 - Track 50;
Opus in ISO/IEC base media file format container - Clip 1 - Clip 2 - Track 49 - Track 50.

Re: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music)

Reply #61
And finally I tried to slip AMR-WB with which the new standard maintains compatibility in the ISO container. To play it on the iPhone is a bit complicated, it must be sent via Airdrop from another Apple product and opened with the Voice Notes app. It works.

AMR-WB in ISO/IEC base media file format container - Clip 1 - Clip 2 - Track 49 - Track 50.

Obviously ACELP is only effective with the human voice, so it is a full-fledged telephone encoder, while Opus is a generalist encoder that uses two encoders to get better results than the same when not cooperating. Fraunhofer xHE-AAC does the same while all other encoders belong to the previous generation or implement only part of the standard, such as Exhale.

However, when the content is musical or hybrid, the judgment is reversed against Opus and even Exhale, but only at higher bitrates, it manages to obtain better results (see https://hydrogenaud.io/index.php?topic=120997.msg998446#msg998446 for previous test file). Opus cannot be transparent below 32kbps because it does not use high quality encoding under these conditions. But not even the other encoders achieve sufficient results at such low bit rates.

I do not face the compatibility issue because it is complicated but in summary AAC has no equal among the most recent.

Re: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music)

Reply #62
At this bitrate its obvious that size over quality is favoured .
Yes, it's obvious :) 12, 24 and 32 kbps are extreme bitrates, especially for music encoding.

Quote
mp3 can be used similarly say around 96 k for stereo or less for mono.
Try -V5 --lowpass 12.5 -b96 , -b 96 --lowpass 12.5 , -b 48 -mm --lowpass 12.5 ,
-b32 -mm --lowpass 7
Yes, quality could be similar but efficiency is way below. The point of this test was to see how efficient new formats are at bitrate that were formerly known as unusable for music encoding :)

Re: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music)

Reply #63
Opus at 12k is a mess, but we already know this. At 16k it's not as bad, though.

Yeah. that's why I tend to use 13kbps as a absolute bare minimum with Opus v1.3(or higher) for speech as I feel at 13kbps the sound quality is 'just' good enough as it's a solid option for those who want to keep file size as low as possible while maintaining a sound quality that's not TOO low. but if someone wants to play it a bit safer on speech sound quality then they will obviously have to increase bit rates a fair amount.

but at the same time... I don't know about everyone else but with speech I am not nearly as concerned with keeping it near transparent like I would be with music. hence, I can say it's 'good enough' at a lower point than I would with music as with music I tend to go a little higher instead of trying to run it on THE edge like I did with speech at 13kbps. but speaking of this, I might say the following with Opus in regards to speech and music...

-Opus (speech) = 13kbps
-Opus (music) = 40kbps or 48kbps

those are in regards to what I would probably consider a bare minimum with each (although I am playing it just a touch safer (i.e. higher kbps) with music though). I know opinions will vary on this stuff, but I am trying to roughly take a bare minimum approach to use with both but I played it a bit safer on the music side of things, since like I was saying, I tend to be a bit pickier on the sound quality of music than I would be on speech. so I don't quite think my 40kbps/48kbps for music is on THE edge like I was with the 13kbps speech suggestion, but it's probably close enough given I don't want to run sound quality where things start to become more obviously worse.

p.s. but like my signature currently shows, with Opus I tend to avoid any less than 64kbps for music as I feel 64kbps is a pretty strong balance of those trying to keep file size at near a bare minimum while still maintaining a sound quality in the ball park of MP3 @ V5 (130kbps average) which does well enough in a public listening test.

In a general sense, anything from V4 or more.

I tend to see V4 (LAME MP3) to be more of a odd-ball/kind-of-useless setting because I think one can see MP3 in general more along the lines of use 'V5' or 'V3 or better'. because V4 is only 10kbps lower than V3 and according to the hydrogenaudio wiki page, V3 is the start point of the highest quality settings. so I figure if someone is going to use a lower bit rate than V3 on MP3, they are probably best off sticking with V5 (130kbps average) and forget about it as V5 does pretty well in public listening test and is efficient with bitrate. so at least with V5, besides doing well in public listening test, it also has a decent decrease in bit rates to unlike V4 which is only 10kbps shy of the higher quality settings. so V4 seems like a pointless setting to use if you ask me given bit rate to quality given how quality and bitrate scales from 130kbps(V5) to 165kbps(V4) to 175kbps(V3) to 190kbps(V2) to V1(225kbps) to V (245kbps).

so I think, at least in my mind, when it comes to MP3 it's pretty much V5 (130kbps) or V2 (190kbps) and forget about it as these two options basically should cover a high percentage of people in my opinion. like those who prefer more efficient bit rate (i.e. V5) and those who want higher quality sound but want some level of efficiency (i.e. V2) as beyond V2 efficiency pretty much goes out the window and sound quality gains got to be minimal/negligible in real world use.

but I can't directly fault you for saying "V4 or more" since basically anything from "V5 or more" is good enough in a very basic sense (hell, some might be able to get away with settings lower than V5). although I would probably say for those who prefer the higher bit rates or so, given only the 10kbps difference between V4 to V3 and given what the Hydrogenaudio wiki page says, seems like those types would think more along the lines of "V3 or more". hell, I suppose one could argue that since the difference between V3 to V2 is only 15kbps more and could give a bit of a safety buffer one could use that etc.

but with all of that said... I know storage space is cheap and all nowadays, so what I said above probably ain't going to matter to most people anymore since one could argue it won't really matter much whether someone uses V5 or all the way to the MAX of 320kbps CBR. but it's more of the thought of it for efficiency sake (for us OCD types around here) ;) ; one last little thing... I guess even with the storage space to burn argument taken into account, a more efficient file size would still be a bit wiser like in a situation one were to upload/backup a bunch of their stuff online to where storage space to burn would be less likely, or someone were to store a good amount of music on their smart phone since many are probably in the 8GB or 16GB of internal memory range etc.
For music I suggest (using Foobar2000)... MP3 (LAME) @ V5 (130kbps). NOTE: using on AGPTEK-U3 as of Mar 18th 2021. I use 'fatsort' (on Linux) so MP3's are listed in proper order on AGPTEK-U3.

Re: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music)

Reply #64
Excellent thread.  To me, a classic until Opus gets better in the 24 kps and under, which is the only bitrate I will accept for Internet audio streams that use my phone battery and limited gb plan.

I also listen to too many hours annually of audio books to accept files greater than 24 kps bitrate.  (It amazes me that people don't grasp the potential of this bitrate.)

I am wondering if Opus plans to improve its sub 24 kps performance. And, if Exhale will fix its voice for low bitrate for xHE-AAC encoding.

I am currently using ab-cable audio driver and ffmpeg.  I edit the artist, album, record duration variables. Click the bat file to record voice to listen to while working.  I am recording voice opus files, trying to target 21 kps. I might start recording to 64 kps opus then use ez cd audio converter to convert the 20 hour files to 18 kps xHE-AAC. 

I have noticed a grain noise on peaks with Opus at 21 kps, but wasn't sure if I was hallucinating or had a bad source, or was a speaker issue.  I normally use a 20 hour mono phone bluetooth to listen to the voice files, and probably wouldn't be able to hear the grain.  But, if I record something really good, I will want to share it with my brother who is in the broadcasting industry and is easily turned off by such things on his thousand dollar speakers.

As excited as I am by the huge potential of xHE-AAC and Opus at sub 24 kps for streams and voice, I will probably stick to 192 kps lame mp3 for long term music that I will be listening to years from now. Because, there are cars that support mp3 , but not a whole lot more.  I have never not enjoyed a good song on a good 192 kps vbr warm lame mp3 file, although I don't find it transparent in the least.

Re: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music)

Reply #65
Even at these low bitrates, we are still talking several hundred of megs per audio book.

Re: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music)

Reply #66
A note for the Original Post, I totally agree that 13 kps or 14 kps should have been tested. 

I just never was able to get any book listenable at 12 kps using lame 3.97 and some extraordinary switches and tricks.   However, I was able to get very listenable vbrs at a range of 14 kps to up to 23 kps, letting lame choose, depending on the speed of the speaker and voice tone. ...  Thus, I cannot imagine any codec going below 14 kps, because the extra 2 kps made all the difference in the world on the quality over a 12 kps file audio book.

It is because of my lame audio book days that I would be skidish going below 17 kps, which was the bitrate if you averaged all the books together that I have encoded and listened to since 2009.  

I have just switched to opus, but it sounds like xHE-AAC is a wiser choice for sound quality in the sub 24 kps range, future compatibility, and battery, since while opus uses less cpu, the phone will use less battery transcoding the opus to xHE-AAC to send to the bluetooth.

Opus is just less of a pain to directly record to than FhG xHE-AAC.  Hypothetically, I could buy the ffmpeg plugin from MainConcept.com for like $79. But as a hobby, $29 is probably my spending limit on such a luxury.

The other thing to mention is that ffmpeg allows a lowpass switch for Opus, and I really could never hear, in the human voice, much to write home about above something like 10.3khz, if I recall. I also had lots of spectral tools back then, since I did lp restoration semi professionally back then.  Opus wisely lowpasses at 12k for voice, auto detection (which would work for non acted out stuff or mixed content) at the low bitrates.  But perhaps the opus 12k lowpass isn't aggressive enough, and we should try to add the ffmpeg opus switch at a 10.3 khz lowpass, in order to make room to eliminate the grain, etc. ?  (With lame, I chose 7.8khz low pass, which worked fine for a single earbud and an audio book. 10.3khz low pass is what I used for a 44 kps targeted mono mp3, as I recall when programing my scripts.)