Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: exhale - Open Source USAC encoder (Read 370936 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

exhale - Open Source USAC encoder

EDIT: PLEASE READ THE FIRST 10 POSTS OF THIS THREAD FIRST BEFORE POSTING A QUESTION OR BUG REPORT!

https://gitlab.com/ecodis/exhale

https://gitlab.com/ecodis/exhale/-/wikis/faq

Quote
exhale, which is an acronym for "Ecodis eXtended High-efficiency And
Low-complexity Encoder", is a lightweight library and application to
encode uncompressed WAVE-format audio files into MPEG-4-format files
complying with the ISO/IEC 23003-3 (MPEG-D) Unified Speech and Audio
Coding (USAC, also known as Extended High-Efficiency AAC) standard.
exhale currently makes use of all frequency-domain (FD) coding tools
in the scalefactor based MDCT processing path, except for predictive
joint stereo, which is still being integrated. Its objective is high
quality mono, stereo, and multichannel coding at medium and high bit
rates, so the lower-rate USAC coding tools (ACELP, TCX, Enhanced SBR
and MPEG Surround with Unified Stereo coding) won't be integrated.

Looks promising. 

Re: exhale - Open Source USAC encoder

Reply #1
Nice to finally see an open-sourced USAC encoder.

Two trivial patches were needed to build exhale on Linux (patches attached).
I could play encoded USAC file with a Samsung Galaxy Tab that runs Android 9 Pie.
Since ffmpeg's native AAC decoder doesn't support USAC yet, I guess almost nothing can play it other than what is based on FDKv2 decoder.
You can still build libfdk-aac enabled ffmpeg on your own, though.

It seems that by defining RESTRICT_TO_AAC=1 you may build exhale as a plain AAC encoder,  I haven't try this.

As is written in README, exhale doesn't target low bitrate ranges. VBR mode 1(lowest) result in like 64kbps.

Re: exhale - Open Source USAC encoder

Reply #2
Nice to finally see an open-sourced USAC encoder.
Indeed

As is written in README, exhale doesn't target low bitrate ranges. VBR mode 1(lowest) result in like 64kbps.
Instead of wasting time on a high number  of low-bitrate tools (those rates aren't a target nowdays)  it's a wise decision to concentrate on  high and middle rates those actually count.

Let's hope that ffmpeg will support xHE decoding soonish.

Re: exhale - Open Source USAC encoder

Reply #3
Quote
...at medium and high bit
rates, so the lower-rate USAC coding tools (ACELP, TCX, Enhanced SBR
and MPEG Surround with Unified Stereo coding) won't be integrated.
What is considered medium bit-rates? I'm more interested in the rates below 64Kbps where Opus starts struggling.

Re: exhale - Open Source USAC encoder

Reply #4
I still think that low bit rates are, will and should be a target.
You see through a quite narrow lens which is like personal use, which can still be argued that people still care about really low bitrates hence low bitrates are, will and should be a target, but on a wider lens like big companies, radio stations, streaming services and the list goes on, care alot on reducing their bandwith and storage footprint because on a larger scale the slightest saving you can do has a massive impact on your running costs and profit.
So unless something extremelly revolutional happens in the computing world, focus on lowering bitrates on any type of media content is always the best interest of any healthy and self respecting project.
So in the case of xHE-AAC the whole idea of it's existance is targeting bitrates where other existing codecs are struggling, and 64Kbps+ bitrate ranges are not one of them. Not that there is no room of improvement but it's definitely not a "struggling range" per se. On the contrary 32-64 or maybe 16-64 sounds like a better target.

Re: exhale - Open Source USAC encoder

Reply #5
Nice to finally see an open-sourced USAC encoder.
Indeed

As is written in README, exhale doesn't target low bitrate ranges. VBR mode 1(lowest) result in like 64kbps.
Instead of wasting time on a high number  of low-bitrate tools (those rates aren't a target nowdays)  it's a wise decision to concentrate on  high and middle rates those actually count.

Let's hope that ffmpeg will support xHE decoding soonish.

  I agree.  Below 100k lossy becomes complex in decoding, stereo imaging, too aggressive psychoacoustics..  (audible or not  - i don't want it)

I will go further that at home with unmetered connections lossless audio should be used (local and streaming) . I would like to see more in this area and even mid-high bitrate lossy (for streaming metered connections) . I like to see people with a quality 1st approach.  IMO this very low bitrate like he-aac opus should not be touched except for very specific cases where compression 1st might be acceptable - voice / news / podcast  etc..

Re: exhale - Open Source USAC encoder

Reply #6
Instead of wasting time on a high number  of low-bitrate tools (those rates aren't a target nowdays)  it's a wise decision to concentrate on  high and middle rates those actually count.
Considering all the labors to implement low-bitrate tools, I agree that it's "wise" to avoid them.
Although it was a bit of surprise because  USAC is considered to be superior than AAC mainly in low-bitrate area.

Personally, I'm not interested in speech codec part so much.
However, MPEG surround, that's a diffrent story... It's not only for low-bitate, and it can drastically reduce bitrate.
I'd like to try it as an alternative to Dolby Prologic II or something (although it doesn't necessarily be as part of USAC)

Re: exhale - Open Source USAC encoder

Reply #7
But why bother with mid to high bitrate? Isn’t MP3 already perfect for that? Also the patents have expired.

Also, don’t get me started on the travesty of surround mixes.

Re: exhale - Open Source USAC encoder

Reply #8
But why bother with mid to high bitrate? Isn’t MP3 already perfect for that? Also the patents have expired.

Also, don’t get me started on the travesty of surround mixes.
Exactly.

Re: exhale - Open Source USAC encoder

Reply #9
Thanks, all, for your comments on exhale and my work! For the record, I pushed a fix to the current release today (commit 7135623a) which recovers the encoding speed of version 1.0.0 but keeps the slightly improved segmental SNR of 1.0.1 (the encoder is still pretty slow, though). That version also contains a fix for a minor issue (see https://gitlab.com/ecodis/exhale/issues/2), plus you don't have to apply nu774's patches anymore.

It seems people here associate very different value ranges with the terms "medium" and "high" bit-rate. What I meant in exhale's Read-Me is, at least for stereo, the following. I consider

- high bit-rates those at and above the sweetspot of (1990s) MPEG-2 AAC-LC, i.e., 128 kbit/s stereo
- medium bit-rates those at and above the sweetspot of (2000s) HE-AAC, i.e., 64 kbit/s stereo, up to 128 kbit/s stereo
- low bit-rates those at and above the sweetspot of the more recent (2010s) codecs, i.e., 32 kbit/s stereo or lower, up to 64 kbit/s stereo.

Also, I initially intended exhale for file-based personal encoding - including for my own use - and as a proof-of-concept and personal challenge :) But of course, its primary purpose, beside file-based storage, is medium-rate streaming, e.g., Web radio. So yes, xHE-AAC targets the low and very low bit-rates but, as any codec should, it also performs pretty well at high bit-rates.

Quote
Considering all the labors to implement low-bitrate tools, ...

Indeed, this is the major reason why exhale starts only at 64 kbit/s stereo. xHE-AAC's advantage over its ancestors is much more obvious at rates lower than that. Adding the algorithms necessary for such low-rate coding would roughly triple the amount of source code and, possibly, work-hours (I like to write my source code from scratch and work on exhale in my free-time), and I won't be able to manage that. So I decided to leave the low rates to commercial encoders.

Quote
It seems that by defining RESTRICT_TO_AAC=1 you may build exhale as a plain AAC encoder.

No, the idea here was only to disable some specific coding tool extensions which didn't exist in (HE-)AAC. Generating AAC files with exhale is not possible because exhale only implements USAC's entropy coding, noise substitution, and TNS variant, not those of AAC.

Quote
But why bother with mid to high bitrate? Isn’t MP3 already perfect for that?

exhale's medium-rate modes (1-4 for my above definition of "medium") cover a bit-rate range on which I wouldn't consider MP3 "perfect". See also Kamedo2's personal test here: https://hydrogenaud.io/index.php?topic=117489.0
 

Chris
If I don't reply to your reply, it means I agree with you.

Re: exhale - Open Source USAC encoder

Reply #10
Sorry. I was mistaken on your intentions. I should have actually acquired your source code and put it to the test, since I am capable of doing things like that.

I wrongly assumed the medium (128-192, wrong) to high (224+, also wrong) bitrates from current popular codecs, and did not even consider the much lower scale you were actually aiming for. So, by all means, go forth and advance the world in ways I hadn't considered.

(For the record, I already consider Opus's 32-48 "low" and unusable for much more than voice or mixed media VOIP, 64-80 "medium" and approaching usable (since I have a hard time discerning it in casual listening without encoding difficult samples and outright looking for errors), and 128+ "high" and probably transparent for most of my use cases, so I don't know where the hell I got off expecting your descriptions to be applied to common MP3.)

E: Okay, I've tested encoding at least, and see that it requires a preset of 3 or higher for 44100Hz material, and won't drop to 2 without a downsample to 32 kHz or lower. I'll test those two presets out anyway, just to see how they handle this one test file I'm going to throw at it, just for casual listening. I'll be using libfdk-aac to decode it.

Re: exhale - Open Source USAC encoder

Reply #11
Quick bump for a new issue: I tested this on a single track album, Secret of Mana +, which I have in lossless. I used the 3 preset, and the resulting USAC/xHE-AAC file decodes with a much lower amplitude than the original file.

Original FLAC:
Track Gain: -4.72 dB
Track Peak: 0.999969

FDK-AAC v2 decoded USAC:
Track Gain: +2.14 dB
Track Peak: 0.459747

Meanwhile, a different track, Level 3 from the cut down Intel demo of Rebel Moon Rising:

Original:
Track Gain: -5.32 dB
Track Peak: 0.93689

USAC decoded by FDK-AAC v2:
Track Gain: -5.31 dB
Track Peak: 0.891266

I can supply FLACs of both privately if you need them.

Re: exhale - Open Source USAC encoder

Reply #12
Quick bump for a new issue: I tested this on a single track album, Secret of Mana +, which I have in lossless. I used the 3 preset, and the resulting USAC/xHE-AAC file decodes with a much lower amplitude than the original file.
...
USAC decoded by FDK-AAC v2:
I don't know if this addresses the issue, but exhale writes its own track loudness metadata (MPEG-D Dynamic Range Control, DRC loudnessInfo), which are an integral part of xHE-AAC, to the encoded files. For the record, I use the MPEG reference software decoder for USAC to decode exhale's bit-streams, and there you need to use the command-line

"-if in.m4a -of out.wav -targetLoudnessLevel L",

where L is the "mobile loudness" value reported by the exhale application when encoding of in.m4a has finished. Does your FDK-AAC v2 compile have a similar functionality?

If that doesn't solve the issue, let's try to fix this privately. On a different topic, which I missed commenting on in my last reply:

... at home with unmetered connections lossless audio should be used (local and streaming) . I would like to see more in this area and even mid-high bitrate lossy (for streaming metered connections).
Given the different understandings of "mid" and "high", what does it mean to you? Do you consider exhale's CVBR mode 9 (roughly 200 kbit/s for stereo, half that for mono) sufficient or are you looking for even higher bit-rates in the lossy case?

Chris
If I don't reply to your reply, it means I agree with you.

Re: exhale - Open Source USAC encoder

Reply #13
  I agree.  Below 100k lossy becomes complex in decoding, stereo imaging, too aggressive psychoacoustics..  (audible or not  - i don't want it)
I don't have numbers for xHE decoding but HE-AAC and Opus decoding was optimized a long time ago.

Today even old smartphones will play HE-AAC/Opus as efficiently (battery life speaking) as uncompressed .wav 44.1k/16 or any other lossy format like  MP3, LC-AAC,  Vorbis or Musepack.

Audio decoders  consume much less CPU than phones's audio DAC+AMP and  Andoid OS in idle state.



Re: exhale - Open Source USAC encoder

Reply #14
I tested FDK-AAC v2's USAC decoding, which does indeed include a submodule, "libDRCdec", which appears to handle the volume leveling for it. It appears to default to a target level of -24 dBFS, which is close to our -23 LUFS R128 level.

As for performance numbers, on my Ryzen 7 2700, a single decode thread of preset 3 USAC appears to use ~1% of a single core, or on Windows, it would be less than 1% of the total CPU resources. This probably scales somewhat higher on mobile processors, but it appears Fraunhofer has this somewhat taken care of already.

E: Hmm, is this thing tuned for gapless encoding yet? I tested a couple of gapless transition files, and they failed marvelously.

Re: exhale - Open Source USAC encoder

Reply #15
E: Hmm, is this thing tuned for gapless encoding yet? I tested a couple of gapless transition files, and they failed marvelously.
exhale definitely should allow for gapless decoding. With the MPEG reference software decoder (see my previous post) I get perfectly gapless decodings of the exhale .m4a files. Maybe the FDK-AAC v2 decoder requires more info in the MPEG-4 file header? I'll check.

Chris
If I don't reply to your reply, it means I agree with you.

Re: exhale - Open Source USAC encoder

Reply #16
Maybe the FDK-AAC v2 decoder requires more info in the MPEG-4 file header? I'll check.


Re: exhale - Open Source USAC encoder

Reply #17
foobar2000 only supports two gapless info methods. First added was the method devised by Nero’s encoder, Nero chapters. The second method added was iTunSMPB, since Apple’s encoder doesn’t write the Nero metadata. I’ll see if I can get Peter to add this edit list method. It would be helpful if you link to the line(s) of code where you write this information, if only for my personal perusal.

I’ll link to my evil decoder here later. I just recently disabled Dynamic Range Compression in my decoder, as it’s results can cause unpredictable effects when comparing input to output, especially huge volume differences. It needs a bit of work though, and it would be nice if foobar supported edit lists as gapless information first.

Re: exhale - Open Source USAC encoder

Reply #18
Document: https://developer.apple.com/library/archive/documentation/QuickTime/QTFF/QTFFAppenG/QTFFAppenG.html
My implementation (for decoding): https://github.com/nu774/qaac/blob/master/input/MP4Source.cpp

You'd better consult with ISO document if possible, though.

The following will output gapless information in elst box:
qaac --gapless-mode=1
fdkaac --gapless-mode=1

Edit list is beter than iTuneSMPB/Nero chapters since this is per-track information, not to mention that this is the ISO standard way.
Both of iTunSMPB/Nero chapters are file-global metadata, so it's only suitable for single track audio files.

However, edit list has it's own problem:
  • Difficult to be fully supported.
  • Uses two different timescale. media_time is at media's timescale. segment_duration is at movie's timescale. When movie's timescale is not precise enough, we cannot achieve sample accuracy. To achieve sample accuracy movie's timescale must be equal to or greater than media's timescale.


As far as I know, typical video-oriented applications (including ffmpeg) are only concerned with the following two cases to handle A/V sync, and doesn't trim end padding:
  • media_time > 0: treat media_time as negative audio delay
  • media_time == -1: treat segment_duration as positive audio delay

And almost nothing can correctly handle multiple edit lists. As the name implies, it was originally used by QuickTime to support editing video files at arbitrary points (not limited to GOP boundaries).
Of course actual video stream can only be cut at GOP boundaries (otherwise delta frames cannot be properly decoded), so video frames need trimming after decode to allow edits in arbitrary points.
Edit list was used for that purpose.

Re: exhale - Open Source USAC encoder

Reply #19
I see there are two combined fields that are relevant, or combined may be consulted to make sure the file is sane. Both the Edit List and the Time to Sample (STTS) box contain valid data, but isn't necessary for this.

I have verified the files I wanted to test play back gaplessly on macOS, using my own macOS player, Cog. This requires a version of macOS which supports USAC, and is not limited to just Cog, but any player which decodes MP4 files with the system codecs.

I'll try to get MP4 streaming working eventually, but Apple kind of broke that for me by not offering a custom reader Audio Toolbox interface that also supports arbitrary streams, not just seekable and perfectly error free static files.

Re: exhale - Open Source USAC encoder

Reply #20
Thanks a lot, nu774, for the info and the help!

I’ll see if I can get Peter to add this edit list method. ... it would be nice if foobar supported edit lists as gapless information first.
Thanks, that would be awesome!
Quote from: kode54
It would be helpful if you link to the line(s) of code where you write this information, if only for my personal perusal. ... Both the Edit List and the Time to Sample (STTS) box contain valid data...
Search for the comment elst in app/basicMP4Writer.cpp. Two 4-byte values are written to the edit list box:
  • number of samples per channel, which is actualLength from exhaleApp.cpp
  • pre-gap size, which with exhale is frameLength * 25/16 from exhaleApp.cpp

The STTS stuff is a bit harder to find (forgot to add an "stts" comment), search for comment 2 entries used in basicMP4Writer.cpp.

Chris
If I don't reply to your reply, it means I agree with you.

Re: exhale - Open Source USAC encoder

Reply #21
I already found them all, also by examining a file I encoded, locating the offsets, and searching for them in your code. It also helps that I was comparing against Apple's documentation of the format. I outlined the changes to him, but I'm not sure if he's noticed yet.

Re: exhale - Open Source USAC encoder

Reply #22
Bump: Curses, foiled again. No WAVEFORMATEXTENSIBLE support. Have fun supporting that!

Edit: Curses, foiled twice. FDK-AAC doesn't support decoding USAC channel configurations other than mono and stereo.

Re: exhale - Open Source USAC encoder

Reply #23
FDK-AAC doesn't support decoding USAC channel configurations other than mono and stereo.
Does it specifically disallow decoding of more than two channels, or does it only allow decoding of only one channel-type element (SCE, CPE)? I'm asking since the exhale library also supports dual-mono encoding of a stereo signal using two SCEs (Edit: channel configuration 8, if I'm not mistaken).

And one more question: how does foobar2000 calculate the mean bit-rate across an MPEG-4 file? In exhale I calculate it as (file size in bytes - static header part in bytes) * 8 / (actual input audio length in seconds), which is also written to the MPEG-4 file header, but Windows Explorer e.g. reports very different values.

Chris
If I don't reply to your reply, it means I agree with you.