Skip to main content
Topic: exhale - Open Source xHE-AAC encoder (Read 29885 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Re: exhale - Open Source xHE-AAC encoder

Reply #325
mint 20
mpv 0.32.0 Copyright © 2000-2020 mpv/MPlayer/mplayer2 projects
 built on UNKNOWN
ffmpeg library versions:
   libavutil       56.31.100
   libavcodec      58.54.100
   libavformat     58.29.100
   libswscale      5.5.100
   libavfilter     7.57.100
   libswresample   3.5.100
ffmpeg version: 4.2.4-1ubuntu0.1
without "libfdk_aac (aac) - Fraunhofer FDK AAC"

Re: exhale - Open Source xHE-AAC encoder

Reply #326
@m14u It has to be enabled in ffmpeg/libav at build time.

Code: [Select]
% ffmpeg -decoders 2>/dev/null | rg fdk
A....D libfdk_aac           Fraunhofer FDK AAC (codec aac)

ffplay/ffmpeg work too of course. But the decoder must be set explicitly by passing -codec:a libfdk_aac (before -i in the case of ffmpeg to apply it to input).

mpv just works because it falls-through from one decoder to another.


Re: exhale - Open Source xHE-AAC encoder

Reply #328
@m14u It's one --enable-libfdk-aac away from being supported is the point. No native ffmpeg decoder strictly needed. And the glue code to use libfdk as a decoder was already written.

Ubuntu not enabling support in ffmpeg despite libfdk being already available in their repositories is their problem. And it's an easily fixable one.

Re: exhale - Open Source xHE-AAC encoder

Reply #329
Ubuntu not enabling support in ffmpeg despite libfdk being already available in their repositories is their problem. And it's an easily fixable one.
No, it's not.
ffmpeg requires --enable-nonfree for --enable-libfdk-aac, which means that fdk-aac enabled ffmpeg binary is not redistributable.

Re: exhale - Open Source xHE-AAC encoder

Reply #330
ffmpeg requires --enable-nonfree for --enable-libfdk-aac, which means that fdk-aac enabled ffmpeg binary is not redistributable.

ugh, fair enough.

Code: [Select]
EXTERNAL_LIBRARY_NONFREE_LIST="
decklink
libfdk_aac
openssl
libtls
"

I remember the list of libraries being larger (and Arch not caring). But things have changed/improved apparently. I may try to convince Arch to not care again.

As for Mint, deb-multimedia has it enabled. Or maybe some ppa can be used if deb-multimedia is not compatible with Ubuntu/Mint.


Re: exhale - Open Source xHE-AAC encoder

Reply #332
ffmpeg requires --enable-nonfree for --enable-libfdk-aac, which means that fdk-aac enabled ffmpeg binary is not redistributable.

Same goes for Arch Linux, which distributes fdk-aac by itself, but does not distribute non-free FFmpeg. You have to do something like build the ABS package modified to enable it, or build the FFmpeg-everything package from AUR, which enables bloody everything, including installing the >1.5GB CUDA package to enable CUDA support.

Re: exhale - Open Source xHE-AAC encoder

Reply #333
How difficult will it be to create a built-in decoder for ffmpeg?

Re: exhale - Open Source xHE-AAC encoder

Reply #334
exhale version 1.0.6 (identical to John's RC2 above) has been released today, see https://gitlab.com/ecodis/exhale/-/releases
Compared with RC1, this version fixed two more rarely occurring bugs. Thanks to guruboolez for reporting one of them and to m14u for testing.

Changes since exhale 1.0.5 from last month:
- bugfixes, improved quality on some transient signals, better decoder compatibility
- exhaleApp: support for Extensible WAVE format, write MP4 «prol» data (issue #10)
- exhaleApp: automatic downsampling of 48-kHz input to 32 kHz for CVBR mode 1
- exhaleLib: fine-tuning of psychoacoustic model for difficult transient input signals

@kode54, Peter: What do you think about this? https://gitlab.com/ecodis/exhale/-/issues/12

Chris
If I don't reply to your reply, it means I agree with you.

Re: exhale - Open Source xHE-AAC encoder

Reply #335
Intel compiles of exhale-v.1.0.6-c7299609 now at Rarewares. :)

Re: exhale - Open Source xHE-AAC encoder

Reply #336
https://gitlab.com/ecodis/exhale/-/issues/10 was an interesting read.

So, if I understand correctly, USAC decoder is only able to start decoding from an independent AU where usacIndependencyFlag is set, and current default of exhale lib is once per 45 frames (but it can be arbitrary), right?

Is that mean, USAC decoder has to seek back (preroll) unpredictable number of frames on seek in order to find an independent AU where it can start decoding without the help of preroll sample group added by the patch?

Re: exhale - Open Source xHE-AAC encoder

Reply #337
So, if I understand correctly, USAC decoder is only able to start decoding from an independent AU where usacIndependencyFlag is set, and current default of exhale lib is once per 45 frames (but it can be arbitrary), right?
All correct. As a rule of thumb, there should be a tune-in point every second or so and, if possible, it should be synchronized with any video tune-in frames which may accompany the audio stream. With 50-Hz video, you usually have random-access points every 48 frames, and with an audio sampling rate of 48000 Hz this gives you a preferred period of (48 * 48000) / (50 * 1024) = 45 audio access units (AUs). For 60-Hz video (random-access point every 64 frames) it would be 50 AUs.

Quote
Is that mean, USAC decoder has to seek back (preroll) unpredictable number of frames on seek in order to find an independent AU where it can start decoding without the help of preroll sample group added by the patch?
If I understand the author of that issue correctly, no, but something similarly bad: the decoder would have to search through all AUs before or at the current seek point (start offsets of each AU are stored in the MPEG-4 file header, so that's still quite cheap) and find the closest one which is independently decodable (which may not be cheap!). The solution was to add another box to the MPEG-4 file header containing the period between successive tune-in AUs, allowing the decoder to simply pick out the independently decodable AU closest in time to the current seek point.

Chris
If I don't reply to your reply, it means I agree with you.


Re: exhale - Open Source xHE-AAC encoder

Reply #339
If I understand the author of that issue correctly, no, but something similarly bad: the decoder would have to search through all AUs before or at the current seek point (start offsets of each AU are stored in the MPEG-4 file header, so that's still quite cheap) and find the closest one which is independently decodable (which may not be cheap!). The solution was to add another box to the MPEG-4 file header containing the period between successive tune-in AUs, allowing the decoder to simply pick out the independently decodable AU closest in time to the current seek point.
Thanks for clearing this up.
For me it looks like conceptually similar to key frames of video, and I wonder if it's also possible to simply use good old Sync Sample Box (stss) to indicate them.

I'm also curious how fb2k decoder plugin implements the seek.

Re: exhale - Open Source xHE-AAC encoder

Reply #340
foobar2000 packet decoders implement seek for their callers by reporting the maximum dependency in floating point seconds, and in frame counts. The frontend, be it MP4, ADTS, ADIF, Matroska/WebM, etc, will seek backwards by this amount, ask the decoder to start decoding, then discard the dependency time from the start of the samples. Not terribly flexible, and definitely doesn't support keyframes to my knowledge.

 

Re: exhale - Open Source xHE-AAC encoder

Reply #341
foobar2000 packet decoders implement seek for their callers by reporting the maximum dependency in floating point seconds, and in frame counts. The frontend, be it MP4, ADTS, ADIF, Matroska/WebM, etc, will seek backwards by this amount, ask the decoder to start decoding, then discard the dependency time from the start of the samples. Not terribly flexible, and definitely doesn't support keyframes to my knowledge.
Yeah, it should have been enough for traditional audio codecs. For proper USAC support (seek/cut edit), more works on container level seem to be required.

Strictly speaking, SBR/PS decoder also cannot start decoding without seeing SBR/PS header which comes only occasionally (somewhat similar to the situation of USAC indep flag), but in AAC case, LC part can be decoded anyway, so I believe this has been neglected.

Re: exhale - Open Source xHE-AAC encoder

Reply #342
For me it looks like conceptually similar to key frames of video, and I wonder if it's also possible to simply use good old Sync Sample Box (stss) to indicate them.
Yes, those are basically key frames. Don't know about the 'stss' part, the only use of that box that I know of (the one used by the exhale application) is to signal to the decoder, by means of entryCount = 0, that not every sample is a sync sample.

Chris
If I don't reply to your reply, it means I agree with you.

Re: exhale - Open Source xHE-AAC encoder

Reply #343
So wait, we basically have to just feed every packet to the decoder and hope it outputs something, or else just expand our search to double the number of packets and keep searching for keyframes?

Re: exhale - Open Source xHE-AAC encoder

Reply #344
Yes, those are basically key frames. Don't know about the 'stss' part, the only use of that box that I know of (the one used by the exhale application) is to signal to the decoder, by means of entryCount = 0, that not every sample is a sync sample.
stss(SyncSampleBox) is a box defined by ISOBMFF. It contains a list of every "sync sample" (= key frame) for a track.
It's quite common for video tracks and I believe it's also suitable for indicating independent frames of usac... but I'm not perfectly sure.

Formally, "key frame" or "random access point" is defined as "Stream Access Point" in Annex I of ISO/IEC 14496-12.
They define six types of stream access point. Two simplest case of stream access point are considered as random access point and  can be indicated by SyncSampleBox.
Apparently these types are considered video codecs in mind (Closed GOP, Open GOP, gradual decoding refresh or something) but it's not restricted for video.




Re: exhale - Open Source xHE-AAC encoder

Reply #345
According to this doc, A/342 Part 3: MPEG-H System seems to use stss to signal random access points: https://www.atsc.org/wp-content/uploads/2017/03/A342-3-2017-MPEG-H-System-2.pdf (5.2.2.2 Random Access Point and Stream Access Point).
So, I believe USAC can/should do the same.

Quote
5.2.2.2 Random Access Point and Stream Access Point

A File Format sample containing a Random Access Point (RAP), i.e., a RAP into an MPEG-H
Audio Stream, is a “sync sample” in the ISOBMFF and shall consist of the following MHAS
packets, in the following order:
• PACTYP_MPEGH3DACFG
• PACTYP_AUDIOSCENEINFO (if Audio Scene Information is present)
• PACTYP_BUFFERINFO
• PACTYP_MPEGH3DAFRAME
Note that additional MHAS packets may be present between the MHAS packets listed above
or after the MHAS packet PACTYP_MPEGH3DAFRAME, with one exception: when present, the
PACTYP_AUDIOSCENEINFO packet shall directly follow the PACTYP_MPEGH3DACFG
packet, as defined in ISO/IEC 23008-3 Amendment 3 [4] Clause 14.4.
Additionally, the following constraints shall apply for sync samples:
• The audio data encapsulated in the MHAS packet PACTYP_MPEGH3DAFRAME shall
follow the rules for random access points as defined in ISO/IEC 23008-3, Clause 5.7 [2].
• All rules defined in ISO/IEC 23008-3 Amendment 2, Clause 20.6.1 [3] regarding sync
samples shall apply.
• The first sample of an ISOBMFF file shall be a RAP. In cases of fragmented ISOBMFF
files, the first sample of each Fragment shall be a RAP.
• In case of non-fragmented ISOBMFF files, a RAP shall be signaled by means of the File
Format sync sample box “stss,” as defined in ISO/IEC 23008-3 Amendment 2, Clause 20.2
[3].
• In case of fragmented ISOBMFF files, the sample flags in the Track Run Box ('trun') are
used to describe the sync samples. The “sample_is_non_sync_sample” flag SHALL be set
to “0” for a RAP; it shall be set to “1” for all other samples.

Re: exhale - Open Source xHE-AAC encoder

Reply #346
The problem is, I gave this encoder to Peter, and some test files, and he cannot find any STSS boxes in any of them.

Re: exhale - Open Source xHE-AAC encoder

Reply #347
• In case of non-fragmented ISOBMFF files, a RAP shall be signaled by means of the File
Format sync sample box “stss,” as defined in ISO/IEC 23008-3 Amendment 2, Clause 20.2
[3].
Is that really necessary? I'm writing chunk information ('stsc' and 'stco' boxes), with each chunk starting with an independently decodable frame and a length up to the next independently decodable frame. I thought this is sufficient.

Chris
If I don't reply to your reply, it means I agree with you.

Re: exhale - Open Source xHE-AAC encoder

Reply #348
Is that really necessary? I'm writing chunk information ('stsc' and 'stco' boxes), with each chunk starting with an independently decodable frame and a length up to the next independently decodable frame. I thought this is sufficient.

Oh, I see.. so, exhale expects decoders to always preroll from the beginning of the chunk where the target sample is located?
I understood your intention, and it should be indeed possible, but I'm afraid no decoders do seek in such a way, and also it's kind of fragile.
I mean, when mp4 file is re-multiplexed, chunking / interleaving is done in whatever interval they like.
It may be per-sample (costly, big overhead) or may be 1sec or so. In other words, original chunk interval/boundaries are not expected to be preserved.

The very basic procedure to seek is written in ISO IEC 14496-12 Annex A.7 Random Access.
Using stts (and optionally, stss), the target sample number is already determined.
Knowing which sample to retrieve, decoder will fetch sample directly using stco/stsc.
stco/stsc is considered to be lowest level of structure for media storage. So it's not usually be exposed by mp4 demuxing library, but your implementation will require it known by the decoder.

Re: exhale - Open Source xHE-AAC encoder

Reply #349
OK, thanks for the info. After reading my own https://gitlab.com/ecodis/exhale/-/issues/1,
I think I understand this now. The Fraunhofer whitepaper on xHE-AAC says:
Quote
All IPFs {which are frame types that exhale doesn't write yet} are signaled by means of the SyncSampleBox.
{...}
In addition to the rather expensive IPFs, all AUs that have the usacIndependencyFlag set to 1 can be used to enable random access, e.g. for seeking operations. While these Independency Frames (IF) can be used to start decoding, a full audio signal is guaranteed only after decoding a certain number of AUs. This is referred to as roll distance in file format terms and can be signaled using the AudioPreRollEntry and the AudioSampleGroupEntry, respectively.
So I could write a proper 'stss' box with entries 1, 1 + independency period, 1 + 2 * independency period, ...
However, John Calhoun, the reporter of issue #10, commented on July 4:
Quote
I've been testing the serialized 'csgp' atom for interoperability with Apple's current macOS developer preview, which is capable of parsing these atoms and of decoding xHE-AAC audio, by creating relatively lengthy encodings and seeking to a random access point near the end. Playback can proceed with little delay only if Apple's implementation has located an independently decodable frame near the point of resumption after a seek. Without 'csgp', a considerable amount of time is required to decode from the start before playback can resume.
It's also possible for me to use tools built upon Apple's APIs to convert exhale-produced .mp4 files to Apple-produced .mp4 files, which will contain the same information that the 'csgp' atom contains in the form of an 'sbgp' atom. These will accurately reflect the same random access interval.
So it seems that at least the Apple decoder is already happy with the current exhale bitstreams.

Oh, I see.. so, exhale expects decoders to always preroll from the beginning of the chunk where the target sample is located?
Well, my intention was not to expect this behavior from all decoders, I thought decoders do it this way anyway. But yes, in combination with the new 'sgpd'/'csgp' code from issue #10, that was the idea. But I understand now that chunks are actually intended for something else.

Chris
If I don't reply to your reply, it means I agree with you.

 
SimplePortal 1.0.0 RC1 © 2008-2020