Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: fdkaac input bitdepth (Read 4475 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

fdkaac input bitdepth

All the information I've been reading, including the info straight from the horses mouth here, and Wikipedia repeats here, indicates fdkaac is based on fixed-point math and only supports 16-bit integer PCM input.

Which confuses the hell out of me when it seems to happily accept a 24 bit input and foobar2000 sets the "highest BPS supported" to 32 for fdkaac.
It confuses me even more when it doesn't appear to be clipping values above 0dBFS and encodes them just as most other lossy encoders do.

Could someone please explain what I'm missing here?

Thank you.

fdkaac input bitdepth

Reply #1
The encoder is surely fixed point based. Frontend (fdkaac command) is not, and it accepts up to 32bit int and 64bit float. It down converts to 16bit int before encoding.
Quote
It confuses me even more when it doesn't appear to be clipping values above 0dBFS and encodes them just as most other lossy encoders do.

Well, it is possible that peak goes beyond 0dBFS when decoded, but it surely clips.
However, it applies smart limiter to floating point input in order to minimize the audible defect of hard clipping. Therefore, it might not so obvious.
Try feeding float input with intentionally high gain (For example, try this one: [attachment=8012:test.wv])
The resulting peak should be much lower than the original (due to clipping), but audible defect of clipping should not be so obvious as the case of hard clipping (For example, LAME or MPC will simply hard clip and the result should be obvious).

fdkaac input bitdepth

Reply #2
Thanks for the reply.

I thought I'd pretty much tried what you suggested before posting (it turns out, maybe not), but to ensure I'm not going mad......

I ran ReplayGain on your test file and the result was this:
Track Gain: -7.64dB, Track Peak 1.633424

After re-encoding with fdkaac (the LAME result was pretty much the same):
Track Gain: -7.51dB, Track Peak 1.019475

Okay, so that was clipped. Back to the drawing board.....

I dug out a random flac track taken from a CD and ran a ReplayGain scan:
Track Gain: -2.55dB, Track Peak 0.905121

Re-encoded with ffdkaac while applying a 10dB volume increase:
Track Gain: -12.00dB, Track Peak 1.418604

Re-encoded with LAME while applying a 10dB volume increase:
Track Gain: -11.60dB, Track Peak 1.547830

Re-encoded with NeroAAC while applying a 10dB volume increase:
Track Gain: -12.52dB, Track Peak 2.928896

Re-encoded with wavepack (32 bit) while applying a 10dB volume increase:
Track Gain: -12.55dB, Track Peak 2.862243

And finally the wavepack (32 bit) file above re-encoded with fdkaac, no volume change):
Track Gain: -12.00dB, Track Peak 1.418604


So a question to help me get my head around all that..... what do the "Track Peak" ReplayGain values represent? Are they a percentage?

I'm assuming from the above result, the only lossy encoder which encoded the audio correctly after the volume was increased by 10dB was NeroAAC. The rest of the time the peaks were clipped and.... I guess..... the ReplayGain Track Peaks of around 1.4 represent the level at which the clipping/distortion is decoded, or something to that effect? If that's the case, I guess seeing the Track Peaks of around 1.4 was fooling me into thinking the audio was still being encoded properly.
Maybe I should have listened instead. I did this time. The distortion in the fdkaac/lame encoded versions was pretty obvious. Foobar2000's advanced limiter (which I leave in the playback chain) did a pretty good job of limiting the wavepack version. Much less distortion.

Sometimes when converting audio while applying a DSP (downmixing multichannel audio to stereo and encoding as MP3, for example) I've run a scan with MP3Gain and it's reported peaks a dB or three over 0dB. I aim to prevent that anyway, but I'd always assumed it's because the encoder could store values greater than 0dBFS, but now I'm thinking that's not possible unless the encoder input is 32 bit (which would be 32 bit float).

Am I on the right track with any of that?

Cheers.

fdkaac input bitdepth

Reply #3
So a question to help me get my head around all that..... what do the "Track Peak" ReplayGain values represent? Are they a percentage?

If you multiply it by 100, yes. 1.0 means 100% and 0dBFS. You can convert it to dB by the following formula.
Code: [Select]
20 * log10(x)

For example, 0.5 will become 20*log10(0.5) = -6.02 in dB.

In your example, original peak is 0.905121. Applying 10dB gain means multiplying 10^(0.05*10), which becomes 0.905121 * (10^(0.05*10)) = 2.8622439.
The result of Wavpack (floating point) looks exact.

fdkaac input bitdepth

Reply #4
The smart limiter of fdkaac was added on 0.6.0 (the latest version on github repo).
It's not usable on older version, in which case it will simply hard clip like LAME.

fdkaac input bitdepth

Reply #5
Thanks for the info. I think I understand now. So for a Track Peak of 1.547830:

20*log10(1.547830) = +3.79 dB

1.547830 seems like a lot more until you convert it to dB. Then it looks like it's probably just extra distortion.

Why does foobar2000 display the ReplayGain scan result for Track Gain in dB but the Track Peak as a percentage? Why not dB for both so it's a tad more intuitive?
I only ask because if it did, I probably would have looked at it the results and thought "a Track Peak of -0.86dB before, +3.79dB after.... that's not the 10dB difference I applied so the audio must be clipped".

Thanks again!

PS One other quick question if I may........
If I create a custom encoder preset for the ffdkaac encoder and set the maximum BPS to 24 (as opposed to the maximum of 32 foobar2000 specifies), would I be doing the encoder a disservice? I ask, because if in the future I need to remember which encoder supports 32 bit float input and which doesn't (even in the case of ffdkaac where it's converted to integer "internally") I can just check the converter presets to remind myself.

Well two more.... although this is more of an observation. When setting up a fdkaac encoder preset with the VBR5 quality setting, foobar2000 lists the average bitrate for CD audio as 180kbps. In my experience so far, that's not even close. It's more like 224kbps - 240kbps. I think the claimed 128kbps for the VBR4 is much closer to accurate. The bitrate jump from VBR4 to VBR5 seems quite large.

fdkaac input bitdepth

Reply #6
Gain values are in dB because they are almost always applied as is, or combined with other dB settings such as a preamp.  Internally that may well mean converting it to a simple multiplier, but that's the same for all gains.  The values may also be quite large as multipliers and are more readily understood as a number no more than a few dB.

Peak values are in percentages/fractions relative to 0dB because that is more useful.  As you've seen the peak numbers are generally quite small, frequently less than 1, and the resulting very small dB values plus or minus from 0 are less easy to understand at a glance.  A dB peak value wouldn't be applied immediately to anything.  Instead a percentage of peak is used in conjunction with any gains that are applied to calculate whether clipping is going to occur.  Then you decide what do do with that information, which might be as simple as applying a gain value to bring the peak down, or may be something more complex.

And because the spec says so

P.S. I agree that the FDK settings 4 and 5 are a long way apart right where you'd want one in between.  The resulting bitrate is very variable between and even within tracks, which can make the gap seem even bigger.

fdkaac input bitdepth

Reply #7
I don't know...... if I saw a Track Peak expressed as 0.2dB, 1.5dB, 4dB, 10dB or some other dB value it'd have some meaning to me. 1.064587 or 2.354671 etc are meaningless to me unless I convert them to dB.

If I was to see a Track Gain of 82dB and a Track Peak of -4.5dB it seems fairly obvious applying track gain without clipping would result in a 4.5dB increase and a new Track Gain of 86.5dB. Unless I'm missing something..... maybe it's just me.

My logic would be..... I convert an audio file. Maybe it's multi-channel and I downmix it. I run ReplayGain on the output file and it shows a track peak of 1.36498. Okay..... so I'll re-convert it while applying a volume reduction..... which I've got to specify as a reduction of "x" dB.....

I haven't played around with it much, and I've no idea what frequency is used by default, but specifying a low pass frequency of 17500 seems to reduce the bitrate for fdkaac VBR5 much of the time (-w 17500). A little more in-line with what you'd expect relative to VBR4 (around 200kbps). I was playing with 17500Hz because from memory that's the frequency the LAME V2 preset uses.
Whether it's a good idea I'm not sure, but specifying the same frequency for VBR4 increases the average bitrate for that preset a tad. A track which might end up 128kbps without -w 17500 might increase to 138kbps with it. Maybe adjusting the frequency is the key.

Edit: I checked and it seems I was remembering wrong. A CBR 128k LAME encode used a low pass of 17k. The V2 preset uses 18.5k. For V4 it's 17.5k and for V0 it's 22.1k. Obviously the LAME developers seem to think adjusting the low pass frequency is a good idea so maybe a couple of presets "in-between" fdkaac's Q4 and Q5 could be created that way?

fdkaac input bitdepth

Reply #8
I run ReplayGain on the output file and it shows a track peak of 1.36498. Okay..... so I'll re-convert it while applying a volume reduction..... which I've got to specify as a reduction of "x" dB.....
Perhaps I am missing something here but why do you need or want to specify the volume reduction manually in this case? foobar2000 can apply the Replaygain settings during conversion just like it does during playback. You only have to enable this function in Processing section of the converter settings.

fdkaac input bitdepth

Reply #9
Needs more testing with my fork of fdkaac. I never updated it, but I did at least set the foundation. Basically, I changed the code that was based on 32 bit integer, so it would instead accept 8.24 fixed point. The only limitation was that it absolutely required +/- 1.0 integer format input for the supplied SBR decoder, which could then produce 8.24 fixed point. I don't know if this can be fixed, or if this also applies to the encoder.

https://github.com/kode54/fdk-aac

Commits:

Modified to output 8.24 fixed point samples [Relevant to encoder and decoder]
Fixed SBR decoding volume scale [Relevant to decoder, possibly needs to be applied to encoder]

fdkaac input bitdepth

Reply #10
I run ReplayGain on the output file and it shows a track peak of 1.36498. Okay..... so I'll re-convert it while applying a volume reduction..... which I've got to specify as a reduction of "x" dB.....
Perhaps I am missing something here but why do you need or want to specify the volume reduction manually in this case? foobar2000 can apply the Replaygain settings during conversion just like it does during playback. You only have to enable this function in Processing section of the converter settings


It doesn't work if the source file doesn't contain ReplayGain info, and at least one format (wave files) can't contain it.

Applying DSPs while converting can change the volume significantly.

I thought I had a third reason but I got distracted for a few minutes and now it's gone....

fdkaac input bitdepth

Reply #11
P.S. I agree that the FDK settings 4 and 5 are a long way apart right where you'd want one in between.  The resulting bitrate is very variable between and even within tracks, which can make the gap seem even bigger.


After a bit of playing around I settled on using a low pass filter of 18500Hz to make a couple of encoder presets that bridged the bitrate gap a bit. Very roughly.....

The default quality setting of 4 (-m 4) = 128kbps

Quality of 4 plus 18500Hz LPF (-m 4 -w 18500) = 160kbps

Quality of 5 with 18500Hz LPF (-m 5 -w 18500) = 208kbps

The default quality setting of 5 (-m 5) = 224kbps

There's nothing to stop you adjusting it but I mostly encode with -m 4 -w 18500 now.

fdkaac input bitdepth

Reply #12
P.S. I agree that the FDK settings 4 and 5 are a long way apart right where you'd want one in between.  The resulting bitrate is very variable between and even within tracks, which can make the gap seem even bigger.


After a bit of playing around I settled on using a low pass filter of 18500Hz to make a couple of encoder presets that bridged the bitrate gap a bit. Very roughly.....

The default quality setting of 4 (-m 4) = 128kbps

Quality of 4 plus 18500Hz LPF (-m 4 -w 18500) = 160kbps

Quality of 5 with 18500Hz LPF (-m 5 -w 18500) = 208kbps

The default quality setting of 5 (-m 5) = 224kbps

There's nothing to stop you adjusting it but I mostly encode with -m 4 -w 18500 now.


Artificially forcing bitrates doesn't really solve the problem.  Instead of producing a quality level of -m4.5, you've just produced -m4 plus some wasted (entirely inaudible to me) high frequency bits.

Re: fdkaac input bitdepth

Reply #13
Ten years later, there is good news and bad news.

Foobar2000 is now able to show Track Peak as decibels, as @yetanotherid requested, but you need to enable it first: Preferences → Advanced → Tools → ReplayGain Scanner → Results dialog: advanced formatting of peak values.



Alas, the confusion between FDKAAC and bit depth remains. HA wiki states: “…based on fixed-point math and only supports 16-bit integer PCM input”, but fdkaac.exe 32float.wav does not throw an error, so there is a temptation to believe that support has been added. For example, Helix MP3 encoder, before @Case expanded its bit depth support up to 32 bit float in the spring of 2024, when running hmp3.exe 32float.wav would clearly reject such input: “Unsupported PCM file type”. In fact, according to @nu774's explanation, which is worth putting on the wiki, FDKAAC accepts even 64float.wav as input, which I just verified with version 1.0.5, and then internally converts it to 16 bit integer.

Code: [Select]
$ sox.exe -n -r 44100 -b 64 -e float 64float.wav synth 10 sine 10000
$ fdkaac.exe -b 128 64float.wav
[100%] 00:10.000/00:10.000 (87x), ETA 00:00.000
441000/441000 samples processed in 00:00.115

a) Is there any dither involved or are the extra bits are simply truncated when 32float.wav is converted to 16int?
b) Is further signal processing carried out in 16 bit precision?
c) I see that @kode54 tried to improve the situation, but were his patches used?
• Join our efforts to make Helix MP3 encoder great again
• Opus complexity & qAAC dependence on Apple is an aberration from Vorbis & Musepack breakthroughs
• Let's pray that D. Bryant improve WavPack hybrid, C. Helmrich update FSLAC, M. van Beurden teach FLAC to handle non-audio data

Re: fdkaac input bitdepth

Reply #14
Alas, the confusion between FDKAAC and bit depth remains. HA wiki states: “…based on fixed-point math and only supports 16-bit integer PCM input”
The wiki page refers to The Fraunhofer FDK AAC Codec Library for Android. It is purely fixed point code.

but fdkaac.exe 32float.wav does not throw an error, so there is a temptation to believe that support has been added.
Here 'fdkaac' means the encoder frontend developed by @nu774 that utilizes the FDK AAC library. The frontend is being helpful and handles the input best way it can.


a) Is there any dither involved or are the extra bits are simply truncated when 32float.wav is converted to 16int?
b) Is further signal processing carried out in 16 bit precision?
c) I see that @kode54 tried to improve the situation, but were his patches used?
Doesn't look like there is dither. A quiet test tone turned into absolute silence with fdkaac.exe.
Everything FDK AAC library does to the signal is done in fixed point. It will clip and it will mutilate quiet tones.

I tested the library modifications from kode54 with fdkaac frontend. I had to patch an assert in nu774's code (assert(((SAMPLE_BITS)>>3) == sizeof(INT_PCM))) to make the encoding work. This modification allowed the encoder to encode 10 dB above digital fullscale. That is an improvement, but doesn't remove any limits. The test signal I was encoding had peaks 20 dB above full scale.
This change also breaks something very badly. Whenever the signal loudness increases there is a popping sound even when the signal goes nowhere near digital fullscale.
Someone with more time, interest and/or familiarity with the code might be able to fix that, but I'm not sure it's worth the effort. This is after all meant for Android.
There is a proper encoder library bundled in Windows 11 (though I haven't yet tested if it does floats) and the library bundled with old Winamp certainly supports floats.