Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Is there an advantage in using bigger frame sizes (40+) for offline use? (Read 4628 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Is there an advantage in using bigger frame sizes (40+) for offline use?

I usually just use foobar2000 as my encoder frontend for convenience, but I'm now trying fre:ac and the Opus settings dialog readily offers more options than just bitrate, including the frame size, and I'm wondering if there's any advantage (or disadvantage) in raising that when encoding for local use instead of streaming applications.
Are bigger frame sizes known to improve efficiency?

Re: Is there an advantage in using bigger frame sizes (40+) for offline use?

Reply #1
Frame sizes bigger than 20ms are only useful at fairly low bitrates. At higher rates, using them might hurt efficiency because some parameters can only change at frame boundaries.

Re: Is there an advantage in using bigger frame sizes (40+) for offline use?

Reply #2
Thanks @Octocontrabass
How low are we talking about? Below 100? or more like less than 50?

Re: Is there an advantage in using bigger frame sizes (40+) for offline use?

Reply #3
Less than 50kbps. Probably a lot less, but I don't have exact numbers on hand.

Re: Is there an advantage in using bigger frame sizes (40+) for offline use?

Reply #4
I remember looking more into this and doing some very simple experiments back in 2014 or so. I don't have anything written down from that any more.

IIRC, for at-rest files the overhead savings are around 0.2 kbps at most. Minor, and there are minor tradeoffs.

I think I came to the conclusion it was worth considering at <16kbps or so.

I didn't make my conclusion based purely on listening tests; did only a little of that and didn't notice differences. I may have been influenced by encoder mode behavior in 1.0.x, which has changed since then, or some particulars of my use-case at the time.

The main reason the option exists is, as you allude to, the reduction of transport overhead when streaming.

Re: Is there an advantage in using bigger frame sizes (40+) for offline use?

Reply #5
IIRC, for at-rest files the overhead savings are around 0.2 kbps at most. Minor, and there are minor tradeoffs.
I was thinking more about quality improvements and not so much about bandwidth savings.
In any case, it seems bigger frames may actually be a bad idea with bitrates sufficient for reasonable quality in full band audio, and I don't usually ultracompress that, or deal with many speech-only files, so that's it for me.

Speaking of speech... will it do any good to disable automatic switching between "speech" and "music" modes and force it to music if you know that's all there is? Or is it still better to let the detector do it's thing regardless?

Re: Is there an advantage in using bigger frame sizes (40+) for offline use?

Reply #6
For the MDCT ("CELT") layer (and thus also for the hybrid modes), frame sizes above 20ms aren't actually larger frames but a way of packing 20ms frames together. So audio data should basically be identical for 20, 40, 60 ms settings except when using LP-only ("SILK") modes.

While old encoders used LP-only up to about 20kbps, 1.3 will normally use hybrid above about 12kbps. So frame size can only have a quality impact in those very low bitrates.

As proof here's the modes table from section 3.1 of the spec:

Configuration #sModeBandwidthFrame Sizes
0...3SILK-onlyNB10, 20, 40, 60 ms
4...7SILK-onlyMB10, 20, 40, 60 ms
8...11SILK-onlyWB10, 20, 40, 60 ms
12...13HybridSWB10, 20 ms
14...15HybridFB10, 20 ms
16...19CELT-onlyNB2.5, 5, 10, 20 ms
20...23CELT-onlyWB2.5, 5, 10, 20 ms
24...27CELT-onlySWB2.5, 5, 10, 20 ms
28...31CELT-onlyFB2.5, 5, 10, 20 ms
You're correct that at the bitrates normally used for music, the benefit of 20ms frames over 10ms frames is smaller than it is at very low bitrates, and >20ms MDCT modes wouldn't really have helped anyways.

In fact, at one point the developers, because they'd found that they could get better results on some problem samples by manually forcing the encoder to use shorter frames for certain transient sounds, published experimental encoder builds that varied the frame size between 2.5-20ms based on input. But that experimental encoder had problems, it seemed that any benefits of getting it right would be minor, and they ended up putting the experiment on hold indefinitely.

As far as using --music or restricting modes:

The speech/music classifier in 1.3.x is much better than it was before. And I think it tends to err on the side of encoding in music-appropriate MDCT modes rather than LP/hybrid unless it's pretty confident you're encoding speech. And it will always encode as MDCT, even if you specify --speech, at ~64kbps and up.

So I don't think it's likely to matter for your use case. I think there may still be some occasional benefit in using --speech for speech-only content at some bitrates.

Re: Is there an advantage in using bigger frame sizes (40+) for offline use?

Reply #7
@jensend thank you for the detailed answer.
I don't normally go as low as 64kbps, so I guess I don't need to fiddle with the mode switch, then.

I'm now a bit more interested in what I've seen commented around: People say that Opus isn't well tuned at bitrates higher than 128, or at least that there hasn't been much focus on improving that compared to other codecs like AAC (I'm confused about the many actual variants under that umbrella), LAME MP3, or even Musepack. That after the initial boom in popularity and enthusiasm, it's really only meant for low bitrate uses like speech for communications and streaming, and that any thing up of 128 is best served by other codecs with more tuning work done in those areas.
Sounds a bit disheartening. But I guess that's a different topic.

Re: Is there an advantage in using bigger frame sizes (40+) for offline use?

Reply #8
The main AAC encoders of note are Apple/CoreAudio, which is usually accessed via qaac, and FDK AAC, which is the AAC encoder used by Android. Both are very good. Even MP3 remains competitive if you use a high enough bitrate, so unless your goal is to save as much space as possible with reasonable quality, you can pick any format you want. MP3 is not at all competitive at the < 128 kbps range that AAC and Opus can reach, though.

Re: Is there an advantage in using bigger frame sizes (40+) for offline use?

Reply #9
Well, I wasn't really looking for advice on using AAC and MP3, but, rather, knowing more on these apparent issues with Opus allegedly not being as well tuned at rates much higher than 128.

Re: Is there an advantage in using bigger frame sizes (40+) for offline use?

Reply #10
but, rather, knowing more on these apparent issues with Opus allegedly not being as well tuned at rates much higher than 128.
I have been testing Opus VBR 256 kbps for two weeks (or maybe more). It's great! No killer samples so far, fully transparent to me. Maybe it's not so impressive achievement, but still.

I would personally say that it's better than MP3 on every possible bitrate. Comparing to AAC - I'm not sure.
Opus VBR 256 + SoX

Re: Is there an advantage in using bigger frame sizes (40+) for offline use?

Reply #11
Well, I wasn't really looking for advice on using AAC and MP3, but, rather, knowing more on these apparent issues with Opus allegedly not being as well tuned at rates much higher than 128.
I have read some claims of this, but I have yet to see anything from the Opus developers themselves stating that they ignored tuning of bitrates above 128 kbps in favour of tuning only the bottom of the bitrate range.

Re: Is there an advantage in using bigger frame sizes (40+) for offline use?

Reply #12
"People say that" does not equal "apparent issues." That's confusing rumors with reality. Listening tests do not bear out the notion that Opus is worse at high bitrates than Musepack or AAC. See for instance this listening test. Opus and AAC are both superior to MP3 at any bitrate.

Encoder development "tuning" effort spent targeting particular bitrates doesn't reflect quality. Yes, over a decade after MP3 was finalized as a format, and over 5 years after LAME was first released, LAME developers were still making changes that had a notable impact at high bitrate on a wide range of music. But that's because MP3 is a particularly problematic and archaic format. Reasonably modern, well-designed formats allow encoders to do a very good job at high bitrates long before the format is finalized. That doesn't mean no further encoder improvements are possible, but they will be less common and more subtle.

At 128kbps, using Opus, AAC, or xHE-AAC, most samples are transparent to average listeners. (Often at 96kbps too for Opus and xHE-AAC.) This creates a problem trying to compare codecs beyond that point. Some problem samples are non-transparent to many listeners beyond that point. Some trained listeners can distinguish on more "normal" samples. But the differences are subtle enough that it becomes very difficult to have enough evidence to conclude, in any kind of statistically sound way, that one codec is better at that bitrate. The test I linked at the start of this post, for instance, provides some evidence that Opus is better than Musepack, AAC, and Vorbis at 192kbps, but statistically it's not enough evidence to conclude that, and the only safe conclusion is that all of them are superior to MP3.

Here's a post from the primary developer of Opus saying the same kind of thing.

Re your confusion between different versions of AAC: plenty of places online (e.g. wikipedia, HA's wiki) have that information, but often not concisely put. I'll summarize.

Compared to the baseline AAC (also known as LC-AAC for Low Complexity), HE-AAC (High Efficiency) adds spectral band replication (SBR), while HE-AACv2 includes SBR and parametric stereo (PS).

SBR "fakes" high-frequency content by "replicating" the lower spectrum, i.e. adding harmonics of the low-frequency content. The encoder sends a little guidance information so the decoder can be a little less naive about doing so. This saves bits to spend on the more-audible lower frequencies without just having no high-frequency sound, but it can't faithfully reproduce high frequencies. So it's helpful for low bitrates but counterproductive at 80kbps and above.

Parametric stereo "fakes" stereo content: the signal is transmitted as mono, but with a little bit of side information to help the decoder try to reconstruct the stereo field. That's helpful for rather low bitrates but counterproductive at 48kbps and above.

xHE-AAC ("Extended High Efficiency") has enough changes from the others to be considered an entirely different codec. Like Opus, it can code speech using linear prediction to help with low-bitrate audio quality. Some other tools it has available, plus its use of longer frames, help commercial xHE-AAC encoders do better than Opus at low bitrates. (The open-source encoder, Exhale, isn't targeted for those bitrates and doesn't use those tools.) At 64kbps and up, xHE-AAC and Opus are basically tied.

Re: Is there an advantage in using bigger frame sizes (40+) for offline use?

Reply #13
@jensend Thanks again for the detailed answer.

I don't currently have good enough equipment or environment to conduct effective ABX testing, but I still want to produce competent encodes that won't turn out to be problematic down the line. So that was my worry. But it seems it was for nothing, after all.

It seems my use of the word "apparent" was incorrect due to Spanish being my native language: There seems to be a case of false friends between English "apparent" and Spanish "aparente", where, at least in this use, the English term denotes something being clearly seen, while the Spanish word has a connotation of the observed thing having uncertain or even deceiving looks.
I guess I could have used "apparently" instead: "it seems apparent —used to describe something that appears to be true based on what is known" (merriam-webster)
My point was that people were saying these things, and I didn't know what to think for sure.

Also thanks for explaining the AAC formats. That was the most helpful explanation I've seen.
Wandered around trying to make sense of it all, but nobody was putting two and two together like that.

Re: Is there an advantage in using bigger frame sizes (40+) for offline use?

Reply #14
You're welcome!

The only way to completely guarantee your encoded music won't be problematic---that it will have zero audible artifacts, or even that it won't have any annoying artifacts ---would be to use a lossless format. But with Opus or a good AAC encoder, at reasonably high bitrates, annoying artifacts are very unlikely. That's especially true for average listeners rather than trained experts with perfect hearing. Again, that listening test showed an expert listener finding all tested tracks completely transparent at 192kbps. For me, that point would be considerably lower.

And this isn't like LAME prior to 3.94 (almost 20 years ago!) where getting the results you wanted depended strongly on figuring out which options to twiddle. The Opus encoder's defaults are fine unless you have unusual specific needs. So it's just a matter of choosing a bitrate where your storage constraints/costs and your desire for a reduced likelihood of occasional artifacts balance each other out.

"Apparent" has a similar two-edged role in both English and Spanish- the inherent analogy to sight can either mean that it's obvious or that it may be only an appearance. But in English, a context where sense perception is the relevant reality would tend towards the former. An apparent red mark is a real and visually obvious red mark, not an optical illusion, unless context dictates otherwise. Yes, the adverb "apparently" is more often used to hedge your bets, or any kind of stated contrast between 'real' and 'apparent' also makes that use clear.

Re: Is there an advantage in using bigger frame sizes (40+) for offline use?

Reply #15
60ms frame sizes, CELT only, fixed point math, yields *slightly* faster encoding times. By faster I mean a couple ms per encoded second. But, seeing as a few ms can mean the difference between life and a hard fault in the embedded world, it's something.
The benchmarks I've done were on Cortex M (arm32) MCUs.