Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Suggestion for a simple psychoacoustic improvement: transient bit-boost. (Read 5257 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Suggestion for a simple psychoacoustic improvement: transient bit-boost.

I’ve been using the OPUS codec for a couple of years now and have become familiar with how the sound quality scales with bitrate. As of the latest version, I think it’s safe to say that this is the best “all rounder” at any given quality level compared to other lossy codecs.



Something I’ve noticed is that with audio content without many transients (like ambient and classical music), subjective transparency can be reached with a substantially lower bitrate. Most classical music to my ears sounds essentially identical to the lossless version at around the 112 to 128 kbps range, generally speaking.



For music with lots of transients, and especially transients rendered in quick succession (demanding temporal detail rather than tonal detail), I need to bump the bitrate up to around 192kbps in order to reach the same subjective experience of transparency (for those kind of sounds). Plucking, clicking, strumming, toms, kicks, rides, hi-hats and snares — anything with sharp attacks or high temporal resolution — all these sounds are audibly “softer” and tend to be less distinct if I stick with a lower bitrate (128kbps would otherwise render almost all other kind of sounds faithfully to my ears). In some situations, a transient won't be rendered as such if it's crowded out by sound in the same decibel range — it just blends into the surrounding noise. I’m curious to hear if others can relate to my experience.



My basic assertion is that OPUS could benefit from further tuning its model, so that more bits are assigned to frames that contain transients, relative to tonal content. I could describe it the following way: with this new “boost” option implemented and toggled on, encoding at 128kbps would have the encoder treat transients (and temporally demanding content in general) as if you had selected 192kbps VBR as the bitrate, while encoding all other content “normally” (at 128kbps VBR). 


This would have the side effect of upping the overall bitrate and resulting size of a track if it has lots of transients, but I would be fine with this greater variability in size if it meant always meeting subjective transparency across all material, without having to encode different songs and albums at different bitrates.



The only other small tweak I would want alongside this is an option to slightly boost bit allocation to high frequency tonal content above 16kHz (but only when there’s enough energy to warrant it) at medium bitrates like 128kbps and below, but that would be a secondary enhancement and likely not as important or simple to tune.



TLDR: I think subjective transparency could be reached quicker at lower bitrates if bits were more generously allocated to transients (maybe an extra 50%+ per frame), while encoding all other content normally.

Re: Suggestion for a simple psychoacoustic improvement: transient bit-boost.

Reply #1
That’s just variable VBR, or variable variable bit rate. I think you mean that the variability  (the V) in the VBR needs to be made greater. I think in practice the bit rate is already greater with big transients in VBR.

Re: Suggestion for a simple psychoacoustic improvement: transient bit-boost.

Reply #2
I think you mean that the variability  (the V) in the VBR needs to be made greater.
The TLDR at the end of my post says what I mean best I think. But yes, a kind of "VBR+" mode is essentially what I'm proposing, just by tweaking one parameter. It would be a mode where resulting file sizes would always be higher by some degree, and never lower, though. The closest analogue would be the "impulse_noisetune" option in Vorbis: https://wiki.hydrogenaud.io/index.php?title=Recommended_Ogg_Vorbis#Reducing_pre-echo

Re: Suggestion for a simple psychoacoustic improvement: transient bit-boost.

Reply #3
Quote
But yes, a kind of "VBR+" mode is essentially what I'm proposing, just by tweaking one parameter. It would be a mode where resulting file sizes would always be higher by some degree, and never lower, though.

I think you mean a mode where the varance in file sizes is larger but the average file size is unchanged, that is, a less constrained VBR mode.  If you are ok with making file sizes larger, then you can use the --bitrate flag which does exactly that. 

Re: Suggestion for a simple psychoacoustic improvement: transient bit-boost.

Reply #4
My basic assertion is that OPUS could benefit from further tuning its model, so that more bits are assigned to frames that contain transients, relative to tonal content. I could describe it the following way: with this new “boost” option implemented and toggled on, encoding at 128kbps would have the encoder treat transients (and temporally demanding content in general) as if you had selected 192kbps VBR as the bitrate, while encoding all other content “normally” (at 128kbps VBR). 


This would have the side effect of upping the overall bitrate and resulting size of a track if it has lots of transients, but I would be fine with this greater variability in size if it meant always meeting subjective transparency across all material, without having to encode different songs and albums at different bitrates.
...
TLDR: I think subjective transparency could be reached quicker at lower bitrates if bits were more generously allocated to transients (maybe an extra 50%+ per frame), while encoding all other content normally.

The current Opus encoder can *already* boost the transient bitrate significantly, by up to a factor of two. So far, I've heard people complaining about tonal content far more than transients, but of course people listen to different music and perceive artefacts differently. The bottom line is that any increase in the bitrate of transient has to be compensated for by a decrease to non-transients so that when averaged over a large collection of music, the bitrate remains constant (otherwise I can just encode everything at 510 kb/s and call it a day). If you feel like playing around with knobs, I can show you where the transient boosting is applied, but keep in mind that what you call a transient isn't necessarily what the code defines as a transient and the line is sometimes blurry.

Re: Suggestion for a simple psychoacoustic improvement: transient bit-boost.

Reply #5
Quote
But yes, a kind of "VBR+" mode is essentially what I'm proposing, just by tweaking one parameter. It would be a mode where resulting file sizes would always be higher by some degree, and never lower, though.

I think you mean a mode where the varance in file sizes is larger but the average file size is unchanged, that is, a less constrained VBR mode.  If you are ok with making file sizes larger, then you can use the --bitrate flag which does exactly that. 
I might be missing your point, but the "mode" I talked about in the original post simply doesn't exist yet.

I think Andre is unintentionally throwing people off the main point I'm trying to make by calling my idea "variable VBR". It doesn't really matter what it could be called, whether file sizes would be larger, whether it needs a new quality scale, etc. IMO, it could simply be a mode you can toggle on or off in setting or with a new flag, while keeping everything else the same.

OPUS already introduced a bitrate boost for music with strong harmonics in version 1.1, called "tonality estimation", without changing their VBR scale. It's baked in, and ends up selectively increasing the bitrate and file size only when needed:
https://people.xiph.org/~xiphmont/demo/opus/demo3.shtml
(halfway down the page)

In version 1.0, a track with strong harmonics encoded at 64kbps VBR, would have ended up being smaller than the same track encoded at the same bitrate using version 1.1. Even though the change was compulsory (can't be turned off using a flag), the VBR scale remained unchanged. Which is fine!

A bitrate boost for transients (beyond what's implemented right now) would be exactly the same kind of concept: you select a bitrate in VBR mode exactly as you do now, and with this feature enabled the encoder will boost the bitrate selectively depending on the density of the transients in any given track.

I hope that clarifies where I'm coming from.

Re: Suggestion for a simple psychoacoustic improvement: transient bit-boost.

Reply #6
If you feel like playing around with knobs, I can show you where the transient boosting is applied, but keep in mind that what you call a transient isn't necessarily what the code defines as a transient and the line is sometimes blurry.
Thanks Jmvalin, if you could give me some direction that would be much appreciated.

The bottom line is that any increase in the bitrate of transient has to be compensated for by a decrease to non-transients so that when averaged over a large collection of music, the bitrate remains constant

That's curious -- I assumed that when "tonality estimation" was implemented in v1.1 for tonally rich content, the final bitrate did actually increase when averaged over a large collection, compared to v1.0. Further boosting transients wouldn't work the same way?

In any case, I wasn't making the case for this boosting feature to become the default behaviour of the encoder, just that it would be useful to be able to have it as an option if needed (much like the ability to define framesize).

Re: Suggestion for a simple psychoacoustic improvement: transient bit-boost.

Reply #7
Even in mp3 there are long & short blocks - you can spend given amount of bits on a short block to encode temporally demanding transient with less tonal resolution, or you can encode with a long block a rich sustained sound for higher tonal resolution at the cost of temporal resolution. I doubt opus would be so efficient if it hadn't similar mechanisms.

Re: Suggestion for a simple psychoacoustic improvement: transient bit-boost.

Reply #8
Even in mp3 there are long & short blocks - you can spend given amount of bits on a short block to encode temporally demanding transient with less tonal resolution, or you can encode with a long block a rich sustained sound for higher tonal resolution at the cost of temporal resolution. I doubt opus would be so efficient if it hadn't similar mechanisms.
I'm already aware that OPUS encodes transients differently just by skimming through the paper: https://arxiv.org/pdf/1602.04845.pdf

I'm talking about boosting bitrate for transient frames beyond the default behaviour (either by tweaking an existing parameter, or adding a new one).

OPUS is already excellent, don't get me wrong -- just interested in slightly different tunings to squeeze a little more out of it.

Re: Suggestion for a simple psychoacoustic improvement: transient bit-boost.

Reply #9
Did you actually ABX that 128kbps encodes vs originals and 192kbps versions? Especially files with HF content above 16kHz you mention?

Re: Suggestion for a simple psychoacoustic improvement: transient bit-boost.

Reply #10
Thanks Jmvalin, if you could give me some direction that would be much appreciated.
You want to look at the celt/celt_encoder.c file. The transient_analysis() function returns 1 when the current frame is a transient and 0 otherwise. It also computes an estimate how how "strong" the transient is, which it returns in tf_estimate. That value gets used in compute_vbr() to boost the bitrate. To change the behaviour, you'd have to change the value of that estimate, and then update tf_calibration so that the average bitrate doesn't change.

Quote
That's curious -- I assumed that when "tonality estimation" was implemented in v1.1 for tonally rich content, the final bitrate did actually increase when averaged over a large collection, compared to v1.0. Further boosting transients wouldn't work the same way?
No, when tonality estimation was added, the bitrate of the (vast majority of) non-tonal files decreased slightly to ensure that the average over a large collection stayed the same. Of course, you can argue how representative my collection is, but I try to make sure any change I make to VBR does not change the overall average. One note about 1.0 though. While it did have transient boosting in VBR mode, the VBR was tuned to always produce the same average over an individual file. Truly unconstrained VBR only arrived with 1.1.

Quote
In any case, I wasn't making the case for this boosting feature to become the default behaviour of the encoder, just that it would be useful to be able to have it as an option if needed (much like the ability to define framesize).
If a VBR change is good, if should be on by default. If it's only good in some files, then you might as well just increase the value you pass to --bitrate.

Re: Suggestion for a simple psychoacoustic improvement: transient bit-boost.

Reply #11
Interesting, thanks for the detailed reply!

You want to look at the celt/celt_encoder.c file. The transient_analysis() function returns 1 when the current frame is a transient and 0 otherwise. It also computes an estimate how how "strong" the transient is, which it returns in tf_estimate. That value gets used in compute_vbr() to boost the bitrate. To change the behaviour, you'd have to change the value of that estimate, and then update tf_calibration so that the average bitrate doesn't change.
I'm not a programmer so I'm a little stuck on where to make the change to the value in question (lets say I want to double it, for simplicity). I have celt_encoder.c open and there are 22 mentions of "tf_estimate" in the code (including in the comments, so I know to ignore those).

If it's not too much trouble, could you point me to the lines and the numbers I need to change? I'm looking at line 400 as potentially the one I need to edit for "tf_estimate" (maybe line 2,090 also), and line 1345 for "tf_calibration". But I'm just guessing.

Anyone else who is familiar with the OPUS source code is free to chip in.

 

Re: Suggestion for a simple psychoacoustic improvement: transient bit-boost.

Reply #12
Double post -- see above.