Unexpectedly high bitrate?
Reply #8 – 2014-09-19 18:28:24
It's not the actual latency in the playback that matters, but that Opus was designed primarily to operate efficiently in low latency applications and thus does not include long-blocks (and the signalling overhead they would incur) as an option in the bitstream. It thus cannot use blocks in the 90-100 ms range as are available in AAC for example. It emerges from the complicated mathematics of windowed transforms, that the longer the window (or block) in the time domain, the more narrowly it represents individual frequencies when transformed into the frequency domain. This helps concentrate the energy of tonal signals into tighter clusters of adjacent frequency-domain samples, which makes them easier to compress, and easier to reduce the bit-depth (quantization accuracy) without introducing audible variations. If you have a long window, yet it contains a very shorty transient pulse or step response, when you transform the pulse or step into the frequency domain, its energy that was so concentrated in the time domain will be spread almost evenly among every single frequency domain component as very low energy per band, and the inverse transform will only reproduce the sharp transient pulse accurately at the correct time if you provide accuracy equivalent to lossless bitrates. If you reduce the bit-depth in the frequency domain - e.g. because you can't spare such a high bitrate - the pulse will be smeared out over a flatter, broader pulse in the time domain, causing less of a sharp click and more of a noisy hiss. Thus long blocks are really bad at efficiently encoding transient signals like clicks because you need high quantization accuracy to stop the energy of the clicks from spreading out in the time domain. This is why transient detection and short-block switching are vital for efficient encoding. On the flip-side if you spot a transient and switch to a short block, now time accuracy is better, but you cannot provide the bit-depth to accurately reproduce narrow frequency peaks without supplying nearly lossless bitrates (albeit, that supplying 1400 kbps for only 4 or 5 ms uses fewer bits than supplying 1400kbps for 90-100 ms). Problem samples like fatboy (from the song Kalifornia) needed practically 1400 kbps momentarily but very frequently in codecs like Musepack to remain transparent for this sort of reason. The sample contains a succession of regular repeating transient sounds that need high time resolution combined with tonal frequencies that require adequate frequency resolution. Likewise, for samples like harpsichords, the CELT layer of Opus needs to increase the bitrate markedly to represent the pure tones with strong harmonics accurately yet still capture the sharp plucking sound at the onset of each note. In Opus 1.1, that's what the tonality estimator does in the default VBR mode when it detects this situation. Opus also has some other tricks to allow medium-to-long block type frequency resolution at some frequency bands while also allowing short-block type time resolution in other bands. It's explained in technical detail in the Opus Codec website and technical presentations. For more normal degrees of tonality, where 20ms blocks are a little bit shorter than ideal, Opus also features a tuneable pre-/post- filter to preferentially improve the encoding of tonal sounds without employing either long blocks or excessive bitrates. In the range of normal music that is most commonly encountered, this proves to be very efficient (i.e. high quality per bitrate), and it requires the encoder to spend extra bits only in relatively uncommon situations like harpsichord or certain bell sounds, yet retain the ability to be used at low latency too. It turns out that the constraint of aiming for low latency and eliminating out-of-band signalling led the developers of CELT and Opus to some very clever and effective solutions which overcome most of the penalties expected by shooting for low latency. Only in reducing the latency further (e.g. 10ms or 5ms) does the short-block penalty become rather more significant, requiring an increase in bitrate to maintain quailty.