Is this a limitation of Opus or is it Foobar2000 that at this point does not handle the opus format fully? Regards.
The Ogg Opus draft specification (http://tools.ietf.org/html/draft-terriberry-oggopus-01#section-5.2) currently with the IETF, expiring in January 2013 (and still open to amendment and comments) states that for the Comment Header, R128_TRACK_GAIN is introduced, based on the EBU R128 specification, and that it specifies the gain relative to the ID header's mandatory Output Gain field.
It is sensible to use Output Gain to match an intended level amounting to the same thing as Album Gain, again based on R128 techniques, which should be a little more accurate than ReplayGain's original algorithm and should deal better with sparse audio, such as dialogue or Audio Description for the visually-impaired thanks to gated measurement.
Output Gain SHOULD be implemented by virtually all players (a strong recommendation), but MAY be ignored, and R128_TRACK_GAIN may be implemented in addition depending on the mode required (e.g. some players may default to applying TRACK_GAIN when in shuffle mode). If Output Gain is modified, R128_TRACK_GAIN value MUST be modified accordingly if it is present (as it is applied after Output Gain and must still be correct).
The Comment Header "OpusTags" section says that to avoid confusion, normalization schemes other than R128 SHOULD NOT be used (they can, but are not recommended rather strongly). This does not mean that the intersample-compatible peak measurements of R128 cannot be used, but it seems a shame that a standardised comment tag name for R128_TRACK_PEAK and R128_ALBUM_PEAK has not been included, so as to prevent incompatible implementations (particularly in their number format) from being produced in different software.
I'd imagine that both PEAK values in the normalised domain should be calculated according to the R128 recommendation (with the same degree of oversampling to catch intersample overs) as if the audio's Output Gain were ignored (set to zero), and Track Gain was unused. The gain values could then be converted to effective peaks after whatever processing occurs (including volume controls etc).
Personally, I can manage without PEAK values as it's sensible to me to match Album Gain as my first priority and use something like fb2k's Advanced Limiter to manage any excessive peaks that remain and thus prevent hard clipping with negligible audible effect. If often also use a negative PreAmp gain or Volume control setting, meaning that Advanced Limiter less often needs to do anything.
I believe many of us on these forums are rather afraid of anything that adds distortion from time to time, possibly excessively afraid, given that it's usually transient, noise-like peaks which are likely to mask distortion that will trigger the Advanced Limiter (and would be hard to ABX), and that if they really cared about distortion, they should play back the digital at reduced levels with more headroom. About 10 years ago, I did some quick ABXing of pure clipping distortion (not even a soft limiter) on a Rachmaninov piano concerto via LAME (with peak of around 1.2 at the old equivalent to -V2) versus lossless and couldn't tell, though that's clearly not conclusive, and in fact the lossless might have still had intersample overs (all peaks were only a few samples long).
I believe many many more people use Prevent Clipping According To Peak than do what I do, and they let the volume become incorrect instead, so it seems there will be demand for this feature and it seems very sensible to acknowledge actual user behaviour (what is surely going to happen as evidenced by questions in these forums, rather than what ought to be necessary and sensible) to specify officially how the OpusTags Comment Header SHOULD be filled if implementers choose to implement R128_TRACK_PEAK and R128_ALBUM_PEAK and how it's calculated (i.e. refer to EBU R128) and do so before mutually-incompatible implementations spring up, while clearly indicating that support is not mandatory. Interoperability is important, and it's been specified with gapless support and R128 gain, so it's surely worth reporting peaks. Specifying the intersample peaks and promoting it as part of the normal tagging of music might encourage high-end audio hardware designers to provide sufficient linear headroom beyond digital full scale to sell it as a feature.
Being a result of oversampled measurement, the precise PEAK values produced by different tools will vary by small amounts, but ought to provide engineering headroom to allow clipping distortion to be prevented in the digital or analogue domains by competent design (and in the analogue domain, even ReplayGain hasn't achieved that, given that intersample overs still cause distortion in many high-end audio devices*)
*source: Thomas Lund talk on Loudness Wars (http://www.youtube.com/watch?v=BhA7Vy3OPbc) and distortion for broadcast audio pros, which later in the talk sounds dangerously non-TOS#8 for a while when listening to SIDE channel only of Mid-Side lossy encodes, but ends up advising against lossy in the production chain - transcoding - which seems sensible, while saying it's fine for distribution, without really quashing the myths.
Wow.Thank you Dynamic. If I get you right the articles author you also think it is a little weird that peak tags are not implemented? Regards.
I believe many many more people use Prevent Clipping According To Peak than do what I do, and they let the volume become incorrect instead, so it seems there will be demand for this feature and it seems very sensible to acknowledge actual user behaviour
I haven't seen any evidence of this and normalizing lossy audio by peaks is demonstrably a bad idea: The peak value is a highly noisy metric and the lossy compression (esp in Opus because it is able to get away with very few bits in some parts of the signal) can cause wild swings in the peak value by several dB and can cause consecutive tracks to be wildly different in apparent volume.
Also, any processing of the audio changes the peak. This means that setting the peak on an Opus file requires separately decoding it to get the peak and it may make the peak meaningless if the playing device does any filtering or equalization.
Moreover, Opus (like most of the lossy music formats) does not have a bit accurate specified decoder and differences in the decoder can change the peak value. Even in the reference implementation you can get back different peak values depending on if use a fixed-point or floating point compiled decoder or especially if you use the float or 16 bit API (which is clamped). Opus itself can also decode at different sampling rates than the original if the caller requests it to do so, and this obviously changes the peak values. (As well as the EBU loudness values, though unless you go all the way to 8kHz output it usually doesn't change them much).
Your points about intersample clipping are quite important and they highlight why 'almost right' is simply _not enough_ for clipping prevention.
To prevent clipping what a player should be doing is (1) making use of normalization (either at the track or album level) that sets the output levels to a point which has ample headroom from clipping, and (2) placing a limiter with up-sampling based detection at the end of the processing chain. Sane normalization makes sure that the limiter will be fairly inactive and transparent, and the limit prevents clipping of the actual signal. If a player does not do these things even if it uses the PEAK levels it is not guaranteed to avoid clipping, and if it does these things the PEAK levels are unnecessary.
It's my impression from reading a number of questions on these forums as well as screen-grabs of fb2k setups that use of clipping prevention in players like fb2k is commonplace, more so than use of a limiter to maintain loudness. I can't point to a body of statistical evidence.
Perhaps a poll (worded plainly without any leading questions) would be a reasonable way of gathering evidence for or against my impression of real user behaviour, albeit in the odd world of Hydrogenaudio readers (from both pragmatic and idealistic ends of our spectrum), not the general public.
If there is some demand for an implementation, and if anybody caves into that demand without a specification suggesting how, it's a recipe for potential fragmentation, which I think is a bad thing.
I completely agree with you, NullC, that "Apply Gain but prevent clipping according to peak" is not the most sensible approach (not quite so bad on a per-album basis as per-track). However, the EBU PLOUD group, specifying R128 does specify True Peak programme value (on a sensible intersample basis) and I think this has potential merit from an engineering perspective to ensure there is sufficient headroom in both the digital and analogue domains. It could also be useful to suggest to users a suitable "Replay Gain pre-amp" type of setting that would keep their whole collection of music reasonably within the linear range of their analogue reconstruction device (DAC), but I'm sure this is a marginal, relatively pointless use case. In unfiltered output,
Anyone applying digital EQ or other DSP will clearly break any peak value, and that's a very valid point well made. The oversampling approach at least gives a good idea for high fidelity uses (which wouldn't include decoding OPUS below 48kHz) and a little engineering margin (or gradual onset of distortion) would at least be possible, or the software could be forewarned about potential clipping.
So, what's the best way forward?
If we were to agree that there's enough demand to cause implementations including some kind of R128 PEAK statistics that might implement it differently or we simply wanted to foster a sensible approach to loudness normalization among developers, then I'd suggest that the Ogg Opus spec should make the case by stating something like:
Storage of peak level metadata:
- it is most sensible that target loudness be the primary aim of a playback loudness normalization system such as R128 (implemented by digital scaling or by a software-adjusted analogue volume control), and that use of a limiter, while it might not eliminate intersample overs and clipping distortion completely, is likely to mitigate clipping distortion while preserving the desired programme loudness.
- allowing per-track PEAK values in metadata to over-ride target loudness is likely to cause annoying volume jumps between tracks, and the use of per-track or per-album PEAK values in this way can make some tracks or albums excessively quiet due to an unusually extreme peak value. In the event that EQ, filtering or DSP is performed, the peak values will no longer be valid. In addition, Opus decoders are not bit-accurate, so actual peak values may still exceed values stored in metadata.
- we thus recommend that PEAK values be ignored or at most be used to select the most appropriate limiter parameters.
- there may still be very limited use cases where PEAK values are useful. To prevent ad-hoc implementations from fragmenting machine-readable comment metadata, we recommend that R128 True Peak values (oversampled to 192kHz sampling rate) be stored in the following number format (e.g. specify same as Vorbis Comments Replaygain Peak values, whatever encoding that is) in a User Comment Field entitled R128_TRACK_PEAK or R128_ALBUM_PEAK as required.
What do you think?
What do you think?
Those suggestion sound fine except that they don't address that the peak value may simply be different from decoder to decoder (even ignoring things like decoding at other sample rates).