HydrogenAudio

Lossy Audio Compression => Other Lossy Codecs => Topic started by: Big_Berny on 2013-02-14 11:11:58

Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: Big_Berny on 2013-02-14 11:11:58
x264 video encoder has encoding mode called Constant Rate Factor. In this mode number (16, 17, etc) is used to define desired quality (lower - better quality and higher bitrate), and encoder does not care about bitrate, only about keeping rate factor constant. It is a question, why nobody has invented something similar for audio encoding (except lossyWAV, which needs too much bitrate for acceptable quality)?

I think every encoder with real vbr (not abr) does that? Lame has V(0-9), QT AAC has --tvbr (0-127), Vorbis has -q((-2)-10). The bitrate may vary a lot with these settings between different songs/genres.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: LigH on 2013-02-14 11:32:18
Opus has that too. It just calculates a quality factor from the given target bitrate, based on statistics. Therefore the resulting bitrate may vary depending on the audio source, Opus will not try to approximate the given target bitrate in true VBR mode.

As far as I read from earlier posts.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: softrunner on 2013-02-14 18:33:20
I think every encoder with real vbr (not abr) does that? Lame has V(0-9), QT AAC has --tvbr (0-127), Vorbis has -q((-2)-10). The bitrate may vary a lot with these settings between different songs/genres.

Opus has that too. It just calculates a quality factor from the given target bitrate, based on statistics. Therefore the resulting bitrate may vary depending on the audio source, Opus will not try to approximate the given target bitrate in true VBR mode.

Well, if you mix audiobook and complex electronic music in one file, then which bitrate will you use for this file? Opus 64 kbps will give good quality for that part, which contains audiobook, but the quality of musical part will be very low. And 176 kbps will give good quality for music, but that bitrate will be too excessive for audiobook. And I would like to have encoder, which takes from me "good quality" as input option, and gives ~64 kbps for audiobook part of the file and ~176 kbps for musical part. None of modern audio encoders can to this.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: db1989 on 2013-02-14 19:03:39
Quote
Opus 1.1 Alpha has some bugs[…] [pictures of spectrograms]
Right. But how does it sound? Not that I expect transparency at 32 kbps, but visual images are of scant relevance to audio.

It is a question, why nobody has invented something similar for audio encoding (except lossyWAV, which needs too much bitrate for acceptable quality)?
Um, what. Plenty of people have done this, for decades.

I would like to have encoder, which takes from me "good quality" as input option, and gives ~64 kbps for audiobook part of the file and ~176 kbps for musical part. None of modern audio encoders can to this.
I think you must be doing something wrong. Any competent modern encoder should do exactly this when configured to use their VBR mode.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: DonP on 2013-02-14 20:01:39
I would like to have encoder, which takes from me "good quality" as input option, and gives ~64 kbps for audiobook part of the file and ~176 kbps for musical part. None of modern audio encoders can to this.
I think you must be doing something wrong. Any competent modern encoder should do exactly this when configured to use their VBR mode.


I guess that rules out Opus babyeater (25 and 64 kb/s), lame (v5), and vorbis aoTuV (q1).  I just encoded a set of 3 tracks: one voice introduction and 2 music.  IN all cases the plain speech encoded at the highest bitrate. 

Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: db1989 on 2013-02-14 20:06:47
Hmmm, maybe I was assuming wrongly… in which case I apologise, but it wasn’t an illogical assumption, I don’t think?

Does that affect only lower-quality settings?

As I don’t want to imply that these encoders aren’t competent, I have to presume there’s a reason for this, and I’d be interested to know what it is. Maybe I’m overlooking something really obvious.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: saratoga on 2013-02-14 20:22:22
VBR gives you constant quality. 

Having a file with two different quality levels is not what VBR is meant to do.  I think to have that work you'd have to have some kind of filter or processing that attempted to classify the signal as audio or music and then adjusted the encoder's parameters from frame to frame.  Its probably not too hard to do, but its also a very strange thing to want to implement so maybe no one has done so.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: DonP on 2013-02-14 20:24:45
Does that affect only lower-quality settings?

As I don’t want to imply that these encoders aren’t competent, I have to presume there’s a reason for this, and I’d be interested to know what it is. Maybe I’m overlooking something really obvious.


I tried again with opus bitrate=170.  This time the speech track was between the 2 music tracks at 191.  I can't say for sure why, except maybe we judge speech on how clear it is rather than whether it is ABX'able.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: db1989 on 2013-02-14 20:36:48
Right, I see now that DonP didn’t encode a single track with three parts, but three separate tracks. That’s not what softrunner was asking about, then. But DonP’s results and possible explanation still have a good degree of relevance.

My previous posts were written under the assumption that we were talking about large regions of broadly differing complexity/amplitude/whatever in the same file encoded at a single quality, such as speech and music. Barring some other aspect of speech that makes the encoder think it’s similarly complex to music, I would have expected a sizeable difference in bitrate between the two sections.

VBR gives you constant quality. 

Having a file with two different quality levels is not what VBR is meant to do.  I think to have that work you'd have to have some kind of filter or processing that attempted to classify the signal as audio or music and then adjusted the encoder's parameters from frame to frame.  Its probably not too hard to do, but its also a very strange thing to want to implement so maybe no one has done so.
As above, what I think softrunner was asking about, and certainly what I was talking about, was a single file with two different parts and the possibility for an encoder to provide significantly differing bitrates if the two parts differ in complexity. Again, perhaps my presumption that there should be a big difference was incorrect. I have no experience with the specific scenario and vanishingly small experience with encoded speech.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: DonP on 2013-02-14 22:55:41
Right, I see now that DonP didn’t encode a single track with three parts, but three separate tracks. That’s not what softrunner was asking about, then. But DonP’s results and possible explanation still have a good degree of relevance.


I went with speech and music in separate tracks (from the same CD) because (1) it would be easier to find the rate for each part, and (2) I figured the bit rate is only based on a small window around the current instant (correct?) so it wouldn't matter that the other type of content is in a different file as long as the quality parameter is unchanged.

Yes, the poser of the question had mixed content in one file.


Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: jensend on 2013-02-14 23:15:37
Mods, could we get a threadsplit for these quality level posts? I really think this deserves its own thread.

If you have a mixed-content file, then for an encoder to do a good job of targeting a bitrate for the whole file while providing "constant quality" it would have to do a two-pass type deal. It has no other way of knowing, when you ask for 64kbps, if e.g. half is effectively-mono speech (both channels identical) and half is stereo music and thus it can target 32kbps for the speech, 96kbps for the music, and give you basically the bitrate you asked for.

For almost all purposes, it'd be better to let the user specify the quality level. With a single user-specified quality level, a file that was all speech could come out as ~32kbps, a file that was all stereo music could come out as ~96kbps, and the mixed-content file above could come out as ~64kbps, with constant quality and without having to do two passes.

The "VBR bitrate setting=quality level" idea we've heard so many times says an ideal VBR encoder is supposed to encode things at a constant quality level which averages out to the target bitrate across some generic ideal reference collection. But it really makes no sense to try to say how much of an ideal reference collection is mono speech. In the opus-tools suggestion thread, NullC mentioned the "bitrate setting is for fullband stereo equivalent quality" idea i.e. considering the ideal reference collection to consist entirely of FB stereo music. As he said there, the downside is that someone encoding just mono speech ends up with their files encoded at ~1/3 of their target rate. If you shift the balance of the ideal collection you ameliorate that but give those encoding music some of the opposite problem. Multichannel users have to guess at how their bitrate translates to a stereo equivalent bitrate too.

The user specified target bitrate thus becomes sufficiently unhinged from the end result's bitrate and quality that it would no longer make sense to tell people it's a target bitrate; instead you just call it a quality mode and provide some kind of table of what range of result bitrates to expect given channel count, bandwidth, and speech vs. music.

Even if your content is not mixed but is in separate files, having such a quality setting would enable people to encode mixed collections of files- whether just tracks of the same CD (changing quality settings for different tracks when ripping=ugh) or their entire audio collection- with a single setting without worrying that they're either bloating the speech files or starving the music.

VBR gives you constant quality. Having a file with two different quality levels is not what VBR is meant to do.
But if you have a file with 64kbps effectively-mono speech and 64kbps stereo music, those are two vastly different quality levels. Being adaptive and constant-quality rather than constant-bitrate most definitely is, as you admit, what VBR is meant to do.
I think to have that work you'd have to have some kind of filter or processing that attempted to classify the signal as audio or music and then adjusted the encoder's parameters from frame to frame.  Its probably not too hard to do, but its also a very strange thing to want to implement so maybe no one has done so.
The Opus encoder already tries, from frame to frame, to classify the audio as speech or music, determine its bandwidth, and determine channel separation. Wanting this analysis to show up in giving lower bitrates for speech and higher bitrates for music is not very strange or even slightly strange.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: db1989 on 2013-02-14 23:26:06
If you have a mixed-content file, then for an encoder to do a good job of targeting a bitrate for the whole file while providing "constant quality" it would have to do a two-pass type deal. It has no other way of knowing, when you ask for 64kbps, if e.g. half is effectively-mono speech (both channels identical) and half is stereo music and thus it can target 32kbps for the speech, 96kbps for the music, and give you basically the bitrate you asked for.

The "VBR bitrate setting=quality level" idea we've heard so many times says an ideal VBR encoder is supposed to encode things at a constant quality level which averages out to the target bitrate across some generic ideal reference collection

At least for me, I was specifically not referring to any quality setting that targets a bitrate: I was thinking about settings that target a given quality (level of noise, whatever) without any obligation to meet a particular bitrate and therefore might be predicted to allocate different bitrates based upon material.

If I fed separate speech and loud music files to a single encoder using ‘pure’ VBR and no target bitrate, I would expect the music to come out at a higher bitrate. Extending that reasoning, I would assume to expect the same thing if the two were placed adjacently in one file. I don’t see how two passes would be necessary if no bitrate is being targeted.

Sure, it’s nice to have some vague idea of what sort of bitrate to expect from a giving setting, but IMHO, a proper VBR setting should just work with psychoacoustics and not worry about bitrate.

I think this may be exactly the point you’re making, so I don’t mean to sound like I’m repeating or contradicting you; this is just to clarify what I was trying to convey in my posts.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: jensend on 2013-02-14 23:37:37
I think this may be exactly the point you’re making
*nod*

I edited my post, adding a paragraph between the two parts you quoted, to try to make that a little more clear, but your reply came faster than my edit
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: softrunner on 2013-02-16 01:28:25
I think to have that work you'd have to have some kind of filter or processing that attempted to classify the signal as audio or music and then adjusted the encoder's parameters from frame to frame.  Its probably not too hard to do, but its also a very strange thing to want to implement so maybe no one has done so.

Yes, something like this, and it is the most desired thing to have, because it frees user from checking every separate file (set of files) in affort to define which bitrate to use. Like for x264 video encoder, he says "I want rate factor 21.00" and he gets it - the same quality for every input file, independently on what content is there - just a voice, simple piano music, rock/pop music, electronic music or something else.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: db1989 on 2013-02-16 02:45:10
Look, I don’t know whether you’ve actually read any of the other posts in this thread. Existing codecs with true VBR modes already target quality without caring about meeting a specific bitrate. You tell the encoder to use a level of quality that is defined by a number on a sliding scale, and then it can vary the bitrate as much as it wants based upon the signal that you supply to it. Have you done any real testing of this?

DonP had some samples of speech that did not encode to a much lower bitrate than samples of music, but this in no way proves that quality-based algorithms are unique to H.264. Please, try some more tests. Read some documentation on encoders such as LAME, oggenc, VBR AAC, etc. Then see whether you can continue to claim that no audio encoder offers a mode that targets only quality without worrying about bitrate.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: Nessuno on 2013-02-16 10:02:06
Just to add up my two pence to an already worked out question by other respectable fellows: my music collection, about 15000 tracks encoded with QT AAC at a target quality of 110, ranges from 74kbps (Edwin Fischer's Bach WTC, mono piano from early thirties) to 314kbps (Henk Van Twiller's transcription for solo saxophone of Bach's cello sonatas: BTW this's quite surprising to me!). The average bitrate is around 256kbps (exactly reached by about 300 tracks). About 5000 tracks are < 240kbps, and about 2500 > 270kbps.
Of course we are speaking of VBR, so those values for a single track are still average bitrate, as shown in iTunes column browser.

This poor man's statistical analysis demonstrates first of how targeting quality the encoder doesn't care the less about bitrate, then that to reach quality 110 the Apple AAC encoder uses on average about 256kpbs, but that if I had chosen to target bitrate instead of quality, say 256kbps CBR instead of quality 110 which is a thing some people consider practically equivalent, it would have been overkill (largely sometime!) for the first tracks and inadequate for the seconds.

On second thought: if I'm not wrong, iTunes store sells everything at 256 CBR. Maybe this is "transparent enough" for everything, but defeats the proven efficiency of their own very good encoder.

On third thought: one of these days I'm going to try to encode those two extremes at 256 CBR and try to ABX them from both the q110 and the lossless ones (though I'm rather sure I will fail all of the times... ).
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: db1989 on 2013-02-16 16:27:40
Thanks for providing data!

if I'm not wrong, iTunes store sells everything at 256 CBR. Maybe this is "transparent enough" for everything, but defeats the proven efficiency of their own very good encoder.
A previous analysis (http://hydrogenaudio.org/forums/?showtopic=69716) of files from the iTunes Store suggests that iTunes Plus is ‘constrained VBR’, which I interpret to mean ABR:
iTunes' standard setting is identical to Quicktime's ABR setting at medium encoding quality.
iTunes' VBR setting is identical to Quicktime's VBR constrained setting at medium encoding quality.
iTunes Plus is identical to Quicktime's VBR constrained 256kbit/s setting at maximum encoding quality.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: Jplus on 2013-02-16 17:31:40
A previous analysis (http://hydrogenaudio.org/forums/?showtopic=69716) of files from the iTunes Store suggests that iTunes Plus is ‘constrained VBR’, which I interpret to mean ABR:

Actually "ABR" and "Constrained VBR" are two separate modes in Apple AAC. I don't know why there are two constrained VBR modes but the ABR mode is more constrained (closer to CBR) than the CVBR mode.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: saratoga on 2013-02-16 19:08:10
I think to have that work you'd have to have some kind of filter or processing that attempted to classify the signal as audio or music and then adjusted the encoder's parameters from frame to frame.  Its probably not too hard to do, but its also a very strange thing to want to implement so maybe no one has done so.

Yes, something like this, and it is the most desired thing to have, because it frees user from checking every separate file (set of files) in affort to define which bitrate to use. Like for x264 video encoder, he says "I want rate factor 21.00" and he gets it - the same quality for every input file, independently on what content is there - just a voice, simple piano music, rock/pop music, electronic music or something else.


Did you read any of what you just agreed with? I don't think so. what you are asking for is vbr. Every modern codec has it.

That's not what I was describing though.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: jensend on 2013-02-16 21:18:06
In repeatedly saying that every codec does what he's talking about and deriding him for asking about it, you seem to be misunderstanding or misconstruing what he's saying and you seem uncivil. English isn't his first language (Russian is, apparently), there have been some communication failures, and he's got some misconceptions that aren't central to what he's saying. Rather than pouncing on him for his misconceptions or his mode of expression and ignoring his main point, we should either have the patience and charity to try to understand and respond to the main thrust of what he's saying, or we should leave it alone.

Yes, VBR encoders generally attempt to do constant quality, they vary their bitrate substantially, especially at higher quality targets, and we have no particular reason to believe that their rate allocation for e.g. different genres of music deviates dramatically from the constant-quality ideal.

But speech is considerably easier to code than music, especially for a codec like Opus which has LP capabilities, and if you can show me a single encoder that dramatically scales back its bitrate when presented with speech, especially in a mixed-content file, I will be quite surprised. Just as an example, I ran some mono samples, both music, speech, and mixed-content, through LAME -V6, oggenc -q 1, and opusenc --bitrate 60. In all cases the speech content was given a bitrate within 1kbps of the average music bitrate. In that respect, these encoders' rate allocation is more like what one would expect of ABR than ideal constant-quality VBR. I'm fairly certain that Vorbis and LAME can both achieve speech quality equivalent to their 60kbps music quality at below 48kbps and that opusenc can do considerably better still.

As NullC said in the opus-tools thread (http://www.hydrogenaudio.org/forums/index.php?showtopic=99033&view=findpost&p=824381), dramatic bitrate changes between speech and music is an idea worth trying. He warns that the speech / music classification in opus isn't yet accurate enough to avoid some audible problems*. But it certainly wouldn't have to be perfect to improve on what we have now, esp. for content that's mostly speech but has occasional music and sound effects.

*There's been some indication that in the future there may be significant relatively-low-hanging fruit in improving non-realtime use of Opus by using greater lookahead, esp. to improve the accuracy of all the kinds of additional analysis the master branch is doing. But for now the devs are continuing their focus on real-time use.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: saratoga on 2013-02-17 00:28:55
In repeatedly saying that every codec does what he's talking about and deriding him for asking about it, you seem to be misunderstanding or misconstruing what he's saying and you seem uncivil. English isn't his first language (Russian is, apparently), there have been some communication failures, and he's got some misconceptions that aren't central to what he's saying. Rather than pouncing on him for his misconceptions or his mode of expression and ignoring his main point, we should either have the patience and charity to try to understand and respond to the main thrust of what he's saying, or we should leave it alone.


If he doesn't understand, he should ask rather then just ignore. 

If you understand his misconception, then you should help him to understand, rather then complain that no one else is doing what you also do not do.

But speech is considerably easier to code than music, especially for a codec like Opus which has LP capabilities, and if you can show me a single encoder that dramatically scales back its bitrate when presented with speech, especially in a mixed-content file, I will be quite surprised.


I'm curious what your source is for this statement?  The fact that codecs do not decrease the bitrate as much as you expect them to suggests that transparent speech is harder to encode then you think.

Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: IgorC on 2013-02-17 01:22:02
Speech isn't that easy to code. http://research.nokia.com/files/public/%5B..._Opus_Codec.pdf (http://research.nokia.com/files/public/%5B16%5D_InterSpeech2011_Voice_Quality_Characterization_of_IETF_Opus_Codec.pdf)

Opus uses hybrid mode only at very low bitrates. Speech requires comparable bitrate as for music  for (near) transparent or high quality . There is no such thing as smart encoder that does"64 kbps for speech and 128 kbps for music".
That's enough to say that Opus 1.1 alpha (--bitrate 64) produces bitrates  considerably >64 kbps on speech. It doesn't go anyhow lower.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: db1989 on 2013-02-17 02:15:00
Speech isn't that easy to code. http://research.nokia.com/files/public/%5B..._Opus_Codec.pdf (http://research.nokia.com/files/public/%5B16%5D_InterSpeech2011_Voice_Quality_Characterization_of_IETF_Opus_Codec.pdf) […] Speech requires comparable bitrate as for music  for (near) transparent or high quality .
Thanks for this! It supports earlier suppositions that bitrates for speech that are similar to music point to speech being more complex than we estimate, not to any failing in VBR modes.

I guess we’re conditioned to think of speech as requiring low bitrates, when in fact it’s often just a case of people forcing low bitrates due to constraints upon bandwidth or capacity, or even just habit. I can appreciate that actually encoding speech at a level that matches music may be more of a challenge than is assumed. That was the case for me, anyway.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: jensend on 2013-02-17 05:30:28
The fact that codecs do not decrease the bitrate as much as you expect them to suggests that transparent speech is harder to encode then you think.

Speech isn't that easy to code. Opus uses hybrid mode only at very low bitrates. Speech requires comparable bitrate as for music  for (near) transparent or high quality .

If you think of audio quality only in terms of the binary transparent vs non-transparent distinction, you divorce yourselves from the realities of non-trained everyday listening by the non-golden-eared public, you exclude a whole host of uses for which people would prefer not to pay the extra bitrate costs for very rapidly diminishing quality returns, and you will be forever chasing corner cases and ephemeral differences.

24-32kbps Opus speech is quite good. While trained listeners can frequently distinguish 32kbps hybrid mode mono Opus speech from the original in careful repeated listening in controlled environments, both the Google and Nokia 2011 Opus listening tests showed that in MOS, MUSHRA, or ABC/HR testing, people rate 32kbps Opus practically on a level with the originals. It's true that those tests showed 32kbps had nonoverlapping error bars with the originals, but that's not true for 40kbps, and remember that's with a two-year-old Opus encoder and there's been plenty of improvement since then. If for mono speech current 32kbps Opus doesn't qualify as "high quality" then I don't care in the slightest what high quality is.

On the other hand, while low-bitrate Opus doesn't totally mangle music like most speech codecs, the difference is considerably more clear. I don't see any large-scale mono music tests readily available to back up my personal listening tests and observations, and if there were such their test setup wouldn't be designed for making cross-sample quality comparisons with speech. But if you look at the Google tests you'll see that subjects rated 64kbps stereo music to have a much much greater quality difference from the original than 32kbps mono speech, even though you'd expect channel coupling to have a very major benefit.

Though the difference is less dramatic in other codecs which don't have speech-oriented technologies, it's still there. Part of this is because a lowpass that butchers the sound of many music samples will not have objectionable - or, often, readily-noticed - effects on speech. (The bitrate->lowpass cutoff maps in LAME and Vorbis were designed for music content - in fact, the one in LAME isn't even well tuned for mono, basically just naively scaling the target bitrate by the arbitrary factor of 3/2 before plugging it into a table which is tuned for stereo - and overriding the lowpass can enable them to do considerably better with speech at <56kbps bitrates.) There are many other factors.

On top of that, recorded music is more likely to have important stereo separation, while for speech we're generally listening to a single source at a time and so most recorded material is either mono, "stereo" with both channels practically identical (e.g. identical except for dithering), or easily representable by intensity stereo. Any decent VBR encoder will manage to reduce its bitrate substantially when stereo separation is practically nil but if such content were in a separate file you'd be well-advised to explicitly tell it to downmix, saving a little bitrate and avoiding the possibility of some nonoptimal encoder decisions. Opus has the fairly unique capacity to switch to a true mono mode and back within the same stream, but opusenc doesn't use it, and at low bitrates it doesn't seem to reduce its bitrate as much for such content as one might anticipate.

Thanks for this! It supports earlier suppositions that bitrates for speech that are similar to music point to speech being more complex than we estimate, not to any failing in VBR modes.
It does no such thing. It has no tests relating to music quality, and most definitely no tests where people were asked to directly compare the quality of encoded speech samples to that of encoded music. It tells us that 40kbps hybrid Opus with a two-year-old still-under-heavy-development encoder was statistically tied with the fullband original speech.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: Nessuno on 2013-02-17 17:18:15
The fact that codecs do not decrease the bitrate as much as you expect them to suggests that transparent speech is harder to encode then you think.

Speech isn't that easy to code. Opus uses hybrid mode only at very low bitrates. Speech requires comparable bitrate as for music  for (near) transparent or high quality .

If you think of audio quality only in terms of the binary transparent vs non-transparent distinction, you divorce yourselves from the realities of non-trained everyday listening by the non-golden-eared public, you exclude a whole host of uses for which people would prefer not to pay the extra bitrate costs for very rapidly diminishing quality returns, and you will be forever chasing corner cases and ephemeral differences.

I do think you are simply mistaking transparency for intelligibility: the fact that you can perfectly understand someone talking on the phone doesn't mean the phone is transparent to speech, not the way this term is used in perceptual codec evaluation, at least.

BTW: have you ever tried to understand someone speaking a foreign language on a slightly noisy line?
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: IgorC on 2013-02-17 19:01:14
The emphasis is mine.
But if you look at the Google tests you'll see that subjects rated 64kbps stereo music to have a much much greater quality difference from the original than 32kbps mono speech, even though you'd expect channel coupling to have a very major benefit.

First, those are completely different Google's tests. http://www.opus-codec.org/comparison/GoogleTest1.pdf (http://www.opus-codec.org/comparison/GoogleTest1.pdf)

32 kbps speech test - 17 listeners.
64 kbps test - 9 listeners. No wonder they got considerably less participants for higher bitrate test.

Second, it's not "much much" greater quality at all.
MUSHRA scores:
32 kbps speech (mono) - Opus - 97.2
64 kbps music  - Opus - 90.7
64 kbps music - LC-AAC - 90.7 (oh, please!)

Once MUSHRA score is >90 (>4.5 in our world) all cats are lions.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: jensend on 2013-02-17 23:10:31
I do think you are simply mistaking transparency for intelligibility: the fact that you can perfectly understand someone talking on the phone doesn't mean the phone is transparent to speech, not the way this term is used in perceptual codec evaluation, at least.
You must not have understood a single thing I was saying. I'm perfectly aware that codecs with vastly better than telephone quality may still not be transparent. Try reading again.
First, those are completely different Google's tests.
What's your point here? I already said the tests are distinct and gave a disclaimer about the limits of comparability. Yes, I didn't spend a bunch of time and money to set up a professional-quality large-scale direct comparison. I'll happily do so once you wire me $10K. (Since no test protocol can make cross-sample quality comparisons blind, the usefulness of my or any single individual's listening tests and preferences is sharply limited; an aggregate test of normal people neutral to this debate would be needed.) In the meantime this data does support my point even though it's not a rigorous proof.
Quote
Second, it's not "much much" greater quality at all.
I never said much greater quality, I said much greater quality difference. Listeners gave the 64kbps stereo music a score 9/100 points lower than the reference. That's a much larger difference than giving the 32kbps mono speech a score 2/100 points lower than the reference. That's despite having the advantage that, thanks to channel coupling, coding these normal stereo samples at 64kbps is considerably easier than coding a mono version at 32kbps would have been. This indicates that 32kbps mono music would likely be rated well below 32kbps mono speech.

Some of you seem to be saying "maybe it's just that the speech is equally degraded but people don't find that as unacceptable as they do for music." Since people's preferences are what define quality, this makes zero sense. A VBR encoder that encodes speech at the same bitrate as music when listeners find the degradation of music at that bitrate to be annoying but would not be annoyed with speech at a marginally lower bitrate is simply not managing to maintain constant quality.

Some of you are saying "well, since the VBR encoders don't drop the bitrate for speech and the VBR encoders are absolute perfection handed down to us from Olympus by the gods, obviously speech is hard to code. The rate allocation scheme of LAME is perfect, enlightening the eyes; the bitrate->lowpass map of LAME is true and righteous altogether. Holy, holy, holy. To say otherwise is blasphemous." The authors of LAME and Vorbis, mere mortal men like ourselves, would happily tell you that their encoders' decisions are not tuned for mono speech and that their encoders have no capability to detect speech and adjust their decisions accordingly. The Vorbis devs straight up tell you in the FAQ that even though it's decent for speech they've given speech little thought and you should consider other codecs. Long ago the LAME devs added a --speech option which uses a low bitrate, forces ABR since their normal bitrate allocation is suboptimal, and forces a lower lowpass than normal (any of that sound familiar from what I've been saying?).
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: IgorC on 2013-02-18 04:27:35
jensend,

Shortly, an interpolation between two different listening tests with different conditions (as this case) isn't just not precise enough but completely wrong. We had been through this many times.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: Nessuno on 2013-02-18 07:14:00
I do think you are simply mistaking transparency for intelligibility: the fact that you can perfectly understand someone talking on the phone doesn't mean the phone is transparent to speech, not the way this term is used in perceptual codec evaluation, at least.
You must not have understood a single thing I was saying. I'm perfectly aware that codecs with vastly better than telephone quality may still not be transparent. Try reading again.

Yes, it's clearly me not understanding. But since the figures I gave in my previous post (which you did fully read, right?) clearly demonstrate that an audio encoder targeting quality does't care about bitrate, as per this thread's subject, could you please help me understand any better which your point in this discussion exactly is? Are you still speaking of signal quality evaluation or something completely different and uncorrelated, like subjective speech recognition (which in my opinion, is something completely out of the realm of this forum)?
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: softrunner on 2013-02-18 12:21:51
Hooh..., finally I've read the whole thread... And what can I say? The most part of misunderstandings in this thread turns around only one word, and this word is "quality" (or so called "quality level"). Can you tell me, what is quality when we are talking about sound? I can tell you: quality here means audibility i.e. closeness of the encoded sound to the source sound. It is not some virtual abstract quality, calculated by encoder. It is something, that is recognized by human ears as "unaudible difference", "very close to source", "audible, but good quality of sound", "audible, and acceptable quality, not annoying" and "unacceptable quality, annoying sound". Do you understand? I am talking about real audio listening, not about some abstract quality in terms of encoder.
And all audio encoders produce different quality/closeness to source for different sources at the same so called "quality preset". Does not matter, whether it is one source file with mixed content or set of files. Suppose, we have 1000 files, 500 of which are some audiobooks, and 500 - music like Fighter Beat by The Prodigy (very complex), and all this files are randomly mixed in one folder. Which "quality preset" would you use for encoding all this stuff to get REAL high quality on output without using excessive bitrate? You really cannot answer. Audiobooks would need ~64kbps vbr quality preset, and music like ~208kbps vbr quality preset (if it is Opus), in both cases output will be close to source. And you have to listen every separate file and only then you can get approximate understanding (of course, if you are experienced enough) which preset to use for every separate file. This listening is a HUGE pain for user (if he is really interested in high ratio of quality/file size).
I am a user, and I want to have a tool intelligent enough to do this work for me. Nobody has invented such a tool (let's call it encoder) so far. Also, this tool should operate not audio files but audio frames, because one file can contain frames of various complexity for encoding to be close to source at output. Check my samples in Uploads part. They are basically small looped pieces of another samples, because usually you can here the difference only at some part of it, where bitrate, provided by encoder, is not high enough to be unaudible; or we can say, that other parts do not need such a high bitrate they have, because they will be still unaudible for user at lower bitrate.

p.s. I usually post rarely. If I do not reply fast, it's normal.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: ggf31416 on 2013-02-18 15:09:12
Hooh..., finally I've read the whole thread... And what can I say? The most part of misunderstandings in this thread turns around only one word, and this word is "quality" (or so called "quality level"). Can you tell me, what is quality when we are talking about sound? I can tell you: quality here means audibility i.e. closeness of the encoded sound to the source sound. It is not some virtual abstract quality, calculated by encoder. It is something, that is recognized by human ears as "unaudible difference", "very close to source", "audible, but good quality of sound", "audible, and acceptable quality, not annoying" and "unacceptable quality, annoying sound". Do you understand? I am talking about real audio listening, not about some abstract quality in terms of encoder.....


All audio codecs at VBR try to reach some level of audible quality as perceived by humans, however
1) There isn't a good metric that takes the input and output and gives a result that is well correlated with perceived quality as we don't have a good model of how humans perceive quality. Video compression has good metrics like SSIM but even they are far from perfect.
2) Some codecs may be too conservative. If the psycho-model predicts that a part needs say 32 kpbs to reach the target quality when it should take 96 for average audio it may mean that part only need 32, in that case the target quality is achieved, or it may mean that the psycho-model underestimated the bitrate and that will result in audible artifacts. So if the developers don't trust their psycho-model they may give easy parts a higher bitrate than needed (better be safe than sorry).
3) Unlike Opus, not all formats are optimized to encode audio at low bitrates.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: db1989 on 2013-02-18 17:53:18
The most part of misunderstandings in this thread turns around only one word, and this word is "quality" (or so called "quality level"). Can you tell me, what is quality when we are talking about sound? I can tell you: quality here means audibility i.e. closeness of the encoded sound to the source sound. It is not some virtual abstract quality, calculated by encoder. It is something, that is recognized by human ears as "unaudible difference", "very close to source", "audible, but good quality of sound", "audible, and acceptable quality, not annoying" and "unacceptable quality, annoying sound". Do you understand? I am talking about real audio listening, not about some abstract quality in terms of encoder.
That’s strange, seeing as you wrote this:
x264 video encoder has encoding mode called Constant Rate Factor. In this mode number (16, 17, etc) is used to define desired quality (lower - better quality and higher bitrate), and encoder does not care about bitrate, only about keeping rate factor constant. It is a question, why nobody has invented something similar for audio encoding (except lossyWAV, which needs too much bitrate for acceptable quality)?
I think every encoder with real vbr (not abr) does that? Lame has V(0-9), QT AAC has --tvbr (0-127), Vorbis has -q((-2)-10). The bitrate may vary a lot with these settings between different songs/genres.

Either you still don’t understand what people have been saying, or you expect perfection from perceptual encoders. Neither will make this thread useful.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: splice on 2013-02-18 20:08:22
Well, if you mix audiobook and complex electronic music in one file, then which bitrate will you use for this file? Opus 64 kbps will give good quality for that part, which contains audiobook, but the quality of musical part will be very low. And 176 kbps will give good quality for music, but that bitrate will be too excessive for audiobook. And I would like to have encoder, which takes from me "good quality" as input option, and gives ~64 kbps for audiobook part of the file and ~176 kbps for musical part. None of modern audio encoders can to this.


Your problem hinges on the point that we usually accept a lower level of quality for the spoken word than we do for music. Likewise, we can accept poor quality printing of text, so long as it is legible, but we prefer images to be high quality.

You have two choices - either develop (or have developed for you) an encoder that recognises sections containing the spoken word and adopts a different quality metric for them, or do it manually - encode the spoken words separately from the music and join (edit) the sections together afterwards. I assume that you already generate the speech and music separately and join then afterwards, so it should not be too much of a change in workflow. I know you can do this with MP3. I don't know about Opus.

Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: jensend on 2013-02-18 22:34:01
Your problem hinges on the point that we usually accept a lower level of quality for the spoken word than we do for music. Likewise, we can accept poor quality printing of text, so long as it is legible, but we prefer images to be high quality.
*sigh* No. Quality!=PSNR. Long ago, a wise man once said,
Some of you seem to be saying "maybe it's just that the speech is equally degraded but people don't find that as unacceptable as they do for music." Since people's preferences are what define quality, this makes zero sense. A VBR encoder that encodes speech at the same bitrate as music when listeners find the degradation of music at that bitrate to be annoying but would not be annoyed with speech at a marginally lower bitrate is simply not managing to maintain constant quality.


Nessuno: no, this isn't about recognizability either. (Recognizability could be considered for music too- e.g. "regardless of how awful it sounds, I can tell- just barely- this is Beethoven's 9th.") It's about quality. This can't be reduced to a binary distinction, but if you must have a binary distinction to start with and you want something more descriptive than "good vs bad" perhaps the best one is "annoying vs not annoying" (c.f. MUSHRA, ABC/HR).  It may be distinguishably different under ideal conditions- so what? Is it any worse, or would you be perfectly fine with listening to this instead of that? On the opposite end of the quality spectrum, it may be recognizable- so what? Is it any good, or would you tear your hair out if you had to listen to it for any substantial amount of time?

Softrunner: It appears you were substantially more confused than I thought you were. Others esp. ggf31416 and db1989 are doing a good job of explaining why.

IgorC: The test was done with the same low anchors and quite likely a subset of the same listeners using the same equipment. You think that the differences significantly biased the results in one coherent direction? Whatever. My opinion is of course not based on these tests but on my own 12-64kbps listening comparisons. Feel free to try your own. Of course, as I already said, no test protocol can make cross-sample quality comparisons blind, so whenever you can ABX the speech  there's nothing preventing you from saying "gee, I'm going to rate this a 2 and the encoded music a 4.9, just 'cause I wanna show jensend is wrong."

Please note that despite what it may look like from the pile-on in this thread, my view appears to be in the majority. Just about everybody recommends bitrates for speech they would not recommend for mono music (or recommend bitrates for speech less than half what they recommend for stereo music despite the savings of channel coupling). This is not because they expect people to just put up with being more annoyed.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: Nessuno on 2013-02-19 10:36:35
Nessuno: no, this isn't about recognizability either. (Recognizability could be considered for music too- e.g. "regardless of how awful it sounds, I can tell- just barely- this is Beethoven's 9th.") It's about quality. This can't be reduced to a binary distinction, but if you must have a binary distinction to start with and you want something more descriptive than "good vs bad" perhaps the best one is "annoying vs not annoying" (c.f. MUSHRA, ABC/HR).  It may be distinguishably different under ideal conditions- so what? Is it any worse, or would you be perfectly fine with listening to this instead of that? On the opposite end of the quality spectrum, it may be recognizable- so what? Is it any good, or would you tear your hair out if you had to listen to it for any substantial amount of time?

Do you know what the quality parameter in all true VBR modes of every modern encoder stands for? Do you know that, for example, AAC accepts 128 different quality level values in VBR mode? Do you know that this quality parameter is a dimensionless number and in fact is a (guess what?!?) qualitative property of the desired output?

What misleads you is that you still think that you always set a desired bitrate (which is wrong, as I shown you before with numbers). In fact you still say:
Quote
Just about everybody recommends bitrates for speech they would not recommend for mono music (or recommend bitrates for speech less than half what they recommend for stereo music despite the savings of channel coupling).

This remark is completely out of context when you set a VBR mode.

In the end, what you want is an encoding mode which takes no parameter at all and select the right (for what?!?!?) output quality level by understanding that its input is speech or music or whatever in between, because it knows how much in each case you'll be more or less annoyed by artifacts.

So it must be smart enough to understand that a musical piece (target: transparency) could contain a speech segment (opera anyone?), that a speech (target: not annoying) could contain background music, that even if the input is music, this time the user would accept a lower quality output (target: a few artifact above audible threshold) because he's planning to use it for listening on the go, that even if the input is speech, the user would like to have a higher quality output (target: better than just enough, but not that much anyway) because it is a lecture in a foreign language and the speaker has also a strong regional accent, so harder to comprehend...

All of the above can be easily accomplished just selecting a VBR mode and an appropriate quality level between the ones that the specified encoder accepts. Then the encoder will choose the lower bitrate possible depending on that quality level, on its psycoacoustic model  and on the instantaneous properties of input signal. Only it's self evident that the desired quality level must be a user choice, not an encoder one!
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: zerowalker on 2013-02-19 12:17:38
I am pretty sure that Vorbis is Bitrate based though.

But i really would like this mode in Opus, as i always use it in x264, it´s easier to have a favorite Quality rate, then bitrate, as it can be such a big difference of bitrate needed to produce that Quality.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: Garf on 2013-02-19 16:37:24
I am pretty sure that Vorbis is Bitrate based though.


Vorbis is quality based. If you enable bitrate management, it basically shifts quality around to hit the bitrate you asked for. This is why it's slower with managed bitrate than with quality mode.

Quote
But i really would like this mode in Opus, as i always use it in x264, it´s easier to have a favorite Quality rate, then bitrate, as it can be such a big difference of bitrate needed to produce that Quality.


A quality mode == Bitrate mode that reaches an average bitrate over a large copus of music.

Opus has an average bitrate mode that is simultaneously a pure quality based mode. This is not a contradiction. Encoding music at a given quality with a given codec will eventually even out to a certain bitrate. The average bitrate. You can map that back to the quality.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: Silversight on 2013-02-19 16:50:21
I am pretty sure that Vorbis is Bitrate based though.

I am pretty sure it is not. Vorbis can be forced to work in ABR or faux-CBR mode, but the standard -qx levels are VBR. The "nominal bitrate" written to the Vorbis header is the statistical average bitrate achieved with the given set of internal parameters, but it does not have to have a specific correlation to the actual file content. I have files encoded with -q7 that have the bitrate_nominal field set to 224 while the actual average bitrate is ~190 kbit/s. Like with all VBR codecs, it depends on the material.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: jensend on 2013-02-19 16:53:00
Nessuno, if I wasn't aware that VBR encoders try to do constant-quality, we wouldn't be having this discussion. Yes, they vary their bitrate for different kinds of input. That doesn't mean their rate allocation scheme is perfect.
What misleads you is that you still think that you always set a desired bitrate (which is wrong, as I shown you before with numbers). In fact you still say:
Quote
Just about everybody recommends bitrates for speech they would not recommend for mono music (or recommend bitrates for speech less than half what they recommend for stereo music despite the savings of channel coupling).
This remark is completely out of context when you set a VBR mode.
No, it isn't "out of context" at all. A quality setting will not give consistent bitrate results for individual samples, but it will give a fairly consistent average bitrate over diverse collections of samples, as Garf just said. For instance, for LAME, people routinely recommend using -V4/V5 for stereo music, which generally average 165/130 kbps respectively on such input, while for speech they recommend either -V7/V8 (average something like 56/48kbps for mono respectively) or ABR modes (e.g. the --preset voice setting, which downmixes to mono, resamples to 32kHz, and uses 56kbps ABR).

I'm not asking for something which will target transparency for music and non-annoyance for speech. That would not be constant quality. Nor am I asking for the encoder to magically discern what quality level the user wants. I have no idea how you're coming up with these absurd straw men.

Quote
All of the above can be easily accomplished just selecting a VBR mode and an appropriate quality level
No, constant quality is only truly accomplished if the VBR encoder's rate allocation is flawless. I'm not asking for perfection, and in most cases the rate allocation in modern encoders does a pretty decent job. The distinction between speech and music is the one area I'm aware of where current encoders seem to deviate substantially from the constant-quality ideal and where therefore substantial improvement seems possible. For many VBR encoders, using the codec for speech is somewhat rare, mixed-content files where people can't just tweak the parameters to be better suited for speech is much rarer, and so the codec developers understandably have little interest in trying to solve the problem of speech/music classification and do additional tuning for speech. But this discussion started by talking about Opus, which is better optimized for speech and already has code to classify input as speech or music. People are interested in seeing that classification be used to improve bitrate allocation.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: Nessuno on 2013-02-19 21:35:29
jensend, let's stop using for one moment the terms "bitrate" and "quality". What you expect from an ideal encoder is that it automatically switch to mono and allow an higher amount of noise and distortion between input and output when it recognizes the input as speech, right?

BTW: how should it consider an opus like Glenn Gould's radio documentaries?
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: m0rbidini on 2013-02-20 02:46:44
I think encoders in true VBR mode could in theory detect "pure speech" sections (although this may not be very simple to define) and end up outputting lower average bitrates for those sections while maintaining a quality level that results in higher average bitrates for non speech sections, ie with both sections having the same average results in listening tests. That may be a future improvement.

If that's not so simple maybe it's just the case that devs prefer to err on the safe side.

Are there any lossy codec devs or researchers that can share any insight on this?
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: Garf on 2013-02-20 10:19:36
I think encoders in true VBR mode could in theory detect "pure speech" sections (although this may not be very simple to define) and end up outputting lower average bitrates for those sections while maintaining a quality level that results in higher average bitrates for non speech sections, ie with both sections having the same average results in listening tests. That may be a future improvement.

If that's not so simple maybe it's just the case that devs prefer to err on the safe side.

Are there any lossy codec devs or researchers that can share any insight on this?


That's basically correct, and Opus 1.1+ works that way. It doesn't only fiddle with the bitrate, it can change the underlying encoder mode even.
Title: Can audio encoders target quality w/o caring about bit rate/file size?
Post by: Nessuno on 2013-02-20 17:19:35
That's basically correct, and Opus 1.1+ works that way. It doesn't only fiddle with the bitrate, it can change the underlying encoder mode even.

I hope that when this feature will reach production stage, will be user selectable.