Vorbis 1.1RC1 artifacts [up to -q8]

Topic: Vorbis 1.1RC1 artifacts [up to -q8] (Read 22539 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #25 – 2004-09-17 17:48:25

Quote

You're correct you never said that, but it's very easy to think that you meant it from your first post when using words like "silly", "obvious", "borderline" and "worst case scenario".

Besides, to turn it around, nobody here has drawn any conclusions related to music performance (yet). But of course you're free to warn us about it...
[a href="index.php?act=findpost&pid=242431"][{POST_SNAPBACK}][/a]

It's quite simple. The problem may or may not affect real music. You can stare at these test results for a long time and you'll never be able to tell, because how a codec performs in an extreme situation won't tell you much about how it will perform normally.

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #26 – 2004-09-17 18:37:08

Quote

Quote

BTW: I think although these signals were generated artifficially they're as important as other natural test sound. Natural sound is nothing more than shaped noise and a bunch of sines...

Sebastian
[{POST_SNAPBACK}][/a]

Udial.wav is also nothing more than some noise and a bunch of sines.

This comment is very silly - it should be pretty obvious that this is an extreme, borderline case that may present a worst case scenario to a perceptual encoder.
[a href="index.php?act=findpost&pid=242342"][{POST_SNAPBACK}][/a]

By "these" I meant the white/pink noise samples. Care to give good reasons why do you think that these samples are inappropriate to test a psychoacoustic model ?

To be honest: I'm a bit sick of these statements like "oh, you're encoding artificial noise! But encoders are tuned to code music". I fact, pink/white noise is pretty natural - think of percussions (Cymbal, Snare) or the applaud sample. Or think of electronic music that makes use of noise generators....

Yes, Udial.wav is also nothing more than some noise and a bunch of sines. So what ? If your encoder fails to calculate appropriate scalefactors so we won't hear any differences (apart from clipping in the decoder that may occur due to "loud" samples) then you are in need of a more accurate psychoacoustic model.

Anyhow, I wasn't talking about udial, I was talking about pink/white noise being appropriate test samples in my opinion because I think they are not extreme borderline cases.

You think that was a very silly comment of mine ?
I'm willing to lern from you in case you've something smart/interesting to say, Garf. But I also have an aversion against ppl who assume to be smarter by default than someone they talk to and insult their intelligence.

So, you don't care about possible misbehaviours of your psychoacoustic model when it comes to those (=white/pink noise) test samples ?
Don't you tnink that those misbehaviours may affect other ("natural") test samples as well ? Shouldn't we thank Compact Dick for finding another test sample with weird encoder issues so the psychoacoustic model can be improved ?

I don't see why white noise should be generally a problem when it comes to perceptual coding. Shouldn't it sound ok at least at -q6 and above?!!.
Guys... we're talking about lossy coding - not lossless coding. Yes, white noise does not leave much room for decorrelation transforms to be successfull (no room at all since the samples are independent from each other and have constant variance) but Vorbis - as a VBR coder - should be able to compute an appropriate signal-to-mask ratio for quantization anyway. Otherwise it's an indication for a psychoacoustic model which needs to be improved.

Again: I don't consider these noise samples to be borderline cases.

BTW: Did you know ? There's a musical genre called noise.
Check out [a href="http://www.archive.org/audio/collection.php?collection=orgasmo_macabro]Archive.ORG page of the Orgasmo-Macabro label[/url]

Sebastian

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #27 – 2004-09-17 20:31:10

Quote

To be honest: I'm a bit sick of these statements like "oh, you're encoding artificial noise! But encoders are tuned to code music". I fact, pink/white noise is pretty natural - think of percussions (Cymbal, Snare) or the applaud sample. Or think of electronic music that makes use of noise generators....

Well, then test it with cymbals and snare, not white noise. There are many samples (some mentioned above). The most well-known are castanets and applause.

Quote

Anyhow, I wasn't talking about udial, I was talking about pink/white noise being appropriate test samples in my opinion because I think they are not extreme borderline cases.

They are. Entropy of white noise should be very high, pink noise is correlated, but in a fractal pattern, which I wouldn't expect the encoder to pick up.
With noise, e.g. lowpass is much more audible. In the real case, pre/post-masking covers most of the noise of the attack.

Quote

Don't you tnink that those misbehaviours may affect other ("natural") test samples as well ? Shouldn't we thank Compact Dick for finding another test sample with weird encoder issues so the psychoacoustic model can be improved ?

No, pure continous stereo white noise doesn't exist in music. Some forms of attacks formed of pink noise, maybe, but for this case there's special tuning.

Quote

Vorbis - as a VBR coder - should be able to compute an appropriate signal-to-mask ratio for quantization anyway. Otherwise it's an indication for a psychoacoustic model which needs to be improved.

Well, Vorbis replaces the noise and uses its built-in generator to save bitrate. (I'm probably wrong, this is how PNS works) That's why it sounds different.
[a href="index.php?act=findpost&pid=242445"][{POST_SNAPBACK}][/a]

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #28 – 2004-09-17 20:37:52

AFAIK PNS is patented, I don't know if Vorbis uses an unpatented replacement.

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #29 – 2004-09-17 20:42:28

Quote

AFAIK PNS is patented, I don't know if Vorbis uses an unpatented replacement.
[a href="index.php?act=findpost&pid=242465"][{POST_SNAPBACK}][/a]

I do know that, just the techniques seem and sound similar. Any developer here?

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #30 – 2004-09-17 21:45:08

Musepack employs PNS as well (for -q < 5) so I would assume there are ways around any patents. 'course MPC is based on MP2...

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #31 – 2004-09-17 22:00:51

Usage of PNS in MPC is most definetely infringing. IIRC Frank himself has said this.

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #32 – 2004-09-18 00:00:30

First of all... AAC and Vorbis are called general purpose audio codecs and not music only audio codecs or music-that-Garf-likes audio codecs. This naming implies that in an ideal case you can feed any kind of audio data (even an EXE file if you want to hear it interpreted as raw PCM). That's why it's called general purpose...

Quote

Quote
Anyhow, I wasn't talking about udial, I was talking about pink/white noise being appropriate test samples in my opinion because I think they are not extreme borderline cases.

They are. Entropy of white noise should be very high, pink noise is correlated, but in a fractal pattern, which I wouldn't expect the encoder to pick up.
With noise, e.g. lowpass is much more audible. In the real case, pre/post-masking covers most of the noise of the attack.

Oh boy... is this a religios position ? "Don't produce artificial sounds because they're BAAAD"
Check some Jean-Michel Jarre Songs like Equinoxe Part 5 / Part 8 / Magnetic Fields Part 5 / Oxygene Part 6.... Especially in case of Oxygene Part 6 you'll notice very bright (almost white) and not-so-transient noise at the beginning.

Please stop talking about fractal patterns (WTF!?). These are big words with less meaning and don't make you look professional to me. The thing about pink noise is that the signal energy is not distributed equally in frequency (more energy in lower frequency regions like it's the case for most natural sounds *wink*). An AAC/Vorbis encoder takes advantage of this by using the MDCT.

Anyway... you mentioned the word "entrpy". What are we talking about ? That it is difficult to store white noise like signals into a few kilobits per second ? I did not question that. But your response seems to me like a justification for having a psychoacoustic model that fails to allocate enough bits at the right places for a VBR encoder like Vorbis from which I expect more or less constant quality.

Quote

Quote
Don't you tnink that those misbehaviours may affect other ("natural") test samples as well ? Shouldn't we thank Compact Dick for finding another test sample with weird encoder issues so the psychoacoustic model can be improved ?

No, pure continous stereo white noise doesn't exist in music. Some forms of attacks formed of pink noise, maybe, but for this case there's special tuning.

Are you implying that you know every song that exists and will exist in the future and checked for usage of independent noise ? Or do you need to soften your claim ?
It's that kind of ignorance I don't like. Again: general purpose formats/encoders should be able to handle anything in terms of quality. But they only need to give good compression ratios for "the usual stuff".

Consider a Dolby Prologic II encoded Jean-Michel Jarre Music where the atmospheric "wind-noise" of Oxygene Part 6 is supposed to get played through all speakers and thus is orthogonal. There you have your noise which no channel decorrelation mechanism in any existing audio codec can cope with. But I don't expect encoders to do a bad job. I expect them to use enough bits to produce a stream which cannot be identified as the encoded one.

Quote

Quote
Vorbis - as a VBR coder - should be able to compute an appropriate signal-to-mask ratio for quantization anyway. Otherwise it's an indication for a psychoacoustic model which needs to be improved.

Well, Vorbis replaces the noise and uses its built-in generator to save bitrate. (I'm probably wrong, this is how PNS works) That's why it sounds different.
[a href="index.php?act=findpost&pid=242445"][{POST_SNAPBACK}][/a]
[a href="index.php?act=findpost&pid=242463"][{POST_SNAPBACK}][/a]

Unfortunately You are wrong, AstralStorm. Take a look at the Vorbis I specification.
You probably picked up the term "noise normalization" somewhere and thought of Vobis using a PNS-like mechanism.

BTW: AFAIK PNS is still part of the current MusePack Streamversion 8 spec (working draft or whatever you call it what is online)

To sum up:
I don't agree that the type of audio signals an encoder should be able to handle correctly is limited to what you call music. By "correctly" I mean an as optimal as possible noise-mask-ratio.
I don't agree on white/pink noise being that different to elements used in what you call music. Examined locally in a scalefactor band / subband every kind of noise looks white. In the MDCT domain you've mainly 2 different kinds of coefficient distributions. Gaussian-like in case of noise and Gaussian-like with some high peaks here and there in case of tonals.

Sebastian

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #33 – 2004-09-18 00:06:36

Quote

First of all... AAC and Vorbis are called general purpose audio codecs and not music only audio codecs or music-that-Garf-likes audio codecs. This naming implies that in an ideal case you can feed any kind of audio data (even an EXE file if you want to hear it interpreted as raw PCM). That's why it's called general purpose...

They're called that way because they're supposed to still give something *acceptable* no matter what you feed them. Which is exactly what they do. And this is why MPEG has seperate voice codecs, for e.g. better voice performance at lower bitrates (which are not general purpose since they completely blow up on music). By your reasoning this would be useless because there already is a "general purpose" codec.

I have exactly *zero* interest in discussing this any further.

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #34 – 2004-09-18 00:29:51

Quote

They're called that way because they're supposed to still give something *acceptable* no matter what you feed them. Which is exactly what they do. And this is why MPEG has seperate voice codecs, for e.g. better voice performance at lower bitrates (which are not general purpose since they completely blow up on music). By your reasoning this would be useless because there already is a "general purpose" codec.

No, my reasoning does not imply that speech codecs are useless.
I did not say that a general purpose codec should behave optimal in sense of quality per bit for every type source - It obviously can't with a general model of audio signals. I just said I expect a general purpose audio VBR encoder to give constant quality for any kind of audio signal. I expect Vorbis to perform well for -q6 and above although I'm pretty pleased with -q3.

Quote

I have exactly *zero* interest in discussing this any further.
[a href="index.php?act=findpost&pid=242507"][{POST_SNAPBACK}][/a]

Well, you started like "blah ... very silly comment ... blah"
and I kind of lost my sympathy for you because it made you look like an elitist which is proud of his position and thinks of everyone he doesn't know as a wanna-be.
But I'm sure you've no problem with that.

Sebastian

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #35 – 2004-09-18 00:55:06

Quote

I just said I expect a general purpose audio VBR encoder to give constant quality for any kind of audio signal.
[a href="index.php?act=findpost&pid=242512"][{POST_SNAPBACK}][/a]

I agree with both sides of this argument, but it seems that it IS giving a constant quality for all kinds of audio samples, it's just not as good as you want it to be for generated tones. The quality it produces is constant though. And insulting people obviously isn't going accomplish much in the way of improving vorbis. Seems to me if your that concerned about preserving the quality of white or pink noise or whatever use lossless or raw wav. And perhaps check how mp3 or others do on the same sample. If they do better then this may become a more reasonable argument to more people.

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #36 – 2004-09-18 01:13:58

I don't quite recall Vorbis using a PNS-like algorithm. I think Monty is quite aware of these new techniques which have appeared in modern codecs as well as their patent status.

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #37 – 2004-09-18 12:44:43

Quote

And insulting people obviously isn't going accomplish much in the way of improving vorbis. [a href="index.php?act=findpost&pid=242515"][{POST_SNAPBACK}][/a]

Ok, this was probably more rude than Garf's first post I got pissy about. I apologize.
I did not mean to harm anyone's feelings. It's just that i disliked Garf's reaction / use of words. This is where we get without respecting TOS #2 I guess. I'll watch my mouth.

Sebastian

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #38 – 2004-09-21 08:19:50

Garf: thanks for the thoughtful explanations. Compress white or pink noise with a lossless codec and I see what you mean. They are difficult samples to encode, especially white noise -- very little savings on that one.

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #39 – 2004-09-21 12:54:49

Quote

Garf: thanks for the thoughtful explanations. Compress white or pink noise with a lossless codec and I see what you mean. They are difficult samples to encode, especially white noise -- very little savings on that one.
[a href="index.php?act=findpost&pid=243248"][{POST_SNAPBACK}][/a]

Ah, well.... So we've very little savings if we want to losslessly compress white noise. What does this mean ? Does it directly imply that psyachoacoustic models might have problems with these samples ?

Let me try to elaborate on how I think lossy VBR audio transform codecs work:
1) Transform the audio data
2) Compute a masking threshold according to a certain quality level
3) Quantize the transformed samples according to this masking threshold
4) Code the quantized coeffs

The big problem ist step 2 - no doubt. It's very difficult to build a proper but simple-enough-to-calculate-efficiently model that reflects the behaviour of human auditory masking effects. If it's a very good model it'll compute an appropriate masking threshold for white noise, too. But the unpredictability of white noise for lossless compression has nothing to do with how well a psyachoacoustic model performs.

However in the real world we have to deal with imperfect masking models. Since our present models are imperfect it makes sense to tune its parameters while checking its quality on "normal" music - the content type which is most often used - to better match the real masking effects for that type of audio. Because of the model being imperfect no set of parameters works best in every case. However we can not deny that weird/noticable arfecacts at -q7 (a quality mode which usually performs so well) is due to the imperfect masking model.

So, from a theoretical standpoint: We need to improve these models so that the computed masking / real masking ratio is a flat curve for every type of signal. But practically this is far from being simple, unfortunately. And that - I guess - is the conflict Garf an I had. I was arguing more theoretically and he was arguing practically.

I could also tune the model's parameters to code white noise perfectly (although it's hard to compress losslessly but this really does not matter at all) but since the model is imperfect anyway it's likely to fail in other cases.

Wow! Wasn't this a very neutral posting ? I'm kinda proud....
Hopefully this was worth reading it, too.

Greets,
Sebastian

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #40 – 2004-09-22 12:43:30

Quote

Garf: thanks for the thoughtful explanations. Compress white or pink noise with a lossless codec and I see what you mean. They are difficult samples to encode, especially white noise -- very little savings on that one.
[a href="index.php?act=findpost&pid=243248"][{POST_SNAPBACK}][/a]

However, just because a lossless approach yields a high bitrate, and one lossy approach yields files that sound different, doesn't mean you can't express a white noise signal, at a lower bitrate, without changing the sound.

The comic would say this is a compact representation of ten seconds of white noise:
"run Cool Edit, New, 44.1kHz 16-bit, Generate, White Noise, Intensity 10, Duration 10, OK."

However, one piece of white noise is perceptually different from another. Being random, you never get the same result twice, and though long term it sounds quite monotinous and homogenous, in the short term you can hear "features". You could ABX two different pieces of white noise very easily.

So, to do what you're trying to do, you need to encode it properly, lossily, but without changing the sound.

How? Easy - convert it to 6 bits! (8-bit is easier, but less efficient). For near full scale white noise, that will do the trick. Really.

If you don't believe me, try it. The psychoacoustics are quite simple - you have (very approximately) 20 or 30dB masking for noise on noise. What you're doing is adding quantisation noise 6*6=36dB below digital full scale, which is inaudible.

You can losslessly encode the result if you wish, though it won't help much. You can lowpass filter the noise at the limit of your hearing before converting to 6-bit, which may help the lossless compressor slightly.

Just converting from 16>6 bits gives 2.666:1 compression. Not bad for an incompressible signal.

You just have to think these things through.

Cheers,
David.

EDIT: P.S. you can't do the same trick for pink noise, though pre-emphasis and noise shaped dither might help.

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #41 – 2004-09-22 13:03:08

Having made a facetious scientific post, can I now say this please...

I think the moderators need a slap in this thread.

The original poster was quite clearly providing this "as is" and said that he didn't know how useful or relevant it was. The attitude and swearing was way out of order - were you having a bad day dev0?

What's more, I reckon foolish assumptions about what could occur in "real music" let official MPEG listening tests give layers II and III quite glowing reports, while we know that certain artificial (though arguably quite musical) signals will cause terrible problems for these encoders. Dibrom should know that as well as anyone.

It is perfectly fair to say that some signals are unlikely to occur in most "music" and so, if they're difficult to fix, they're certainly not a priority. But to suggest that there's absolutely no point in fixing them because they don't come from a CD in your or anyone else's collection is being short sighted.

If someone decide to pop a bit of FM radio hiss (half way between pink and white noise) at the start of a record that make it into the US top ten next month, will you consider it then? If a synth sound built purely of impulses becomes popular, will you think it's worth tackling that problem in some codecs?

FWIW it seems to me that the slight faults in some codecs may be related to their behaviour with killer artificial signals. Don't some versions of vorbis do strange things to tape hiss and stereo image, as shown in listening tests? Is this totally unrelated to what we have hear?

If undail shows that a resampler is broken, isn't it better to fix it? It can't hurt, and you never know when it might help. Maybe it'll only help undial. So what? How much tuning was done on fatboy.wav alone!?

I've made my point. You may not agree. But I think, with all the caveats in place, there was no reason for the moderators to be so down on this thread.

Cheers,
David.

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #42 – 2004-09-23 01:23:08

Quote

The original poster was quite clearly providing this "as is" and said that he didn't know how useful or relevant it was. The attitude and swearing was way out of order ...

What's more, I reckon foolish assumptions about what could occur in "real music" let official MPEG listening tests give layers II and III quite glowing reports, while we know that certain artificial (though arguably quite musical) signals will cause terrible problems for these encoders.

...

It is perfectly fair to say that some signals are unlikely to occur in most "music" and so, if they're difficult to fix, they're certainly not a priority. But to suggest that there's absolutely no point in fixing them because they don't come from a CD in your or anyone else's collection is being short sighted.

If someone decide to pop a bit of FM radio hiss (half way between pink and white noise) at the start of a record that make it into the US top ten next month, will you consider it then? If a synth sound built purely of impulses becomes popular, will you think it's worth tackling that problem in some codecs?
[a href="index.php?act=findpost&pid=243589"][{POST_SNAPBACK}][/a]

Thanks for getting it right. It's one thing to snap at those providing unsubstantiated opinions without stating so, but when a genuine ABX case is presented as per the rules, I expect serious interest. Insults and immature behaviour not becoming of a moderator is not acceptable at a site that purports to be an unbiased source of audio encoding knowledge.

Cheers,
CD

Vorbis 1.1RC1 artifacts [up to -q8]

Reply #43 – 2004-10-06 16:04:17

Quote

To be honest: I'm a bit sick of these statements like "oh, you're encoding artificial noise! But encoders are tuned to code music". I fact, pink/white noise is pretty natural - think of percussions (Cymbal, Snare) or the applaud sample. Or think of electronic music that makes use of noise generators....

I agree, pink/white noise is perfectly natural. Which is why some of the comments in this thread really confuse me. I've been sort of watching this thread, mostly because I have a good deal of relaxation stuff based on nature sounds (rain, ocean surf, etc). Some has music, some doesn't. Anyways, I plan on transcoding all of my audio to Vorbis for use on my new Rio Karma and have been trying to decide how low a bitrate I can get away with, and because of this am reviewing this thread with greater interest since it sounds like something that may actually be problematic if I'm not careful. Hopefully this thread hasn't died a permanent death due to lack of interest.

Quote

Thanks for getting it right. It's one thing to snap at those providing unsubstantiated opinions without stating so, but when a genuine ABX case is presented as per the rules, I expect serious interest. Insults and immature behaviour not becoming of a moderator is not acceptable at a site that purports to be an unbiased source of audio encoding knowledge.

Honestly hate having to say it, but you are right. Well said CD. Hope I don't get in trouble for saying that.

Notice