Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: VBR - how does it really work? (Read 7460 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

VBR - how does it really work?

(This should be the right forum. Sadly I'm not entirely sure, so excuse me if it's not.)

Wikipedia says:
"VBR allows a higher bitrate (and therefore more storage space) to be allocated to the more complex segments of media files while less space is allocated to less complex segments."

"The bits available are used more flexibly to encode the sound or video data more accurately, with fewer bits used in less demanding passages and more bits used in difficult-to-encode passages."


And I'm looking for something a little less vague. I know people here have a bunch of experience with this so hopefully someone could teach me a thing or two.

Basically, this is more or less what I want to know: what types of music and sound requires the most (and least) bits?

I'm wondering because a few days ago, I looked through my newly created VBR V4 MP3s to see which ones had the highest and the lowest average bitrate - and was very surprised. The three or four songs with the highest average bitrate were all from this Deep Purple singles collection CD released in -94 (my first CD!), except for one which was a (fairly low quality) demo version of a song. The two songs with the highest average bitrate each showed 209 kbps (and, as I said, they were all VBR V4s).

But what also sort of confused me was when I looked at the FLAC files these had been converted from. Now, I know that this is not the FLAC forum, but discussing the differences between the formats shouldn't be too off topic in this case, I hope. None of the songs from the Deep Purple CD (except for one, which was a live version) had a bitrate above 1000 kbps in that case. The demo song showed 725 kbps. The FLAC file with the highest bitrate showed 1202, but as an MP3, it's 158 kbps (again, VBR V4).

So I assume this is because of differences between FLAC and MP3. However, I don't know that much about FLAC so I won't bother going into what I think the differences might be. Instead, I'm sort of asking you.

But again, the main thing I'm wondering is why relatively calm classic rock songs from the 70's require a higher bitrate than progressive death metal from '08. I mean, you'd think more sound = more bits - but that's obviously not true. Is this because you can't actually hear all the sound? Or something...? Bah. I should stop guessing, and just post this.

VBR - how does it really work?

Reply #1
I'm by no means an expert on audio encoding like some people here but my guess would be that the seeming opposites in file sizes are because mp3 and flac are totally different methods of encoding.

Mp3 compresses audio by introducing distortion. In loud, "messy" tracks like death metal, this distortion may be more likely to be hidden by the style of music, whereas in quiet tracks the distortion would be more apparent, so the bitrate needs to be higher.

Flac compresses music whilst maintaining bit-perfect replication. This is a totally different requirement, and means that music with lots of quiet passages and simple sounds like classical compress more easily than loud noisy tracks with more random noise in.

I'm totally guessing all this so someone please step in and tell me why it is all wrong :-)

Sam

VBR - how does it really work?

Reply #2
Mp3 compresses audio by introducing distortion. In loud, "messy" tracks like death metal, this distortion may be more likely to be hidden by the style of music, whereas in quiet tracks the distortion would be more apparent, so the bitrate needs to be higher.

umm, no.

Neither mp3 compression in general nor VBR specifically compress by introducing distortion. There several kinds of compression going on, and this is by no means a comprehensive list, just a few of the major ones.

There is simple repeating string compression  - 10 Zero bits in a row in a wav file are replaced by an expression equivalent to 10x0. The more changes there are in the music, the less this can be done. (This applies to FLAC, mp3 and zip files)

There is stereo difference compression - in a given slice of time, the difference between the L & R channels is calculated and only one channel plus the difference is kept, rather than 2 full channels. The greater the difference, the more bits are needed to describe it.

There is sample to sample difference - the difference between samples is measured and if possible, only the difference is kept. The more similar the samples, the fewer the bits needed to describe it.

This difference is measured both as frequency and amplitude. So a single note held at the same volume over  several measures takes fewer bits to describe than complex music with rapidly changing volume.

There is frequency substitution -
 -  very high or low frequencies out of normal human hearing range are mapped to lower or higher frequencies respectively that are in hearing range or are simply ignored. The extent this happens can be set.

- some frequencies cannot be easily distinguished from others and therefore can be lumped to together (part of psychoacoustics)

- some frequencies can by masked by others, particularly if they occur at very different amplitudes. The one that is masked may be ignored or fewer bits may be used to describe it. (part of psychoacoustics)

Transient blurring: some changes occur so fast that they can be blurred or minimized without apparent loss of quality.

Quality settings determine the extent of frequency substitution. The more it is done, the fewer bits are used. Same for transient blurring.

Silence : Silence or near silence can require few or no bits in mp3. The more silence or near silence a piece of music has, the fewer bits needed to describe it.
EAC secure | FLAC  --best -V -b 4096 | LAME 3.97 -V0 -q0 -b32

VBR - how does it really work?

Reply #3
Mp3 compresses audio by introducing distortion. In loud, "messy" tracks like death metal, this distortion may be more likely to be hidden by the style of music, whereas in quiet tracks the distortion would be more apparent, so the bitrate needs to be higher.

That is correct.

Another factor may be that the 1994 CD could contain a fair amount of tape hiss that falls well below lowpass of V4, whereas metal is rich in high-freq noise above the lowpass.

VBR - how does it really work?

Reply #4

Mp3 compresses audio by introducing distortion. In loud, "messy" tracks like death metal, this distortion may be more likely to be hidden by the style of music, whereas in quiet tracks the distortion would be more apparent, so the bitrate needs to be higher.


umm, no.

Neither mp3 compression in general nor VBR specifically compress by introducing distortion.


Yes they do.

- some frequencies cannot be easily distinguished from others and therefore can be lumped to together (part of psychoacoustics)

- some frequencies can by masked by others, particularly if they occur at very different amplitudes. The one that is masked may be ignored or fewer bits may be used to describe it. (part of psychoacoustics)

Transient blurring: some changes occur so fast that they can be blurred or minimized without apparent loss of quality.


These all are adding distortion, specifically quantization noise.

VBR - how does it really work?

Reply #5
As others already mentioned, one of the aspects of how MP3 works is that it stores parts of the signal "less accurately" if the inaccuracy cannot be heard. Just to get this right from the start: MP3 does NOT work, by "removing" things from the audio (except of lowpass). It works by storing inaudible things less accurately.

To aid in finding out what is audible and what isn't, it uses the so called "ATH-curve" - which is a frequency-curve describing how sensitive we are to individual frequencies. In addition to this, it uses other tricks... for example, inaccuracies immediatelly after (edit) a transient tend to be temporarily masked by the transient.

So, part of how MP3 works, is that it "hides" inaccuracies behind other sounds which mask them. This means that soundmaterial which is very complex tends to offer lots of opportunities for making use of masking. A solo-instrument playing quietly in turn doesn't offer much opportunities for that: when listening, we will pay close attention to just that single instrument - and therefore notice inaccuracies more easily.

With lossless codecs, the opposite is the case. They cannot make use of masking because they aren't allowed to be lossy. Therefore, they MUST encode everything, no matter what is thrown at them. This in turn means that they will more easily compress "simple" soundmaterial, but less good "highly complex" material. In addition to this, ultrahigh frequencies require much more bits to encode, than low-to-high freqs - this applies to both, MP3 and lossless. HOWEVER, mp3 because of being lossy can make use of masking and apply a lowpass (normal audio can store up to 22khz, but humans typically can only hear up to 17-18khz in the case of TESTTONES - with music, its even lower - because of masking). Lossless doesn't have the luxury of spending less bits on ultrahigh freqs, since they are not allowed to be lossy.

- Lyx
I am arrogant and I can afford it because I deliver.

VBR - how does it really work?

Reply #6
These all are adding distortion, specifically quantization noise.


I won't argue the point - an mp3 is a distortion of the original wav, and the distortion in this context is regular and repeatable, though not reversible. As I see it the distortion in this context is the process, not just noise introduced to make the signal easier to compress. Or maybe I'm just looking at it the wrong way.

All mp3's at some point are a trade off between compression ratio and quantization noise - any time you down sample to a lower bit depth it's going to be an issue. That's the point of VBR - using fewer bits where it's least noticeable. But yes it is a distortion of the original signal to achieve compression.
EAC secure | FLAC  --best -V -b 4096 | LAME 3.97 -V0 -q0 -b32


VBR - how does it really work?

Reply #8
As I see it the distortion in this context is ... not just noise introduced to make the signal easier to compress.

...

But yes it is a distortion of the original signal to achieve compression.


Confused!


Anyway, the answer to the original question is simple:

The bitrate of a FLAC file is determined by the coefficients needed to attempt to predict it mathematically, plus the difference between that prediction and the original file. More "predictable" = lower bitrate.

The bitrate of a VBR mp3 files is determined by the quantisation levels required to keep the added noise below the masked threshold calculated by the psychoacoustic model. Less audio above the masked threshold = lower bitrate.

The two aren't that well related, so the original poster's results aren't that surprising.


Where both codecs drop the bitrate dramatically is with mono files and silent files; these contain 50% and 100% redundant information respectively. Both codecs support joint stereo; both codecs find little to waste bits on in silence!

Cheers,
David.

 
SimplePortal 1.0.0 RC1 © 2008-2021