.ape vs other lossless types

2007-12-08 18:33:22

I was hoping someone can explain to me why Monkey's Audio will compress a .wav file so much more then another standard LZ compression algorithm, say UltimateZip for example? Any help is greatly appreciated Thanks!

.ape vs other lossless types

Reply #1 – 2007-12-08 18:35:10

Perhaps the SEARCH function would help!

.ape vs other lossless types

Reply #2 – 2007-12-08 18:36:10

Quote from: jimhaddon on 2007-12-08 18:35:10

Perhaps the SEARCH function would help!

Ive been searching google for a long time for an answer, what do you recommend the search keywords be?
Im not really looking for a comparison, im just wondering how Monkey's Audio or another good .wav compressor is so much more efficient then one like UltimateZip. When i encoded the same .wav with UltimateZip i got about 20% compression, but with Monkeys Audio i got about 58%, and i was hoping someone could tell me why

.ape vs other lossless types

Reply #3 – 2007-12-08 18:59:58

Because most general file compressors are not optimized to encode audio files. Also these files are not optimized to be played back. Therefore we have all the lossless formats. Theres plenty of information on wikipedia.

I read somewhere that the wavpack algorithm should be implemented in some zipper (forgot which one). It's a nice idea, but I think the benefit of it is too low to make a big deal out of it.

.ape vs other lossless types

Reply #4 – 2007-12-08 19:01:24

http://wiki.hydrogenaudio.org/index.php?title=Lossless

Quote from: odyssey on 2007-12-08 18:59:58

I read somewhere that the wavpack algorithm should be implemented in some zipper (forgot which one). It's a nice idea, but I think the benefit of it is too low to make a big deal out of it.

WinZip does now use WavPack I believe.

.ape vs other lossless types

Reply #5 – 2007-12-08 19:10:36

Quote from: jcl43 on 2007-12-08 18:36:10

Quote from: jimhaddon on 2007-12-08 18:35:10

Perhaps the SEARCH function would help!

Ive been searching google for a long time for an answer, what do you recommend the search keywords be?
Im not really looking for a comparison, im just wondering how Monkey's Audio or another good .wav compressor is so much more efficient then one like UltimateZip. When i encoded the same .wav with UltimateZip i got about 20% compression, but with Monkeys Audio i got about 58%, and i was hoping someone could tell me why

Hey thanks, thats basically everything i needed to know. For some reason i couldnt think to wiki on audio compression, i kept looking for lossless in general and specific programs. Thanks a million for your help!

.ape vs other lossless types

Reply #6 – 2007-12-08 22:02:57

Quote from: Synthetic Soul on 2007-12-08 19:01:24

http://wiki.hydrogenaudio.org/index.php?title=Lossless

I believe I wrote the lead note explaining on that HAK article...

So I won't repeat that info here

.ape vs other lossless types

Reply #7 – 2007-12-08 22:29:16

Roughly speaking the LZ compression algorithm looks for sequences of bytes that existed in the file before and uses an alternative bit saving representation of this byte sequence by using kind of a pointer to the previously present identical sequence.
This is a good procedure for compressing text and other contents, but not to compress audio material.

Lossless codecs like Monkey do a prediction of the next wave sample on the basis of the previous samples. They also make use of the correlation between the left and right channel. They code the prediction parameters to be able to do the prediction when decoding, and they encode the prediction error by means of techiques similar to those used in the LZ kind of algorithms but with specific details most adequate to this context.

.ape vs other lossless types

Reply #8 – 2007-12-08 23:58:25

Matt Ashland, the creator of Monkey's Audio, explains the compression system like this:

Quote

Theory

This document is designed to give those interested in lossless audio compression a primer on the basics of a lossless audio compressor.
This document is by no means complete, but hopefully you will find it useful and maybe even interesting.

Digital Audio:

Sound is simply a wave, and digital audio is the digital representation of this wave. This is achieved by "sampling" the magnitude of an analog signal many times per second. This can be thought of conceptually as recording the "height" of the wave many times per second. Today's audio CD's store 44,100 samples per second. Since CD's are in stereo, they store both a left and a right value 44,100 times per second. These values are represented by 16 bit integers. Basically a WAV file is a header, followed by an array of R, L, R, L…. Since each sample takes 32 bits (16 for the left, and 16 for the right), and there are 44,100 samples per second, one second of audio takes 1,411,200 bits, or 176,400 bytes.

Lossless Compression:

1) Conversion to X,Y

The first step in lossless compression is to more efficiently model the channels L and R as some X and Y values. There is often a great deal of correlation between the L and R channels, and this can be exploited several ways, with one popular way being through the use of mid / side encoding. In this case, a mid (X) and a side (Y) value are encoded instead of a L and a R value. The mid (X) is the midpoint between the L and R channels and the side (Y) is the difference in the channels. This can be achieved:

X = (L + R) / 2

Y = (L - R)

2) Predictor

Next, the X and Y data is passed through a predictor to attempt to remove any redundancy. Basically, the goal of this stage is to make the X and Y arrays contain the smallest possible values while still remaining decompressible. This stage is what separates one compression scheme from another. There are virtually countless ways to do this. Here is a sample using simple linear algebra:

PX and PY are the predicted X and Y; X-1is the previous X value; X-2 is the X value two back

PX = (2 * X-1) - X-2

PY = (2 * Y-1) - Y-2

As an example, if X = (2, 8, 24, ?); PX = (2 * X-1) - X-2 = (2 * 24) - 8 = 40

Then, these predicted values are compared with the actual value and the difference (error) is what gets sent to the next stage for encoding.

Most good predictors are adaptive, so that they adjust to how "predictable" the data currently is. For example, let's use a factor 'm' that ranges from 0 to 1024 (0 is no prediction and 1024 is full prediction). After each prediction, m is adjusted up or down depending on whether the prediction was helpful or not.. So in the previous example, what leaves the predictor is this

X = (2, 8, 24, ?)

PX = (2 * X-1) - X-2 = (2 * 24) - 8 = 40

If ? = 45 and m = 512, then [Final Value] = ? - (PX * m / 1024) = 45 - (40 * m / 1024) = 45 - (40 * 512 / 1024) = 45 - 20 = 25

After this m would be adjusted upward because a higher m would have been more efficient

Using different prediction equations and using multiple passes through the predictor can make a fairly substantial difference in compression level. Here is a quick list of some prediction equations as shown in the Shorten technical documentation (for different orders):

P0 = 0

P1 = X-1

P2 = (2 * X-1) - X-2

P3 = (3 * X-1) - (3 *X-2) + X-3

3) Encoding of Data / Rice coding

The goal behind audio compression is to make all of the numbers as small as possible by removing any correlation that may exist between them…once this is achieved the resulting numbers must be written to disk. One of (if not the) most efficient way to do this is with rice coding.

Why are smaller numbers better? They are better because they take less bits to represent. For example, say we want to encode this array of numbers (32 bit longs):

Base 10: 10, 14, 15, 46

or in binary

Base 2: 1010, 1110, 1111, 101110

Now obviously if we want to represent these numbers in the fewest possible bits, it would be quite inefficient to represent them each as separate longs with 32 bits apiece. That would take 128 bits, and just from looking at the same numbers represented in base two, it is obvious that there must be a better way. The ideal thing would be just to slap the four numbers together using the least bits necessary, so 1010, 1110, 1111, 101110 without the commas would be 101011101111101110. The problem here is that we don't know where one number starts and the next begins. This is where rice coding comes into play.

Rice coding is a way of using less bits to represent small numbers, while still maintaining the ability to tell one from the next. Basically it works like this:

1) You make your best guess as to how many bits a number will take, and call that k

2) Take the rightmost k bits of the number and remember what they are

3) Imagine the binary number without those rightmost k bits and look at its new value (this is the overflow that doesn't fit in k bits)

4) Use these values to encode the number… This encoded value is represented as a number of zeroes corresponding to step 3, then a terminating 1 to tell that your done sending the "overflow", then the k bits from step 2.

Let's work through our example, and try to encode the fourth number in our series 10, 14, 15, 46.

1) You make your best guess as to how many bits a number will take, and call that k: since the previous 3 numbers took 4 bits, that seems like a reasonable guess so we will set k = 4

2) Take the rightmost k bits of the number and remember what they are: The right 4 bits of 46 (101110) are 1110

3) Imagine the binary number without those rightmost k bits and look at its new value (this is the overflow that doesn't fit in k bits): When you take the 1110 away from the right of 101110 you are left with 10 or 2 (in base 10)

4) Use these values to encode the number… So, we put two 0's, followed by the terminating 1, followed by the k bits 1110…altogether we have 0011110

Now to undue this operation, we just take 0011110 and k = 4 and work our way backwards… We first see that the overflow is 2 (there are two zeroes before the terminating 1) We also see that the last four bits = 1110. So, we take the value 10 (the overflow) and the values 1110 (the k) and just do a little shifting and volah! (overflow is shifted << k bits)

Here is a little more technical and mathematical description of the same process:

Assuming some integer n is the number to encode, and k is the number of bits to encode directly.

1) sign (1 for positive, 0 for negative)

2) n / (2k) 0's

3) terminating 1

4) k least significant bits of n

As an example, if n = 578 and k = 8: 100101000010

1) sign (1 for positive, 0 for negative) = [1]

2) n / (2k) 0's: n / 2k = 578 / 256 = 2 = [00]

3) terminating 1: [1]

4) k least significant bits of n: 578 = [01000010]

5) put the 1-4 together: [1][00][1][01000010] = 100101000010

During the encode process, the optimum k is determined by looking at the average value over the past however many values (16 - 128 works well), and choosing the optimum k for that average. (basically it's guessing what the next value will be, and trying to choose the most efficient k based on that) The optimum k can be calculated as [log(n) / log(2)].

Source: http://www.monkeysaudio.com/theory.html

Notice