Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Choosing frame size (Read 4838 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Choosing frame size

I've written a simple codec (lossless: prediction+golomb coding), but for now the frame size is hardcoded - the results differ a bit for various sizes, but not much. Are there any articles about how estimate a good frame size for compression? My first thought would be to defer framing to the prediction step, and doing it based on how the expected value changes, but maybe someone has tried it already?

Choosing frame size

Reply #1
AFAIK, most lossless codecs use trial-and-error based approaches.

I'm not sure what you mean by "and doing it based on how the expected value changes". The predictor coefficients you calculate will be quite dependant on the framing.

Choosing frame size

Reply #2
I've written a simple codec (lossless: prediction+golomb coding), but for now the frame size is hardcoded - the results differ a bit for various sizes, but not much. Are there any articles about how estimate a good frame size for compression? My first thought would be to defer framing to the prediction step, and doing it based on how the expected value changes, but maybe someone has tried it already?


What kind of prediction are you using? If between samples, why do you care about frame size at all?
Unless you are transforming before prediction, of course.

Choosing frame size

Reply #3
AFAIK, most lossless codecs use trial-and-error based approaches.

I'm not sure what you mean by "and doing it based on how the expected value changes". The predictor coefficients you calculate will be quite dependant on the framing.
I'm doing it the Shorten way, calculating predictor residuals from last samples (0 to 3 last samples). The expected value is a good way to estimate how big these values are - so if I calculate it after every new sample I would be able to tell if it begun to grow significantly, which would mean that I should start a new frame. An example (artificial, but it should explain what I want):
Code: [Select]
i                  |  1 |  2 |  3 |  4 |  5 |  6 |  7 |  8 |  9 | 10 | 11 | 12 | 13 | 14 |
------------------------------------------------------------------------------------------
Values            |  3 |  4 |  6 |  8 |  7 |  5 | 15 | 26 | 37 | 47 | 37 | 45 | 37 | 28 |
e1:                |  3 |  1 |  2 |  2 | -1 | -2 | 10 | 11 | 11 | 10 |-10 | 12 |-12 | -9 |
sum(abs(e1))(0..i) |  3 |  4 |  6 |  8 |  9 | 11 | 21 | 32 | 43 | 53 | 63 | 75 | 87 | 98 |
E(abs(e1))(0..i)  |  3 |  2 |  2 |  2 |  2 |  2 |  3 |  4 |  5 |  5 |  5 |  6 |  6 |  7 |
So now, when I see that at i = 6 the expected value E begins to rise, it would be probably good to start a new frame. This way the 1-6 part would be encoded with shorter codewords, and codewords for the 7-14 would be more suitable for it (choosing a better n for the rice code would make the unary part of the codeword shorter for most values, or even a predictor of a higher order might work better for this frame). Of course there would be a minimum frame length because for every frame I need a header and extremely short frames would have a long header compared to their bodies. And if E wouldn't vary a lot, this would produce an extra long frame, so I don't waste bytes for the frames metadata.

The problem is of course how to measure that the expected values begun to grow rapidly (calculating derivatives?) and whether it would make any difference - maybe such situations won't happen or will happen rarely?

What kind of prediction are you using? If between samples, why do you care about frame size at all?
That's what I'm asking about - do such situations as I described above happen in normal audio signal (or better, whether they happen often)?

Choosing frame size

Reply #4
Ok, I think I get what you mean. You want to use arbitary-length frames and decide when to start a new frame when you observe the signal is getting suddently much more/less predictable. I don't see anything wrong with this as part of the total strategy, but there are a few caveats:

1) The predictor coefficients are initially calculated on an entire frame, correct? If you now split this, you have to recalculate the predictor and get the new residuals (which you would then expect to be smaller again, though).

Edit: Hmm, Shorten doesn't really have the LPC style predictor I was thinking of, but a fixed set of smaller predictors. You can change the "predictor coefficients" by "selected predictor" and it works out approximately the same.

2) You still would have to try to split large, badly predictable blocks into several smaller ones, because it might be possible to find 2 predictors that produce small residuals for a block that produces large residuals with a single predictor.

Note that doing (2) supersedes (1), which is probably why most lossless codecs aren't doing what you propose. On the other hand, if you have abitrary-sized frames, then doing a full search on (2) is prohibitive, and getting some assistance from heuristics like the one you describe might help.

 

Choosing frame size

Reply #5
A frame is mostly used when you take a bunch of consecutive samples and transform them (or LPC them).
From what I understand, in your case you switch frame simply because the dynamic of your prediction changes.

IMO, this is OK, but it's not strictly necessary. What you want to do is to signal that you are switching coding tables.
This could be done (forward adaptively) by using an explicit escape code or (backward adaptively) when the decoder
observes that the representation of a residual is too long. In the first case you send extra bits before hand, in
the second you don't send side information but you pay the price of a residual sample that is longer because
encoded with a sub optimal table.

Another possibility, is to switch predictors, with the same mechanisms.

Edit: I see that this has been suggested...

And another is to use an arithmetic encoder and adapt instead the probability model to your X most recent samples.

Rice is risky and it makes sense only if you are working with hardware implementation (and even in this case I
think it's not worth the trouble). I would at least look into a more general Golomb. There are low complexity methods
to estimate the parameter on the fly (see for example the original LOCO-I/JPEG-LS paper by Weinberger et al.)