Choosing frame size
Reply #3 – 2011-02-10 23:02:46
AFAIK, most lossless codecs use trial-and-error based approaches. I'm not sure what you mean by "and doing it based on how the expected value changes". The predictor coefficients you calculate will be quite dependant on the framing. I'm doing it the Shorten way, calculating predictor residuals from last samples (0 to 3 last samples). The expected value is a good way to estimate how big these values are - so if I calculate it after every new sample I would be able to tell if it begun to grow significantly, which would mean that I should start a new frame. An example (artificial, but it should explain what I want):i | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | ------------------------------------------------------------------------------------------ Values | 3 | 4 | 6 | 8 | 7 | 5 | 15 | 26 | 37 | 47 | 37 | 45 | 37 | 28 | e1: | 3 | 1 | 2 | 2 | -1 | -2 | 10 | 11 | 11 | 10 |-10 | 12 |-12 | -9 | sum(abs(e1))(0..i) | 3 | 4 | 6 | 8 | 9 | 11 | 21 | 32 | 43 | 53 | 63 | 75 | 87 | 98 | E(abs(e1))(0..i) | 3 | 2 | 2 | 2 | 2 | 2 | 3 | 4 | 5 | 5 | 5 | 6 | 6 | 7 | So now, when I see that at i = 6 the expected value E begins to rise, it would be probably good to start a new frame. This way the 1-6 part would be encoded with shorter codewords, and codewords for the 7-14 would be more suitable for it (choosing a better n for the rice code would make the unary part of the codeword shorter for most values, or even a predictor of a higher order might work better for this frame). Of course there would be a minimum frame length because for every frame I need a header and extremely short frames would have a long header compared to their bodies. And if E wouldn't vary a lot, this would produce an extra long frame, so I don't waste bytes for the frames metadata. The problem is of course how to measure that the expected values begun to grow rapidly (calculating derivatives?) and whether it would make any difference - maybe such situations won't happen or will happen rarely?What kind of prediction are you using? If between samples, why do you care about frame size at all? That's what I'm asking about - do such situations as I described above happen in normal audio signal (or better, whether they happen often)?