Near-lossless / lossy FLAC
Reply #75 – 2007-06-15 19:10:44
Three rather unrelated but still on-topic comments: (1) I'd like to note that it's not only the "frame size" that should match. This preprocessor and any lossless encoder exploiting zeroed LSBs should be in perfect sync (not only the same frame sizes but the same frame boundary positions). (2) It's nice to have those isolated tools ("simplifier" and lossless encoder) but this also limits the performance. So one should either go for a combined tool with variable length blocks or a modified lossless encoder which is smart enough to detect varying "wasted_bits" and partitions the stream accordingly. (3) Here's another technical thought which might be interesting for Thomas in case he wants to add lossy support to TAK: Selecting "wasted_bits" to be an integer allows an encoder to control the signal-to-noise ratio in steps of 6 dB only. Compared to other lossy codecs (MP3, AAC control the SNR in steps of roughly 1.1 dB = 1.5*(3/4) dB) this 6 dB step size is quite large. This is an old idea of mine of how to get more resolution: Make it probabilistic. You can store in each frame or subframe (you might want to allow changing the resolution within a frame) the information "wasted_bits = x with probability p and x+1 with probability (1-p)" and use the same pseudo-random number generator in encoder and decoder for deciding the "wasted_bits" value per sample. Also you should think about generating the actual "wasted bits" via this RNG instead of zeroing them. This would be equivalent to subtractive dithering and avoids nonlinear distortions. Entropy coding might be a bit more complicated, though. Per sample coding could be done like this:wbits = minWasted + RNG.nextfloat()>p ? 1 : 0; // randomly chosen wasted bits count waste = RNG.nextIntBits(wbits); // randomly generated LSBs quantized_to_code = round( (float)(current_sample-waste) / 2^wbits ); // sample to code quantized_actual = (quantized_to_code << wbits) + waste: // dequantized sample Of course, the encoder's RNG state should match the decoder's (ie. same seed). Good news: Noise shaping doesn't need to be part of the format specification but can later be added to the encoder without breaking anything. Cheers! SG