Justin, I found another bug that may be the culprit. I've gone back to finish the speedups of the libFLAC bitbuffer I started a long time ago and realized it is similar to flake's. then I noticed a problem in bitwriter_writebits():
static inline void
bitwriter_writebits(BitWriter *bw, int bits, uint32_t val)
{
assert(bits == 32 || val < (1U << bits));
if(bits == 0) return;
if(bw->eof || (bw->buf_ptr+3) >= bw->buf_end) {
bw->eof = 1;
} else {
if(bits < bw->bit_left) {
bw->bit_buf = (bw->bit_buf << bits) | val;
bw->bit_left -= bits;
} else {
bw->bit_buf <<= bw->bit_left; ///////////// <-------- HERE
bw->bit_buf |= val >> (bits - bw->bit_left);
if(bw->buffer != NULL) {
*(uint32_t *)bw->buf_ptr = be2me_32(bw->bit_buf);
}
bw->buf_ptr += 4;
bw->bit_left += (32 - bits);
bw->bit_buf = val;
}
}
}
if bits==32 and bw->bit_left==32, then it will get to that line and try to shift left by 32 bits. according to C semantics that is a NOP, so bit_buf will have junk instead of being 0 for the next instruction.
I think it's just lucky that it is rarely (never?) triggered. most writes are <32bit quantities and you do do 31 bits at a time of the rice unary portion. but some metadata writes are doing 32 bits and if they are word-aligned there could be trouble.
in my new writer I am doing something like this (trying to translate back into flake-style, may be buggy):
} else if(bw->bit_left<32) {
bw->bit_buf <<= bw->bit_left; ///////////// <-------- OK now
bw->bit_buf |= val >> (bits - bw->bit_left);
if(bw->buffer != NULL) {
*(uint32_t *)bw->buf_ptr = be2me_32(bw->bit_buf);
}
bw->buf_ptr += 4;
bw->bit_left += (32 - bits);
bw->bit_buf = val;
} else {
assert(bits==32);
bw->bit_buf = val;
if(bw->buffer != NULL) {
*(uint32_t *)bw->buf_ptr = be2me_32(bw->bit_buf);
}
bw->buf_ptr += 4;
}
it can actually be optimized more but that's the idea.
Josh