Apparently FLAC compresses the ImageNet data better than PNG, gzip and LZMA when offered in chunks of 2048 byte? That seems very unlikely? I'd say the results of LZMA, gzip, PNG and FLAC are suspiciously similar when considering that they work with completely different methods. Sure, PNG is more or less a general purpose filter with a specific context-aware filter, but how would you even feed non-image data to a PNG compressor? That context-aware filter depends on height and width of the image, which is not applicable for non-image data.
Also, I don't really understand the 'chunking' with the audio data. It seems to me they have chopped up the data in chunks of 2048 bytes and concatenated them. If the data is 16 bit per sample, that means only 1024 samples for each chunk, which really isn't representative for any kind of audio. Similar for pictures:
We extract contiguous patches of size 32 × 64 from all images, flatten them, convert them to grayscale (so that each byte represents exactly one pixel) to obtain samples of 2048 bytes. We then concatenate 488 821 of these patches, following the original dataset order, to create a dataset of 1 GB.This doesn't seem in any way representative of any real-world use case? Image and audio data don't fit 1 byte per sample most of the time, so the data seems 'crafted' to me.
Finally, I don't understand where the 107% figure comes from for FLAC when compressing noise. It does much better than that. When I compress noise as an 8bps single channel stream with a blocksize of 2048 (as suggested in the paper) I get only 0.5% overhead, not 7%.