AI language models can exceed PNG and FLAC in lossless compression, says study

Topic: AI language models can exceed PNG and FLAC in lossless compression, says study (Read 4716 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #25 – 2023-10-02 12:31:02

Quote from: ktf on 2023-10-02 09:54:29

Quote from: fooball on 2023-10-02 08:56:54
Quote from: ktf on 2023-10-02 07:03:34
Everyone keeps talking about 16bps, but as I read it, the paper suggests that the samples have been converted to 8bps.
Where?

Under 3.1
Quote
[...] 𝐶 = 2048 bytes, i.e., 2048 tokens of 8 bits that represent the ASCII characters [...] We extract contiguous patches of size 32 × 64 from all images, flatten them, convert them to grayscale (so that each byte represents exactly one pixel) to obtain samples of 2048 bytes.
It doesn't say it out loud, but it seems reasonable to assume that a model trained on ASCII works with data points that are a single byte. This is reinforced by the statement that the PNG data is 1 byte per pixel. Why would they restrict PNG to 1 byte per pixel, but do audio with 2 byte per sample?

Technically enwik8 is utf8 (~99% ascii) so the transformer models are trained as "ascii with some noise". Chinchilla is probably trained with pure ascii unless they've taken other languages into account.

Quote from: ktf on 2023-10-02 09:54:29

All in all, I think the results are highly specific and cannot be generalized to audio and image data in general, unless the model can learn that data points might be larger than one byte. If it was easy to do that, why would they restrict the PNG data to 1 byte per pixel?

Like you say further up I think they've screwed with the corpus data heavily to make it 8bps, likely destroying any useful comparability to domain codecs.

Flac compressing enwik9 to 88.9% is suspiciously close to 87.5%, they're almost certainly treating the input as 8 bit and this is simply a weird way to omit the leading zero bit in a mostly-ascii source. In fact I just tested it with enwik8, enwik9 is definitely compressed in their test with flac as 8bps. Also tried random data that's "roughly ascii" in nature and it compresses similarly.

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #26 – 2023-10-02 15:40:49

Quote from: ktf on 2023-10-02 07:03:34

Everyone keeps talking about 16bps, but as I read it, the paper suggests that the samples have been converted to 8bps.

Looks plausible. It looks like you've got the key to solving the mystery.

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #27 – 2023-10-02 15:43:42

So that would be bits per sample then? In other contexts "bps" means bits per second.

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #28 – 2023-10-02 17:14:06

LibriSpeech original samples	librispeech-8bit.wav bytes	librispeech-8bit.flac bytes	FLAC ratio	LZMA2 bytes	LZMA2 ratio
test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0006.flac	104,766	42,521	40.6%	36,968	35.3%
test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0007.flac	69,164	31,835	46.0%	24,999	36.1%
test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0008.flac	93,644	37,893	40.5%	31,742	33.9%
test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0009.flac	210,044	76,195	36.3%	71,192	33.9%

Code: [Select]

flac-1.4.3-x64-AVX2\flac.exe -d -f --force-legacy-wave-format in.original.flac -o flac-decode-16bit.wav
SoX-14.4.2-20230624-x64\sox.exe flac-decode-16bit.wav -r 16000 -c 1 -b 8 librispeech-8bit.wav
flac-1.4.3-x64-AVX2\flac.exe -f librispeech-8bit.wav -o librispeech-8bit.flac
7-Zip\7z.exe a librispeech-8bit.7z librispeech-8bit.wav

It's close, and LZMA2 is certainly smaller than the FLAC in this particular setting.

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #29 – 2023-10-02 18:53:10

I'm not familiar with that Sox command-line, so let me ask: did you just recreate the WAV header such that one 16-bit sample is interpreted as two 8-bit samples, i.e., the high-byte as one sample and the low-byte as another sample? How can this be better compressible by FLAC than a meaningful unscrambled 16-bit waveform? Have you listened to the resulting files or looked at them in an audio editor? Do they contain long passages of digital silence?

Quote from: rutra80 on 2023-10-02 07:43:50

Here come the times when for encoding you shouldn't consider only the size of encoded files, but executable and dictionary/model size too. Computing power and memory bandwidth requirements rose significantly too.

Exactly! That's common practice in both MPEG and 3GPP codec standardizations, btw, especially when the codec in question is supposed to run on a mobile device (so basically every codec nowadays).

Chris

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #30 – 2023-10-02 19:08:22

Quote from: C.R.Helmrich on 2023-10-02 18:53:10

I'm not familiar with that Sox command-line, so let me ask: did you just recreate the WAV header such that one 16-bit sample is interpreted as two 8-bit samples, i.e., the high-byte as one sample and the low-byte as another sample?

It is a normal bit-depth reduction command. It is not a command to split one file into two. As expected from being 8bit, it has lower S/N. It is wholly dithered.

Quote from: C.R.Helmrich on 2023-10-02 18:53:10

How can this be better compressible by FLAC than a meaningful unscrambled 16-bit waveform? Have you listened to the resulting files or looked at them in an audio editor? Do they contain long passages of digital silence?

The MSB is easier than the LSB to predict. The dither is audible throughout the tracks. No section of digital silence, the whole tracks are dithered, even when the space between speech.

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #31 – 2023-10-02 19:24:04

This is the test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0009.flac file, converted to 8bit using sox. 76,195 bytes.

"2414-128292-0009 FOR WHEN ZARATHUSTRA SCRUTINISED HIM WITH HIS GLANCE HE WAS FRIGHTENED AS BY A SUDDEN APPARITION SO SLENDER SWARTHY HOLLOW AND WORN OUT DID THIS FOLLOWER APPEAR"

Original, distributed version:

Quote

General
Complete name : C:\~~~~~~~\test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0009.flac
Format    : FLAC
Format/Info : Free Lossless Audio Codec
File size : 235 KiB
Duration    : 13 s 125 ms
Overall bit rate mode : Variable
Overall bit rate    : 146 kb/s

Audio
Format    : FLAC
Format/Info : Free Lossless Audio Codec
Duration    : 13 s 125 ms
Bit rate mode : Variable
Bit rate    : 141 kb/s
Channel(s)    : 1 channel
Channel layout    : C
Sampling rate : 16.0 kHz
Bit depth : 16 bits
Compression mode    : Lossless
Stream size : 227 KiB (97%)
Writing library : libFLAC 1.2.1 (UTC 2007-09-17)
MD5 of the unencoded content   : C6C1AF5F80BB643A4172F406186017E7

16 bit original turned into 8 bit by sox.

Quote

General
Complete name : C:\~~~~\librispeech-8bit.flac
Format    : FLAC
Format/Info : Free Lossless Audio Codec
File size : 74.4 KiB
Duration    : 13 s 125 ms
Overall bit rate mode : Variable
Overall bit rate    : 46.4 kb/s

Audio
Format    : FLAC
Format/Info : Free Lossless Audio Codec
Duration    : 13 s 125 ms
Bit rate mode : Variable
Bit rate    : 41.4 kb/s
Channel(s)    : 1 channel
Channel layout    : C
Sampling rate : 16.0 kHz
Bit depth : 8 bits
Compression mode    : Lossless
Stream size : 66.3 KiB (89%)
Writing library : libFLAC 1.4.3 (UTC 2023-06-23)
MD5 of the unencoded content   : 17D503AC844BB657FC6BE24F755B9BE7

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #32 – 2023-10-02 19:30:04

Alright then, in that case I'll assume they did not use dither, just simple rounding. Then, many speech pauses will turn into digital silence, making FLAC compression to ≈30% of the WAV size possible, I guess.

Edit: To simulate this, I manually zeroed out the speech pauses in your librispeech-8bit.flac and, after re-FLACing, the compression ratio improves from 36.3% to around 32%, as expected.

And the fact that LZMA2 compresses a bit better than FLAC can probably be explained by the frame-wise header overhead in FLAC, which turns significant with file sizes as small as those which you reported.

Chris

P.S.: And compared with the input media (audio, color images), none of what they do is actually lossless then. Still wondering about the 16.4%, though.

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #33 – 2023-10-03 17:58:17

Indeed, converting 16bit FLAC to 8bit without dither, it is getting closer to the paper's "LZMA2 is 29.9%, FLAC 30.9%" claim.
With headers FLAC 31.6%, without headers FLAC 27.6%.

Quote

General
Complete name : C:\~~~~~~~~\librispeech-8bit-no-dither.flac
Format    : FLAC
Format/Info : Free Lossless Audio Codec
File size : 64.8 KiB
Duration    : 13 s 125 ms
Overall bit rate mode : Variable
Overall bit rate    : 40.5 kb/s

Audio
Format    : FLAC
Format/Info : Free Lossless Audio Codec
Duration    : 13 s 125 ms
Bit rate mode : Variable
Bit rate    : 35.4 kb/s
Channel(s)    : 1 channel
Channel layout    : C
Sampling rate : 16.0 kHz
Bit depth : 8 bits
Compression mode    : Lossless
Stream size : 56.7 KiB (87%)
Writing library : libFLAC 1.4.3 (UTC 2023-06-23)
MD5 of the unencoded content   : 99573F92139EF6D23F0994CA4E23243F

LibriSpeech original samples	librispeech-8bit.wav bytes	librispeech-8bit.flac bytes	FLAC ratio	LZMA2 bytes	LZMA2 ratio	librispeech-8bit-no-dither.flac bytes	Non-dither FLAC ratio	Non-dither LZMA2 bytes	Non-dither LZMA2 ratio
test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0009.flac	210,044	76,195	36.3%	71,192	33.9%	66,394	31.6%	54,225	25.8%

Code: [Select]

flac.exe -d -f --force-legacy-wave-format in.original.distributed.flac -o flac-decode-16bit.wav
sox.exe flac-decode-16bit.wav -b 8 --no-dither librispeech-8bit-no-dither.wav
flac.exe -f librispeech-8bit-no-dither.wav -o librispeech-8bit-no-dither.flac
7z.exe a librispeech-8bit-no-dither.7z librispeech-8bit-no-dither.wav

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #34 – 2023-10-03 18:56:00

As I said in a post which has gone missing, you can't trust AI to produce real results. It will seem real much of the time but then it goes off into its own fairyland when you least expect.

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #35 – 2023-10-07 08:05:02

As for compression ratio, this AI's compression ratio is very close to the OptimFrog maximum compression setting.
According to the paper, "LZMA2 is 29.9%, FLAC 30.9% Chinchilla 70B 21.0%". In this one-file test, 20.0%.

Used sample is the "test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0009.flac", distributed original.

samples	file bytes	compression ratio
2414-128292-0009.flac	240,349	-
librispeech-8bit-no-dither.wav	210,044	100.0%
LZMA2 of 8bit	54,225	25.8%
FLAC of 8bit	66,394	31.6%
FLAC of 8bit, best	65,936	31.4%
FLAC of 8bit, best, no padding	57,740	27.5%
OptimFrog of 8bit, max compression	41,939	20.0%

command lines:

Code: [Select]

flac.exe -d -f --force-legacy-wave-format 2414-128292-0009.flac -o flac-decode-16bit.wav
sox.exe flac-decode-16bit.wav -b 8 --no-dither librispeech-8bit-no-dither.wav
7z.exe a librispeech-8bit-no-dither.7z librispeech-8bit-no-dither.wav
flac.exe -f librispeech-8bit-no-dither.wav -o librispeech-8bit-no-dither.flac
flac.exe -l 12 -b 4096 -m -r 6 -A subdivide_tukey(3) -f librispeech-8bit-no-dither.wav -o librispeech-8bit-no-dither.best.flac
flac.exe --no-padding -l 12 -b 4096 -m -r 6 -A subdivide_tukey(3) -f librispeech-8bit-no-dither.wav -o librispeech-8bit-no-dither.best-no-padding.flac
ofr.exe --encode librispeech-8bit-no-dither.wav --preset max --output librispeech-8bit-no-dither.presetmax.ofr

Notice