Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: AI language models can exceed PNG and FLAC in lossless compression, says study (Read 5633 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #25
Everyone keeps talking about 16bps, but as I read it, the paper suggests that the samples have been converted to 8bps.
Where?

Under 3.1
Quote
[...] 𝐶 = 2048 bytes, i.e., 2048 tokens of 8 bits that represent the ASCII characters [...] We extract contiguous patches of size 32 × 64 from all images, flatten them, convert them to grayscale (so that each byte represents exactly one pixel) to obtain samples of 2048 bytes.
It doesn't say it out loud, but it seems reasonable to assume that a model trained on ASCII works with data points that are a single byte. This is reinforced by the statement that the PNG data is 1 byte per pixel. Why would they restrict PNG to 1 byte per pixel, but do audio with 2 byte per sample?
Technically enwik8 is utf8 (~99% ascii) so the transformer models are trained as "ascii with some noise". Chinchilla is probably trained with pure ascii unless they've taken other languages into account.

All in all, I think the results are highly specific and cannot be generalized to audio and image data in general, unless the model can learn that data points might be larger than one byte. If it was easy to do that, why would they restrict the PNG data to 1 byte per pixel?
Like you say further up I think they've screwed with the corpus data heavily to make it 8bps, likely destroying any useful comparability to domain codecs.

Flac compressing enwik9 to 88.9% is suspiciously close to 87.5%, they're almost certainly treating the input as 8 bit and this is simply a weird way to omit the leading zero bit in a mostly-ascii source. In fact I just tested it with enwik8, enwik9 is definitely compressed in their test with flac as 8bps. Also tried random data that's "roughly ascii" in nature and it compresses similarly.

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #26
Everyone keeps talking about 16bps, but as I read it, the paper suggests that the samples have been converted to 8bps.

Looks plausible. It looks like you've got the key to solving the mystery.

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #27
So that would be bits per sample then?  In other contexts "bps" means bits per second.
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #28
LibriSpeech original samples librispeech-8bit.wav bytes librispeech-8bit.flac bytes FLAC ratioLZMA2 bytesLZMA2 ratio
test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0006.flac104,76642,52140.6%36,96835.3%
test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0007.flac69,16431,83546.0%24,99936.1%
test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0008.flac93,64437,89340.5%31,74233.9%
test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0009.flac210,04476,19536.3%71,19233.9%
Code: [Select]
flac-1.4.3-x64-AVX2\flac.exe -d -f --force-legacy-wave-format in.original.flac -o flac-decode-16bit.wav
SoX-14.4.2-20230624-x64\sox.exe flac-decode-16bit.wav -r 16000 -c 1 -b 8 librispeech-8bit.wav
flac-1.4.3-x64-AVX2\flac.exe -f librispeech-8bit.wav -o librispeech-8bit.flac
7-Zip\7z.exe a librispeech-8bit.7z librispeech-8bit.wav

It's close, and LZMA2 is certainly smaller than the FLAC in this particular setting.

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #29
I'm not familiar with that Sox command-line, so let me ask: did you just recreate the WAV header such that one 16-bit sample is interpreted as two 8-bit samples, i.e., the high-byte as one sample and the low-byte as another sample? How can this be better compressible by FLAC than a meaningful unscrambled 16-bit waveform? Have you listened to the resulting files or looked at them in an audio editor? Do they contain long passages of digital silence?

Here come the times when for encoding you shouldn't consider only the size of encoded files, but executable and dictionary/model size too. Computing power and memory bandwidth requirements rose significantly too.
Exactly! That's common practice in both MPEG and 3GPP codec standardizations, btw, especially when the codec in question is supposed to run on a mobile device (so basically every codec nowadays).

Chris
If I don't reply to your reply, it means I agree with you.

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #30
I'm not familiar with that Sox command-line, so let me ask: did you just recreate the WAV header such that one 16-bit sample is interpreted as two 8-bit samples, i.e., the high-byte as one sample and the low-byte as another sample?
It is a normal bit-depth reduction command. It is not a command to split one file into two. As expected from being 8bit, it has lower S/N. It is wholly dithered.                                                                                                                                                                                                                                                                                                                            

How can this be better compressible by FLAC than a meaningful unscrambled 16-bit waveform? Have you listened to the resulting files or looked at them in an audio editor? Do they contain long passages of digital silence?

The MSB is easier than the LSB to predict. The dither is audible throughout the tracks. No section of digital silence, the whole tracks are dithered, even when the space between speech.

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #31
This is the test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0009.flac file, converted to 8bit using sox. 76,195 bytes.

"2414-128292-0009 FOR WHEN ZARATHUSTRA SCRUTINISED HIM WITH HIS GLANCE HE WAS FRIGHTENED AS BY A SUDDEN APPARITION SO SLENDER SWARTHY HOLLOW AND WORN OUT DID THIS FOLLOWER APPEAR"

Original, distributed version:
Quote
General
Complete name                  : C:\~~~~~~~\test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0009.flac
Format                         : FLAC
Format/Info                    : Free Lossless Audio Codec
File size                      : 235 KiB
Duration                       : 13 s 125 ms
Overall bit rate mode          : Variable
Overall bit rate               : 146 kb/s

Audio
Format                         : FLAC
Format/Info                    : Free Lossless Audio Codec
Duration                       : 13 s 125 ms
Bit rate mode                  : Variable
Bit rate                       : 141 kb/s
Channel(s)                     : 1 channel
Channel layout                 : C
Sampling rate                  : 16.0 kHz
Bit depth                      : 16 bits
Compression mode               : Lossless
Stream size                    : 227 KiB (97%)
Writing library                : libFLAC 1.2.1 (UTC 2007-09-17)
MD5 of the unencoded content   : C6C1AF5F80BB643A4172F406186017E7

16 bit original turned into 8 bit by sox.
Quote
General
Complete name                  : C:\~~~~\librispeech-8bit.flac
Format                         : FLAC
Format/Info                    : Free Lossless Audio Codec
File size                      : 74.4 KiB
Duration                       : 13 s 125 ms
Overall bit rate mode          : Variable
Overall bit rate               : 46.4 kb/s

Audio
Format                         : FLAC
Format/Info                    : Free Lossless Audio Codec
Duration                       : 13 s 125 ms
Bit rate mode                  : Variable
Bit rate                       : 41.4 kb/s
Channel(s)                     : 1 channel
Channel layout                 : C
Sampling rate                  : 16.0 kHz
Bit depth                      : 8 bits
Compression mode               : Lossless
Stream size                    : 66.3 KiB (89%)
Writing library                : libFLAC 1.4.3 (UTC 2023-06-23)
MD5 of the unencoded content   : 17D503AC844BB657FC6BE24F755B9BE7

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #32
Alright then, in that case I'll assume they did not use dither, just simple rounding. Then, many speech pauses will turn into digital silence, making FLAC compression to ≈30% of the WAV size possible, I guess.

Edit: To simulate this, I manually zeroed out the speech pauses in your librispeech-8bit.flac and, after re-FLACing, the compression ratio improves from 36.3% to around 32%, as expected.

And the fact that LZMA2 compresses a bit better than FLAC can probably be explained by the frame-wise header overhead in FLAC, which turns significant with file sizes as small as those which you reported.

Chris

P.S.: And compared with the input media (audio, color images), none of what they do is actually lossless then. Still wondering about the 16.4%, though.
If I don't reply to your reply, it means I agree with you.

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #33
Indeed, converting 16bit FLAC to 8bit without dither, it is getting closer to the paper's "LZMA2 is 29.9%, FLAC 30.9%" claim.
With headers FLAC 31.6%, without headers FLAC 27.6%.

Quote
General
Complete name                  : C:\~~~~~~~~\librispeech-8bit-no-dither.flac
Format                         : FLAC
Format/Info                    : Free Lossless Audio Codec
File size                      : 64.8 KiB
Duration                       : 13 s 125 ms
Overall bit rate mode          : Variable
Overall bit rate               : 40.5 kb/s

Audio
Format                         : FLAC
Format/Info                    : Free Lossless Audio Codec
Duration                       : 13 s 125 ms
Bit rate mode                  : Variable
Bit rate                       : 35.4 kb/s
Channel(s)                     : 1 channel
Channel layout                 : C
Sampling rate                  : 16.0 kHz
Bit depth                      : 8 bits
Compression mode               : Lossless
Stream size                    : 56.7 KiB (87%)
Writing library                : libFLAC 1.4.3 (UTC 2023-06-23)
MD5 of the unencoded content   : 99573F92139EF6D23F0994CA4E23243F
LibriSpeech original samples librispeech-8bit.wav bytes librispeech-8bit.flac bytes FLAC ratioLZMA2 bytesLZMA2 ratiolibrispeech-8bit-no-dither.flac bytes Non-dither FLAC ratioNon-dither LZMA2 bytesNon-dither LZMA2 ratio
test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0009.flac210,04476,19536.3%71,19233.9%66,39431.6%54,22525.8%
Code: [Select]
flac.exe -d -f --force-legacy-wave-format in.original.distributed.flac -o flac-decode-16bit.wav
sox.exe flac-decode-16bit.wav -b 8 --no-dither librispeech-8bit-no-dither.wav
flac.exe -f librispeech-8bit-no-dither.wav -o librispeech-8bit-no-dither.flac
7z.exe a librispeech-8bit-no-dither.7z librispeech-8bit-no-dither.wav

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #34
As I said in a post which has gone missing, you can't trust AI to produce real results.  It will seem real much of the time but then it goes off into its own fairyland when you least expect.
It's your privilege to disagree, but that doesn't make you right and me wrong.

Re: AI language models can exceed PNG and FLAC in lossless compression, says study

Reply #35
As for compression ratio, this AI's compression ratio is very close to the OptimFrog maximum compression setting.
According to the paper, "LZMA2 is 29.9%, FLAC 30.9% Chinchilla 70B 21.0%". In this one-file test, 20.0%.

Used sample is the "test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0009.flac", distributed original.

samplesfile bytescompression ratio
2414-128292-0009.flac240,349-
librispeech-8bit-no-dither.wav210,044100.0%
LZMA2 of 8bit54,22525.8%
FLAC of 8bit66,39431.6%
FLAC of 8bit, best65,93631.4%
FLAC of 8bit, best, no padding57,74027.5%
OptimFrog of 8bit, max compression41,93920.0%
command lines:
Code: [Select]
flac.exe -d -f --force-legacy-wave-format 2414-128292-0009.flac -o flac-decode-16bit.wav
sox.exe flac-decode-16bit.wav -b 8 --no-dither librispeech-8bit-no-dither.wav
7z.exe a librispeech-8bit-no-dither.7z librispeech-8bit-no-dither.wav
flac.exe -f librispeech-8bit-no-dither.wav -o librispeech-8bit-no-dither.flac
flac.exe -l 12 -b 4096 -m -r 6 -A subdivide_tukey(3) -f librispeech-8bit-no-dither.wav -o librispeech-8bit-no-dither.best.flac
flac.exe --no-padding -l 12 -b 4096 -m -r 6 -A subdivide_tukey(3) -f librispeech-8bit-no-dither.wav -o librispeech-8bit-no-dither.best-no-padding.flac
ofr.exe --encode librispeech-8bit-no-dither.wav --preset max --output librispeech-8bit-no-dither.presetmax.ofr