Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares (Read 13131 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Hi all,
A really weird bug I encountered while compressing some wavs to flac: On some small files in particular, the encoder appears to freeze for several minutes; the command prompt shows the typical Flac.exe welcome message but no compression progress is shown. During this time Flac.exe's memory usage climbs up to 4 gb at times! And when it finally encodes the file, the output is ridiculously huge (300 kb wav becomes 1 or 2 gb flac!) So yeah, there's definitely a problem. I guess the silver lining is the file doesn't appear broken, just bloated beyond belief. X86 or X64 builds don't seem to make a  difference. Not sure what else I should be testing or what other info is needed about my system etc.

I've attached a troublesome  wav file. If I compress this with -6 or more, it breaks. Can someone confirm this?

I happened to have an old copy of Flac 1.3.2 lying around and it seems to be fine. Pretty sure I got that from rarewares too.

Edit: Just downloaded the newest 1.3.3 from this thread, and that seems to be fine at the moment. Will have to test it more to be sure.
Edit 2: Hmm seems to be doing the same thing but on different files.
For now I'll downgrade to 1.3.2 but I hope to figure this out...

Any advice?
Thanks!

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #1
I've attached a troublesome  wav file. If I compress this with -6 or more, it breaks. Can someone confirm this?
confirmed

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #2
I'll compile and upload latest git for you to check against. I'll post here when it's available.

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #3
New compiles now at rarewares. :)

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #4
The new 32-bit compile still does not work. I tried removing all metadata from the wav, no improvement. After leaking memory encode fails:

test-.wav: ERROR during encoding
state = FLAC__STREAM_ENCODER_FRAMING_ERROR

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #5
The new 64-bit compile with -8 produced a 1GB File with the Test.flac but no error message.

 

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #6
I confirm that both 32bit and 64bit compiles fail on the test.wav, above. Similarly NetRanger's GCC 32bit and 64bit compiles of the same git generation also fail, so this is a FLAC issue. The last version I have that encodes this correctly is flac-v1.3.3.git-ce6dd6b, so I shall revert to this on rarewares.

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #7
I can confirm it also.

But i tested to convert the 'test.wav' file to 16-bit instead of 24-bit and then it converted perfectly, no issues at all. Wonder why FLAC reacts like it do with that 24-bit wav file when converting it to 24-bit FLAC.

I have never stumbled over any files b4 that FLAC have had issues with.

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #8
Huh, glad I'm not going crazy then!

Thanks john33 for updating the version on Rarewares. I'll test it at some point soon and run it on other similar files ot make sure it is truly stable.

I wonder what's actually going wrong though?

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #9
It fails with block size of 4096 (flac -8 -b 4096), other block sizes works fine. Hmm, weird...
gold plated toslink fan

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #10
Seems to be the combination of block size 4096 with partial_tukey(2)? Even changing the "(2)" resolves it.

Started out with  -l 12 -b 4096 -m -r 6 -A tukey(0.5);partial_tukey(2);punchout_tukey(3) as that is "-8", and fiddled a bit.
Removing partial_tukey(2) resolved it. Tweaked -l even up to 32 and -r even up to 15 (with --lax). Sure, no prob. Including partial_tukey(n) for n=1,3,4,5,6 is not an issue. Even including the whole load of other functions and partial_tukey(otherthantwo) in the end ... OK. Worked my way to
--lax -l 32 -b 4096  -r 15 -m  -A bartlett;bartlett_hann;blackman;blackman_harris_4term_92db;connes;flattop;gauss(0.2);hamming;hann;kaiser_bessel;nuttall;rectangle;triangle;tukey(0.5);punchout_tukey(2);partial_tukey(4) -p -e
which took some seconds, but sure, no prob. (Got down to 112973 by using arguments gauss(0.1) and tukey(0.2) and punchout_tukey(5) and partial_tukey(6).)

But  -l 2 -b 4096  -r 1   -A partial_tukey(2) is A Bad Thing.



Version tried: the Win64 of the download from https://hydrogenaud.io/index.php?topic=118008.msg1001140#msg1001140

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #11
Nice find! I took a quick look and it seems this sample triggers a problem that I've come across a while ago: http://lists.xiph.org/pipermail/flac-dev/2020-July/006448.html However, I thought this would only occur with 'hacked' code, but you found a sample that triggers this on current git as well. It seems there is some underlying problem here too.

After bisecting this, I found that one of my own commits is the culprit: https://github.com/xiph/flac/commit/ae288c067c03bc3404eff67063958bfe854f5a01 I'm not sure how this change causes the problem, but I will definitely dive into it.

Forget that last bit, that only changes the compression ratio display. With that fix reverted the file is still bloated, but FLAC doesn't display it as such. It is the commit before that: https://github.com/xiph/flac/commit/ced7f6829d14e38128bf0ba66412cc0541246c46 Apparently this commit is wrong and it actually worsens the problem it was supposed to fix....
Music: sounds arranged such that they construct feelings.

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #12
Here's a pull request that should fix this problem:

https://github.com/xiph/flac/pull/251
Music: sounds arranged such that they construct feelings.

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #13
Good news! I'll recompile once it becomes available.


Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #15
Thanks, ktf! So to clarify again, has this been an issue already in FLAC 1.3.2 or was the bug that you're fixing introduced sometime around the 1.3.3 release?

Chris
If I don't reply to your reply, it means I agree with you.

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #16
That depends on how you look at it.

The original overflow possibility has been in FLAC probably since its inception. Triggering this original overflow much became more likely with the introduction of new apodization (partial_tukey for example) functions in FLAC 1.3.1.

I've tried to fix this overflow, the commit was merged into mainline git on 15 March 2020. This has not been in any release yet, so 1.3.3 is unaffected. This commit fixes the overflow in certain cases (specifically, the corner case I was researching at the time), but worsens it in other cases (like this particular sample), because I did not notice possible overflows using this value further in the code.

However, I'd like to stress that hitting this problem is still very unlikely, especially on music files, and it does not produce invalid files, just hugely bloated ones.
Music: sounds arranged such that they construct feelings.

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #17
it does not produce invalid files, just hugely bloated ones
With sample from this thread and this build used on 32 bit Windows 7 it does produce truncated (i.e. not bit-perfect copy) FLAC with wrong length in header when used with file input and it does produce truncated FLAC with unknown length when used with pipe input from foobar2000. And created FLAC files are not bloated on 32 bit OS.
Please, keep in mind compatibility with 32 bit OS, while compiling for them is officially supported.

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #18
That's quite some catch. If this is also caused by what @ktf  suspects to be an old bug just rarely triggered, ...

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #19
However, I'd like to stress that hitting this problem is still very unlikely, especially on music files, and it does not produce invalid files, just hugely bloated ones.

Some of the data is compressed music. What's in the rest of the file?
I have FLAC 1.3.3 but I don't know where I got it, maybe I've compiled it myself... and it compresses the test file as it should.
Error 404; signature server not available.

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #20
With sample from this thread and this build used on 32 bit Windows 7 it does produce truncated (i.e. not bit-perfect copy) FLAC with wrong length in header when used with file input and it does produce truncated FLAC with unknown length when used with pipe input from foobar2000. And created FLAC files are not bloated on 32 bit OS.
Oh, I didn't know. Maybe I'll look into it.

Some of the data is compressed music. What's in the rest of the file?
All of the data is compressed music. Iĺl try to explain, but it is a bit technical.

FLAC works by first modelling the data and then coding the difference between the input and the model (the residual). It can do this in two ways: it can store the residual directly with a certain number of bits (which should be less than the input number of bits, or there is no improvement), or it can store it with a residual coding method, specifically a rice code. The first method is implemented in FLAC but is not used anymore (as it is not as efficient as rice coding).

If you have an 16-bit input file, the 'numbers' in the input will be between -32768 and +32767. If the prediction was good, the residual will consists of small numbers. For hard to predict music (like metal) the residual might (mostly) be 13-bit, so between -4096 and +4095. For easy to predict music, like a solo piano, this value might (mostly) be 6-bit, so between -32 and +31. Notice that I used the word 'mostly'. The problem is that sometimes (at the start of a new note for example) this residual can be much larger for a short while.

Rice code works by choosing an certain optimal range in which most of the entropy signal lies within that range, and not too much lies outside it. Let's take the piano music as an example. The encoder chooses rice parameter 6. If for one of the samples the predicition left a residual of 8, we can simply say that it fits and store 8. This takes one bit (to say that the residual fits the 6-bit range) and 6 bits (the rice parameter). However, there will be samples that do not fit the selected range. If it fits the range -64 to +63 instead, the residual is stored as 2 bit + 6 bit. If it fits the range -96 to +95 the residual is stored as 3 bit + 6 bit. If it fits the range -128 to +127 it is stores as 4 bit + 6 bit etc.

This works very well for audio signals. FLAC has to store the model (which is quite small) too of course, but if that model is accurate, the 16-bit input signal can be transformed into a signal that is (for the largest part) 7-bit, with an occasional outlier taking 8 or 9 bit.

The problem with the bug here, is that something goes wrong with the selection of the range i.e. rice parameter. If the residual is on average 250, the encoder should pick a rice parameter of 9 to store it as 10 bit. However, in this case, the encoder wrongly picks a rice parameter of 0. Instead of needing only 10 bits to store a single sample residual, it will now need 251 bit on average per sample. This is almost 16 times larger than the original input.

My explanation of rice codes is quite short and might be hard to follow. I you want to know more, please take a look at https://en.wikipedia.org/wiki/Golomb_coding and https://en.wikipedia.org/wiki/Unary_coding The unary coding is the thing that really bloats if something goes wrong.

So, this bloat is still audio data, but the data is not stored in a format that needs less bits, but in one that needs more bits.
Music: sounds arranged such that they construct feelings.

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #21
Here's a pull request that should fix this problem:

https://github.com/xiph/flac/pull/251
New compiles below use the above although the changes are not yet incorporated into the git-master. Therefore, I am making them available here for testing but will not update Rarewares until this becomes 'official'.

www.rarewares.org/files/lossless/flac-v1.3.3-b358381-mod-x64.zip

www.rarewares.org/files/lossless/flac-v1.3.3-b358381-mod-x86.zip
 :)

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #22
All of the data is compressed music. Iĺl try to explain, but it is a bit technical.
And it was so good an explanation that even a layman like me could understand.
Thank you!
Listen to the music, not the media it's on.
União e reconstrução

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #23
All of the data is compressed music. Iĺl try to explain, but it is a bit technical.

You explained it nicely - I think I understood where the problem is.
Error 404; signature server not available.

Re: horrendous file bloating when using Flac 1.3.3 binaries from Rarewares

Reply #24
Hi all,

As said before, I've already sent a patch for this issue here: https://github.com/xiph/flac/pull/251

However, that fixes the overflow triggered. I've now sent in a patch to fix the root cause: https://github.com/xiph/flac/pull/252

In short: I explained in post #20 that sometimes FLAC chooses the wrong rice parameter, which causes enormous file bloat. Because of an overflow, the FLAC encoder actually thinks that the resulting very large frame is the smallest possible. This overflow should be fixed in PR 251, the first patch. This second patch, PR 252, makes FLAC not choose the wrong rice parameter anymore, fixing the root cause.
Music: sounds arranged such that they construct feelings.