Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Tested: > 4 GiB handling by lossless codecs/encoders (including fb2k/ffmpeg) (Read 1338 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Tested: > 4 GiB handling by lossless codecs/encoders (including fb2k/ffmpeg)

Next chapter in the series "Porcus does the testing so that you won't have to construct these stupid signals to feed that stupid encoder" is: Support for more than the WAVE size limit - 4 GiB PCM or more?
TL;DR: I got pretty much everything to create misbehaved files. In some cases it is by (re-) construction, but even then I would wish for better warning messages.


Initial test: Can the formats accommodate 4 GiB and can the encoders do it right?

I created some files that were altogether too big for the 4GiB PCM size, and concatenated them using foobar2000 outputting to ALAC-in-m4a, FLAC, TAK -e -p0  -ihs -wm0 - %d or dropping -wm0 also tested, TTA -eb - %d, WavPack - and .wav, where fb2k creates invalid .wav file. 
Also tried Monkey's and OptimFrog, let us do away with those once and for all:
* OptimFROG doesn't support piping, and the only way it supports 4 GiB is as .raw and decoding to .raw.  If you try to trick it using --incorrectheader, it will warn you first and scream error at you afterwards.  Don't blame it, it warned you both before and after. Acceptable behaviour indeed.
* Monkey's Audio does apparently not support > 4 GiB at all, and although something might hang if you try to force it, it will not output anything wrong (nor right). Acceptable behaviour although a bit annoying.
Then the rest:
* ALAC, FLAC, WavPack: files are fine, play fine. 
* TAK: TAK seems to have a bound at 2^31 samples per channel.  That coincides with 2^32=4Gi for mono with 16 bits (= 2 bytes) per sample).   On one hand, that means some 36 GiB at 5.1 surround and 24 bits/sample - but for 8 bits mono the limit is down to 2 GiB.
Above those 2^31, the encoded file is "Undecodable".  Encoding files you cannot decode is hardly well-behaved (ffmpeg can salvage the audio, see below).
* TTA by tta.exe: The enfant terrible again.  First, using the necessary "-eb" option that ignores chunk sizes, it will build a temp file with all the audio, and then encode the first size MOD 2^32.  That is: if the audio is 17 seconds too long, it will behave as an odometer going full circle, declare the audio to be "17 seconds", encode that, and discard the rest.  
Also the -eb is needed even from 2 GiB, because tta.exe seems to think that signed long integer is a good idea - slightly above 2 GiB you will see it claiming to buffer a negative number of samples.
* TTA by ffmpeg: just fine!  (Beware what i just learned: you cannot use ffmpeg to encode from 8-bit FLAC or WavPack.  Well who really wants to encode to TTA ... something that is already in a better format?)

Then the WAVE files:
* foobar2000 will for large enough signals create a too long ".wav" and without any warning unless you set it to show output file and notice the little question mark.
* ffmpeg will also create a too long ".wav" (unless you permit rf64) - but at least it will warn.



Next test: Can the decoders decode their formats' too-large-for-4GiB-PCM files?

* refalac: Will decode too large files to RF64, call it ".wav" and give no warning that it isn't your usual .wav.  Is that ... good?
* FLAC: Will throw "ERROR: stream is too big to fit in a single WAVE file".  Or similar if you specify -o outfile.aiff.  It can decode to RF64 or .w64, but you need to know that.  Maybe it is fine not to uncritically advocate those formats.  However, there is a risk that users will resort to ffmpeg, which defaults to its own sample-format (that is, does not default to lossless, but to something which "by coincidence" happens to be lossless for most files) ...
* WavPack will default to recreating the input file bit by bit.  That is, if you concatenated together a bunch of files with foobar2000, then fb2k will pass to WavPack an invalid WAVE header, and WavPack will output an invalid .wav file which foobar2000 will show as question mark duration again. Fair enough for an application that writes "restores" and does that, but ...
--> what about a warning?
If you try to encode this .wav file again, you will get an error that WavPack cannot encode non-standard .wav files ... which it can, with -i.
-->  what about a 'try -i'?
* TAK, the "Undecodable" files:  Nopes.
--> Use ffmpeg, it works! But beware again of ffmpeg's behaviour - which also of course deletes RIFF chunks, but you probably didn't have anything significant there if you concatenated together stuff for a too big file? Oh, and if you are dealing with 8-bits TAK: ffmpegging 8-bit to FLAC will result in a 16-bits file.
* TTA.exe & TAK, the decodable-but-too-big: While TAK attempts at restoring the file bit-by-bit, it changes the invalid length to another wrong length: TAK's odometer will go full circle and display the 17 seconds too long file as "17 seconds".  However, that is the time stamp only - the file contains all the audio, and can be salvaged. TTA does the same. TTA/TAK seem to produce same audio, but files might not be identical due to TAK trying to write the WAV header.
Also, if you use tta.exe and/or foo_input.tta (whether the official one or kode54's update), not everything is supported at all. 8 bits in a codec that boasts 8/16/24 bit support? Naaah ...
So, again: ffmpeg. With the same reservations as for TAK except RIFF chunks which you don't have in .tta in any case.



Next test: Salvaging audio from bad .wav output. Since foobar2000/ffmpeg/takc.exe/tta.exe output each their own invalid .wav files: who will get the audio out?
TL;DR: Use WavPack -i (the problem will return if you wvunpack, but at least you have your audio).

For maximum sadism, I created a 5-channel 24-bit file.  Reason: a sample per channel occupies 15 bytes, and the WAVE upper bound 2^32-1 is a multiple of 15.  Thus a sample won't be split across the boundary - sneaking under the radar of FLAC's "partial sample" error ... or as it turned out, that depends on the reported (possibly wrong) sample count.
Encoded it, decoded it using the respective encoders. Took the output .wav and tried to encode it with the application in the left column:
* No FLAC/refalac output, as I didn't get them to output anything invalid.
* No WavPack output, as WavPack only outputs what it encodes, and so that will be bit-identical to something else tested.
* Seems that tta.exe/takc.exe output are the same if TAK was called with -wm0, discarding the RIFF chunks
* MP3tag: just opened it, didn't try to write anything.
* "rejects" and "fixes" should be self-explanatory
* "truncates" means that they get the 2^32-1 length (fb2k sometimes displays slightly more or less) and the final 17 seconds are discarded.
* "destroys": encoding only the initial 17 seconds and discarding the remaining 4 GiB (or 8 GiB for even larger files).
* "How blue" I gave "error" depends a bit on how (subjectively!) clearly the error message is pushed in my face.

written by fb2k or ffmpegwritten by takc or tta.exe
refalactruncatesfixessame with -D, output to rf64
flac truncateserrormsg/destroys(*)see note
flac --ignor...fixesfixes--ignore-chunk-sizes gives a blanket warning
wavpackrejectsrejects... but please improve the error message to inform about the -i!
wavpack -ifixesfixes
takc -erejectsrejects
tta.exedestroysdestroysNo warning. Rejects ffmpeg-generated without -bitexact
foobar2000 to wave64error/truncatesdestroyswarns on the truncated, not on the destroyed
ffmpegfixes/errormsgwarns/destroysalthough it appears to hang on the latter, "be patient" (or not)
mp3tagno dataPCM "17 sec"
The files that become truncated, are the ones produced by foobar2000 and ffmpeg.  (Are fb2k's question marks actully a substitute for 2^32-1?)  foobar2000, refalac and FLAC truncate these files.
* FLAC will throw an error if it hits EOF OR the 4 GiB limit in the middle of the sample.  That was the reason to use a 24-bit (= 3 bytes) five-channel file: the max size 2^32-1 is divisible by fifteen, and so FLAC sees 286331153 samples (per channel) and happily declares that the end of it.  Avoiding this error at 4 GiB should then only be possible for 24-bit five-channel, 24-bit mono or 8-bit 5-channel/3-channel/mono; I tested a bit of those, but not completely. The asterisked result was initially surprising: here it cries foul even if the number of is divisible by fifteen ... only, the sample count written by takc.exe/tta.exe is not. (Try if you like: created one with max wave size + 1 sample.)
Note though that even if it throws error, it does produce a file. Maybe it shouldn't. Maybe there should be a "-F" without which nothing would be produced.
* For the ffmpeg-produced .wav it does not matter whether "-bitexact" is passed - except for tta.exe which refuses .wav's without.
Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

Re: Tested: > 4 GiB handling by lossless codecs/encoders (including fb2k/ffmpeg)

Reply #1
I must say, I really appreciate the amount of effort you're putting into this. Even that formatted and colour-coded table... *chef's kiss*.

Re: Tested: > 4 GiB handling by lossless codecs/encoders (including fb2k/ffmpeg)

Reply #2
Thanks. And thanks to this spreadsheet-paste to BBcode table converter.

appreciate the amount of effort
Literally I don't! I mean, that effort I "had to" put into it when a new can of worms opened - no, appreciate it I certainly didn't, but I, uh, I thought I was already nearly done ...
And then by coincidence, those divergences in attempting to salvage audio caught my attention.

One clue was how refalac can convert from wavpack directly (with the appropriate dll), and using the decoding command (-D) it doesn't have to even go by way of ALAC. -D will even "decode" .wav to .wav. Why even think of that? Because refalac can split by cuesheet (even WavPack with cuesheet embedded). And if you want to split by cuesheet, it makes all sense to let -D take whatever input and output .wav. Note, refalac's ".wav" output is RF64 whenever it has to be.

But, some fun results might sometimes come out of it. I found out how horrible ALAC is at compressing silence. (Like, it does not get below 1/3 of uncompressed PCM - that's mono though. And then NTFS compression gets the the .m4a down by 94 percent.)
Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

Re: Tested: > 4 GiB handling by lossless codecs/encoders (including fb2k/ffmpeg)

Reply #3
I must say, I really appreciate the amount of effort you're putting into this. Even that formatted and colour-coded table... *chef's kiss*.
Yeah I wish HA had a thumb up or heart counter for posts, so I could express appreciation without polluting with comments.

Re: Tested: > 4 GiB handling by lossless codecs/encoders (including fb2k/ffmpeg)

Reply #4
I must say, I really appreciate the amount of effort you're putting into this. Even that formatted and colour-coded table... *chef's kiss*.
Yeah I wish HA had a thumb up or heart counter for posts, so I could express appreciation without polluting with comments.
100%
Please do this
some ANC'd headphones + AutoEq-based impulse + Meier Crossfeed (30%)

Re: Tested: > 4 GiB handling by lossless codecs/encoders (including fb2k/ffmpeg)

Reply #5
Hm:

Next test: Salvaging audio from bad .wav output. Since foobar2000/ffmpeg/takc.exe/tta.exe output each their own invalid .wav files: who will get the audio out?
TL;DR: Use WavPack -i (the problem will return if you wvunpack, but at least you have your audio).
Actually flac --ignore-chunk-sizes might be just as good - I should maybe not have been that frightened at its blanket warning.
(WavPack's default behaviour of roundtripping to exactly the same may or may not be what you ultimately want, but it is "safe" in that it is constructed not to change any information.)

For maximum sadism, I created a 5-channel 24-bit file.
Oh, but wait ... the channels may come out allocated different.

I actually tried stereo before I came up with how 1ch&5ch 24-bit and 1/3/5ch 8-bit could - in some cases - fool flac.exe, and it seems .tta-to-wav being bit-identical to -wm0 created .tak-to-wav is not necessarily true above 2 channels.

Apparently that issue is not related to size at all. Different topic. (Ugh. Don't want to.)
Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

Re: Tested: > 4 GiB handling by lossless codecs/encoders (including fb2k/ffmpeg)

Reply #6
* OptimFROG doesn't support piping, and the only way it supports 4 GiB is as .raw and decoding to .raw.  If you try to trick it using --incorrectheader, it will warn you first and scream error at you afterwards.
Bollocks, the frog does support piping indeed but then you have to apply the --incorrectheader.
And then it will ...
... not necessarily scream:
If you pipe it the malformed .wav files produced by fb2k/ffmpeg/takc/tta, the frog will truncate. Like flac.exe, you can get under the radar of its partial sample error message by when 2^32-1 bytes form an integer number of samples per channel - and "per channel" then has to be one, as OptimFROG doesn't support anything above stereo.

And since a test with 8 bits ribbited out a different error message, "Invalid access to memory location", it means you need to take to heart for all those days when you cannot help yourself concatecatecatecatecatecatenating together 24 bits mono in invalidly big .wav: the frog will not watch over you.
Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

Re: Tested: > 4 GiB handling by lossless codecs/encoders (including fb2k/ffmpeg)

Reply #7
Monkey's gets 4 GB support.
Two days ago, this Sunday.  And RF64 support on Saturday. Also on Saturday, different release ... allegedly with .caf support. (Which I don't get to work at all.)
And, whooshed me: AU/SND support last year - I think it is the only lossless codec that supports it. 

So, after some hours of the CPU fan spinning up while monkeying and demonkeying large near-silent PCM files, here are the updates:

** Large files:
* Monkey's will work ... often.  Not always.  Which is certainly quite normal in this thread, as I have been feeding them encoders noncompliant .wav files. Here also non-compliant .au files.
* Furthermore, the .ape files can - usually - be decoded with ffmpeg.  (5.0 for Windows - new, these Monkey's versions are newer!) 
* It fixes the malformed too-large .wav files generated by foobar2000 and ffmpeg.  It rejects those generated by takc.exe and tta.exe. No truncation observed for .wav.
Exceptions:
* Monkey's misbehaves on "large" 8-bit .wav files - and does so already at 2 GB. Try the attachment (2 GB near-silence compressed to 4 kilobytes)
* Stay away from too large .au files.   I encountered bitrates of 2626 kbit/s for near-silence (PCM was 4608, WavPack was 3 kbit/s), truncation without warning, 158857129wk 4d 23:48:40.960 (0 samples) but still jolly happy about the output - and at least one file repaired.  But, even when Monkey's can handle a too large .au file and reconstruct it by decoding, ffmpeg cannot necessarily.


File format quirks that appear to be size-independent:
* Monkey's seems to reject some ffmpeg-generated RF64 files lying around since previous test. Also attached. Produced with -bitexact or something?
* I don't get it to work with .caf at all.  Regardless of endianness.
And don't let anything loose on 8-bit .au files without listening to the result. As far as I understand, 8-bit .au are signed, unlike 8-bit .wav ... anyway, it seems that when they have gone through fb2k (to convert .wav to .au), Monkey's to .ape and ffmpeg's to convert .ape to .wv ... the audio must have changed more than once.
Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

 

Re: Tested: > 4 GiB handling by lossless codecs/encoders (including fb2k/ffmpeg)

Reply #8
I was slow to see this, but thanks for the files!

Both issues fixed here:
https://monkeysaudio.com/download.html

If you have any other issues, please email me at mail at monkeysaudio dot com since I don't often check here.

Thanks again!