Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Reasonable handling of non-compliant source files in lossless audio compressors (Read 4749 times) previous topic - next topic - Topic derived from FLAC-git Releases (Co...
0 Members and 1 Guest are viewing this topic.

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #25
@ktf You've summarized the situation with FLAC in relation to dolle.wav. But to illustrate the scale of the problem of dealing with “invalid” sources, I additionally published metacorder.wav (analyzed later by @danadam) that is not compressed at all with the --keep-foreign-metadata flag. Both of these sources are not created for the sake of chicanery, but are taken from real life apps (ProTools and Metacorder respectively).

WavPack, Monkey Audio and TAK find a way to store the user's belongings in a more compact form, optimizing the space they occupy, while FLAC also tries to evaluate and fix them, effectively becoming a warden. Imagine that, instead of putting the scattered pages of the King James Bible into one pile, you get these pages with omissions due to the latest findings like the Dead Sea Scrolls after compression. And the works of Dostoevsky in Russian language are not compressed at all due to US government sanctions. I believe the compression should be FLAC priority, leaving non-audio inconsistencies with the fleeting standards at the user's discretion, of which he should be unobtrusively informed.

Was there any difference PCM-wise? ALAC doesn't keep "foreign metadata".
I don't know, sir. My intention was to describe the behavior of apps that claim to be lossless encoders. Accordingly, if the restored audio file is not identical to the original (which is confirmed by verification of checksums), I consider it a loss.
• Join our efforts to make Helix MP3 encoder great again
• Opus complexity & qAAC dependence on Apple is an aberration from Vorbis & Musepack breakthroughs
• Let's pray that D. Bryant improve WavPack hybrid, C. Helmrich update FSLAC, M. van Beurden teach FLAC to handle non-audio data

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #26
*sighs* ...

Behind all this pseudo-intellectualism ... go get yourself a torx key AND a hex key and use them as appropriate, rather than trying to force one into being the other.

And now you have wasted the refalac dev's time here too, you could in the very least confirm that the audio is lossless.
(ALAC doesn't even come with a file format, it uses container formats that are already around.)

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #27
Tell me about the obstacles, if any, as I don't see how this can be a regression.
I'll summarize what happens
1. Protools generates an invalid WAVE file.
2. FLAC stores this invalid metadata when invoked with --keep-foreign-metadata
3. FLAC restores a valid/fixed WAVE file when invoked with -d --keep-foreign-metadata
4. FLAC returns an error that the restored file is not identical to the original file. All foreign metadata is there, but the fmt chunk has been corrected

I could do a few things:
1. Do the same thing as WavPack/Monkeys/TAK, but then the restored file is invalid. I would consider this a regression, FLAC should not output invalid files
2. Return a more comprehensible warning instead of an error, explaining that the the fmt chunk has been altered but all other metadata is correct
3. Warn users on encoding that the file is invalid and restoring might not give back an identical file

So, I do not think fixing this is straightforward and haven't yet made a decision
My suggestion, FWIW, would be to do (1.), above. Protools, and any other apps that create non-standard chunks, including mal-formed fmt chunks, have been doing so for some considerable period of time apparantly with no objections from users of those tools, or any other apps that may use those files. It could be argued that it is not the job of FLAC to correct what may be incorrect but clearly seems not to cause a problem. That may hurt intellectually and aesthetically but is surely the pragmatic solution? Just my thoughts. ;)

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #28
Was there any difference PCM-wise? ALAC doesn't keep "foreign metadata".
I don't know, sir. My intention was to describe the behavior of apps that claim to be lossless encoders. Accordingly, if the restored audio file is not identical to the original (which is confirmed by verification of checksums), I consider it a loss.
Thank you for an indirect answer.
Since you have mentioned ffmpeg + ALAC and you don't seem to know this trivial fact, I have to tell you that ffmpeg NEVER bother to preserve source container chunks on conversion no matter what audio codec you choose.
Same goes for "lossless" audio format conversion via foobar2000 or other general audio converters that decode from one format and encode to other.
Lossless audio codec simply means encoding audio data (usually PCM) without loss.  Input format can be anything... it may be RTP packets coming from the internet, or audio track in MMT/TLV broadcast.  Lossless coder can transcode them without loss in terms of PCM data and not more than that. What you want can be achieved in VERY limited scenarios or use cases. Outside of there, it's simply impossible or nonsense.

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #29
I believe the compression should be FLAC priority, leaving non-audio inconsistencies with the fleeting standards at the user's discretion, of which he should be unobtrusively informed.
This isn't really fleeting. The requirement of IFF chunks being an even number of bytes (which the metacorder.wav file violates) dates back to 1985.

I will consider adding handling to the flac command line tool for this kind of malformed files. Maybe I'll add an option to disable 'fixing' of files. But it is rather low on my priority list. I've opened an issue here: https://github.com/xiph/flac/issues/680
Music: sounds arranged such that they construct feelings.

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #30
I will consider adding handling to the flac command line tool for this kind of malformed files. Maybe I'll add an option to disable 'fixing' of files.

Thanks for taking a step from tyranny to freedom, @ktf.
I stand with @john33, “it is not the job of FLAC to correct what may be incorrect”.

I have to tell you that ffmpeg NEVER bother to preserve source container chunks on conversion

Without a doubt, FFMPEG has a room for improvement as well. At this stage, it at least preserves dolle.wav metadata and allows to compress metacorder.wav, which FLAC itself does not do with --keep-foreign-metadata flag. We can talk about how to bring that data back to WAV intact on their issue tracker, when the expected change happens in the reference FLAC encoder. For example, in 2011 they implemented -write_bext 1 flag to preserve that chunk.

Right now my heart and mind are here.

Code: [Select]
 $ ffprobe -loglevel error -show_entries format_tags -of json dolle.wav
{
    "format": {
        "tags": {
            "encoded_by": "Pro Tools",
            "originator_reference": "4ut5zBTw#CdaaaGk",
            "date": "2017-12-1",
            "creation_time": "22:06:31",
            "time_reference": "158760000"
        }
    }
}

$ flac --keep-foreign-metadata dolle.wav
flac 1.4.3
dolle.wav: WARNING: legacy WAVE file has format type 1 but bits-per-sample=24
dolle.wav: wrote 1717062 bytes, ratio=0,565

$ ffprobe -loglevel error -show_entries format_tags -of json dolle.flac
{
    "format": {
    }
}

$ ffmpeg -loglevel error -i dolle.wav -bitexact -map_metadata 0 dolle.flac -y

$ ffprobe -loglevel error -show_entries format_tags -of json dolle.flac
{
    "format": {
        "tags": {
            "encoded_by": "Pro Tools",
            "originator_reference": "4ut5zBTw#CdaaaGk",
            "date": "2017-12-1",
            "creation_time": "22:06:31",
            "time_reference": "158760000"
        }
    }
}

-----------------------------------------------------------------------------------

$ mediainfo metacorder.wav
General
Complete name                            : metacorder.wav
Format                                   : Wave
Format settings                          : PcmWaveformat
File size                                : 209 KiB
Duration                                 : 2 s 189 ms
Overall bit rate mode                    : Constant
Overall bit rate                         : 782 kb/s
Producer                                 : Gallery Metacorder
Description                              : gSCENE=66a / gTAKE=002 / gTAPE=007 / gNOTE=Circle  / gUBITS=00000000
Encoded date                             : 2005:08:04 16:55:54

Audio
Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 1
Duration                                 : 2 s 189 ms
Bit rate mode                            : Constant
Bit rate                                 : 768 kb/s
Channel(s)                               : 1 channel
Sampling rate                            : 48.0 kHz
Bit depth                                : 16 bits
Stream size                              : 205 KiB (98%)

$ flac --keep-foreign-metadata metacorder.wav
flac 1.4.3
metacorder.wav: ERROR reading foreign metadata: invalid WAVE file: unexpected EOF (010)

$ ffmpeg -loglevel error -i metacorder.wav -bitexact -map_metadata 0 metacorder.flac -y

$ mediainfo metacorder.flac
General
Complete name                            : metacorder.flac
Format                                   : FLAC
Format/Info                              : Free Lossless Audio Codec
File size                                : 42.0 KiB
Duration                                 : 2 s 189 ms
Overall bit rate mode                    : Variable
Overall bit rate                         : 157 kb/s
Description                              : gSCENE=66a / gTAKE=002 / gTAPE=007 / gNOTE=Circle  / gUBITS=00000000
Recorded date                            : 2005:08:04
encoded_by                               : Gallery Metacorder
creation_time                            : 16:55:54
time_reference                           : 4090608000

Audio
Format                                   : FLAC
Format/Info                              : Free Lossless Audio Codec
Duration                                 : 2 s 189 ms
Bit rate mode                            : Variable
Bit rate                                 : 126 kb/s
Channel(s)                               : 1 channel
Channel layout                           : M
Sampling rate                            : 48.0 kHz
Bit depth                                : 16 bits
Compression mode                         : Lossless
Stream size                              : 33.7 KiB (80%)
Writing library                          : ffmpeg
MD5 of the unencoded content             : 81A6AD258A5C39DE55DF2041727B9E3B
• Join our efforts to make Helix MP3 encoder great again
• Opus complexity & qAAC dependence on Apple is an aberration from Vorbis & Musepack breakthroughs
• Let's pray that D. Bryant improve WavPack hybrid, C. Helmrich update FSLAC, M. van Beurden teach FLAC to handle non-audio data

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #31
I’d like to clarify a couple things. In the case of metacoder.wav the reason that WavPack is able to handle this file without complaint is because WavPack only parses RIFF chunks up to the audio chunk. For any data past the audio, WavPack simply copies it verbatim without even parsing it or checking the length.

As for the dolle.wav file, the problem is that the cbSize field is wrong (as was said) and in previous versions of WavPack I basically ignored that field (it’s really kind of redundant). I ran into an issue recently where I was mistakenly using information in the format header even though the cbSize indicated that it wasn’t there, so I fixed that here. But, if the format header was any bigger than the standard extended version, I would have errored out on that too just like FLAC does. We can only anticipate so many surprises.

In summary, for both of these cases, WavPack handles them “correctly” because I am lazy, not because I’m somehow more concerned about the user’s experience. And it should be a lesson that what appears initially to be an obvious shortcoming may, when examined more closely, actually be the most reasonable compromise given the intended application (Josh never intended FLAC to be a file compressor, whereas WAV is right in WavPack’s name).

Things are often more complex and subtle than they at first appear, and the true lazy developers are the ones that create files that so blatantly violate the specs they are purporting to follow. I mean, that metacoder.wav file has an odd length, which makes it an invalid WAV file right out of the gate. It would not be completely unreasonable for a file reader to see that and refuse to even open it!

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #32
Dear @MonkeysAudio and @enzo, could you please shed some light, like @bryant aka WavPack author did, on how Monkey Audio manages to process such non-standard audio files as dolle.wav and metacorder.wav without errors and return them bit-perfect?

Code: [Select]
$ mac dolle.wav auto -c2000
--- Monkey's Audio Console Front End (v 10.55) (c) Matthew T. Ashland ---
Compressing (normal)...
Progress: 100.0% (0.0 seconds remaining, 0.3 seconds total)
Success

$ mac dolle.ape dolle.restored.wav -d
Decompressing...
Progress: 100.0% (0.0 seconds remaining, 0.4 seconds total)
Success

$ mac metacorder.wav auto -c2000
Compressing (normal)...
Progress: 100.0% (0.0 seconds remaining, 0.0 seconds total)
Success

$ mac metacorder.ape metacorder.restored.wav -d
Decompressing...
Progress: 100.0% (0.0 seconds remaining, 0.0 seconds total)
Success

$ b3sum *.wav
5bc927c245396131cf6e4dccd46c6399423c05ed0bd67d2209446217e0bb93d2  dolle.restored.wav
5bc927c245396131cf6e4dccd46c6399423c05ed0bd67d2209446217e0bb93d2  dolle.wav
4accf01292e8d2ed9ef9b2047c13841cebaeec59b22570ae8694ef88232304c4  metacorder.restored.wav
4accf01292e8d2ed9ef9b2047c13841cebaeec59b22570ae8694ef88232304c4  metacorder.wav
• Join our efforts to make Helix MP3 encoder great again
• Opus complexity & qAAC dependence on Apple is an aberration from Vorbis & Musepack breakthroughs
• Let's pray that D. Bryant improve WavPack hybrid, C. Helmrich update FSLAC, M. van Beurden teach FLAC to handle non-audio data

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #33
It isn't so hard to create a lossless file compressor. But for an audio file format, you also want to serve a couple of more purposes, like for example playing it.
And so you want to know precisely where the audio starts and ends, and what format it is. That is in the headers of the [WAVE or AIFF or whatever] source file, and that means you not only have to compress the file, but to understand part of its content.
Headers have to be processed into the correct information - which isn't straightforwards if they don't conform to If the WAVE or AIFF specification. If those file headers are not reliable, you could at worst get that wrong, and the following could happen:

* Your compression algorithm, which heavily depends on audio samples being serially correlated, would not give smaller sizes because it doesn't know where an individual sample starts and ends.
* If you try to play back the audio part, it will sound like noise.

Hypothetical you say? No, I've had Monkey's do that to weirdo AIFF files. Monkey's would uncompress them bit-exactly, but they would sound like noise. From memory: For one of them, the information was in the headers in a way that it could be processed in the next Monkey's release. For the other, the next Monkey's release did the right thing and rejected it.

You should also be aware that e.g. WAVE is a container format that need not contain PCM. It can for example contain MP3. The encoder should reject those files - not because they are non-compliant, but because they are "out of scope". This is not what I was constructed to do, and I won't just pretend.

Non-compliant files can fool the encoder into processing something it should reject. The files can also be flawed in such a way that everything goes well - not only reversing the algorithm to restore the file, which is in some sense the easier part.


With that said, the following is above my paygrade: I simply don't have the knowledge to tell why this approach has not been chosen, though I know it has been "attempted":
Upon playback, the decoder serves not the audio, but the unpacked file. It is a WAVE file, maybe non-compliant, maybe the encoder was wrong about the content, whatever - that is the player's business, not the compessed format's. Player has to treat it as if it were the original file. Any other information - like seekpoints - are an auxilliary service.

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #34
A simple illustration: Try to encode wrong.wav to flac/ape/wavpack, then don't just decompress to .wav and compare checksum, but play/analyze/process the encoded files via different software (e.g. foobar, MPC-HC, Audacity, SoX, Spek) or hardware like DAPs and streamers. Also, listen to wrong.wav in compressed and uncompressed form via stereo headphones, don't just look at graph and stats.

Use correct.wav as a reference, but keep in mind that such a reference won't exist in real-life situations.

I use flac, wavpack and sometimes ape, apart from format support like DSD and float, if for some reasons I need to preserve the whole file, I will use wavpack, for example, keeping a bit-perfect specimen of a potentially malformed .wav file for debugging some programs I write.

Another reason to use wavpack is for example, Audition and Reaper can show the stored markers and regions within a wavpack file without using wvunpack. With flac, even with --keep-foreign-metadata, I still need an intermediate .wav file to get the markers and regions back. Opening a flac file directly in Audition and Reaper these info won't show up.

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #35
Apart from the observed behaviour of wrong.wav posted above when using the APE/FLAC/WavPack command line tools and various 3rd party decoder/players, I also found that even though there is only one byte of difference between correct.wav and wrong.wav, 7z achieved a much smaller file size when compressing correct.wav and wrong.wav separately (652KB vs 713KB). Also, I cannot reduce file size by putting these two files into a solid 7z. It seems that 7z used different methods to compress correct.wav and wrong.wav so solid compression doesn't work. I tried different settings on the 7z GUI but still cannot find a way to defeat this behaviour, don't know if it can be overridden by using the 7z command line tool or not.

Anyway, good job on flac.exe to reject wrong.wav.


Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #37
Putting my money on it not recognizing the "wrong" as .wav
Well, It's quite natural that 7z compressed "correct.wav" more efficiently even if 7z recognized both of them as WAV, since it can exploit the similarity of channels on correct.wav (which looks like a stereo file), but not on wrong.wav (which looks like a mono file).

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #38
Now it works ;)
X

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #39
Putting my money on it not recognizing the "wrong" as .wav
Well, It's quite natural that 7z compressed "correct.wav" more efficiently even if 7z recognized both of them as WAV
And it seems it does. Looking at the compression model used in test.7z, it uses delta:2 on one and delta:4 on the other. Apparently that means two bytes per sample and four bytes per sample (the latter meaning, stereo and four bytes per stereo sample).

 

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #40
@ktf
Is it true that flac's raw mode guarantees bit-perfectness at file level?

How about this? When the command-line tool encounters files with potentially malformed metadata, as long as the audio data encoding format can be automatically detected, offer an option to encode as raw and automatically insert the required raw mode parameters and suggests the required decoder version (e.g. for 32-bit int support), and shows the required commands to decode the file properly.

This mechanism can force users to manually invoke the encoding and decoding processes, to ensure they understand what they are doing and what risks they should face, for example, compatibility with 3rd party tools.

For example, --keep foreign-metadata can preserve the attached .wav file bit-perfect, but putting this .wav file into "SoX Spectogram.cmd" below will cause SoX to crash. So the mechanism mentioned above can make sure that users understand the crash is not flac's fault.
Updated 'SoX Spectogram' folder with the SoX binary from above
https://www.mediafire.com/file/u9uhy0nbotrp8hx/SoX+Spectogram.7z/file

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #41
@ktf
Is it true that flac's raw mode guarantees bit-perfectness at file level?
No, flac's raw mode is for reading headerless PCM files. You could of course use it to store WAV/AIFF/W64, but that is not what it was meant to do.

FLAC can store 'foreign metadata'. When this is activated on both encoding and decoding, the tool will try to store and restore the file (WAV/AIFF/W64) as it was encoded. As you probably know, FLAC tells you the following:
Quote
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

So no, the keep foreign metadata option does not guarantee anything.

Quote
How about this? When the command-line tool encounters files with potentially malformed metadata, as long as the audio data encoding format can be automatically detected, offer an option to encode as raw and automatically insert the required raw mode parameters and suggests the required decoder version (e.g. for 32-bit int support), and shows the required commands to decode the file properly.
No. FLAC needs to know what is metadata and what is audio, otherwise you can have noise playback. So, FLAC will not store files in which it cannot detect what is audio and what isn't.
Music: sounds arranged such that they construct feelings.

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #42
@bennetng Pushing helicopter parenting makes no good, because the number of apps that create audio files in a non-standard way and new specifications that extend or reinterpret previous ones is only growing, leaving FLAC heuristic behind. For example, read a story about the mess that was made, trying to make WAV larger than 4 GB, told by the user and the developer. The fact that it is possible to craft a file that causes troubles, like a ZIP bomb, does not mean that compressors must have a mechanism to reject files — this is a task for watchdog apps, such as antiviruses, which regularly annoy with false positives (remember all those noobs running around yelling there is a red spot in the VirusTotal report, which calls into question the credibility of the well-known developer). Compression resembles vacuum packaging that pumps out excess air and enables you to store the contents in a more compact form, but the packaging is a transport medium and is not responsible for the contents: you can put ordinary cheese (cheddar, gouda, parmesan), blue cheese with edible mold (dorblu, gorgonzola, roquefort) or poisoned cheese in it.

As for SoX, which has not been updated for more than 8 years, although it would be worth it (say, do you know that internal operations in it are still carried out without floating point?), it draws a spectrogram without issues on my end, but I use RareWares version 14.4.2 x64 built on June 24, 2023,. Try updating yours.
• Join our efforts to make Helix MP3 encoder great again
• Opus complexity & qAAC dependence on Apple is an aberration from Vorbis & Musepack breakthroughs
• Let's pray that D. Bryant improve WavPack hybrid, C. Helmrich update FSLAC, M. van Beurden teach FLAC to handle non-audio data

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #43
@ktf
Is it true that flac's raw mode guarantees bit-perfectness at file level?
No, flac's raw mode is for reading headerless PCM files. You could of course use it to store WAV/AIFF/W64, but that is not what it was meant to do.

Quote
How about this? When the command-line tool encounters files with potentially malformed metadata, as long as the audio data encoding format can be automatically detected, offer an option to encode as raw and automatically insert the required raw mode parameters and suggests the required decoder version (e.g. for 32-bit int support), and shows the required commands to decode the file properly.
No. FLAC needs to know what is metadata and what is audio, otherwise you can have noise playback. So, FLAC will not store files in which it cannot detect what is audio and what isn't.
I am talking about using flac in the same way as 7z, so it means the files are not meant to be played in compressed form and they must be decoded. For example I attached a SoundFont file which flac does not understand but it contains compressible data:

encode:
flac woodwinds.sf2 --force-raw-format --endian=little --channels=1 --bps=16 --sample-rate=44100 --sign=signed

decode:
flac woodwinds.flac -d --force-raw-format --endian=little --sign=signed -o woodwinds-decode.sf2

In the case of .wav input, flac.exe should be able to parse what raw encoding and decoding settings to use when receiving the file.

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #44
For example, --keep foreign-metadata can preserve the attached .wav file bit-perfect, but putting this .wav file into "SoX Spectogram.cmd" below will cause SoX to crash. So the mechanism mentioned above can make sure that users understand the crash is not flac's fault.
Updated 'SoX Spectogram' folder with the SoX binary from above
https://www.mediafire.com/file/u9uhy0nbotrp8hx/SoX+Spectogram.7z/file
I should also mention that foobar2000 shows strange duration when reading this file.

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #45
@bennetng Pushing helicopter parenting makes no good, because the number of apps that create files in a non-standard way and new specifications that extend or reinterpret previous ones is only growing, leaving FLAC heuristic behind. For example, read a story about the mess that was made, trying to make WAV larger than 4 GB, told by the user and the developer. The fact that it is possible to craft a file that causes troubles, like a ZIP bomb, does not mean that compressors must have a mechanism to reject files — this is a task for a separate class of watchdog apps, such as antiviruses, which regularly annoy with false positives (remember all those noobs running around yelling there is a red spot in the VirusTotal report, which calls into question the credibility of the well-known developer). Compression resembles vacuum packaging that pumps out excess air and enables you to store the contents in a more compact form, but the packaging is a transport medium and is not responsible for the contents: you can put ordinary cheese (cheddar, gouda, parmesan), blue cheese with edible mold (dorblu, gorgonzola, roquefort) or poisoned cheese in it.
First of all, generic archive formats like zip, rar and 7z are meant for generic purposes regardless of compression efficiency (e.g. they can even use the "store" method). These formats are designed to keep the data unchanged even if they contain malware etc.

For media formats, keep in mind that not everything is upgradable, or upgradable for free, especially hardware. Even on the software side, not everything is FOSS, and even in the FOSS scene, not every software receive update. Always remember that as long as you cannot code it yourself, you always rely on others to fulfill your wishes. You can wait indefinitely though, but probably not everyone can wait.

Quote
(say, do you know that internal operations in it are still carried out without floating point?)
This is off topic and especially when flac also does not support float. But yeah, why don't you code it yourself?

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #46
@bennetng Code what? You came here with a WAV file that supposedly causes SoX to crash, so I suggested you get the SoX version that might not crash. Not only suggested, but also provided a download link to save your time. After all, the less obsessed you are with issues not related to FLAC, the more attention you can pay to this topic about audio files created by well-known apps that are good enough to be played and edited for decades, but not good enough to be processed by FLAC in one or both directions. This topic has already been assessed as worthy of attention and a request for improvement was created. Now all that remains is to pray and wish the developers strength and good spirits.
• Join our efforts to make Helix MP3 encoder great again
• Opus complexity & qAAC dependence on Apple is an aberration from Vorbis & Musepack breakthroughs
• Let's pray that D. Bryant improve WavPack hybrid, C. Helmrich update FSLAC, M. van Beurden teach FLAC to handle non-audio data

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #47
@bennetng Code what?
As for SoX, which has not been updated for more than 8 years, although it would be worth it (say, do you know that internal operations in it are still carried out without floating point?)
Then why you mentioned the above? Why floating point support has anything to do with the crash when generating a spectrogram with that file? If you want floating point support, code it yourself.

Quote
You came here with a WAV file that supposedly causes SoX to crash, so I suggested you update the version of SoX that might not crash. Not only suggested, but also provided a download link to save your time.
You have to read my previous post. The compile of SoX that Netranger provided is newer than the rarewares one you mentioned.
https://hydrogenaud.io/index.php?msg=1039602

I also tried to use the rarewares version you mentioned to generate a spectrogram with that file, no crash, but still cannot get any useful output, I got this:
X
Code: [Select]
Input File     : 'E:\download\junglede.wav'
Channels       : 2
Sample Rate    : 22050
Precision      : 16-bit
Duration       : 50:30:48.27 = 4009754427 samples ~ 1.36386e+07 CDDA sectors
File Size      : 140k
Bit Rate       : 6.17
Sample Encoding: 16-bit Signed Integer PCM

In:0.00% 00:00:00.00 [50:30:48.27] Out:0     [      |      ]        Clip:0    sox WARN wav: Premature EOF on .wav input file
In:0.00% 00:00:01.59 [50:30:46.68] Out:35.1k [!=====|=====!]        Clip:0
Don't ignore that I also mentioned foobar2000 shows strange duration with this file.

Prime example of garbage in garbage out.

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #48
@bennetng , did you mean for an option to specify the headers to be deleted and then specifying the signal like you specify raw? I think OptimFROG has such an option.
I don't see the point in using FLAC to raw-compress a header'ed WAVE file. Sure the format allows compression as raw, and for material that is not intended to be playable one can put sample rate = 0 in streaminfo - but I don't think it should be encouraged as a solution. It would just disguise a corrupted .wav as a valid but unplayable .flac, wouldn't it?

On to the wrong.wav, after having looked at it ... I don't know that WAVE specification, but https://learn.microsoft.com/en-us/windows/win32/api/mmeapi/ns-mmeapi-waveformat doesn't say anything about what information takes precedent in case of inconsistencies. Here we have nChannels=1, nBlockAlign=4 and wBitsPerSample=16, and so if we for the sake of discussion assume that no more than one of these is wrong: Is there anything that says that this file should be interpreted as
* mono with 4 bytes per channel, i.e. wBitsPerSample is wrong and should be replaced by 32?
* 4 bytes per channel and 16 bits, i.e. nChannels is wrong and should be replaced by 2?
* mono with 16 wBitsPerSample, i.e. nBlockAlign is wrong and should be replaced by 2?
 
As far as I can tell,
flac, TAK and OptimFROG reject it
Monkey's roundtrip it to bit-exactly the original
WavPack is fooled by it, into unpacking it to something else
ffmpeg thinks it is ten seconds long, I assume then it disregards nBlockAlign although I didn't bother to listen
foobar2000 seems to do the same thing when playing it.

Re: Reasonable handling of non-compliant source files in lossless audio compressors

Reply #49
@bennetng , did you mean for an option to specify the headers to be deleted and then specifying the signal like you specify raw? I think OptimFROG has such an option.
I don't see the point in using FLAC to raw-compress a header'ed WAVE file. Sure the format allows compression as raw, and for material that is not intended to be playable one can put sample rate = 0 in streaminfo - but I don't think it should be encouraged as a solution. It would just disguise a corrupted .wav as a valid but unplayable .flac, wouldn't it?
For SoundFonts, it is what I can do without writing any code, so a proof of concept. The actual implementation of course can be improved for .wav (or even BW64!) depends on the will of developers. Take this as an idea instead of feature request. The important thing is to make sure users really understand what they are doing.

Quote
On to the wrong.wav, after having looked at it ... I don't know that WAVE specification, but https://learn.microsoft.com/en-us/windows/win32/api/mmeapi/ns-mmeapi-waveformat doesn't say anything about what information takes precedent in case of inconsistencies. Here we have nChannels=1, nBlockAlign=4 and wBitsPerSample=16, and so if we for the sake of discussion assume that no more than one of these is wrong: Is there anything that says that this file should be interpreted as
* mono with 4 bytes per channel, i.e. wBitsPerSample is wrong and should be replaced by 32?
* 4 bytes per channel and 16 bits, i.e. nChannels is wrong and should be replaced by 2?
* mono with 16 wBitsPerSample, i.e. nBlockAlign is wrong and should be replaced by 2?
There is also a "bytes per second" field to consider.