ffmpeg: A buggy MP3 (MPEG-2 mode) decoder?
2022-10-16 14:19:33
Hello there, I'm encoding speech generated via neural Text-To-Speech (TTS) engines for use in web applications. The obvious choice is Opus, but having a MP3 fallback ensures basically 100% playback success with the <audio>-tag. So I spent some time to find pleasant settings for LAME encoding. Given that the TTS-engine generates 24 kHz mono PCM files, doing the encodes as 24 kHz MPEG-2 Layer 3 files should be obvious for some low-bitrate speech-only encodings. An example file: https://maikmerten.de/public/mp3-decoder-tests/bla24-v7.mp3 (created using settings specified in https://maikmerten.de/public/mp3-decoder-tests/encode.sh ) I feel that results, overall, are pretty good. However, even when going less aggressive VBR settings I couldn't completely get rid of some high-frequency chirp noises (I find them easy to spot with headphones, not so much on speakers) when encoding 24 kHz MPEG-2 Layer 3 and assumed that this might stem from trying to preserve 12 kHz of audio bandwidth and perhaps running into some sfb21 shenanigans (adding -Y or not did not show any differences, though) or perhaps that LAME simply isn't tuned for 24 kHz. Turns out that the encoding most likely is fine, but that ffmpeg (which I use for a quick command-line listen) apparently has a MP3 decoder that introduces the chirp artifacts... This is the file above, decoded with ffmpeg: https://maikmerten.de/public/mp3-decoder-tests/decoded-ffmpeg.flac This is the same file, decoded with LAME: https://maikmerten.de/public/mp3-decoder-tests/decoded-lame.flac And this is a 10/10 ABX result comparing those two with the graphical "abx"-Tool available on Ubuntu Linux: https://maikmerten.de/public/mp3-decoder-tests/abx-result.png (The chirp artifact is pretty obvious, e.g., around the 8.4 seconds mark) The ffmpeg decoder doesn't appear to be entirely happy with the MP3 file and reports some warnings, such as:[mp3float @ 0x7f3f20005b40] overread, skip -6 enddists: -1 -1=0/0 [mp3float @ 0x7f3f20005b40] overread, skip -7 enddists: -5 -5=0/0 [mp3float @ 0x7f3f20005b40] overread, skip -5 enddists: -2 -2=0/0 [mp3float @ 0x7f3f20005b40] overread, skip -6 enddists: -1 -1=0/0 (Tested with current git) Now, after this wall of text:Can somebody confirm that ffmpeg and LAME produce perceivably different decoding results? Is this "normal" and to be expected, given that the MP3 specification doesn't expect bit-exact results? Is the MP3 file perhaps malformed? I did tests with the "--strictly-enforce-ISO" LAME parameter, but that didn't resolve the issue. Is this a case of a "bad"/buggy MP3 decoder? In 2022? Deployed basically everywhere (e.g., in Firefox and Chrome - exactly my use case)? I'd love to gather some opinions before filing a "hard to hear, thus hard to reproduce" bug ticket for ffmpeg. edit: Perhaps this is https://trac.ffmpeg.org/ticket/1958 - filed in 2012