is it possible...

2003-03-13 08:25:49

Say the last frame of an mp3 has 1 part soun d and the rest silence.. could a DLL be written that would detect the silence and play the next song as soon as the silence begins? creating a "fake" gapless play back with mp3s.. (which is impossible unless you encode them with -nogap) Just a thought.. I figure with the amount of stuff people have already done with foobar, this would be relativly simple.. no?

word.
wes.

is it possible...

Reply #1 – 2003-03-13 13:36:57

peter made winamp2/3 plugins that do exactly that, whether he will ever put it into foobar is anyones guess, i would love to see this tho' imho its the best thing about winamp. (and crossfade )

-J

is it possible...

Reply #2 – 2003-03-13 13:45:28

Acording to ZzZzZzZ Foobar will never support gapless MP3s because it's a "stupid hack" if I remember correctly. Although if there's enough interest a third party component may be written.

is it possible...

Reply #3 – 2003-03-13 15:40:04

so the SDK has room for that kind of thing? i was just curious.. I'm looking forward to when people feel comfortable enough to start pumping out plugins... i think we're gonna see some cool stuff "hack" or not..

but back to gapless.. it would be nice if it was part of a crossfade plugin.. very cool..

word wes.

is it possible...

Reply #4 – 2003-03-13 17:08:40

mp3splice (see my sig, winamp section) may be a stupid hack to get around this problem with mp3's but it's an astoundingly good hack!

It preserves intended silences (I presume that it only looks at the last frame of the MP3 to detect the gap) and it sounds essentially perfect (or at least very pleasant). I guess you'd need to intercept the wave out to the soundcard to capture its output and ABX it against the wave out from a single-file MP3 encoding of the same music to be sure whether it's up to fb2k standards and actually transparent.

Anyhow, I'm using fb2k and .MPC for everything new I encode, so gaplessness isn't a problem, but I run MP3splice when using WinAmp 2.8 unless I need a different output plugin like the SSRC.

is it possible...

Reply #5 – 2003-03-13 17:57:47

To be honest I hope there never is a MP3 gapless plugin for foobar. Not having it pushes people toward encoding better formats (.mpc). I know many of you will say "but we'll need to rip our collections again" etc. but hacking an old format to try and make it do stuff it wasn't supposed to isn't the way to go.

is it possible...

Reply #6 – 2003-03-13 23:41:17

the thing with silence at the end of mp3s is that is not digital silence and therefore is very hard to determine

is it possible...

Reply #7 – 2003-03-14 02:40:15

Funkstar De Luxe - Unless the Hardware mp3 player manufacturuers push a different standard, i dont think you'll be seeing alot of people switch... better format or not.. if my jukebox zen supported mpc or Ogg i'd check them out, but fact is... if i want 20 gigs of music in my car (or pocket for that matter), mp3 is the way to go... The only complaint i have about mp3 is the gap problem.. and if that can be fixed then whats the big deal? what other people encode there music too is up to them.. not really my concern...

is it possible...

Reply #8 – 2003-03-14 07:50:13

while there is no digital silence, most players just assume that anything less than say.. -40db is silence anyway and then it skips the remaining. winamp is able to do that i believe.

is it possible...

Reply #9 – 2003-03-14 10:53:18

My guess is the out_mp3splice.dll for WinAmp 2.x is smart enough (unlike most gapless crossfader plugins) to realise that it's only silence within the last frame or granule of a MP3 (i.e. 1152 or 576 samples) that could possibly be unintended silence. (I guess it only uses a 5-second buffer to give it ample time to process the join between files and allow for seek time on slow media to access the next file, something fb2k lets the user control instead)

I may be wrong in some subtleties (which would need testing), but perhaps my comments will suggest a near-optimal way of gapkilling without risking any damage at all to the rest of the audio. I'm not claiming they're my invention or that it's the way mp3splice does it, but it's an educated guess at a good approach and might be food for thought and experimentation...

By altering only the very last frame, intentional silences are completely or almost completely preserved and beat-matching and intended timing is also correct to within about 26 or 13 milliseconds at worst (is it a frame or a granule?). None of the rest of the decoded MP3 need be altered in any way, preserving the utmost quality.

The description of mp3splice says it performs some careful filtering and smoothing, which I'd guess is only done within the last frame or granule to match it up to the first frame/granule of the next file (just as MP3 uses overlapping granules and windowing functions with the MDCT to produce temporal continuity). This is something you have to do with all Fourier-like Transforms and it's necessary to balance time-domain ripple/accuracy against frequency-domain accuracy/ripple - you can't have more of one without losing some of the other.

Since most tracks are from CDs, and all CDs have 1/75 s frames to denote track markers, it's also possible to see if any integer multiples of 588 samples (one CD frame) from the start of the track fall within the last frame of the MP3 and analyse it to see if this is the most likely end-point of the file. This will usually be the case because most MP3 files will be ripped from CDs, and it's rare that a track will have been edited for length before encoding, or that mp3directCut or similar will have been used to remove frames from the middle of an MP3 after encoding. Only one or zero CD frame boundaries can fall in the last 576-sample granule (for MPEG-1, layer III), but one or two could fall within the last 1152-sample frame.

This thinking might also help when using DiskWriter mode to prepare PCM files before burning a CD, since the PCM files could be as gapless as possible (optionally smoothed to join the next track as they're written) and restricted to integer multiples of 588 samples.

It may even be possible to deduce the intended end point of a sample from the response of the window function (e.g. temporal ripple) due to MP3's MDCT padding the last frame with zeroes. I wouldn't bank on it being easy to do reliably especially as MP3 throws away some information, but it might be possible to demonstrate using test encodes that one can make a reliable guess most of the time.

It might be possible to use fuzzy logic to make a decent guess of the intended end point. (An increased fuzzy logic score for any points that could match based on silence detection or step-induced ripple, but a particularly large score for any points that match the possible CD frame boundaries would usually result in picking exactly the right cut point).

Other options would be to try the lame decoder's --nogap method as a preferred option to override mine, I guess, when the required information is stored in the MP3 (at least I think it adds some info or hints to the lame header or something). The presence of this info could either override the fuzzy logic mode or simply add to the fuzzy logic score for the hinted-at cut position.

The choice of what to do with decoder output beyond the cut position (likely to be low-level ripple ending up at zero) is between discarding it or adding it to the first samples of the next file before doing any other processing (such as deglitch, smoothing or enveloping algorithms)

It would also be possible for fb2k users to audition the gapkilling result and to place something in the tag and/or the fb2k track database to indicate the cut point they've approved within the last frame and their choice of whether any kind of smoothing should be applied when playing in album order or in shuffle mode.

They could even be shown the fuzzy-logic scores for various possible cut points and audition them all. They could be told which points (if any) fall on CD frame boundaries, and which match for different reasons and how certain the fuzzy logic is. (This is rather like Encspot, I guess, which looks at MP3 attributes and makes a good guess at the encoder used)

I'd imagine the smoothing is mainly required to adjust the DC levels so that the two files attach at about the same sample value and slew rate without a discontinuity and a resultant click. The adjustment should be slow and gradual enough not to be at an audible frequency.

To me, despite any gapkiller being a hack, providing it leaves all but the last frame unaffected and providing it is a defeatable option in fb2k's MP3 decoder... Providing it's all optional, it would still be incompatible with producing excellent sound with all modifications that might cause degradation or change completely under the user's control, and it would produce no effect on intrinsically gapless files like MPC, Vorbis or lossless files because it's restricted to processing formats like MP3.

For example, it would be possible to select the behaviour that the gap-killer should adopt whenever Track Gain is turned on. The volume difference may cause a sharp step and resulting click, and gapless mode could either be sey to smooth the volume change over one frame (the envelope introducing little harmonic change above about 19 Hz) or to disable gap-killing entirely. In Album Gain or ReplayGain OFF modes, gapkilling would be my ideal choice for MP3s.

My rationale is that such cautious gapkilling only hacks about with the very last frame of an MP3 which doesn't sound exactly as intended anyway due to the shortcomings of the format.

The kinds of quality-conscious people who use foobar2000 would still prefer to use a more efficient, artifact resistant and intrinsically gapless format like MusePaCk anyway, but many will have at least some MP3 files that are treasured, not easily replaced or re-encoded and which they'd like to avoid transcoding or editing while allowing them to flow gaplessly.

Providing user overrides and options while ensuring that the gapkiller can only ever change the last frame, seem to me to be compatible with the quality-first ethos of fb2k, assuming that gapkiller produces at the very least a "nicer" last-frame sound than the gaps we get at the moment, even if it can't manage to give us true transparency.

That's my opinion. What do you think? Are there any suggestions for experimentation (and ABXing?) or further discussion?

Regards,

DickD

Notice