Why don't codecs take advantage of repeated parts of songs (like choruses) ?

Topic: Why don't codecs take advantage of repeated parts of songs (like choruses) ? (Read 4018 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Why don't codecs take advantage of repeated parts of songs (like choruses) ?

2023-01-15 10:34:10

Very basic question here from someone who doesn't know too much.

I'm sure there's a good reason why codecs (both lossy and lossless) don't take advantage of the fact that many songs have repeated sections. Say choruses, for example. As far as I know, no codec takes advantage of this by only storing the difference between choruses, instead of storing each (very similar) chorus three or four times per song.

I was thinking this would be even more useful for certain types of electronic music, where there really may be very little difference between certain passages of a track.

What's the reason this isn't used in codecs? I know it of course would take a lot more processing / searching during the encode, and also would totally mess up real-time decode, seeking etc. I suppose this would only be useful for a codec that would be aimed at minimizing storage space, rather than being easy to play back. Maybe you'd need to decode the entire file before playback (something like zipping a file). But has any codec looked at implementing an idea like this?

Re: Why don't codecs take advantage of repeated parts of songs (like choruses) ?

Reply #1 – 2023-01-15 11:03:18

Text, or even binary data, can be zipped by exploiting patterns, and music scores substantially reduce the amount of paper required by identifying repeated passages, so why not?...

The process you describe is used to great effect for video, where the similarities between successive frames are employed to dramatically reduce the amount of data required, and even moving objects are accounted for.

Audio is somewhat different. The repeat interval for video is one frame, whereas the repeat interval for any particular audio file is not known and would require the entire input file to be run through auto-correlation. Even if repeated passages are found, that does not account for performance. Hopefully, the artist will have added some variety, so the question is whether the potential data compression justifies the effort.

No doubt some extremely specific (synthesised?) tracks might be amenable to such compression, but not in general. Existing audio compression only considers a relatively short span of audio at a time, and can be implemented in real-time (with a small latency), whereas repeat-pattern compression requires the whole track to be analysed before compression can start.

At the reproduction end, again the player would have to (effectively) recreate the entire track from a formula, rather like fractal compression. This would impose too great an overhead for mainstream consumer players. And the format couldn't be streamed.

I think there might be some mileage in it for archival storage, but again I wonder whether the potential space savings for a typical music archive comprising many varied tracks would justify the cost of development.

The ultimate in "song" compression is to synthesise it from a MIDI file.

Re: Why don't codecs take advantage of repeated parts of songs (like choruses) ?

Reply #2 – 2023-01-15 11:28:25

Under normal circumstances a repeated chorus isn't copy-paste, even most fully electronic music probably doesn't directly copy-paste sections (the masterer may copy-paste on the software but on export the software will process separately which most-likely won't result in an identical waveform). As soon as anything live is involved it's almost impossible to produce a bit-identical waveform, you could try for a thousand years to play a fraction of a second of wonderwall and not get a match. Even if you were perfect there's subtle randomness introduced in background noise and the mic.

It is probably possible for a codec to take advantage of repeated parts of songs, however it doesn't have as much of an impact as you'd think. Take flac as an example of a lossless codec, something like 98% of a flac file is the residual and many lossless codecs will be similar. The residual is the part that couldn't easily be modelled (for convenience lets say this contains background noise and subtle performance differences), the rest is broadstrokes modelling and overhead. If a repeated chorus could be detected and the frames aligned appropriately, it's possible that the model from chorusA could be applied to chorusB with good results, allowing the model to not be repeated in the encoding of chorusB. Even if this extremely complicated analysis could be done, and even if the benefit of reusing models outweighed the negative of not using a slightly better model for chorusB, at best you're saving only a small amount of data.

Re: Why don't codecs take advantage of repeated parts of songs (like choruses) ?

Reply #3 – 2023-01-15 16:35:45

This is a recurring question, and it is not a bad one. You may think that with choruses being copied and pasted in a DAW (and the instrumentation of the verses too), that is redundance to be exploited. And maybe there is. Indeed, if you were sitting on the workstation with instructions to "paste this signa X into A and B and C and then on C apply these drum fills Y ..." it could very well be exploited in the sense that X need not be duplicated until the final stage.

But even if you could retrieve back that information afterwards, you would lose streamability. Suppose you are twelve minutes into a YouTube video and YouTube tells your browser: "now the following 18 seconds are to be mixed in the following way with the 18 seconds that start at 1:29.382 ... which you surely have ready, you haven't thrown them away have you?"
So even if it were feasible to do the compression, you would lose functionality. And in an age where storage is much cheaper, to the extent that people are subscribing to videostreaming in HD and above (not broadcast!) - then no big player would be going out of their way and rewrite the way applications treat audio streams only for such small bandwidth/storage savings.

Re: Why don't codecs take advantage of repeated parts of songs (like choruses) ?

Reply #4 – 2023-01-15 18:44:16

Actually, MP3 and FLAC do take advantage of the similarity between the left right channels.   They use "M/S" (mid-side) to create L+R and L-R. You still have a 2-channel file but it's not regular stereo. This process is mathematically lossless and perfectly reversible.    The L-R is usually contains less information and is quieter and easier to compress.

Basically, you are only compressing the common information once.    At the extreme, if you have a regular WAV file with identical left & right channels, it's the same size as a "normal' stereo WAV file. A mono WAV is half the size. But a FLAC with identical left & right channels is the same size a mono FLAC.   (FLAC is "smart").

The algorithms analyze the file moment-to-moment and they only use M/S when it works better.   If you have a dialog file with English on the left and French on the right it would only be used where there is silence in both channels at the same time.   But silence is easy to compress already so it doesn't help that much (if at all).

With MP3 they call it "joint stereo" and it's an option.   In most cases it gives you smaller files, or better quality with the same size and bitrate. Since it's analyzing the file and deciding when to use it, it's almost always better to select that option (unless you have mono where it wouldn't make any difference but the file will show-up falsely as "stereo").

As far as I know FLAC always uses M/S (moment-to-moment when it helps) and it's not a user option.

Back to what you were talking about...

There a couple of demonstrations/experiments you can do in Audacity -

You can take a file, invert it, and mix it with the original.   Mixing is done by summation and since you are "adding a negative" you get subtraction and dead-digital-silence...   This is just to show how you can get (and hear) the difference or to show if there is no difference (silence).   If you slightly-reduce the volume of one file, the difference will simply be a quieter-copy.

But if you record yourself saying "hello" twice and subtract, the subtraction will sound (almost) exactly like a regular mix (addition).   It will sound like you and your twin saying "hello" together, whether you add or subtract! The actual bits & byes are just too different.*   And just like the regular mix, the "difference" will be "louder" than the original file!

The same thing will happen if you subtract two "similar" parts of the song...   They are just too-different and it will sound like a regular mix or "blend".

...If you do these experiments, you should reduce the volume by 6dB before you start.   Otherwise the added & "subtracted" files will probably go over 0dB and clip (distort).

* It's not just because you are adding or subtracting digitally. If you mix an original and inverted analog copy with an analog mixer, you'll also get silence.   (There are analog inverting circuits so no digital processing is needed, but it's not as easy to play-around and "experiment".)

Re: Why don't codecs take advantage of repeated parts of songs (like choruses) ?

Reply #5 – 2023-01-15 21:28:53

Songs/tracks where parts of audio are repeating and exactly the same are very very rare. I'd say they are vanishingly rare that's why no audio codec has ever attempted to take this into consideration.

Re: Why don't codecs take advantage of repeated parts of songs (like choruses) ?

Reply #6 – 2024-06-07 08:18:44

Quote from: Porcus on 2023-01-15 16:35:45

This is a recurring question, and it is not a bad one. You may think that with choruses being copied and pasted in a DAW (and the instrumentation of the verses too), that is redundance to be exploited. And maybe there is. Indeed, if you were sitting on the workstation with instructions to "paste this signa X into A and B and C and then on C apply these drum fills Y ..." it could very well be exploited in the sense that X need not be duplicated until the final stage.

But even if you could retrieve back that information afterwards, you would lose streamability. Suppose you are twelve minutes into a YouTube video and YouTube tells your browser: "now the following 18 seconds are to be mixed in the following way with the 18 seconds that start at 1:29.382 ... which you surely have ready, you haven't thrown them away have you?"
So even if it were feasible to do the compression, you would lose functionality. And in an age where storage is much cheaper, to the extent that people are subscribing to videostreaming in HD and above (not broadcast!) - then no big player would be going out of their way and rewrite the way applications treat audio streams only for such small bandwidth/storage savings.

There are streaming codecs out there that exploit long-term redundancy (video codecs). The way this is done, is that the standard describes mechanisms for flagging information as "keep in storage", and compliant implementation would have the required memory etc to kope with this. No problem.

There might be an issue with constant bitrate / unidirectional broadcast and swapping channels. If that Shakira tune included the chorus 25 seconds into the song, but you only started listening to that channel at 1m30, the chorus would not be available to you. Surely there are solutions to this (listen to all radio stations at once, delimit audio into self-contained "chunks", re-transmit the chorus on demand etc), but the solutions I can think of could rapidly become impractical with the time-scales we are talking about here.

Re: Why don't codecs take advantage of repeated parts of songs (like choruses) ?

Reply #7 – 2024-06-07 08:23:12

Quote from: birdie on 2023-01-15 21:28:53

Songs/tracks where parts of audio are repeating and exactly the same are very very rare. I'd say they are vanishingly rare that's why no audio codec has ever attempted to take this into consideration.

"Copy and paste" have become more relevant with samplers and now computer workstation generated music. As you say, the way producers and masterers work, they tend to not be sample-for-sample duplicates, even though the musical relevance is obviously copied. Something simple like a reverberation applied to the mix might not be perfectly deterministic and time-invariant, even though it sounds as if it is.

Perhaps modern machine learning will be better at interpreting the _perceptual relevance_ of audio, and deem that "these two 15 second segments are sufficiently similar that we can just copy them - at least in the 100 Hz to 5kHz band"

Re: Why don't codecs take advantage of repeated parts of songs (like choruses) ?

Reply #8 – 2024-06-07 12:29:41

Quote from: knutinh on 2024-06-07 08:18:44

There are streaming codecs out there that exploit long-term redundancy (video codecs). The way this is done, is that the standard describes mechanisms for flagging information as "keep in storage", and compliant implementation would have the required memory etc to kope with this. No problem.

Chorus at 1 minute. Chorus at 2 minutes. A streaming decoder picks up the stream at 1:50 and has "No problem" utilizing packets sent long before?

Re: Why don't codecs take advantage of repeated parts of songs (like choruses) ?

Reply #9 – 2024-06-07 18:32:26

In broadcast video, the "long-term redundancy" being exploited is only a few seconds. Any more than that and it would take too long to tune into a broadcast or recover from lost data. Offline video is usually limited by similar amounts to make seeking easier.

Extremely long-term redundancy is usually only possible in real-time point-to-point video connections, where the decoder can tell the encoder which frames it successfully decoded, and the encoder can adapt based on that information.

Re: Why don't codecs take advantage of repeated parts of songs (like choruses) ?

Reply #10 – 2024-06-07 20:27:46

Quote from: Porcus on 2024-06-07 12:29:41

Quote from: knutinh on 2024-06-07 08:18:44
There are streaming codecs out there that exploit long-term redundancy (video codecs). The way this is done, is that the standard describes mechanisms for flagging information as "keep in storage", and compliant implementation would have the required memory etc to kope with this. No problem.
Chorus at 1 minute. Chorus at 2 minutes. A streaming decoder picks up the stream at 1:50 and has "No problem" utilizing packets sent long before?

If you read the second paragraph of the post you quoted, you will see that I mention the problem of picking up mid-stream.

Re: Why don't codecs take advantage of repeated parts of songs (like choruses) ?

Reply #11 – 2024-06-08 16:21:38

> … whereas repeat-pattern compression requires the whole track to be analysed before compression can start

the encoder/decoder could also try to work in one pass, but still building up "memory" of the track as it goes and reference it in the "now" position.
Then, seeking/starting from random position would not really work good or at all, but it could still record/encode live.

Notice