Skip to main content

Topic: Why not take advantage of repetition in tracks? (Read 5279 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.
Why not take advantage of repetition in tracks?
I was wondering if there's any codec that takes advantage of the fact that many songs (especially electronic) would have sections repeated throughout the track (eg choruses).

Is there a codec that takes advantage of this, one that detects which sections are similar (or the same), and only encodes it once (and then just the difference for the sections where it repeats).

I assume there must be some reason why this isn't feasible / not worth doing?

  • [JAZ]
  • [*][*][*][*][*]
Re: Why not take advantage of repetition in tracks?
Reply #1
Hi.

The first reason is lookahead. You need an important lookahead for that (or some sort of 2 pass encoding). In some sense, it means it cannot be used while transmitting audio (not good for radiostations or communications in general, just for storage).
Second, it might sound similar, but most of the times, there are enough differences for it to not be an exact copy, this means that an additional work for detecting similarities, and encoding the differences would be required. This would be similar to adding a lossless encoder on top of the detected similarity.

In some sense, to do that really well, it would be similar to decomposing the audio to obtain the original parts of the composition.
Of course, a less ambitious one could simply try to detect beats, and if one beat is similar to the previous (or bars instead of beats), then apply a differential encoding using the first beat/bar as reference. But the problem still resides in that it requires a longer than usual analysis of the signal to determine the similarities.



  • saratoga
  • [*][*][*][*][*]
Re: Why not take advantage of repetition in tracks?
Reply #2
AAC-LTP (long term prediction) does something like that iirc.

The reason this approach isn't popular is that it requires a lot of memory for buffering audio, and most applications value having very low memory requirements because that enables energy efficient decoding using hardware DSPs. 

  • AwoK
  • [*][*]
Re: Why not take advantage of repetition in tracks?
Reply #3
But not every audio stream needs perfect streamability or one pass encoding.
Video codecs allow choosing the maximum time between keyframes. It could work similary for audio.




  • saratoga
  • [*][*][*][*][*]
Re: Why not take advantage of repetition in tracks?
Reply #4
But not every audio stream needs perfect streamability or one pass encoding.

For audio compatibility is usually much more important than tiny differences in compression efficiency.  Therefore if any feature is optional in an encoder it will probably end up either universally used or abandoned very quickly (as AAC-LTP was). 

Re: Why not take advantage of repetition in tracks?
Reply #5
You're looking into something the Infinite Jukebox does:
http://labs.echonest.com/Uploader/index.html

  • AwoK
  • [*][*]
Re: Why not take advantage of repetition in tracks?
Reply #6
For audio compatibility is usually much more important than tiny differences in compression efficiency.
But if matching patterns across a few minutes might lead to large compression gains...
It's not necessary or desirable for every use, but in some cases compression might be a major goal. Different codecs or settings for different goals, no?

Even assuming a taxing decoding process, requiring 10s of MBs of memory and a real CPU running at 100s of MHz, it's easily doable on $20-30 Android video boxes or modern phones.

  • Soap
  • [*][*][*][*][*]
Re: Why not take advantage of repetition in tracks?
Reply #7
I think we could do a better job of explaining the answer if you provided or pointed towards a piece of audio you believe would benefit from this type of compression method.

For what I strongly suspect is, outside rare corner cases and non-music content, that what you consider "repetition" is far further from 1:1 than you're thinking after a casual glance.
Creature of habit.

  • saratoga
  • [*][*][*][*][*]
Re: Why not take advantage of repetition in tracks?
Reply #8
Different codecs or settings for different goals, no?

If the world worked like this MP3 would have been abandoned long ago.  In practice, no one wants to deal with multiple formats, so they just use whatever is available, usually mp3 or aac-lc. 

  • AwoK
  • [*][*]
Re: Why not take advantage of repetition in tracks?
Reply #9
With good compression gains I think it might see success, at least as a lossless format.
And mainstream adoption is not a requirement anyway. People use APE, TAK, etc. Even FLAC took ages to reach where it is.
It could be a format for the hard core, and maybe eventually grow to be more than that.
  • Last Edit: 26 February, 2017, 10:21:01 PM by AwoK

  • saratoga
  • [*][*][*][*][*]
Re: Why not take advantage of repetition in tracks?
Reply #10
MPEG 4 ALS supports LTP, although it is very slow and does not compress as well as some more widely used formats.

  • KozmoNaut
  • [*][*][*][*][*]
Re: Why not take advantage of repetition in tracks?
Reply #11
You're looking into something the Infinite Jukebox does:
http://labs.echonest.com/Uploader/index.html

That is actually really damn cool. It doesn't work 100% on all of the tracks I tried, sometimes the transistions are quite jarring. But with more work put into it, I'm sure it could be amazing for party mixes and such.

I think we could do a better job of explaining the answer if you provided or pointed towards a piece of audio you believe would benefit from this type of compression method.

For what I strongly suspect is, outside rare corner cases and non-music content, that what you consider "repetition" is far further from 1:1 than you're thinking after a casual glance.

I think this is the main issue.

Sure, the chorus of a particular song may have the exact same musical notation and the same lyrics, but no band plays perfectly according to the notes, 100% every time. There will always be variation, some of it is deliberate, some of it isn't. If you take that away, and basically just play a recording of the chorus every time you get to that part of the song, it'll make the songs lifeless, generic and boring.

This is for music played on physical instruments of course. For electronically-produced music with heavy sample usage, glued together in Protools or similar software, I guess you could exploit repetitive sections. If you look to the past, that's sort of how MOD music works. It's all constructed from short samples glued together to form music, and the resulting files are tiny, thanks to this repetition, often 500KB or less for a full 3-4 minute song, some times significantly less than that, I used to have a small selection and some of them were smaller than 50KB. But real live music, it wasn't :-)

  • dhromed
  • [*][*][*][*][*]
Re: Why not take advantage of repetition in tracks?
Reply #12
Er, don't encoders already do this, except on a block-by-block basis, and using patterns in the data, rather than what a human perceives as a pattern?

  • AwoK
  • [*][*]
Re: Why not take advantage of repetition in tracks?
Reply #13
MPEG 4 ALS supports LTP, although it is very slow and does not compress as well as some more widely used formats.
Isn't the LTP part at best <100ms?

Sure, the chorus of a particular song may have the exact same musical notation and the same lyrics, but no band plays perfectly according to the notes, 100% every time. There will always be variation, some of it is deliberate, some of it isn't.
It's not exactly the same but it might be close enough to be used as a first approximation. You then store the difference to the target piece of audio. This is what video codecs do; they build frames based on other frames that are visually similar but not exactly the same.
  • Last Edit: 27 February, 2017, 09:26:39 AM by AwoK

  • KozmoNaut
  • [*][*][*][*][*]
Re: Why not take advantage of repetition in tracks?
Reply #14
Sure, the chorus of a particular song may have the exact same musical notation and the same lyrics, but no band plays perfectly according to the notes, 100% every time. There will always be variation, some of it is deliberate, some of it isn't.
It's not exactly the same but it might be close enough to be used as a first approximation. You then store the difference to the target piece of audio. This is what video codecs do; they build frames based on other frames that are visually similar but not exactly the same.

But only over a few seconds, at the very most. They don't look several minutes back and reuse those frames.

  • IgorC
  • [*][*][*][*][*]
Re: Why not take advantage of repetition in tracks?
Reply #15
Not sure about other codecs but Opus does prediction from previous frames (inter) and prediction between frequency bands within the same frame (intra).  It's similar what videocodecs do.

But prediction of large chunks isn't viable. Too much hardware requirement for an audiocodec.

  • j7n
  • [*][*][*][*][*]
Re: Why not take advantage of repetition in tracks?
Reply #16
Even an arrangement from a small set of instrument samples is not trivial to analyze into its constituent parts from a mixdown, unless there are sections in the MOD file (patterns or loops) that are repeated completely verbatim. Add a subsample offset to how the individual instruments are mixed together and a reverb effect (or likely multiple simultaneous effects in a Pro Tools project) to make the problem that more much more complex. Even if the all the original samples could somehow be extracted, the codec would still have to either encode the reverb(s) or model them without knowing how they work.

Also, I observe that a video stream has an inherently more variable complexity and bitrate compared to audio even when looking at an I-frame only encoding. Nearly static scenes are common in video, to which the closest analogy I can think of is modern electronic drone music, a solo drum loop or maybe the recording of any monophonic instrument. A video input also provides the codec a "period" as the dimensions of a frame. A music codec would have to guess what the period of the sound is if it has no knowledge of the musical tempo, the duration of a MOD pattern, or similar internal structure, to start the analysis from.

I would guess that a typical music recording, even without much pure noise, is much like a video of flames or a rainstorm, where the codec might try set keyframes and compute differences, but not increase coding efficiency much through that process, even if it might appear that it does because the P and B frames are quantized more.

  • AwoK
  • [*][*]
Re: Why not take advantage of repetition in tracks?
Reply #17
But only over a few seconds, at the very most. They don't look several minutes back and reuse those frames.
I'd say it's a combination of high processing cost, and less benefit because in video similar frames are close by.

I would guess that a typical music recording, even without much pure noise, is much like a video of flames or a rainstorm
In rock/pop the drums could be a good "I-frame" hint. Maybe also in other music types the attack of instruments could be detected. If you can blur/simplify the audio enough, maybe it's possible to find matches in a fashion similar to text search algorithms or rsync.

  • saratoga
  • [*][*][*][*][*]
Re: Why not take advantage of repetition in tracks?
Reply #18
MPEG 4 ALS supports LTP, although it is very slow and does not compress as well as some more widely used formats.
Isn't the LTP part at best <100ms?

Sure, the chorus of a particular song may have the exact same musical notation and the same lyrics, but no band plays perfectly according to the notes, 100% every time. There will always be variation, some of it is deliberate, some of it isn't.
It's not exactly the same but it might be close enough to be used as a first approximation. You then store the difference to the target piece of audio. This is what video codecs do; they build frames based on other frames that are visually similar but not exactly the same.


50 ms is the shortest time you can begin LTP at for 44.1k due to the MDCT length. 100 ms would therefore be 1 frame worth of prediction. I don't think it's limited to only one frame of prediction, but in practice the interval is usually restricted because it's very slow and gains you relatively little compression.

LTP is actually more sophisticated than that. It is done on a per band basis in the MDCT domain. If you just tried it in the time domain you would virtually never find any reduancy unless the music was literally the same track set to loop.

Edit: sorry thought you were referring to AAC LTP.
  • Last Edit: 27 February, 2017, 12:13:49 PM by saratoga

  • Palladium
  • [*][*]
Re: Why not take advantage of repetition in tracks?
Reply #19
Even if we could nobody would bother with the additional complexity because current lossy audio codecs already has a tiny bandwidth footprint to make additional bandwidth savings pointless.

Unlike HEVC vs H264 in video, where even an inherent ~50% bitrate savings barely compensates for the 4x the resolution jump from 1080p to 4K at the same framerate on an average internet connection.

  • Woodinville
  • [*][*][*][*][*]
Re: Why not take advantage of repetition in tracks?
Reply #20
I thought I had already chimed in here, but the real answer is "repetition is not what you think it it".  When musical notation repeats, that does not mean that there is a useful amount of correlation between two different parts of the song, no matter how identical they should be according to notation, unless the chorus, say, was recorded once and edited in a half-dozen times with no modification.

Speaking as someone who has done short-term, long-term, longer-term and ridiculously long term (fft of a whole track, power spectrum, ifft, i.e. autocorrelation of the whole song at every lag with giant fft) it's just not there.

Oh, heck, let's do one for the heck of it. :)

Ok. Looking at that autocorrelation, there's not much longrange there, now, is there?  Nope. Nothing to be gained.

This is a piano piece that consists of the same thing repeated 3 times.  You really don't see that in the autocorrelation at all.

To do this in your own friendly matlab

x=wavread('file.wav'); % or use audio read, whatever floats your boat
xt=fft(x); % ( yes, I did just do what you think I did ) :)
xta=abs(xt); % calculate abs of fft
xte=xta .* xta; % convert to energy, alternatively xte=xt .* conj(xt) and leave out the "abs" part
xtei=ifft(xte); % yep. that's what you do!
xtei=xtei/max(xtei); % normalize autocorrelation to one
plot(xtei)

there y'go.
  • Last Edit: 22 March, 2017, 04:57:01 AM by Woodinville
-----
J. D. (jj) Johnston

  • knutinh
  • [*][*][*][*][*]
Re: Why not take advantage of repetition in tracks?
Reply #21
Sure, the chorus of a particular song may have the exact same musical notation and the same lyrics, but no band plays perfectly according to the notes, 100% every time. There will always be variation, some of it is deliberate, some of it isn't. If you take that away, and basically just play a recording of the chorus every time you get to that part of the song, it'll make the songs lifeless, generic and boring....
If a (compressed) MIDI file represents the core musical information of a song, then might not "musically guided" loss represent sensible rate:distortion compromises? While repeated musical sections might be numerically different (due to noise, performer involuntary variation or performer conscious variation), it sounds like an interesting axis to work in.

I know that scores of e.g. Bach music may not be available from the composer himself, rather, some musically gifted soul listening to Bachs performance went home and transcribed the music from memory. While I find that capability amazing, I guess that there will be errors. Hopefully, the "essence" of the music survived this operation.

Perhaps because I am interested in both dsp and music, I am intrigued by the idea of lossy codecs being able to analyze a piece of music in a musically sensible manner (i.e. waveform to score) perhaps using some (algorithm du jour) machine-learning mechanism, then figuring out what matters the most to (to e.g. someone coming from a western musical tradition), and how to spend bits most wisely.

The comparision to video codecs is interesting. AFAIK, they don't even have an explicit model of our vision (unlike audio codecs), and when they track "motion" across temporal frames, they will often find "apparent" motion that does not correspond well with actual motion. I.e. they will pick up "something" that allows them to encode the residual with fewer bits, but nothing like a plausible optical flow type modelling.

-k
  • Last Edit: 27 May, 2017, 02:18:22 PM by knutinh

  • Klimis
  • [*]
Re: Why not take advantage of repetition in tracks?
Reply #22
I had this idea too, I'm glad I wasn't the only one.
Well, atleast 90% of all Top40 tracks you listen to the radio, if you create a duplicate in a DAW and you invert it's phase, then match the first verse with the second verse and the first chorus with the second chorus, pretty like 75% of the content gets cancelled out in the verses (the rest will be mostly some reverb and the vocals) and the chorus will be mostly cancelled out as a whole. I bet that a potential codec that could take advantage of such thing (lossless or lossy, both could get potential advantage) would have one problem. When things start to get out of phase and they are not identical anymore it makes it inefficient and hard to find a way to compress uniformly. I mean you could have a typical Top40 pop track that goes like verse1-chorus-verse2-chorus-bridge-chorus that could be compressed to crazy small files (lossless or lossy) with amazing efficiency (percivable quality or compression ratio) and then a piece of classical music that most possibly has nothing that it's phase is a douplicate of an other part and it will compress poorly. Still though, there are so many video codecs that have tuning for different types of inputs to them, creating a long term prediction subcodec/codec that is based on the idea that you may find the same thing again on the file with potentially minor differences wouldn't be as much of a bad idea as alot of people will try to claim that it is.

Your only enemies are processing power, long encoding times (because of looking waaay ahead to the file) and the fact that you need a coder that is very smart when he writes code, he must definitely be a music producer to "get it".

Also this idea is pretty similar to treating stereo content as quadraphonic where the front and back channels (when the encoder finds a match in the file) are treated as mid/side at the same time as left and right does too, so essentially you will have a mid/side for each couple of the channels.
  • Last Edit: 27 May, 2017, 03:20:35 PM by Klimis

  • Klimis
  • [*]
Re: Why not take advantage of repetition in tracks?
Reply #23
To do this in your own friendly matlab

x=wavread('file.wav'); % or use audio read, whatever floats your boat
xt=fft(x); % ( yes, I did just do what you think I did ) :)
xta=abs(xt); % calculate abs of fft
xte=xta .* xta; % convert to energy, alternatively xte=xt .* conj(xt) and leave out the "abs" part
xtei=ifft(xte); % yep. that's what you do!
xtei=xtei/max(xtei); % normalize autocorrelation to one
plot(xtei)

there y'go.


Try this let's say with Madonna's "Hung Up". There are multiple positives returned, so it works.