AI for audio compression?

Topic: AI for audio compression? (Read 4546 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

AI for audio compression?

2018-10-26 22:41:13

I have not seen this actively used currently, yes there seems to be movement with AI based image and video conversion.

What is quite interesting is the ability for the AI compression to adapt on source material, so different routines for rock vs classical are possible.

I also think lossless compression is where AI would be best suited, I have a feeling this field can push data compression rates better than achieved today.

Re: AI for audio compression?

Reply #1 – 2018-10-26 22:43:44

This article spells it out quite well in relation to images:

http://www.wave.one/icml2017/

Re: AI for audio compression?

Reply #2 – 2018-10-26 22:53:15

Quote from: spoon on 2018-10-26 22:41:13

I also think lossless compression is where AI would be best suited, I have a feeling this field can push data compression rates better than achieved today.

I would have said exactly the opposite. Machine learning is good at picking smart ways to approximate data, but if you need exactness, what advantage does it have?

Re: AI for audio compression?

Reply #3 – 2018-10-27 06:43:07

Quote from: saratoga on 2018-10-26 22:53:15

Machine learning is good at picking smart ways to approximate data, but if you need exactness, what advantage does it have?

Well, that's exactly how lossless compressors work. They make an approximation and then store the difference between the approximation (aka prediction) and the actual value. So if the ML allows their model to produce better approximations, then it will result in better lossless compression.

That said, a quick read of their paper did not reveal whether their algorithm is superior to others based on absolute per-pixel value differences between images and their lossy counterparts, or whether their superiority shows up only with the MS-SSIM values that more accurately measure how similar two images will look to the eye. This is analogous to how measuring absolute sample differences for lossy audio codecs does not accurately predict how good they sound.

There's a link on their website to a blog post that shows some pretty amazing compression results on faces. In this case it's obvious though that they're not even pretending to get close on a pixel-by-pixel basis.

I have wondered whether the lossless audio compressors that achieve the highest absolute compression like OptimFROG or LA (remember that one?) are asymptotically approaching the actual entropy of the sources (in which case very little improvement is possible) or whether some new method will allow seeing through the residual noise to some higher order patterns. Of course, if new ML methods just let us have OptimFROG compression levels with FLAC speed, that would be something.

Re: AI for audio compression?

Reply #4 – 2018-10-27 08:58:38

My first thought was like, for lossless one could always do a sanity check by comparing a practical codec with a speed-no-object; for lossies, there is no metric that is not subjectivity-based, so it would be more exciting to see how much AI it takes to outsmart those psy models that are already around.

And second thought: who - except geex like us - cares about microimprovements in lossy compression after all? Nowadays they stream 4k video. If lossless audio performance were a big thing, then ALAC would be completely out.
[That is a giant gap in my argument, yes: just because "those" customers do not care about anything but their duty to do what The Lord Thy Apple commands them to, that does not mean that there isn't a large segment that is neither geeks nor zealots.]

Quote from: bryant on 2018-10-27 06:43:07

Of course, if new ML methods just let us have OptimFROG compression levels with FLAC speed, that would be something.

I'd say the test then is encoding speed? A "smart" algorithm for finding exploitable patterns much quicker than brute-force trying the whole Swiss knife and picking the best?

Re: AI for audio compression?

Reply #5 – 2018-10-27 10:07:55

Quote from: saratoga on 2018-10-26 22:53:15

I would have said exactly the opposite. Machine learning is good at picking smart ways to approximate data, but if you need exactness, what advantage does it have?

Well, you don't need exactness. In image processing, ML can reconstruct data that is similar to what was thrown away. Denoising and upscaling done by ML for example shows rather surprising results.

It stands to reason that the same could be done for audio data. Things thrown away during encoding could be reconstructed during decoding. The data will be different when examined, but if it sounds similar to humans, then it's a win.

Re: AI for audio compression?

Reply #6 – 2018-10-27 12:11:42

Quote from: Nikaki on 2018-10-27 10:07:55

Well, you don't need exactness.

You replied to a comment on lossless compression.

Quote from: Nikaki on 2018-10-27 10:07:55

The data will be different when examined, but if it sounds similar to humans, then it's a win.

That is not lossless compression, it is lossy compression.

Re: AI for audio compression?

Reply #7 – 2018-10-27 20:53:20

A lossless codec definitely needs exactness, at least in terms of the compression result. Maybe you could train a ML model to pick more efficient compression parameters based on a training set without actually having to exhaustively search, but we're at the point where even if you want to spend a lot of power on lossless compression, you really don't get much more even when you get to extreme lengths, GPU processing, etc.

Re: AI for audio compression?

Reply #8 – 2018-10-27 21:11:53

ML power is in it's ability to detect complex patterns, much better than people can, patterns = compression. The average person is good remembering some 10 to 20 numbers before they start forgetting them, ML has a much larger ability, I think there is real unexplored possibility here. Given enough resources, ML potentially could come up with a whole new method of compression.

Re: AI for audio compression?

Reply #9 – 2018-10-27 21:40:02

Quote from: spoon on 2018-10-27 21:11:53

ML power is in it's ability to detect complex patterns, much better than people can, patterns = compression.

Yes, but people don't do compression, computers do compression, and a ML algorithm isn't going to come up with a solution that brute force exhaustive searching of the same solution space wouldn't find.

The image paper you linked is instructive, it describes an generational adversarial network approach that attempts to pick optimal features to extract by using the adversarial network to improve guesses over time. However, if you didn't mind spending a lot of CPU time, you could also just brute force that same solution by just doing that optimization directly. The benefit of the GAN is that it finds a relatively good solution using a smaller training set so its hopefully much faster while still being almost as good as an optimal solution.

But is that really useful for lossless coding? We know that coding gain in lossless formats increases VERY slowly as you optimize your coding. So this will let you search more for a given coding time, but it also forces you to assume that your training set is representative of the signal you're encoding. So you get a small improvement in exchange for a lot of assumptions. That doesn't seem very appealing.

Quote from: spoon on 2018-10-27 21:11:53

Given enough resources, ML potentially could come up with a whole new method of compression.

Machine learning isn't magic. It is a heuristic approach to finding on average good solutions using a training set. You're not going to come up with some fundamentally new though, just shortcuts for finding solutions quicker so long as your input doesn't diverge too much from your training set.

Re: AI for audio compression?

Reply #10 – 2018-10-27 22:10:37

Quote from: saratoga on 2018-10-27 21:40:02

Machine learning isn't magic. It is a heuristic approach to finding on average good solutions using a training set. You're not going to come up with some fundamentally new though, just shortcuts for finding solutions quicker so long as your input doesn't diverge too much from your training set.

I am quite certain in my life-time ML will become magic, it will reach a level of ability where it can self design the next level of ML (and hardware on which it runs), and so-on and will leave human intelligence in the dust so the speak.

ML has shown ability to start from nothing (playing games, driving cars), and through positive feedback learning of the results, gain not only the 'rules of the system' but also to become as efficient as possible in the realm they operate. When it comes to lossless audio compression, the rules are simple, the resulting compression is either lossless or not, and one which is 50% smaller than another is an improvement. I would not be so confident to state that people have thought of all forms compression that are available, there will still be discoveries to be made and ML can speed that process.

Re: AI for audio compression?

Reply #11 – 2018-10-27 22:28:57

A few relevant sites:

"20 minutes for ML to learn to drive a car":

https://newatlas.com/wayve-autonomous-car-machine-learning-learn-drive/55340/

Alphago which beats the worlds best go player:

https://www.theregister.co.uk/2017/10/18/deepminds_latest_alphago_software_doesnt_need_human_data_to_win/
https://en.wikipedia.org/wiki/AlphaGo

"Many top Go players characterized AlphaGo's unorthodox plays as seemingly-questionable moves that initially befuddled onlookers, but made sense in hindsight: "All but the very best Go players craft their style by imitating top players. AlphaGo seems to have totally original moves it creates itself."

That one match is the reason why China has committed $150 billion to AI research...

Re: AI for audio compression?

Reply #12 – 2018-10-27 23:22:19

Quote from: spoon on 2018-10-27 22:10:37

Quote from: saratoga on 2018-10-27 21:40:02
Machine learning isn't magic. It is a heuristic approach to finding on average good solutions using a training set. You're not going to come up with some fundamentally new though, just shortcuts for finding solutions quicker so long as your input doesn't diverge too much from your training set.

I am quite certain in my life-time ML will become magic, it will reach a level of ability where it can self design the next level of ML (and hardware on which it runs), and so-on and will leave human intelligence in the dust so the speak.

While it is possible that this will happen, such things will have no relationship whatsoever to the algorithms you are linking to and speaking about, and so confusing them with the class of algorithms we are discussing is not a good idea. Modern machine learning has nothing in common with real machine intelligence, which appears to be what you are thinking of.

Quote from: spoon on 2018-10-27 22:10:37

ML has shown ability to start from nothing (playing games, driving cars), and through positive feedback learning of the results, gain not only the 'rules of the system' but also to become as efficient as possible in the realm they operate.

Very little of ML actually operates that way. More often algorithms start with a framework or basic structure known to be good for that problem and then use ML to tune the assumed parameters using a training set. Essentially this is an optimization process. This includes the paper you linked above, which starts with something very close to a transform codec and uses a GAN to optimize the parameters for that specific training set.

Quote from: spoon on 2018-10-27 22:10:37

I would not be so confident to state that people have thought of all forms compression that are available, there will still be discoveries to be made and ML can speed that process.

Conversely, having taken several minutes to skim your paper, I can confidently state that it cannot think of forms of compression that people have not already developed. Are you sure you actually understand the ideas you're describing? Have you read (and understood) those papers?

Quote from: spoon on 2018-10-27 22:10:37

A few relevant sites:

"20 minutes for ML to learn to drive a car":

That is a clickbait article that is sourced from a startup's press release. That is not a real paper nor a real source, it is just some marketing material meant to impress investors.

Re: AI for audio compression?

Reply #13 – 2018-10-28 08:50:05

The linked documents are an insight into the current progress in other fields, they do not directly relate to concepts in lossless audio compression.

I would not be so quick to dismiss the self learning ability from nothing, this is key I believe to efficiencies and discoveries in ai.

Re: AI for audio compression?

Reply #14 – 2018-10-28 12:03:56

Quote from: spoon on 2018-10-28 08:50:05

I would not be so quick to dismiss the self learning ability from nothing

I don't dismiss the ability to learn, but if we are close to the minimum file size, it is going to be a bit like hi-rez audio: yes you can measure a difference, no it is not relevant to the end-user.

Re: AI for audio compression?

Reply #15 – 2018-10-28 12:15:45

But hey, if I look beyond single-file streamable codecs:
Spoon might have a point on file-system level (cf. his attempt at the AudioSafe cloud storage project). Suppose you gather a helluvalotta users' files (all lossless of the same song, for the sake of the argument). Some might differ only in offsets or ripping single-frame ripping errors. Some might be the same mix, but differing in dither, and the second could be compressed down to 1/16th, say. Some might be a hi-rez version with the 15 MSBs more or less already covered - but might differ in sample rate. Some might differ in length (single files on compilation albums). Some remasters could be very much alike.

Notice