Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Using AI to encode "lossy FLAC" (Read 2146 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Using AI to encode "lossy FLAC"

hello, i was reading about google's codec Sound Stream that is capable of really good audio quality at low bitrates, if i undestand correctly they train an encoder AI and a decode AI that work toguether to fool the discriminator AI which is trained to know if the audio is the original or not, and they use a lot of data for it.

Then it ocurred to me that nothing stops you doing the same thing but instead training the AI to create a .wav that compresses really well to flac or other stablished audio formats with a good audio quality and small size, the cool thing about this is that you don't need a fancy decoder or a new file type and you can just train the encoder to get better, i wonder how good this thing can get

Re: Using AI to encode "lossy FLAC"

Reply #1
Using AI isn't the magical solution for everything. Just because AI can do magical things somewhere doesn't mean it can elsewhere.

Classic lossy formats, like MP3 and Vorbis, spend most of their bits on coding an 'approximation' of a signal, by converting it from the time domain to the frequency domain. Those formats have the right tools to describe these approximations. As far as I understand, in SoundStream the actual format is trained by AI, which gives it a profound advantage. The resulting tools of the format are probably much more complex than those used in MP3/Vorbis/Opus.

FLAC is very different. FLAC uses a very simple model (approximation) and spends most of the bits correcting that model. That is because a lossless codec needs to be, well, lossless, so correction is needed anyway, no model can ever be accurate enough. Other lossless codecs use (much) more complicated models to get a few percent extra compression, but none of these can do *much* better than FLAC. See here for a comparison between lossless audio codecs.

So, as FLAC has a very simple model, there is really nothing for an AI to optimize for. To fit FLACs model, you'd have to really make large (probably audible) sacrifices. It seems unlikely the end result will be better than processors like LossyWAV can achieve.
Music: sounds arranged such that they construct feelings.

 

Re: Using AI to encode "lossy FLAC"

Reply #2
If the purpose is a show-off, then I am sure they can get good results at significant bitrate s(h)avings. Reducing a, say, 777 kbit/s FLAC file to 256 kbit/s "within the format" would be impressive from that point of view - and not at all any competition against lossy-designed codecs that produce jolly listenable files at 64.

What I guess could be done with a good AI learning, is something like a psy model for dynamically adjusted LossyWAV: discard more bits when they don't matter much for audibility.
And I have a hunch that it could also learn how to gradually decimate stereo information. Slightly closer to mono could be listenable even when it is easily ABXable against the original.