HydrogenAudio

Lossy Audio Compression => Other Lossy Codecs => Topic started by: AiZ on 2022-10-29 21:36:38

Title: EnCodec: High Fidelity Neural Audio Compression
Post by: AiZ on 2022-10-29 21:36:38
Hello,

EnCodec: High Fidelity Neural Audio Compression (https://github.com/facebookresearch/encodec)

First saw here (https://encode.su/threads/3966-EnCodec-High-Fidelity-Neural-Audio-Compression). As DZgas pointed out, results on mainstream music at 12Kbps with 48Khz model are impressive.

    AiZ
Title: Re: EnCodec: High Fidelity Neural Audio Compression
Post by: radorn on 2022-10-30 10:34:26
Not having more than a superficial understanding of neural networks, I was going to ask if the decoder was dependent on the encoder's network, and if updating the encoder would break compatibility with existing decoders, but it seems it does, if I'm reaching the right conclusion from what's said in your second link about the decoder taking just as long as the encoder and requiring all the same software.

The compression is impressive, indeed, but does this approach have the potential to become as usable as current codecs?
Title: Re: EnCodec: High Fidelity Neural Audio Compression
Post by: MinPower on 2022-10-31 03:57:04
Hello,

EnCodec: High Fidelity Neural Audio Compression (https://github.com/facebookresearch/encodec)

First saw here (https://encode.su/threads/3966-EnCodec-High-Fidelity-Neural-Audio-Compression). As DZgas pointed out, results on mainstream music at 12Kbps with 48Khz model are impressive.

    AiZ

I hope it wouldn't take as long as Google to push out an encoder, which is still not here after a year!
Title: Re: EnCodec: High Fidelity Neural Audio Compression
Post by: binaryhermit on 2022-10-31 09:42:17
but does this approach have the potential to become as usable as current codecs?
If you're correct, I suspect the usefulness is limited to situations where one party controls both ends of the encode/decode process and bandwidth (and/or storage) is incredibly limited and/or just prohibitively expensive.  I also suspect this requires much more CPU power/RAM than more traditional codecs, to the point where that might severely limit its usefulness as well
Title: Re: EnCodec: High Fidelity Neural Audio Compression
Post by: Kartoffelbrei on 2022-10-31 22:04:04
Im trying to encode with it, but im getting this error:
Code: [Select]
File "C:\Users\Me\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torchaudio\backend\no_backend.py", line 16, in load
    raise RuntimeError("No audio I/O backend is available.")
RuntimeError: No audio I/O backend is available.
Any Ideas?

Edit: Nvm it just needed soundfile.
Title: Re: EnCodec: High Fidelity Neural Audio Compression
Post by: MinPower on 2022-11-01 05:59:22
Im trying to encode with it, but im getting this error:
Code: [Select]
File "C:\Users\Me\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torchaudio\backend\no_backend.py", line 16, in load
    raise RuntimeError("No audio I/O backend is available.")
RuntimeError: No audio I/O backend is available.
Any Ideas?

Edit: Nvm it just needed soundfile.

How much space were you able to save?  were you able to play that on a phone?
Title: Re: EnCodec: High Fidelity Neural Audio Compression
Post by: radorn on 2022-11-01 13:06:12
@binaryhermit
Oh, yes, that's also discussed in that other forum that the OP linked. It takes a beefy computer to do. Lots of CPU power and a good chunk of RAM. I'm not nearly qualified enough to estimate how much that can improve in the future as to guess whether this can eventually become a viable common codec.
Title: Re: EnCodec: High Fidelity Neural Audio Compression
Post by: jensend on 2022-11-06 04:46:50
but does this approach have the potential to become as usable as current codecs?
If you're correct, I suspect the usefulness is limited to situations where one party controls both ends of the encode/decode process and bandwidth (and/or storage) is incredibly limited and/or just prohibitively expensive.  I also suspect this requires much more CPU power/RAM than more traditional codecs, to the point where that might severely limit its usefulness as well
Their paper (https://arxiv.org/abs/2210.13438) claims ~10x realtime encoding and decoding on one thread of a MacBook Pro M1 for the basic 24kHz model, down to 5x for the 48kHz model, and then down to only 2/3x realtime when also using their entropy encoder (which reduces bitrate by an additional ~25% without quality loss). So, it's anywhere from >10x to >300x slower than Opus, and uses a lot more memory.

I saw a comment on HN saying that the model has about 15 million parameters. That's ~200 times bigger than the optimized version of LPCNet. It's probably a few times bigger than Lyra/Soundstream. (It's tens of thousands of times smaller than some of the big language AI models out there.)

There may be lots of room for optimization. For instance, you can often make weight matrices sparser or reduce their precision without really impacting the quality of an AI model, as jmv etc did with LPCNet.

And the linear algebra steps involved in AI models are inherently parallelizable. So multi-threading may give a sizeable speedup, and GPUs and other accelerators / vector units may give large speedups or perf/W improvements. And future and hardware will continue to speed up parallelizable tasks in the future, in contrast to sequential tasks which haven't seen rapid improvement since the breakdown of Dennard scaling (https://en.wikipedia.org/wiki/Dennard_scaling) c. 15 years ago.

So while the speed does mean it's not immediately practical, it doesn't mean this kind of direction lacks promise or that NN codecs won't be practical in the coming years.

Optimizations and better hardware don't address things like latency or seeking, though. It looks like the encode.su thread was looking at the "non-streaming" encoder, which, at best, processes the audio in chunks of a full second. EnCodec does have a "streaming" mode which advertises 13ms latency, but the paper doesn't report results with that mode for 48kHz audio, only for 24kHz.

In any case, the neural network codecs show there's still plenty of room for improved compression of normal audio. A decade ago I got into an argument with someone here when he claimed there'd never be such improvements.
Title: Re: EnCodec: High Fidelity Neural Audio Compression
Post by: radorn on 2022-11-06 09:12:27
@jensend I'll wait and see, then. Nothing I can add to it.
Title: Re: EnCodec: High Fidelity Neural Audio Compression
Post by: ktf on 2022-11-06 09:56:51
Their paper (https://arxiv.org/abs/2210.13438) claims ~10x realtime encoding and decoding on one thread of a MacBook Pro M1 for the basic 24kHz model, down to 5x for the 48kHz model, and then down to only 2/3x realtime when also using their entropy encoder (which reduces bitrate by an additional ~25% without quality loss). So, it's anywhere from >10x to >300x slower than Opus, and uses a lot more memory.
I read 2/3x as 2 to 3 times realtime, but now I see it is 0.66x realtime (two-thirds). Also, the paper doesn't mention whether that 48kHz figure is for monophonic or stereophonic audio. From context and the numbers, it seems reasonable to assume there's another slowdown there.

Also, I can't find a mention of the M1 CPU anywhere. They mention a Macbook Pro 2019. The M1 was introduced only in 2020. As far as I can find it seems the mentioned Macbook was still using a high-end Intel CPU.
Title: Re: EnCodec: High Fidelity Neural Audio Compression
Post by: jensend on 2022-11-06 14:33:17
I read 2/3x as 2 to 3 times realtime
yeah, for that I would have written 2-3x
Quote
Also, I can't find a mention of the M1 CPU anywhere.
Sorry, entirely my bad as a non-Apple guy misremembering the M1 launch timing and forgetting Apple's Coffee Lake refresh, which this would be.
Title: Re: EnCodec: High Fidelity Neural Audio Compression
Post by: Kartoffelbrei on 2022-11-09 19:23:26
Im trying to encode with it, but im getting this error:
Code: [Select]
File "C:\Users\Me\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torchaudio\backend\no_backend.py", line 16, in load
    raise RuntimeError("No audio I/O backend is available.")
RuntimeError: No audio I/O backend is available.
Any Ideas?

Edit: Nvm it just needed soundfile.

How much space were you able to save?  were you able to play that on a phone?
Not practical. It already takes like a solid minute to encode / decode a file on a PC. I did try on fairly dynamic pieces @ 24 kbps -hq and then encoded back to flac so i can play it. it was not that good sounding ngl. You are better off using opus at that bitrate.
Title: Re: EnCodec: High Fidelity Neural Audio Compression
Post by: Kartoffelbrei on 2022-11-09 19:26:56
Im trying to encode with it, but im getting this error:
Code: [Select]
File "C:\Users\Me\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torchaudio\backend\no_backend.py", line 16, in load
    raise RuntimeError("No audio I/O backend is available.")
RuntimeError: No audio I/O backend is available.
Any Ideas?

Edit: Nvm it just needed soundfile.

How much space were you able to save?  were you able to play that on a phone?
Not practical. It already takes like a solid minute to encode / decode a file on a PC. I did try on fairly dynamic pieces @ 24 kbps -hq and then encoded back to flac so i can play it. it was not that good sounding ngl. You are better off using opus at that bitrate.
Btw. on ridiculously low bitrates like 1.5 kbps on the speech optimised version it has the potential to competely change the tonal properties of the song, which is fun to experiment with and quite alien.
It is a fun experiment, i could try with pop music too.
Title: Re: EnCodec: High Fidelity Neural Audio Compression
Post by: Kartoffelbrei on 2022-11-09 19:41:23
Here is a preview of what i mean with that it changes tonal properties ;)
Title: Re: EnCodec: High Fidelity Neural Audio Compression
Post by: Kartoffelbrei on 2022-11-09 19:43:21
Here is a preview of what i mean with that it changes tonal properties ;) I think this one was at around 6 kbps