Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: EnCodec: High Fidelity Neural Audio Compression (Read 8496 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Re: EnCodec: High Fidelity Neural Audio Compression

Reply #1
Not having more than a superficial understanding of neural networks, I was going to ask if the decoder was dependent on the encoder's network, and if updating the encoder would break compatibility with existing decoders, but it seems it does, if I'm reaching the right conclusion from what's said in your second link about the decoder taking just as long as the encoder and requiring all the same software.

The compression is impressive, indeed, but does this approach have the potential to become as usable as current codecs?


 

Re: EnCodec: High Fidelity Neural Audio Compression

Reply #3
but does this approach have the potential to become as usable as current codecs?
If you're correct, I suspect the usefulness is limited to situations where one party controls both ends of the encode/decode process and bandwidth (and/or storage) is incredibly limited and/or just prohibitively expensive.  I also suspect this requires much more CPU power/RAM than more traditional codecs, to the point where that might severely limit its usefulness as well

Re: EnCodec: High Fidelity Neural Audio Compression

Reply #4
Im trying to encode with it, but im getting this error:
Code: [Select]
File "C:\Users\Me\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torchaudio\backend\no_backend.py", line 16, in load
    raise RuntimeError("No audio I/O backend is available.")
RuntimeError: No audio I/O backend is available.
Any Ideas?

Edit: Nvm it just needed soundfile.
And so, with digital, computer was put into place, and all the IT that came with it.

Re: EnCodec: High Fidelity Neural Audio Compression

Reply #5
Im trying to encode with it, but im getting this error:
Code: [Select]
File "C:\Users\Me\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torchaudio\backend\no_backend.py", line 16, in load
    raise RuntimeError("No audio I/O backend is available.")
RuntimeError: No audio I/O backend is available.
Any Ideas?

Edit: Nvm it just needed soundfile.

How much space were you able to save?  were you able to play that on a phone?

Re: EnCodec: High Fidelity Neural Audio Compression

Reply #6
@binaryhermit
Oh, yes, that's also discussed in that other forum that the OP linked. It takes a beefy computer to do. Lots of CPU power and a good chunk of RAM. I'm not nearly qualified enough to estimate how much that can improve in the future as to guess whether this can eventually become a viable common codec.

Re: EnCodec: High Fidelity Neural Audio Compression

Reply #7
but does this approach have the potential to become as usable as current codecs?
If you're correct, I suspect the usefulness is limited to situations where one party controls both ends of the encode/decode process and bandwidth (and/or storage) is incredibly limited and/or just prohibitively expensive.  I also suspect this requires much more CPU power/RAM than more traditional codecs, to the point where that might severely limit its usefulness as well
Their paper claims ~10x realtime encoding and decoding on one thread of a MacBook Pro M1 for the basic 24kHz model, down to 5x for the 48kHz model, and then down to only 2/3x realtime when also using their entropy encoder (which reduces bitrate by an additional ~25% without quality loss). So, it's anywhere from >10x to >300x slower than Opus, and uses a lot more memory.

I saw a comment on HN saying that the model has about 15 million parameters. That's ~200 times bigger than the optimized version of LPCNet. It's probably a few times bigger than Lyra/Soundstream. (It's tens of thousands of times smaller than some of the big language AI models out there.)

There may be lots of room for optimization. For instance, you can often make weight matrices sparser or reduce their precision without really impacting the quality of an AI model, as jmv etc did with LPCNet.

And the linear algebra steps involved in AI models are inherently parallelizable. So multi-threading may give a sizeable speedup, and GPUs and other accelerators / vector units may give large speedups or perf/W improvements. And future and hardware will continue to speed up parallelizable tasks in the future, in contrast to sequential tasks which haven't seen rapid improvement since the breakdown of Dennard scaling c. 15 years ago.

So while the speed does mean it's not immediately practical, it doesn't mean this kind of direction lacks promise or that NN codecs won't be practical in the coming years.

Optimizations and better hardware don't address things like latency or seeking, though. It looks like the encode.su thread was looking at the "non-streaming" encoder, which, at best, processes the audio in chunks of a full second. EnCodec does have a "streaming" mode which advertises 13ms latency, but the paper doesn't report results with that mode for 48kHz audio, only for 24kHz.

In any case, the neural network codecs show there's still plenty of room for improved compression of normal audio. A decade ago I got into an argument with someone here when he claimed there'd never be such improvements.

Re: EnCodec: High Fidelity Neural Audio Compression

Reply #8
@jensend I'll wait and see, then. Nothing I can add to it.

Re: EnCodec: High Fidelity Neural Audio Compression

Reply #9
Their paper claims ~10x realtime encoding and decoding on one thread of a MacBook Pro M1 for the basic 24kHz model, down to 5x for the 48kHz model, and then down to only 2/3x realtime when also using their entropy encoder (which reduces bitrate by an additional ~25% without quality loss). So, it's anywhere from >10x to >300x slower than Opus, and uses a lot more memory.
I read 2/3x as 2 to 3 times realtime, but now I see it is 0.66x realtime (two-thirds). Also, the paper doesn't mention whether that 48kHz figure is for monophonic or stereophonic audio. From context and the numbers, it seems reasonable to assume there's another slowdown there.

Also, I can't find a mention of the M1 CPU anywhere. They mention a Macbook Pro 2019. The M1 was introduced only in 2020. As far as I can find it seems the mentioned Macbook was still using a high-end Intel CPU.
Music: sounds arranged such that they construct feelings.

Re: EnCodec: High Fidelity Neural Audio Compression

Reply #10
I read 2/3x as 2 to 3 times realtime
yeah, for that I would have written 2-3x
Quote
Also, I can't find a mention of the M1 CPU anywhere.
Sorry, entirely my bad as a non-Apple guy misremembering the M1 launch timing and forgetting Apple's Coffee Lake refresh, which this would be.

Re: EnCodec: High Fidelity Neural Audio Compression

Reply #11
Im trying to encode with it, but im getting this error:
Code: [Select]
File "C:\Users\Me\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torchaudio\backend\no_backend.py", line 16, in load
    raise RuntimeError("No audio I/O backend is available.")
RuntimeError: No audio I/O backend is available.
Any Ideas?

Edit: Nvm it just needed soundfile.

How much space were you able to save?  were you able to play that on a phone?
Not practical. It already takes like a solid minute to encode / decode a file on a PC. I did try on fairly dynamic pieces @ 24 kbps -hq and then encoded back to flac so i can play it. it was not that good sounding ngl. You are better off using opus at that bitrate.
And so, with digital, computer was put into place, and all the IT that came with it.

Re: EnCodec: High Fidelity Neural Audio Compression

Reply #12
Im trying to encode with it, but im getting this error:
Code: [Select]
File "C:\Users\Me\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torchaudio\backend\no_backend.py", line 16, in load
    raise RuntimeError("No audio I/O backend is available.")
RuntimeError: No audio I/O backend is available.
Any Ideas?

Edit: Nvm it just needed soundfile.

How much space were you able to save?  were you able to play that on a phone?
Not practical. It already takes like a solid minute to encode / decode a file on a PC. I did try on fairly dynamic pieces @ 24 kbps -hq and then encoded back to flac so i can play it. it was not that good sounding ngl. You are better off using opus at that bitrate.
Btw. on ridiculously low bitrates like 1.5 kbps on the speech optimised version it has the potential to competely change the tonal properties of the song, which is fun to experiment with and quite alien.
It is a fun experiment, i could try with pop music too.
And so, with digital, computer was put into place, and all the IT that came with it.

Re: EnCodec: High Fidelity Neural Audio Compression

Reply #13
Here is a preview of what i mean with that it changes tonal properties ;)
And so, with digital, computer was put into place, and all the IT that came with it.

Re: EnCodec: High Fidelity Neural Audio Compression

Reply #14
Here is a preview of what i mean with that it changes tonal properties ;) I think this one was at around 6 kbps

And so, with digital, computer was put into place, and all the IT that came with it.

Re: EnCodec: High Fidelity Neural Audio Compression

Reply #15
Since it's from the same company, I figured I should post it here, instead of making a new thread:

MLow: Meta’s low bitrate audio codec

Quote
The MLow codec

We broke ground with our development of a new codec in late 2021. After nearly two years of active development and testing, we are proud to announce Meta Low Bitrate audio codec, aka MLow, which achieves two-times-better quality than Opus (POLQA MOS 1.89 vs 3.9 @ 6kbps WB). Even more importantly, we are able to achieve this great quality while keeping MLow’s computational complexity 10 percent lower than that of Opus.

Figure 2 below shows a MOS (Mean Opinion Score) plot on a 1-5 scale and compares the POLQA scores between Opus and MLow at various bitrates. As the chart makes evident, MLow has a huge advantage over Opus at the lowest bitrates, where it saturates quality faster than Opus.

X

Hacker News discussion