Just came accross this current development going on in the Opus codec, Neural encoding.
https://www.amazon.science/blog/neural-encoding-enables-more-efficient-recovery-of-lost-audio-packets
Bizarre stuff the Xiph team is up to!
These are the 3 branches on their GitHub that correspond to this neural encoding approach:
https://github.com/xiph/opus/tree/exp_neural_fec3
https://github.com/xiph/opus/tree/exp-neural-silk-enhancement
Would anyone be so kind to generate x64 windows binaries for this one? Awaiting.
https://www.rarewares.org/opus.php (https://www.rarewares.org/opus.php) ;)
I am assuming that Opus 1.4 includes Opus in-band FEC (LBRR), but 1.4 does NOT include the "neural packet loss concealment" and "deep redundancy (DRED) features", based on the news section of the official website (https://opus-codec.org/).
In order to use the much awaited DRED features, we need the exp_neural_fec3 branch (https://gitlab.xiph.org/xiph/opus/-/tree/exp_neural_fec3) or the exp-neural-silk-enhancement branch (https://gitlab.xiph.org/xiph/opus/-/tree/exp-neural-silk-enhancement), I guess.
OK, I'll see what I can do. :)
Just came accross this current development going on in the Opus codec, Neural encoding.
I'm wondering if this can improve generic speech encoding like for example 22 kbps audio books, or if all these experimental branches are just tuned for ultra-low bitrates, preferably in real time. https://github.com/xiph/opus/commits/exp-neural-silk-enhancement
Here's the 2020 research paper: "We have shown that neural synthesis can significantly improve the output of a low bit rate Opus bit stream. Previous speech coding efforts using neural synthesis were based on pure parametric coding, here we expand the scope to address also a waveform matching coder. Furthermore, when using the LPCNet architecture, real-time synthesis can be achieved even on a mobile device." https://arxiv.org/pdf/1905.04628v1.pdf
Looking at the speech performance of USAC in xHE-AAC encoders (i.e. not the open source exhale), Opus could use a boost...
At first glance it seemed to me the linked articles were only about using machine learning for enhancements like packet loss concealment, but it seems there is more. Apparently a backwards-compatible Opus v2 is in the works: https://datatracker.ietf.org/doc/charter-ietf-mlcodec/
Deliverables
1. A specification for a generic Opus extension mechanism that can be
used not only for the other proposed deliverables, but can also sustain
further extensions to Opus in the future. This document shall be a
Proposed Standard document.
2. A specification for coding large amounts of very low bitrate
redundancy information for the purpose of significantly improving the
robustness of Opus to bursts of packet loss. This document shall be a
Proposed Standard document.
3. A specification for improving the quality of SILK- and hybrid-coded
speech through decoder changes, with and without side information
provided by the encoder. This will be done in a way that does not affect
interoperability between original and extended implementations. This
document shall be a Proposed Standard document.
4. A specification for improving the quality of CELT-coded audio (both
speech and music) through decoder changes, with and without side
information provided by the encoder. This will be done in a way that
does not affect interoperability between original and extended
implementations. This document shall be a Proposed Standard document.
a compile for this project would be a really interesting thing, really wanna try how this encoding actually works/sounds
a compile for this project would be a really interesting thing, really wanna try how this encoding actually works/sounds
I have to agree. I want to compile this so I plan to target the exp_neural_fec4 (https://gitlab.xiph.org/xiph/opus/-/tree/exp_neural_fec4) branch where there appears to be current activity. Does that seem useful?
I don't think this makes sense yet. The stuff is still very experimental, and the most active branches are fec (forward error correction) and silk. Both aren't going to make a difference when playing audio locally: there is no need for fec for local playback and music is usually stored with celt instead of silk IIRC.