In “SoundStream: an End-to-End Neural Audio Codec”, we introduce a novel neural audio codec that extends those efforts by providing higher-quality audio and expanding to encode different sound types, including clean speech, noisy and reverberant speech, music, and environmental sounds. SoundStream is the first neural network codec to work on speech and music, while being able to run in real-time on a smartphone CPU. It is able to deliver state-of-the-art quality over a broad range of bitrates with a single trained model, which represents a significant advance in learnable codecs.
SoundStream at 3 kbps outperforms Opus at 12 kbps and approaches the quality of EVS at 9.6 kbps, while using 3.2x–4x fewer bits.
Still experimental, not yet in production but the audio samples on the page are pretty impressive.