Lyra is a high-quality, very low-bitrate speech codec that makes voice communication available even on the slowest networks. To do this, we’ve applied traditional codec techniques while leveraging advances in machine learning (ML) with models trained on thousands of hours of data to create a novel method for compressing and transmitting voice signals.
The basic architecture of the Lyra codec is quite simple. Features, or distinctive speech attributes, are extracted from speech every 40ms and are then compressed for transmission. The features themselves are log mel spectrograms, a list of numbers representing the speech energy in different frequency bands, which have traditionally been used for their perceptual relevance because they are modeled after human auditory response. On the other end, a generative model uses those features to recreate the speech signal. In this sense, Lyra is very similar to other traditional parametric codecs, such as MELP.
However traditional parametric codecs, which simply extract from speech critical parameters that can then be used to recreate the signal at the receiving end, achieve low bitrates, but often sound robotic and unnatural. These shortcomings have led to the development of a new generation of high-quality audio generative models that have revolutionized the field by being able to not only differentiate between signals, but also generate completely new ones. DeepMind’s WaveNet was the first of these generative models that paved the way for many to come. Additionally, WaveNetEQ, the generative model-based packet-loss-concealment system currently used in Duo, has demonstrated how this technology can be used in real-world scenarios.
A New Approach to Compression with Lyra
Using these models as a baseline, we’ve developed a new model capable of reconstructing speech using minimal amounts of data. Lyra harnesses the power of these new natural-sounding generative models to maintain the low bitrate of parametric codecs while achieving high quality, on par with state-of-the-art waveform codecs used in most streaming and communication platforms today. The drawback of waveform codecs is that they achieve this high quality by compressing and sending over the signal sample-by-sample, which requires a higher bitrate and, in most cases, isn’t necessary to achieve natural sounding speech.
One concern with generative models is their computational complexity. Lyra avoids this issue by using a cheaper recurrent generative model, a WaveRNN variation, that works at a lower rate, but generates in parallel multiple signals in different frequency ranges that it later combines into a single output signal at the desired sample rate. This trick enables Lyra to not only run on cloud servers, but also on-device on mid-range phones in real time (with a processing latency of 90ms, which is in line with other traditional speech codecs). This generative model is then trained on thousands of hours of speech data and optimized, similarly to WaveNet, to accurately recreate the input audio.
Comparison with Existing Codecs
Since the inception of Lyra, our mission has been to provide the best quality audio using a fraction of the bitrate data of alternatives. Currently, the royalty-free open-source codec Opus, is the most widely used codec for WebRTC-based VOIP applications and, with audio at 32kbps, typically obtains transparent speech quality, i.e., indistinguishable from the original. However, while Opus can be used in more bandwidth constrained environments down to 6kbps, it starts to demonstrate degraded audio quality. Other codecs are capable of operating at comparable bitrates to Lyra (Speex, MELP, AMR), but each suffer from increased artifacts and result in a robotic sounding voice.
Lyra is currently designed to operate at 3kbps and listening tests show that Lyra outperforms any other codec at that bitrate and is compared favorably to Opus at 8kbps, thus achieving more than a 60% reduction in bandwidth. Lyra can be used wherever the bandwidth conditions are insufficient for higher-bitrates and existing low-bitrate codecs do not provide adequate quality.
It looks like this is no longer an audio "codec" - it's basically the AI to recognize speech and synthesize it which is simply amazing. Perhaps future video codecs will work similarly. NVIDIA has already created a working AI powered video codec for video conferences which requires a much lower bitrate than standard codecs.
Exact Audio Copy v1.6 released
- Standard setup now using gnudb.org instead of freedb.org (for built-in engine)
- Several small problems with secondary encoder
- Fixed problems with the Musicbrainz plugin
- Several smaller bugs removed
No new features.
Issue "Replace spaces by underscores" (Filename and Additional Filename tab)
https://hydrogenaud.io/index.php?msg=980398 was one of the bugs addressed.
General audio discussion. All topics which don't fit to other forums should go here.
by AdilLast post:
Discuss current news items. Post new items in News Submissions.
by RollinLast post:
Submit news for validation. Validated news will appear in the "Validated News"-section and on the front page.
by vanquyenvaeuro2020Last post:
Make your voice heard in the Hydrogenaudio Polls.
by For SeriousLast post:
Hydrogenaudio.org site discussion. Feedback, suggestions, problems etc. related to the the site and forums.
by PeterLast post:
Discussion of listening test results, techniques and arrangements.
by Kamedo2Last post:
Discussion of psychoacoustic phenomena and models, coding architectures and algorithms, and other general DSP related subjects.
by PorcusLast post:
This is the forum for regular members to upload files for use by others. Hydrogenaudio.org takes no responsibility for the content that may be present here, but states that any misuse of this forum, as deemed by the staff, may result in revocation of the offending users account. Acceptable content includes freely and legally distributable data of the following types: audio programs, audio samples (under 30 second clips), misc. audio related data, or other utilities which are immediately relevant to the Hydrogenaudio.org community.
by 40th.comLast post:
Discussion of AAC audio compression
by OLPPLast post:
Discussion of MP3 Audio Compression
by Markuza97Last post:
Discussion of Ogg Vorbis Audio Compression
by deus-exLast post:
Discussion of other lossy audio codecs like AC3, ADPCM, Atrac, Dolby Pro logic/II, DTS, MP1, MP2, Real Audio, VQF, Wavpack lossy, WMA etc.
by C.R.HelmrichLast post:
Discussion of speech codecs like Speex, GSM-FR, GSM-EFR, iLBC, G.723.1, G.728, G.729, AMR-NB, AMR-WB, VSELP, ACELP.xxx etc.
by spoonLast post:
Discussion of MPC (Musepack) audio compression. The official forum is at http://forum.musepack.net/
by ani_Jackal3Last post:
Discussion of the Opus codec. Both technical/developer and user questions go here.
by ani_Jackal3Last post:
Discussion of FLAC Lossless Audio Compression
Discussion of WavPack Lossless Audio Compression
by PorcusLast post:
General discussion of lossless audio compression and other lossless codecs like ALAC, Monkey's Audio, WMA Lossless, OptimFrog, LA, LPAC, Shorten, TAK etc.
by bennetngLast post:
News forum of the Audio/Video section.
by mudlordLast post:
Discussion of general A/V topics such as DivX/XviD, AVC (H.264), DVD ripping, VirtualDub, container formats, streaming, and so on.
Discussion dedicated especially to movie and multichannel -audio.
by AndyH-haLast post:
Discussion of CD-ROM/-R/-RW/DVD-hardware, copying, ripping and burning of CD media, EAC, CDex, Plextools etc.
by korthLast post:
Discussion of Audio Hardware, Soundcards, Hi-Fi equipment, stand-alone CD players, portable MP3 players, headphones etc.
by itisljarLast post:
Discussion of playback and recording of vinyl records, turntables, and related hardware.
by paregistraseLast post:
Official foobar2000 forum. Discussion about Peter Pawlowski's advanced and compact audio player for Microsoft Windows called foobar2000.
Native MP3, Ogg Vorbis, MPC, FLAC, Ogg FLAC, WAV, MOD -support.
by snotlickerLast post:
All things "Other".. whatever doesn't fit somewhere else, goes here.
by fufloLast post:
The trashcan of HydrogenAudio. These posts represent the kind of messages we wouldn't like to see any more. These include: trolls, offensive, zealotry, spam and other useless and redundant messages.
by clintbLast post: