Just found this blog post: https://techcommunity.microsoft.com/t5/microsoft-teams-blog/get-the-most-from-your-meetings-and-calls-with-microsoft-teams/ba-p/1911016
Attached are the two demonstration files linked to in that post (Silk.wav (https://cdn.techcommunity.microsoft.com/assets/MicrosoftTeams/Silk.wav) and Satin.wav (https://cdn.techcommunity.microsoft.com/assets/MicrosoftTeams/Satin.wav)), with the Silk version upsampled to 32 kHz and properly delay matched with the Satin version, for more reliable comparison.
Judging from the file name and sampling rate, Microsoft Teams previously used Silk (the speech coding core of Opus) in a Narrowband configuration (audio only up to 4 kHz), at least at low bit-rates. The new codec seems to achieve Super-wideband coding (the audio range up to 8 kHz is waveform coded, with some simple SBR-like audio bandwidth extension from Wideband 8 kHz to Super-wideband 16 kHz).
No clue which bit-rate this demo was made at, but judging, again, from the Silk.wav, it was likely quite low. Does anybody know more about this new codec?
The packet loss concealment demo in the above blog post is also quite convincing.
Chris
Found mentions of it back in march 2020 already.
- https://techcommunity.microsoft.com/t5/microsoft-teams-blog/what-s-new-in-microsoft-teams-3rd-anniversary-edition/ba-p/1234871#
- https://tomtalks.blog/2020/03/new-satin-codec-coming-to-microsoft-teams/
Can't seem to find more details about it though.
My first reaction was "oh, just when the world desperately needed another WMA!", but maybe there is more to it?
Considering a high grade of similarity between WMA with LC/HE-AAC also WMV/VC1 with MPEG4 Part2 ASP/H.263(+) there are high chances that a new MSFT codec (Satin) is EVS or based on it.
EVS provides WB at 7.2 kbps and now MSFT speaks about WB at 7 kbps. It's too much coincidence (?)
P.S: as a user, I can say that MSFT Teams has a very good audio VoIP quality and overall experience :)
The name "satin" makes me think it's somehow related to the older "silk" codec that was part of Skype (and IIRC is used for speech in Opus or something like that)
It's definitely related to Silk. Microsoft acquired Skype in 2011 (https://news.microsoft.com/about/) and I assume that some of the developers of Silk (for Skype back then) improved upon that codec, and kept the naming scheme. After all, it's much easier and cheaper to gradually improve a codec than to develop a completely new one.
Chris
Oops, then my assumption about Satin being based on EVS was wrong.
Today I've seen FR of Satin_32kHz.wav and it's not similar to EVS's bandwidth extension. It's rather similar to Opus/CELT's band folding or something else.
Considering a high grade of similarity between WMA with LC/HE-AAC
WMA is a stripped down MDCT codec, so it is superficially similar to almost all modern codecs. I don't think AAC specifically was a huge inspiration, probably AC3 was given the time frame and how similar the core codecs are. The option to let the encoder use tons of different transform lengths all in the same file seems like a (over?) reaction to how mp3 picked poor transform sizes and then was stuck with them. Maybe AC3 and MP3 as inspirations.
Yes, You probably mean an older versions of WMA.
However the latest WMA10pro has SBR-like BWE and its efficiency is on par with HE/LC-AAC.
^^^^ You meant WMA Pro. It's a completely different codec then WMA.
ok, got it.
Satin: Microsoft’s latest AI-powered audio codec for real-time communications (https://techcommunity.microsoft.com/t5/microsoft-teams-blog/satin-microsoft-s-latest-ai-powered-audio-codec-for-real-time/ba-p/2141382)
New blog post by Microsoft with more details and audio samples. Impressive audio quality at 6 kbps.
Unless I missed something it doesn't mention whether this codec will be royalty-free or proprietary.
Satin is already being used for all Teams and Skype two-party calls and will roll out for Teams meetings soon. It currently operates in wideband voice mode within a bitrate range of 6 – 36 kbps and will be extended to support full-band stereo music at a maximum sampling rate of 48 kHz in the near future. We are very excited for you to try this new codec and let us know what you think.
I wonder about the higher bitrate performance and how it will compare to Opus.
@Spyros, thank you for the link.
This is actually good news that MSFT Satin won't be just AI-based speech codec but also will support fullband music. :)
It's clear now that next generation audio codecs will be AI-based.
Hello,
Just saw this on Phoronix (https://www.phoronix.com/scan.php?page=news_item&px=Google-Lyra) :
Google AI Blog: Lyra: A New Very Low-Bitrate Codec for Speech Compression (https://ai.googleblog.com/2021/02/lyra-new-very-low-bitrate-codec-for.html)
3kbps... ???
AiZ
Hello,
Just saw this on Phoronix (https://www.phoronix.com/scan.php?page=news_item&px=Google-Lyra) :
Google AI Blog: Lyra: A New Very Low-Bitrate Codec for Speech Compression (https://ai.googleblog.com/2021/02/lyra-new-very-low-bitrate-codec-for.html)
Very impressive. Let's hope one of these (Satin or Lyra) becomes an open standard. I wonder how they compare with Codec2.
Paper linked at the blog post: Generative Speech Coding with Predictive Variance Regularization (https://arxiv.org/abs/2102.09660)
With these latest developments, it certainly feels we are at the endgame for lossy audio codecs and that after 2022-2025 the improvements will be extremely tiny, if any.
As if LPCNet @ 1.6 kbps (https://jmvalin.ca/demo/lpcnet_codec/) wasn't enough low now they have outperformed it and even at lowered rate of 0.9 kbps (yes, less than 1 kbps!) https://arxiv.org/pdf/2102.06610.pdf ???
Any samples at such a low bitrate?
I'm extremely skeptical.