Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Future of Audio Codecs and Acoustic Transparency Threshold  (Read 2613 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Future of Audio Codecs and Acoustic Transparency Threshold

Is there any roadmap for future codecs and a ballpark estimate of further efficiency improvements (stereo music) down the road?

I was very curious as to whether there is a theoretical threshold of some sort beyond which compression is either impossible or unfeasibly difficult (computationally or otherwise). Opus has improved substantially from previous codecs throughout the bitrate spectrum (albeit varyingly).

How much more can we extract from those bits?

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #1
It's hard to predict the future but MP3 was patented in 1989 so there has been more than 30 of development/advancement in perceptual compression.   And I don't think there have been any sound-quality related tweaks to LAME in more than 10 years, so MP3 has probably been pushed as far as it can go.     Maybe other formats can be improved a little, but probably not a lot.

Plus, increases in bandwidth and file storage space reduce the need for lossy compression everyday!   (Lossless compression also seems mature.)

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #2
The main drivers in audio codec development is currently more towards low-delay codecs for online conferencing, telephony, live-streaming, and on a much more band-limited scale codecs for digital radio communication (2-way radios, etc.).
Another important area is codecs used for wireless transmission in relatively near fields, things like Bluetooth headsets, headphones, etc.

When it comes to streaming media, or even download media, low-delay is important, but even with quality aspects in mind, current codecs aren't really saturating most of what we'd consider bandwidth limitation. I think we'll only see relatively minute changes to that in the forthcoming years.

When it comes to archiving and when quality is the prime aspect, even when we're talking about downloads and/or streaming, there isn't really much of a reason not to just go lossless. Listening to lossless music on a phone through crappy earbuds (those that came with the phone, etc.) while on a bus or train commute, might not be a very sensible application of lossless media. But for that we have Opus and (HE-) AAC which are fine. And when you're at home and you need to stream music from a media server to a nice DAC, there's pretty much no reason to not go for lossless. Fast local networks are so cheap these days, that even streaming uncompressed audio isn't likely to saturate your network. Not only that, but lossless has gained so much hardware acceptance in the last couple years (or is it already decades?), that there's little argument for not using lossless.

As networks get faster and cheaper, simplex audio transfer is gonna edge more towards lossless, while duplex audio is gonna be more and more going for low delay.

Space constraint are almost never the issue really. Even cloud storage is cheap enough, that people can store entire lossless audio libraries for next to nothing - let alone a 1TB hard drive.
The only exception where I see this, is perhaps things like audio books. An audio book is for instance around 10h of audio, so that'll add up if you want to keep the audio book on a mobile device etc. However, this is also an application of primarily speech codecs, etc. so what I've written at the top is also kinda true for those.

When it comes to just keeping music, stereo or however many channels you want, more people are gonna demand lossless downloads, or rip to lossless, and just keep it as that. There's not much reason to go for anything else.

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #3
Is there any roadmap for future codecs and a ballpark estimate of further efficiency improvements (stereo music) down the road?

I was very curious as to whether there is a theoretical threshold of some sort beyond which compression is either impossible or unfeasibly difficult (computationally or otherwise). Opus has improved substantially from previous codecs throughout the bitrate spectrum (albeit varyingly).

How much more can we extract from those bits?
How few bits can a song possibly be represented with using currently unknown codecs, while still being perceived as "good enough"?

I don't know how you could estimate such as thing. If a song can be generated using a 2KB "MIDI file" + commodity sound generators + commodity audio effects, then that may be a significantly smaller number than any current lossy music codecs. But then you need a superset of all music studios in the decoder, and the search problem in the encoder is going to be really large unless you can import the parametric music generation parameters directly.

If machine learning can chew along happily on all of the music catalogs and try to express each song using a minimum amount of bits, and try to fool some perfect digital representation of our auditory system into believing that it is lossless, then it would only be a matter of compute power, time and access to those catalogs.

-k

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #4
It seems that lossless codecs are converging on ~2:1 compression on generic material. I don't know information theory well enough to say if it is possible to define a hard minimum on how small a file could possibly be represented. Trivially, it could be stored in the decoder and selected by a simple table look-up, but that is not very interesting. Similarly, one could design an encoding scheme that would compress one single file extremely well, but not anything else. Not so useful.

So perhaps I am asking for a lower bound on the total number of bits needed to lossless express a large catalog of music while also keeping the size of the decoder < 1MB.

-k

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #5
COVID and all the video conferencing thing have led to more awareness of streaming quality over low quality lines, but audio is such a small fraction video stream resource requirements, that I guess improvements there will be for the marketing value of "oh we got everything better than the competition!"

Another important area is codecs used for wireless transmission in relatively near fields, things like Bluetooth headsets, headphones, etc.
aptX lossless was announced for hardware support half a year ago (that's where I found this marketing) out. aptX lossless itself was announced in 2009.
Likely it is low-complexity. (Whether it is lower-complexity than FLAC ... who knows. Need not be. Why should a corporate take a ready-made solution when they can make an inferior one just to license it out?)

As for using lossies (you mentioned Opus and (HE-) AAC): maybe some focus on battery efficiency will be appropriate, but whatever is acceptable for now will be considered acceptable next year too. People with a small music collection might have their entire library copied (rather than transcoded) to a mobile device, and with storage growing and streaming taking over for file ownership ... less demand for a genius lossy improvement.
Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

 

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #6
I don't know information theory well enough to say if it is possible to define a hard minimum on how small a file could possibly be represented.
If you take a typical 128 kbit/s MP3 (yes it has been through a lossy procedure), then whatever is in that file can of course be compressed down to 128 kbit/s - you got the proof in front of you.

Now decode it. There is still nothing about that signal that isn't possible to represent in 128 kbit/s (+ a possibly a tiny size for roundoff errors in the decoding) - but put a lossless compressor at the task and it won't be close.

Maybe someone could make a processor that looks for patterns like these and finds them. But it is unlikely to be any priority. The way to keep such a signal small is to keep the file.
Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #7
I don't know information theory well enough to say if it is possible to define a hard minimum on how small a file could possibly be represented.
If you take a typical 128 kbit/s MP3 (yes it has been through a lossy procedure), then whatever is in that file can of course be compressed down to 128 kbit/s - you got the proof in front of you....
That seems right, still I would claim that it is not very relevant.

I would like to have a all of my CDs backed up in a lossless manner. If I can do that with 4:1 compression rather than 2:1 compression, that would be a 2:1 improvement in disk and bandwidth usage.

Most of those songs are (hopefully) pure PCM, never having been encoded as mp3 in between the musician performaing and the spinning disk in my basement.

Similar with lossless streaming: having lossless streaming from Spotify would be a (minor) assurance. Not sure that it is super important to me personally. But being able to to that at 384kbps rather than 7-800kbps would make it less restricting. For this to make much sense, the music should never have been coded to/from mp3 anywhere upstream.

-k

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #8
That seems right, still I would claim that it is not very relevant.
Practically (as opposed to theoretically), my best hunch is that it is not very relevant, no.
Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #9
Very good comments here, I'd say. Based on my experience as a lossy codec developer, see my previous 2 cents on this topic here:

https://hydrogenaud.io/index.php?topic=118888.msg1006908#msg1006908

Regarding lossless coding:
perhaps I am asking for a lower bound on the total number of bits needed to lossless express a large catalog of music while also keeping the size of the decoder < 1MB.
I think the performance of e.g. Monkey's Audio and OptimFROG at their slowest encoding speeds is a good indicator. Basically, it boils down to: who comes up with the most complicated audio signal predictors (for extracting a residual from the input waveforms) and entropy coding algorithms (for packing that residual similar to what Zip does). We've converged to very similar lossless coding ratios already around the turn of the century, and even if you'd make the lossless encoders try all possible options when encoding a piece of audio (which would make it 100 to 1000 times slower, I guess), the end result is unlikely to change by more than a few percent.

So sorry, I don't see us getting from 2:1 to 4:1 lossless compression.

Chris
If I don't reply to your reply, it means I agree with you.

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #10
It is possible to make a database-based codec. The encoder and decoder would be huge, but the stored songs could be much smaller than current solutions—and possibly lossless. I'm sure it's more complicated than it's worth, or someone would have taken that route by now... But then, the task of breaking audio into large reusable chunks, might be a task well suited for machine learning.
Processed audio in java and python.

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #11
It is possible to make a database-based codec. The encoder and decoder would be huge...
Yes, several Terabyte or even Petabyte. And what would you do if new audio recordings - or even only new mixes or masters - come along? Ask every user to update their decoder every time? And content providers/owners will probably ask you to encrypt the audio data stored in the encoder/decoder binaries.

I can't see how this could work.

Chris
If I don't reply to your reply, it means I agree with you.

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #12
Very good comments here, I'd say. Based on my experience as a lossy codec developer, see my previous 2 cents on this topic here:

https://hydrogenaud.io/index.php?topic=118888.msg1006908#msg1006908

Regarding lossless coding:
perhaps I am asking for a lower bound on the total number of bits needed to lossless express a large catalog of music while also keeping the size of the decoder < 1MB.
I think the performance of e.g. Monkey's Audio and OptimFROG at their slowest encoding speeds is a good indicator. Basically, it boils down to: who comes up with the most complicated audio signal predictors (for extracting a residual from the input waveforms) and entropy coding algorithms (for packing that residual similar to what Zip does). We've converged to very similar lossless coding ratios already around the turn of the century, and even if you'd make the lossless encoders try all possible options when encoding a piece of audio (which would make it 100 to 1000 times slower, I guess), the end result is unlikely to change by more than a few percent.

So sorry, I don't see us getting from 2:1 to 4:1 lossless compression.

Chris
As a mind exercise:
If a song is generated in a computer. Using plugin instruments, plugin effects, and a sequencer to orchestrate the thing. The information needed to recreate that in principle ought to be similar to a MIDI file. I would assume that a Logic file can be stored on my computer and loaded on your computer and generate the exact same bitpattern (i.e. they avoid depending on floating-point quirks of a particular SIMD implementation being compiled with non-ieee754 adherence). I would perhaps have to buy a few softsynths and reverbs. And that file may well be large simply because compression never was the point of the file format.

An encoder that had access to the "programmatic way of generating a given piece of music" should have excellent opportunity to find patterns that are otherwise too hard to find. Such that it is the exact same bass drum sample being trigged 4 times a beat, only it is attenuated by 2 dB on the 2 and 4, and it is added to lots of other changing sound events. And a compressor is ducking the synth pad every time the base drum beats. The reverb embellishing the track is a Lexicon algorithm set to "Hall1" and 0.6 seconds. The Mastering engineer chose to use 3-band compression with a ratio of 4:1 over x dB.

There may usually be "novel and complex" audio stuff added to a track. Such as vocals generated for that particular song. Creative studio technique may be to clip and shift around real vocals based on a small recording, disturbing the natural statistics of speech. The question then becomes if the total piece can be more easily compressed if it is broken up into "atoms" and each of those components are compressed using tools that are suitable for its statistics.

If all of the above is true, then the same ought to be true for the case where we do not have access to the producers sequence file, where an encoder needs to do an gigantic search to recreated the music production process and arrive at the same answer. Not a task that I think we will will ever be able to do, still covering ideas of fundamental compressibility.


Having to be totally lossless makes this exercise hard. I would think that the ideas would be more useful in a lossy codec where you can do some search and find patterns that can be expressed "well enough", without having to encode some noise-residual. Having complete knowledge about what humans are able to distinguish and what they tend to care about, and finding apparent patterns in music sounds like an ML task that might be solvable.

-k

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #13
As a mind exercise:
If a song is generated in a computer. Using plugin instruments, plugin effects, and a sequencer to orchestrate the thing. The information needed to recreate that in principle ought to be similar to a MIDI file. I would assume that a Logic file can be stored on my computer and loaded on your computer and generate the exact same bitpattern
[...]

You can easily fool "most reasonable" (<-- handwaving the true Scotsman I am!) codecs by setting up signals they don't catch.
Creating music by taking N signal pieces and combining them in a digital audio workstation ... well not combining them to a file, but just storing the instructions on how to put them together - yep, no problem coming up with ideas like that.

Say if your song factory records chorus.wav, verseinstrumentation.wav, voxforverse1.wav and voxforverse2.wav for a song that goes chorus-verse-chorus-verse-chorus, you can surely save space. It seems to me that Monkey's Audio at "Insane" looks at blocks of nearly half a minute, so even that wouldn't catch any repetition in the final track.

And simpler: some albums have had an interlude repeating as say track 3 and track 7. They contain the same audio.


So sure, sure, there are cases where you can put up patterns the codecs are unable to catch.

But, when we speak about the compression percentage, it is understood that we average over some representative music corpus. That does not include much of this, does it?
 By creating HUUUUUUGE amounts of new music, then current music would asymptotically disappear from the average and you would get epsilon close to a the new music's compression range. So in year 2525, when humans have created and listened to only minimalist repeating loops for five hundred years ... Now, that is ... uh, not "moving the goalpost", but watching the goalpost slide away.

For current music ... well then.


I think the performance of e.g. Monkey's Audio and OptimFROG at their slowest encoding speeds is a good indicator [...] unlikely to change by more than a few percent.

So sorry, I don't see us getting from 2:1 to 4:1 lossless compression.

My gut feeling agrees. There are those examples that I doubt amount to much on the average (well here I blatantly disregarded the lossies), and there is something more to be fetched by throwing computing power at it - but a lot on a representative corpus? Nah.

Maybe there could be slightly more savings than we expect (but not for practical use): Just for the hell of it, I ran my least compressible CD track (Merzbow ...) through the ultra-slow https://github.com/slmdev/sac encoder first at default setting - and then, overnight, at --high --optimize high. I forgot the --sparse-pcm switch, but anyway: it took 11 hours 49 minutes to compress five and a half minutes.

* Monkey's Audio needs more bits than uncompressed PCM. TAK makes it below 1411 on a setting.
* 58176764: the .wav file
* 56535466 for the smallest .flac I could get
* 53920654 for the smallest .wv I could get
* 53222937 for the smallest frog
* 50613838 for sac at default
* 49046602 for the twelve-hour sac compression. But hey, it decodes in less than ten minutes. (Realtime playback? Please, no irrelevant questions! O:)  )

In the words of the sac dev: "We throw a lot of muscles at the problem and archive only little gains - by practically predicting noise."
I am mildly surprised at how much smaller it could get, though; the developer's testing finds only one sample where it outfrogs the frog by five percent (and averages at not a percent reduction) - and this is nearly 8 on music I'd expect to be inherently less compressible.
Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #14
With quantum computers slowly coming into the mix, there is no telling how that will play out.
"Life's the same, I'm moving in stereo
Life's the same except for my shoes"


Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #16
Ah yes, the so-called "Black MIDI" thing. Where the entire note sheet is "black" with note density. That phenomenon led BASSMIDI to be optimized heavily. I need to revisit that again some day.

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #17
Speaking of MIDI and machine learning, it seems that I have an infected MIDI file:

https://www.virustotal.com/gui/file/25ffe65bb8b2da481736901d1ab9030078a0a19159c09c3716593b8c1c9d3552
https://nvd.nist.gov/vuln/detail/CVE-2012-0003

If such a AI-MIDI like system is infected, perhaps it will automatically compose some harpsichord music with Gamelan tuning, and some Michael Jackson vocal on top of it and deliver to the customers, not a bad idea I suppose.

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #18
aptX lossless is a funny one. Bluetooth codecs need to optimize for constant bitrate since they need to work with specific upper limit of bits per unit of time, and if there's even a short part of white noise, that's not compressible. So might as well use plain PCM. Or there must be some "cheating" and it's technically not always lossless.
some ANC'd headphones + AutoEq-based impulse + Meier Crossfeed (30%)

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #19
They claim that users can select CDDA lossless or higher resolution lossy. Interesting then if you have a higher resolution, would you then want it to be naively decimated to 44.1/16 or do you want a more clever lossy scheme?

(The answer is clear if you want multi-channel audio transferred - but I don't know if that is even going to be part of it. Part of me is going "why not ...?")
Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #20
So sure, sure, there are cases where you can put up patterns the codecs are unable to catch.

But, when we speak about the compression percentage, it is understood that we average over some representative music corpus. That does not include much of this, does it?
I am not updated on how popular music is recorded the last decade, but my understanding is that:
1) It is not as simple as copying "the verse" sample-by-sample several times
2) There are still significant "patterns" in the generation. From using a sample snare drum sound to using an off-the-shelf softsynth with preset #42, to using loops from "loop collection 9" to ...
3) Mixing and mastering practices means that these patterns are "obfuscated".
 
This ought to introduce complex patterns in "mainly computer produced music" (is that not 99% of all music listening today?)  that makes them in principle significantly more predictable, but in practice probably not so.

-k

Re: Future of Audio Codecs and Acoustic Transparency Threshold

Reply #21
Maybe there could be slightly more savings than we expect (but not for practical use): Just for the hell of it, I ran my least compressible CD track (Merzbow ...) through the ultra-slow https://github.com/slmdev/sac encoder first at default setting - and then, overnight, at --high --optimize high. I forgot the --sparse-pcm switch, but anyway: it took 11 hours 49 minutes to compress five and a half minutes.

[...]

I am mildly surprised at how much smaller it could get, though; the developer's testing finds only one sample where it outfrogs the frog by five percent (and averages at not a percent reduction) - and this is nearly 8 on music I'd expect to be inherently less compressible.

What.
I am more than mildly surprised now I left a computer on for a week, and OptimFROG got beaten by 27 percent. Look at this:
* Monkey's Audio needs more bits than uncompressed PCM. TAK makes it below 1411 on a setting.
* 58176764: the .wav file
* 56535466 for the smallest .flac I could get
* 53920654 for the smallest .wv I could get
* 53222937 for the smallest frog
* 50613838 for sac at default
* 49046602 for the twelve-hour sac compression. But hey, it decodes in less than ten minutes. (Realtime playback? Please, no irrelevant questions! O:)  )
* 46 224 423: sac --optimize=fast , takes "only" an hour and a half. ("Normal" mode beats fast at compression.)
* 38 767 975: sac --optimize=high --sparse-pcm

That is < 73 percent of smallest OptimFROG size. Proving that it is possible ... although it took all day and night. Yes, on a single track.
So I split it in six-second segments to see if it had just discovered repeating patterns or so. Sure it had, but still:
* 44 630 528 in total for 55 sac'd segments of the same file. Took the day rather than a day and a night.
But still ...
Last two months' worth of foobar2000.org ad revenue has been donated to support war refugees from Ukraine: https://www.foobar2000.org/