Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: New lossless audio codec in development (Read 14550 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

New lossless audio codec in development

I got idea to make new audio lossless codec.

Main idea is to allow non-intra frames, like done in mlp/truehd but better and with bigger frame sizes.

Its currently in R&D phase only.

What do you think? Can using non-intra frames make compression ratio really better?
Usually lossless audio codecs use just LPC for prediction. I think this is not always optimal solution for compression.

Re: New lossless audio codec in development

Reply #1
I'm actually not interested in lossless audio codecs as I think that lossy codecs can achieve total transparency but I agree that lossless audio codecs have some good use cases (preventing generation loss in some cases and scientific scerarios), so that's good. For your question, I don't have enough knowledge to answer it. But I have three questions:

1: Will it support low bit depths like 8bps?
2: Which channel combinations will it support?
3: Will it use frequency domain? Could it provide better compression ratios even for lossless?

Re: New lossless audio codec in development

Reply #2
Well, yes, I think this is a great idea, and tremendous if you could pull it off.

There have been several attempts at exploiting the as of yet mostly untapped potential of FLAC files with a variable blocksize. It turns out nobody has come up with a good and fast algorithm to determine how to split the audio in blocks in such a way that it improves compression. I would think an algorithm to determine where to use inter frames would be even more challenging, and potentially inspiring in solving the problem I just mentioned.
Music: sounds arranged such that they construct feelings.

Re: New lossless audio codec in development

Reply #3
I have a suspicion that some of those newer machine learning techniques could be used to make initial guesses about where are the sections that are similar enough to be useful for referencing to reduce redundancy.
Even then, there's a problem that pieces can sound very similarly but with rather different waveforms, so even after having those references I think it won't be an easy task to really benefit from them on many types of recordings where the repetition is not due to simple copy paste.
a fan of AutoEq + Meier Crossfeed

Re: New lossless audio codec in development

Reply #4
There have been several attempts at exploiting the as of yet mostly untapped potential of FLAC files with a variable blocksize. It turns out nobody has come up with a good and fast algorithm to determine how to split the audio in blocks in such a way that it improves compression.
If the length of the processing time will not be a problem, this process can be done even if it is not perfect. However, it will definitely not be suitable for practical use. And as side information, the size of each block will also need to be kept. This will also take away some of the return.

Re: New lossless audio codec in development

Reply #5
I'm actually not interested in lossless audio codecs as I think that lossy codecs can achieve total transparency but I agree that lossless audio codecs have some good use cases (preventing generation loss in some cases and scientific scerarios), so that's good. For your question, I don't have enough knowledge to answer it. But I have three questions:

1: Will it support low bit depths like 8bps?
2: Which channel combinations will it support?
3: Will it use frequency domain? Could it provide better compression ratios even for lossless?

8bit should be trivially supported, and that one could benefit the most because it only have 256 different states to use at once.

For about channel combinations, I currently try to make mono encoding compress well, once and if it reach state that it outperforms FLAC/TAK/Wavpack then I will try to make >1 possible, but for anything >2 it would be challenging to reach fast and really good compression at same time.

Currently working purely in time domain and using frequency domain for picking split points.

First idea is to compress all  kinds of fixed sin/cos waves really good at zero extra cost.
Next step is to do similar with more complex sounds.

It would be nice to use only frequency domain, but once you get magnitude and phase you are more/less stuck.
Magnitude changes are less demanding to compress, while phase looks like pure noise.
Unwrapping phase could help, but that also have other problems.

Currently concentrating on just residue coding for simple sine waves, later will add and experiment with LPC prediction for less trivial waves. The most problematic part is compressing pure noise, my idea is just split where noise is and encode it own subframe.

Re: New lossless audio codec in development

Reply #6
I guess your purpose is something clever to improve compression without taking aeons. And the following might be worth something yes:

and using frequency domain for picking split points.

Yeah, as @ktf points out, it is not an easy task to do in variable-block FLAC without taking ages, and so it may call for some fast signal analysis trick to differentiate the strategy by input properties.
You probably know the block-length switching scheme of MPEG-4 ALS, see 3.5 of http://elvera.nue.tu-berlin.de/files/1216Liebchen2009.pdf - or at least IIRC something similar in TAK. The ALS strategy is confined to "successive halvings", not unlike what FLAC does with partitioning with different Rice exponents - which sometimes makes it outcompress the competition even if FLAC actually has a "design flaw" there, ruling out the combination of any significant prediction length with extremely fine partition.

Also you could try to allow for more residual encoding methods and selection between them.  Or other methods for LPC analysis, like Burg's algorithm, or to specify coefficients on the nth order difference rather than the nth past (why try that? FLAC can make small gains from setting precision, i.e. how many bits each need - if you want to switch it more often, then what? Storing the predictor will matter more?)

There are of course other possible uses than plainly compressing mono or stereo:
 * One for the possible future, is to facilitate native handling of object-based audio.
 * One that should have been around twenty years ago, is a format that stores CD rips with subchannel/correction data that a ripping app could read upon re-reading on a different drive - and could store several concurrent rips, which are for most samples bit-identical (up to offset?). Actually for CD use, a good frame size would be say 2^N * 588, though it needs a rule to subdivide below 147.

First idea is to compress all  kinds of fixed sin/cos waves really good at zero extra cost.
That is something that doesn't necessarily follow from being good at "real-world audio" (a sine could need to tweak one parameter to arbitrary resolution) - but if you start at that from a bottom-up perspective and get it to work ...
I did a test on upsamples, which could give an idea of how strange "artificially smooth" signals act: https://hydrogenaud.io/index.php/topic,125607.0.html , I mean watch the difference between codecs. 
Who knows what fraction of hi-rez audio has just noise, upsampling artefacts, and maybe actual overtones ... Again, if you get something out of signal analysis in the frequency fomain, you might improve compression of such spacewasters in the online store retail. (Just don't expect them to come running to pay you for exposing their signals as tons of empty bits ...)

Re: New lossless audio codec in development

Reply #7
Please remove my account from this forum.

I thought you want to leave? Or are you trolling? I really don't care if you are, just curious.


Re: New lossless audio codec in development

Reply #9
I guess your purpose is something clever to improve compression without taking aeons. And the following might be worth something yes:

and using frequency domain for picking split points.

Yeah, as @ktf points out, it is not an easy task to do in variable-block FLAC without taking ages, and so it may call for some fast signal analysis trick to differentiate the strategy by input properties.
You probably know the block-length switching scheme of MPEG-4 ALS, see 3.5 of http://elvera.nue.tu-berlin.de/files/1216Liebchen2009.pdf - or at least IIRC something similar in TAK. The ALS strategy is confined to "successive halvings", not unlike what FLAC does with partitioning with different Rice exponents - which sometimes makes it outcompress the competition even if FLAC actually has a "design flaw" there, ruling out the combination of any significant prediction length with extremely fine partition.

Also you could try to allow for more residual encoding methods and selection between them.  Or other methods for LPC analysis, like Burg's algorithm, or to specify coefficients on the nth order difference rather than the nth past (why try that? FLAC can make small gains from setting precision, i.e. how many bits each need - if you want to switch it more often, then what? Storing the predictor will matter more?)

Maybe you are right about rudimentary sin/cos waves and probably this initial approach would work only for that simple cases and with not dramatic noise corruption.
This splitting of audio into similar chunks is not that trivial, at least I'm not aware of any robust and fast algorithm.
Currently I use (normalized)cross-correlation and auto-correlation computed via RDFT and that is too crude approach.

Will also explore integer MDCT maybe it can provide something more useful.

Re: New lossless audio codec in development

Reply #10
... allow non-intra frames, like done in mlp/truehd but better and with bigger frame sizes.

Its currently in R&D phase only.

What do you think? Can using non-intra frames make compression ratio really better?
Usually lossless audio codecs use just LPC for prediction. I think this is not always optimal solution for compression.
I don't know much about MLP, but the non-Intra frame concept you're proposing sounds, to me, like a long-term prediction (LTP) approach, i.e., predicting samples in a given block from samples in a previous block. MPEG-4 ALS supports this, and it does seem to work well on some musical audio. From http://elvera.nue.tu-berlin.de/files/1216Liebchen2009.pdf:

Chris
If I don't reply to your reply, it means I agree with you.

Re: New lossless audio codec in development

Reply #11
I have little experience with MPEG-4-ALS and its very over-engineered codec IMHO.

My idea is like following (maybe its exact as LTP or not):

Take for example sine wave, find pitch, and split at correct zero crossing, if sine period is not fractional but integer you will get 0 difference with previous sine period. Now you can compress sine wave or any wave (if you find correct period) that is just repeated over and over again with almost 0 extra cost.
If there is no exact match just pick one period with max correlation and store difference via LPC+entropy+residue.
Note that both number of samples and lag/offset are variable here, because using fixed size frames and then doing lags is pointless IMHO.
So each frame would be of variable length - number of samples when encoding single channel.

For >1 channels the INTRA+INTER frames come to mind within one big super-frame because L/R/.. channels may not be very correlated most of time, so each lags and sizes are different in each of channel samples.

This idea of picking variable length periods works very well (at least) with the ascale (tempo adjuster filter) but it have some limitations with extremely low frequency content (and it does not handle background noise periods as I would like) and multi-channels filtering have sync issues one does not change periods to match across all channels.

I think this splitting audio into periods of equal correlations is similar to YIN algorithm?

Re: New lossless audio codec in development

Reply #12
I tested my usual 38 CDs with the "-p" long-term prediction switch in MPEG4-ALS, comparing -l (which searches for wasted bits and otherwise is the default, prediction order 10) with -l -p. Grand total it averaged 0.6 percent. That's percent of -l compressed size, not points relative to WAVE size.

As one could expect, there is more to save in classical music: 1.2 percent (ranging 0.6 to 2.3, the latter being flute)
The rest averaged 0.3 percent, ranging from 0.07 percent (Psycroptic and Sodom, that's thrash and tech. death metal) to two loners up at .9 and .8 (Sopor Aeternus and Springsteen, that's darkwave and singer/songwriter).

How much that is ... well. Up to opinion.

I also find ALS to be a bit "over-engineered", but back then they didn't know which over-engineering ideas would work out good. For example, the format allows for order-1023 prediction, and that's ... a lot.
They have an alternative entropy encoding method too, and a selection of different such ones might be worth checking out.

Re: New lossless audio codec in development

Reply #13
Since I am a layman to this topic, my perception to this topic is a different modality. I am approaching the incoming audio data stream as a delta sigma approach for each audio channel.  Create a sufficient static buffer for lets say 4 seconds of audio at whatever sample rate you decide then perform a run-length encoding algorithm perhaps Group 4 Compression (ITU-T T.6). Then that output to another algorithm that dynamically changes to patterns within the frequency domain (FFT).  Common tonal or frequencies reach optimal compression. Further break frequencies into most common frequency bands for their own data compression groups.  I am thinking this type of approach is for archival purposes and not for real time streaming. I hope that makes sense.


Re: New lossless audio codec in development

Reply #15
Anyone is free to work on anything that he/she likes.

It's just my opinion that compression gains for new lossless format will be very limited.  +10%  vs FLAC or less.
Lossless part of MPEG D Audio format (xHE-AAC) was improved only by 3-6% compared to older formats.

New lossy format is a different thing. Many patents expired or will be expiring as in case of HE-AAC. SBR patents will expire in 2025 and Parametric Stereo patents in 2026.  It's possible to model these parametric tools to scale very well with high bitrates.  Also there is AI.
+20%-30% compression gain for codec like Opus is achievable while keeping complexity acceptable, and even larger gains for multichannel audio.

Re: New lossless audio codec in development

Reply #16
Anyone is free to work on anything that he/she likes.

It's just my opinion that compression gains for new lossless format will be very limited.  +10%  vs FLAC or less.
Lossless part of MPEG D Audio format (xHE-AAC) was improved only by 3-6% compared to older formats.

New lossy format is a different thing. Many patents expired or will be expiring as in case of HE-AAC. SBR patents will expire in 2025 and Parametric Stereo patents in 2026.  It's possible to model these parametric tools to scale very well with high bitrates.  Also there is AI.
+20%-30% compression gain for codec like Opus is achievable while keeping complexity acceptable, and even larger gains for multichannel audio.

Yes, the compression ratio in lossless audio compression is really limited. FLAC (default mode) can be improved by 5% on average. But we know that even this is not suitable for practical use. That's why the speed of running becomes more important.

If my glasses don't deceive me, my own lossless codec is currently able to offer the highest level of compression of FLAC by halving the processing speed. But even with this small compression gain, I can't accept halving the speed, because I don't think it's worth it.

On the other hand, with lossy codecs we always have more options.  Just like with audio data, it's the same with image data. If the end user is not bothered and does not sense anything, the tricks can continue.

And AI (neural networks and deep neural networks) may seem at first glance suitable for audio compression. Compared to traditional methods, AI can do slightly better compression on specially selected and trained data by expending enormous energy and time. But it is currently not suitable for practical use in the real world. We can find many academic papers on this topic. And interestingly, the majority of them only talk about the compressed result. Of course, they don't add the size of this particular codec to the compressed result. They don't talk much about the processing time, nor about the size of the model, nor about the size of the decoder.

Re: New lossless audio codec in development

Reply #17
Yes, the compression ratio in lossless audio compression is really limited. FLAC (default mode) can be improved by 5% on average. But we know that even this is not suitable for practical use. That's why the speed of running becomes more important.

If my glasses don't deceive me, my own lossless codec is currently able to offer the highest level of compression of FLAC by halving the processing speed. But even with this small compression gain, I can't accept halving the speed, because I don't think it's worth it.

On the other hand, with lossy codecs we always have more options.  Just like with audio data, it's the same with image data. If the end user is not bothered and does not sense anything, the tricks can continue.

And AI (neural networks and deep neural networks) may seem at first glance suitable for audio compression. Compared to traditional methods, AI can do slightly better compression on specially selected and trained data by expending enormous energy and time. But it is currently not suitable for practical use in the real world. We can find many academic papers on this topic. And interestingly, the majority of them only talk about the compressed result. Of course, they don't add the size of this particular codec to the compressed result. They don't talk much about the processing time, nor about the size of the model, nor about the size of the decoder.

I agree with you, and I hate artificial intelligence in almost everything including codecs. I think it's completely soulless.

Re: New lossless audio codec in development

Reply #18
Anyone is free to work on anything that he/she likes.

I am gonna risk being banned, but I call BS. Countless times over the years people pestered me to work on very specific things, in very specific ways. And if I deviate in any form, I am Worse Than Hitler. This is why I grown to hate things.

So no. At some point tho you have to learn to not care about what the public wants or even thinks of you, since really some of them are acting in utter bad faith, constantly, for your own sanity.

Re: New lossless audio codec in development

Reply #19
Anyone is free to work on anything that he/she likes.

I am gonna risk being banned, but I call BS. Countless times over the years people pestered me to work on very specific things, in very specific ways. And if I deviate in any form, I am Worse Than Hitler.

Hey. You are free to work on anything you like in addition!  ;)

Re: New lossless audio codec in development

Reply #20
Anyone is free to work on anything that he/she likes.

I am gonna risk being banned, but I call BS. Countless times over the years people pestered me to work on very specific things, in very specific ways. And if I deviate in any form, I am Worse Than Hitler.

Hey. You are free to work on anything you like in addition!  ;)

Which is pretty much exactly how it is. I get assigned to work on garbage, yet I have the *privilege* to work on things I also like.

Fancy that.

Re: New lossless audio codec in development

Reply #21
My idea is like following (maybe its exact as LTP or not):
... if sine period is not fractional but integer you will get 0 difference with previous sine period. ...
If there is no exact match just pick one period with max correlation and store difference via LPC+entropy+residue.
Note that both number of samples and lag/offset are variable here, because using fixed size frames and then doing lags is pointless IMHO.
That sounds exactly like the LTP approach, except for the variable frame size. But why should fixed frame sizes be pointless in that case? You don't have to start or end at a zero-crossing, do you? Just try to find the "best" lag for the waveform segment in the given frame.
Quote
I think this splitting audio into periods of equal correlations is similar to YIN algorithm?
Yes, I think so. YIN is a lag-search algorithm.

Quote from: Hakan Abbas
On the other hand, with lossy codecs we always have more options.  Just like with audio data, it's the same with image data. If the end user is not bothered and does not sense anything, the tricks can continue.
Well, as someone involved in that craft for 20 years, I can tell  you that many things have been tried and that it's also really hard to make progress over state-of-the-art lossy codec solutions like MPEG-H Audio, at least at medium-low to high bit-rates. At very low rates, AI (more precisely, machine learning) does show considerable benefit, but as mentioned, requires much more computattional resources.

+20%-30% compression gain for codec like Opus is achievable while keeping complexity acceptable, and even larger gains for multichannel audio.
Given my above comments, I have to say I doubt that.

Chris
If I don't reply to your reply, it means I agree with you.

Re: New lossless audio codec in development

Reply #22
+20%-30% compression gain for codec like Opus is achievable while keeping complexity acceptable, and even larger gains for multichannel audio.
Given my above comments, I have to say I doubt that.
Opus is low delay format, moving to high delay can save +10% (and that's being conservative,  AAC-LD  lossess much more than 10% to non-LD AAC family).
Another ++10% is reasonably achievable as processing budget of modern SoC has been significantly increased since MPEG-D/H and Opus  were released.