Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: New lossless audio codec in development (Read 17679 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Re: New lossless audio codec in development

Reply #25
I was referring to this little gem: https://github.com/maka89/LeastSquaresFIR
Doing solution for y = x * z, where x and y are known.
It is not levinson recursion.
It is multiple algorithms with similar output; the latest one as used in example script is more powerful.
Its iterative algorithm, so not that fast like "trivial" levinson in FLAC.
I did just limited testing in my spare time and it is pretty good.
Maybe its my bias as I did not do more testing...

Re: New lossless audio codec in development

Reply #26
I was referring to this little gem: https://github.com/maka89/LeastSquaresFIR
Doing solution for y = x * z, where x and y are known.
It is not levinson recursion.
It is multiple algorithms with similar output; the latest one as used in example script is more powerful.
Its iterative algorithm, so not that fast like "trivial" levinson in FLAC.
I did just limited testing in my spare time and it is pretty good.
Maybe its my bias as I did not do more testing...

As i already wrote - nearly all audio codecs model audio linear as y = x*z
It all boils down how you calculate the weights z
least squares using cholesky decomposition
least squares using levinson durbin
recursive least squares
iteratively reweighted least squares
(normalized) least mean squares
etc...

he is using conjugate gradient method using FFTs


Re: New lossless audio codec in development

Reply #27
But with this one, impulse result generated is so good that its residues are very very small.
I'm not arguing about linear audio modeling. I think this approach is superior to any one produced before.
My dream is to code very very efficient float hybrid (lossless+lossy) audio codec.

Re: New lossless audio codec in development

Reply #28
Which of the ones mentioned? If you use the Toeplitz solver, I guess it must then make the same kind of approximations that the Levinson-Durbin method uses? And if then it improves, it "must" be about numerical stability (or maybe not "stability" in the strict sense, but in roundoffs)?
There are other methods than least squares available in statistics packages, of course, but if different approaches to least squares actually make a sizeable impact, it is maybe interesting in its own right.
(And then: tried Burg's method? Used in Opus.)

Also a minor point could be a constant. Say for Rice coded residuals (like FLAC) then a residual of N is occasionally worse than -N if N happens to be k*(2^r) for the Rice exponent used. Likely happens only every now and then, and likely no big benefit in tweaking it, but you may want to keep residuals not close to "0", but to a small negative value. -1/4?

Re: New lossless audio codec in development

Reply #29
And if then it improves, it "must" be about numerical stability (or maybe not "stability" in the strict sense, but in roundoffs)?

not necessarily - its usually about proper "regularization"
e.g. for ordinary least squares (like used in Sac) you first make an estimate of the auto-correlation matrix and then inverse this matrix.
The estimation error can be compensated by different techniques e.g. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=433840

On the other hand you don't really want to do "least squares" (L2) but "least absolute deviation" (L1).
The latter usually can not be calculated in closed form so you (again) rely on regularization or other methods like "Iteratively reweighted least squares"

Re: New lossless audio codec in development

Reply #30
And if then it improves, it "must" be about numerical stability (or maybe not "stability" in the strict sense, but in roundoffs)?

not necessarily - its usually about proper "regularization"

... of the matrix to invert, then? But - as long as you stick to Toeplitzes - that is before the choice of solver? Or are the matrices so often so close to singular that method choice has impact far beyond what you would call roundoffs?

Edit: Though, if that repo offers a straightforward method that makes better than the other (straightforward) methods for this particular application, then that is interesting in itself.


On the other hand you don't really want to do "least squares" (L2) but "least absolute deviation" (L1).

ffmpeg's flac encoder can do an IRLS procedure. A similar attempt was made with the reference implementation a few years ago, trying to approach an L1 estimate: https://hydrogenaud.io/index.php/topic,120158.msg1003256.html#msg1003256 .
Impact was surprisingly small, at least for the computational cost.
(And, testing ffmpeg's encoder I found that running more than say, six or seven reweighting passes, then often the files would start to grow again. If I remember correct, it would do the N iterations and then pick the final result - rather than the best of them.)

Re: New lossless audio codec in development

Reply #31
... of the matrix to invert, then? But - as long as you stick to Toeplitzes - that is before the choice of solver? Or are the matrices so often so close to singular that method choice has impact far beyond what you would call roundoffs?

Key focus of regularization is to prevent overfitting and improve generalizability by introducing a small bias.
In the case of ordinary least squares a common technique is Ridge regression.
https://en.wikipedia.org/wiki/Ridge_regression

In code it is just adding a constant on the diagonal, check Sac
https://github.com/slmdev/sac/blob/72de44ba2eb7fe91409a04c8b9835e60418bdd4c/src/common/utils.h#L190

Effects vary on the signal but are not to be underestimated.
Determining the optimal regularization param is difficult.
If its too large you have a well-posed problem but the solution may be too far away from the noise-free solution.
If its too small you are near the noise-contaminated solution, which may be also too far away from the noise-free solution.

Quote
ffmpeg's flac encoder can do an IRLS procedure. A similar attempt was made with the reference implementation a few years ago, trying to approach an L1 estimate: https://hydrogenaud.io/index.php/topic,120158.msg1003256.html#msg1003256 .
Impact was surprisingly small, at least for the computational cost.
(And, testing ffmpeg's encoder I found that running more than say, six or seven reweighting passes, then often the files would start to grow again. If I remember correct, it would do the N iterations and then pick the final result - rather than the best of them.)

Thanks for the link. I can not tell for sure, why it is not improving the results more but imo
Flacs architecture is too limited to really benefit from more accurate coefficients. 
IRLS is better used with very high orders (~2^10) because you try to determine a sparse system.
Its unlikely you find a lot of "sparseness" in the last 32 samples - or whatever the maximum for Flac is.

Also you really have to work on your heuristics - and test them,
because in the end you don't want to reduce a p-norm but the result of residual encoding.

I tested a lot of different cost-functions for "how much benefit in compression does this parameter change make"
and came to the conclusion that L1 or L2/RMS (which are most often used) are not a good approximation for the entropy encoding stage in lossless audio compression.

you can try it yourself using Sac
It supports L1, RMS, o0-entropy, golomb and bitplane cost function

All profiles except --best use "entropy"
https://github.com/slmdev/sac/blob/72de44ba2eb7fe91409a04c8b9835e60418bdd4c/src/libsac/cost.h#L69

which is surprising, because o0-entropy does not care about the size of coefficients, only their distribution.

Re: New lossless audio codec in development

Reply #32
Oh, a very detailed technical discussion, nice!
... o0-entropy does not care about the size of coefficients, only their distribution.
Can you clarify what you mean by "distribution" in this case? Something related to the variance of the (quantized) coefficient values?

Chris
If I don't reply to your reply, it means I agree with you.

Re: New lossless audio codec in development

Reply #33
Can you clarify what you mean by "distribution" in this case? Something related to the variance of the (quantized) coefficient values?

Sorry that was a typo - i was not talking about coefficients but the errors=residuals.
When you make an approximation of the coding cost you usually do
L1: sum of absolute values (assume laplace errors)
L2: sum of squared values (assume gaussian errors)

which makes sense, as smaller residuals tend to result in smaller encodings
My observation was, that (at least for Sac) there are better approximations for the encoding cost.

Entropy (in information theory) measures the uncertainty of the errors.
Thus, if the errors are e.g. constant, it does not matter if its large or small - the information content is zero.
You can check https://en.wikipedia.org/wiki/Entropy_(information_theory)

and see the formula in Sac using an order-0 model
https://github.com/slmdev/sac/blob/72de44ba2eb7fe91409a04c8b9835e60418bdd4c/src/libsac/cost.h#L90


Re: New lossless audio codec in development

Reply #34
The last one is trivial entropy calculation from histogram, right?

Re: New lossless audio codec in development

Reply #35
yes order-0 markov implies "counting the number of occurrences"

the cost function has its limits - using a golomb coder as approximation is better but a lot slower
you can try it out yourself using e.g. --optimize=0.2,250,c where c is from [L1,RMS,ENT,GLB,BPN]

I advice using --verbose to see the actual value of the objective function after optimization,
because a lower objective function does not automatically result in a smaller file

 

Re: New lossless audio codec in development

Reply #36
Sorry that was a typo - i was not talking about coefficients but the errors=residuals.
...
if the errors are e.g. constant, it does not matter if its large or small - the information content is zero.
Ah, the prediction residual coefficients, thanks for clarifying. Makes sense, and the histogram thing for "more trivial" residuals is a good idea. Thanks!

Chris
If I don't reply to your reply, it means I agree with you.

Re: New lossless audio codec in development

Reply #37
ffmpeg's flac encoder can do an IRLS procedure. A similar attempt was made with the reference implementation a few years ago, trying to approach an L1 estimate: https://hydrogenaud.io/index.php/topic,120158.msg1003256.html#msg1003256 .
Impact was surprisingly small, at least for the computational cost.
(And, testing ffmpeg's encoder I found that running more than say, six or seven reweighting passes, then often the files would start to grow again. If I remember correct, it would do the N iterations and then pick the final result - rather than the best of them.)

Thanks for the link. I can not tell for sure, why it is not improving the results more but imo
Flacs architecture is too limited to really benefit from more accurate coefficients. 
IRLS is better used with very high orders (~2^10) because you try to determine a sparse system.
Its unlikely you find a lot of "sparseness" in the last 32 samples - or whatever the maximum for Flac is.
32 is max for FLAC the format, but "all the development" has focused on up to 12, because that ensures it is within the streamable subset. Making the reference encoder select well between orders all the way up to 32? That territory is only lightly explored.

One observation is that reweighting and iterating appears expensive compared to trying different shots at it. One is how reference FLAC tries different weightings (first part of block, and last part of block - even more for -8) and thus often can "handle" a difficult part of the signal by simply trying with and without it. Another is that it partitions the block ("subframe", i.e. a one channel - and 4096 samples typically) in 2^r partitions each with its own Golomb-Rice exponent - that sometimes does wonders.

Maybe FLAC is quite close to how well you can actually model the signal by order 12 (and no dynamic update). TAK does 32 to 160 IIRC. ALS can go up to the 2^10 you mention.


in the end you don't want to reduce a p-norm but the result of residual encoding.
Plus the size of the predictor. IIRC reference FLAC already does a heuristic that sometimes chooses lower prediction order to save the coefficient bits. Also FLAC can save ~ a per-mille point by trading off the resolution of the predictor vector. How much is that? Hardly enough to care for storage cost. Enough to care for the sport.
(For 16 bit signals, reference FLAC will also limit the resolution to facilitate decoding with 32-bit words. I don't think ffmpeg's implementation cares about that.)


But given the order and the residual coding parameter(s) (the Golomb-Rice exponent for FLAC), then it could in principle make sense to in the end make a slight reweighting with high weight on those that are close to gaining/losing a bit, and low weight on those whose residuals could be let slightly bigger without spending any bits. Not at all a convex optimization program, so I guess the weights have to be set just heuristically. And how much would it make for? Maybe surprisingly little if you have flexibility on the residual coding method?

Re: New lossless audio codec in development

Reply #38
I'm not arguing about linear audio modeling. I think this approach is superior to any one produced before.
My dream is to code very very efficient float hybrid (lossless+lossy) audio codec.
https://github.com/Hakan-Abbas/Dynamic-Linear-Prediction
You can try DLP. Maybe you can see what I can't. This is a new linear prediction method that I worked on a long time ago. It works much more efficiently, especially for small blocks. For larger blocks (over 4,000 samples) the efficiency may decrease.

Apart from this test tool, I haven't had time to measure its performance in a real application yet. The main reason for this is that the processing speed will decrease a bit. The current best parameters need to be updated again at regular intervals. And how long that interval is needs to be determined accurately. Because while it is efficient to go with the best for each small block, there will be a significant penalty in terms of processing speed. If I can integrate it into a real application, I plan to share how it works. I need to be completely sure of the results.

Here's how it works briefly. First a csv file(no, value) or a 16 bit, 2 channel wav file is loaded. If the data is more than 100,000 samples, no graph is drawn. To see the graph, the slider at the bottom should be narrowed by pulling the slider from the right and left to the desired range. For DLP training, samples in the range of 100-1000 are sufficient. The smallest error sum for the relevant block is manually determined by changing the parameters. Then, we can see that these parameters work equally well for many previous and next blocks(according to my observations). So with a quick experiment/training on a small dataset, we can create a very good predictor for a larger dataset. DLP gives better results than Levinson-Durbin in most cases. We can see this by testing, except for very large blocks and very complex data. In addition, we need to know only the 2-3 parameter obtained for DLP.

Various tests can be performed with fixed estimators, Levinson-Durbin or DLP. The parametric data on absolute error and squared error sums can be seen at the bottom.

Re: New lossless audio codec in development

Reply #39
What does it do, really? Sure fixed predictors, then a Levinson-Durbin on who knows how you have (or have not) windowed the data - and then: Does the machine learning start to learn from there on, or from scratch?

And are those coefficients at the end really ... did a clever third-order really improve that much over a different least squares algorithm? It shouldn't ... should it? 

Re: New lossless audio codec in development

Reply #40
What does it do, really? Sure fixed predictors, then a Levinson-Durbin on who knows how you have (or have not) windowed the data - and then: Does the machine learning start to learn from there on, or from scratch?

And are those coefficients at the end really ... did a clever third-order really improve that much over a different least squares algorithm? It shouldn't ... should it? 
DLP is completely different from Levinson-Durbin. However, it can be thought of as a dynamic version of fixed estimators. The learning mechanism in DLP is actually trying to find the best case for a block selected from zero (e.g. 512 samples), i.e. the case with the least error. Here a decision is made by looking at only 2 or 3 samples. And the mistakes made are tried to be improved. This is not very interesting.

But what is interesting is that once the appropriate parameters are set, the best result can be obtained with the same parameters in the previous or subsequent blocks. Even if this depends on the shape of the data, in my tests it can sometimes be valid for hundreds of blocks before or after. you can see this by trying it immediately. Maybe we will also see the negative aspects of the method.

Re: New lossless audio codec in development

Reply #41
Next step in lossless audio compression should involve
1. merging prediction + residual encoding -> model p(x_t | x_{t-1}, x_{t-2}, ...} directly
2. use lstm/gru for prediction, e.g. wavenet: https://arxiv.org/abs/1609.03499

Issue with 2. is that for a frame the NN does not converge fast enough using (stochastic) gradient descent.
Possible solutions:
- use a hybrid approach, where you load a (large) base model from disk and update during prediction
- train a (small) NN on every frame and save it -> have to find a compromise between model-size, accuracy and final file size

Having a large base model opens a discussion how to compare such variants to classical (statistical) codecs
See https://en.wikipedia.org/wiki/Kolmogorov_complexity




Re: New lossless audio codec in development

Reply #42
DLP is completely different from Levinson-Durbin. However, it can be thought of as a dynamic version of fixed estimators. The learning mechanism in DLP is actually trying to find the best case for a block selected from zero (e.g. 512 samples), i.e. the case with the least error. Here a decision is made by looking at only 2 or 3 samples. And the mistakes made are tried to be improved. This is not very interesting.

But what is interesting is that once the appropriate parameters are set, the best result can be obtained with the same parameters in the previous or subsequent blocks. Even if this depends on the shape of the data, in my tests it can sometimes be valid for hundreds of blocks before or after. you can see this by trying it immediately. Maybe we will also see the negative aspects of the method.
In my experiments I used blocks of 500 samples. In DLP, if the best parameters are determined for each block, the error averages can be much lower. However, even with the same parameters, it seems that many consecutive blocks can operate with similar efficiency. If there are no inaccuracies, I have also found that fixed estimators give very good results for the music I have selected according to Levinson. And they are inexpensive. But Levinson gives better results than fixed estimators and close to DLP on larger blocks.

https://www.rarewares.org/test_samples
ATrain.wav
DLP 1.9 / 0.9 and Levinsion Degree 10

Quote
Range 120,000 - 120,500
P-1: (Absolute Error Avarage) |E|= 443.106   (Root Mean Square Error) RMSE= 562.444
P-2: (Absolute Error Avarage) |E|= 207.323   (Root Mean Square Error) RMSE= 278.034   
P-3: (Absolute Error Avarage) |E|= 233.894   (Root Mean Square Error) RMSE= 303.262   
P-4: (Absolute Error Avarage) |E|= 337.748   (Root Mean Square Error) RMSE= 428.854   
L-D: (Absolute Error Avarage) |E|= 385.817   (Root Mean Square Error) RMSE= 512.501   
DLP: (Absolute Error Avarage) |E|= 149.771   (Root Mean Square Error) RMSE= 196.561   

Range 123,000 - 123,500
P-1: (Absolute Error Avarage) |E|= 310.696   (Root Mean Square Error) RMSE= 405.519   
P-2: (Absolute Error Avarage) |E|= 125.683   (Root Mean Square Error) RMSE= 167.74   
P-3: (Absolute Error Avarage) |E|= 121.777   (Root Mean Square Error) RMSE= 153.605   
P-4: (Absolute Error Avarage) |E|= 163.332   (Root Mean Square Error) RMSE= 200.383   
L-D: (Absolute Error Avarage) |E|= 301.466   (Root Mean Square Error) RMSE= 393.957   
DLP: (Absolute Error Avarage) |E|= 81.6198   (Root Mean Square Error) RMSE= 104.565   

Range 130,000 - 130,500
P-1: (Absolute Error Avarage) |E|= 257.808   (Root Mean Square Error) RMSE= 346.281   
P-2: (Absolute Error Avarage) |E|= 134.14   (Root Mean Square Error) RMSE= 185.864   
P-3: (Absolute Error Avarage) |E|= 153.179   (Root Mean Square Error) RMSE= 197.29   
P-4: (Absolute Error Avarage) |E|= 225.078   (Root Mean Square Error) RMSE= 281.94   
L-D: (Absolute Error Avarage) |E|= 260.342   (Root Mean Square Error) RMSE= 345.381   
DLP: (Absolute Error Avarage) |E|= 106.201   (Root Mean Square Error) RMSE= 134.686   

Range 150,000 - 150,500
P-1: (Absolute Error Avarage) |E|= 191.308   (Root Mean Square Error) RMSE= 247.628   
P-2: (Absolute Error Avarage) |E|= 128.058   (Root Mean Square Error) RMSE= 159.02   
P-3: (Absolute Error Avarage) |E|= 175.956   (Root Mean Square Error) RMSE= 222.19   
P-4: (Absolute Error Avarage) |E|= 270.7   (Root Mean Square Error) RMSE= 338.797   
L-D: (Absolute Error Avarage) |E|= 196.845   (Root Mean Square Error) RMSE= 253.754   
DLP: (Absolute Error Avarage) |E|= 96.0289   (Root Mean Square Error) RMSE= 121.365   

Range 200,000 - 200,500
P-1: (Absolute Error Avarage) |E|= 174.672   (Root Mean Square Error) RMSE= 214.922   
P-2: (Absolute Error Avarage) |E|= 171.539   (Root Mean Square Error) RMSE= 213.147   
P-3: (Absolute Error Avarage) |E|= 257.679   (Root Mean Square Error) RMSE= 320.909   
P-4: (Absolute Error Avarage) |E|= 407.173   (Root Mean Square Error) RMSE= 510.26   
L-D: (Absolute Error Avarage) |E|= 179.434   (Root Mean Square Error) RMSE= 222.833   
DLP: (Absolute Error Avarage) |E|= 145.787   (Root Mean Square Error) RMSE= 181.068   

And Range 200,000 - 200,500 for DLP 1.2 / 0.9
DLP: (Absolute Error Avarage) |E|= 119.179   (Root Mean Square Error) RMSE= 146.419   

https://www.rarewares.org/test_samples
Bachpsichord.wav
DLP 0.9 / 0.6 and Levinsion Degree 10

Quote
Range 781,000 - 781,500
P-1: (Absolute Error Avarage) |E|= 323.39   (Root Mean Square Error) RMSE= 401.816   
P-2: (Absolute Error Avarage) |E|= 246.164   (Root Mean Square Error) RMSE= 308.946   
P-3: (Absolute Error Avarage) |E|= 264.817   (Root Mean Square Error) RMSE= 332.567   
P-4: (Absolute Error Avarage) |E|= 319.684   (Root Mean Square Error) RMSE= 406.375   
L-D: (Absolute Error Avarage) |E|= 328.579   (Root Mean Square Error) RMSE= 401.791   
DLP: (Absolute Error Avarage) |E|= 236.809   (Root Mean Square Error) RMSE= 295.039   

Range 783,500 - 784,000
P-1: (Absolute Error Avarage) |E|= 576.748   (Root Mean Square Error) RMSE= 732.896   
P-2: (Absolute Error Avarage) |E|= 739.136   (Root Mean Square Error) RMSE= 923.171   
P-3: (Absolute Error Avarage) |E|= 1188.61   (Root Mean Square Error) RMSE= 1482.96   
P-4: (Absolute Error Avarage) |E|= 2095.01   (Root Mean Square Error) RMSE= 2596.91   
L-D: (Absolute Error Avarage) |E|= 565.347   (Root Mean Square Error) RMSE= 722.512   
DLP: (Absolute Error Avarage) |E|= 599.783   (Root Mean Square Error) RMSE= 758.233   

Range 785,000 - 785,500
P-1: (Absolute Error Avarage) |E|= 858.63   (Root Mean Square Error) RMSE= 1076.89   
P-2: (Absolute Error Avarage) |E|= 1068.71   (Root Mean Square Error) RMSE= 1312.39   
P-3: (Absolute Error Avarage) |E|= 1539.71   (Root Mean Square Error) RMSE= 1930.56   
P-4: (Absolute Error Avarage) |E|= 2525.31   (Root Mean Square Error) RMSE= 3107.55   
L-D: (Absolute Error Avarage) |E|= 801.279   (Root Mean Square Error) RMSE= 1003.89   
DLP: (Absolute Error Avarage) |E|= 774.577   (Root Mean Square Error) RMSE= 961.112   

Range 789,000 - 789,500
P-1: (Absolute Error Avarage) |E|= 541.868   (Root Mean Square Error) RMSE= 674.853   
P-2: (Absolute Error Avarage) |E|= 560.549   (Root Mean Square Error) RMSE= 700.241   
P-3: (Absolute Error Avarage) |E|= 763.349   (Root Mean Square Error) RMSE= 948.568   
P-4: (Absolute Error Avarage) |E|= 1119.4   (Root Mean Square Error) RMSE= 1435.04   
L-D: (Absolute Error Avarage) |E|= 533.588   (Root Mean Square Error) RMSE= 670.675   
DLP: (Absolute Error Avarage) |E|= 439.466   (Root Mean Square Error) RMSE= 548.961   

Range 795,500 - 796,000
P-1: (Absolute Error Avarage) |E|= 305.056   (Root Mean Square Error) RMSE= 387.991   
P-2: (Absolute Error Avarage) |E|= 269.337   (Root Mean Square Error) RMSE= 337.022   
P-3: (Absolute Error Avarage) |E|= 321.195   (Root Mean Square Error) RMSE= 393.435   
P-4: (Absolute Error Avarage) |E|= 412.865   (Root Mean Square Error) RMSE= 506.12   
L-D: (Absolute Error Avarage) |E|= 306.578   (Root Mean Square Error) RMSE= 390.907   
DLP: (Absolute Error Avarage) |E|= 229.701   (Root Mean Square Error) RMSE= 290.912   

And Range 795,500 - 796,000 for DLP 1.5 / 0.8
DLP: (Absolute Error Avarage) |E|= 174.152   (Root Mean Square Error) RMSE= 207.577