Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Sac: state-of-the.-art lossless audio compression (Read 31230 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Sac: state-of-the.-art lossless audio compression

Hi together,

over the course of the last 10 years i was working on a lossless audio compression codec.
This is a research project aiming to find the limits of compression.

Sac has an asymmetric design, which means you can spend enormous time on encoding,
while decoding time stays roughly the same.

Sac provides the best compression rates on 2ch, 16-bit stereo!

There is also already a wiki page:
https://wiki.hydrogenaud.io/index.php?title=Sac

I wanted to provide a link here to my github including benchmarks:
https://github.com/slmdev/sac

If you have any questions feel free to ask :)

Thanks

Re: Sac: state-of-the.-art lossless audio compression

Reply #1
I will wait for >16bit and >2 channels support.

Re: Sac: state-of-the.-art lossless audio compression

Reply #2
So, for comparison: OptimFROG max encodes at ~2x realtime and decodes at ~10x realtime at modern hardware. If I take the latest results from the github readme, sac --best encodes 2000 times as slow as OptimFROG and decodes 50 times as slow? So, encoding a single 4 minute track would take 4000 minutes (66 hours), and decoding would take 20 minutes? Is that about right?

I've tried sac about a year ago, and I believe the fastest mode (normal without any optimizations) was quite near OptimFROG max in terms of both speed and compression. Is that still correct?
Music: sounds arranged such that they construct feelings.

Re: Sac: state-of-the.-art lossless audio compression

Reply #3
I've tried sac about a year ago, and I believe the fastest mode (normal without any optimizations) was quite near OptimFROG max in terms of both speed and compression. Is that still correct?

Why don't you check the benchmarks on github? I pushed compression massively in the last 12 month.

--normal is ~2 times slower than MP4ALS -z3 (encoding/decoding), 3 times slower than OFR --max on encoding but already better than everything else
--high is as slow as PAQ on encoding, but 5 times faster on decoding

you can try --insane :)

I don't really care about speed - as this project is not intended for productive use
That doesn't mean i code careless but if an option provides a minimal improvement but makes the codec 2x times slower, I would got for it

Re: Sac: state-of-the.-art lossless audio compression

Reply #4
Frankly, it doesn't appeal to me to compress slightly better and stay slower than already too slow codecs like Optimfrog or MPEG4 ALS. I always say, the fast one wins.

However, considering that it is not for normal use and is just an experimental study, this work should be appreciated.

Re: Sac: state-of-the.-art lossless audio compression

Reply #5
I was the one writing that wiki entry, I hope I got it broadly right. From the descriptions you have given here and there, I guess you won't be offended that some of us (myself included) routinely add a warning to common users that it is not suited for daily playback - the "experimental research project" phrasing in the wiki text reflects that.

Sac has been touched upon a few times in this forum.
2012: https://hydrogenaud.io/index.php/topic,97310.msg812720.html#msg812720
Then a couple of years ago tested some music that compresses quite ... weird. Some of the results might insult Sac O:)  - at least that old version of it. I have not re-tested the signals on more recent Sac versions. The first among these I must have deleted long ago.
 * artificially created near-mono where music changes every ten seconds. Sac does not do very well: https://hydrogenaud.io/index.php/topic,121770.msg1010590.html#msg1010590
 * a weirdly distorted piece of music - still available for free download! - where FLAC did so well that I had to check against everything: https://hydrogenaud.io/index.php/topic,122179.msg1014245.html#msg1014245 and reply #165
FLAC beating Sac means there are compression gains to be retrieved in how the residual coding method captures ultra-fast changes (rather than focusing on patterns in long blocks).
 * The infamous Merzbow: "I Lead You Towards Glorious Times", where Sac's effort did pay off https://hydrogenaud.io/index.php/topic,122040.msg1010086.html#msg1010086 . I later found how FLAC could get down to 51.9, also beating the frog. Again, the fast-changing residual coding scheme.
2024 on a "two-value" file: https://hydrogenaud.io/index.php/topic,124862.msg1038855.html#msg1038855

Oh, and it was reported that Sac (like flac) cannot handle corrupted/noncompliant metadata blocks that ProTools create. I bet that is not your top priority.

Re: Sac: state-of-the.-art lossless audio compression

Reply #6
Thanks for the heads-up - I will check the threads to see if i can improve on that
Seems you used some 0.6 version which is rather old.

- near-mono files: newest versions should handle that well but i don't model it explicitly
- corrupted data: I checked a lot of weird files - but i will look into this

If the LSBs are constant for a short amount of time (e.g. lossyWAV) , FLAC is still the better choice because Sac has a default framelength of 20s. You can check your weird samples with something like Sac --framelen=1.

Things on my to-do list:
- compress multiple frames at once
- change the way --sparse-pcm works so that lossyWAV files can be encoded more efficiently

Re: Sac: state-of-the.-art lossless audio compression

Reply #7
This is interesting to look at.
Do you think if it is possible eventually to design a subset based on it that will *decode* not much slower than OFR/MAC while still beating them on comp ratios?
a fan of AutoEq + Meier Crossfeed

Re: Sac: state-of-the.-art lossless audio compression

Reply #8
This is interesting to look at.
Do you think if it is possible eventually to design a subset based on it that will *decode* not much slower than OFR/MAC while still beating them on comp ratios?

Sure - there is a lot of stuff slowing everything down with minimal benefit
But I don't see the reason to do so as my intentions are different.

If i had to choose i would use MAC --high and thats it.

There is no real "sense" in lossless audio compression.
If you do something simple (flac -8) you get 2x
If you do something complicated you get 2x.

Re: Sac: state-of-the.-art lossless audio compression

Reply #9
* a weirdly distorted piece of music - still available for free download! - where FLAC did so well that I had to check against everything: https://hydrogenaud.io/index.php/topic,122179.msg1014245.html#msg1014245 and reply #165
FLAC beating Sac means there are compression gains to be retrieved in how the residual coding method captures ultra-fast changes (rather than focusing on patterns in long blocks).

I tested current master (v0.7.11) on the file
Code: [Select]
"Aylwin - Farallon - 01 My Spirit of Pine and The Outer Body Experience- A Sequence of Night and Day.wav" 
150.947.876 Bytes

Flac -8        121.068.123 Bytes
Sac v0.7.11    113.318.408 Bytes (1059 kbps, encoding time 13min)

so i don't know if there is some issue

Re: Sac: state-of-the.-art lossless audio compression

Reply #10
I tested current master (v0.7.11) on the file
Code: [Select]
"Aylwin - Farallon - 01 My Spirit of Pine and The Outer Body Experience- A Sequence of Night and Day.wav" 

Wrong track.

Code: [Select]
  97 301 142	Stellar Descent - Farallon - 02 BANDCAMP EDIT (PART 1)- Farallon- A Sequence of Subduction and Orogeny.flac
 100 781 825 Stellar Descent - Farallon - 02 BANDCAMP EDIT (PART 1)- Farallon- A Sequence of Subduction and Orogeny.sac
 118 482 027 Stellar Descent - Farallon - 02 BANDCAMP EDIT (PART 1)- Farallon- A Sequence of Subduction and Orogeny.tak
 133 182 044 Stellar Descent - Farallon - 02 BANDCAMP EDIT (PART 1)- Farallon- A Sequence of Subduction and Orogeny.wav

 131 018 525 Stellar Descent - Farallon - 03 BANDCAMP EDIT (PART 2)- Farallon- A Sequence of Subduction and Orogeny.flac
 137 946 456 Stellar Descent - Farallon - 03 BANDCAMP EDIT (PART 2)- Farallon- A Sequence of Subduction and Orogeny.sac
 166 106 033 Stellar Descent - Farallon - 03 BANDCAMP EDIT (PART 2)- Farallon- A Sequence of Subduction and Orogeny.tak
 181 548 104 Stellar Descent - Farallon - 03 BANDCAMP EDIT (PART 2)- Farallon- A Sequence of Subduction and Orogeny.wav

Re: Sac: state-of-the.-art lossless audio compression

Reply #11
I tested current master (v0.7.11) on the file
Code: [Select]
"Aylwin - Farallon - 01 My Spirit of Pine and The Outer Body Experience- A Sequence of Night and Day.wav" 

Wrong track.

Code: [Select]
  97 301 142	Stellar Descent - Farallon - 02 BANDCAMP EDIT (PART 1)- Farallon- A Sequence of Subduction and Orogeny.flac
 100 781 825 Stellar Descent - Farallon - 02 BANDCAMP EDIT (PART 1)- Farallon- A Sequence of Subduction and Orogeny.sac
 118 482 027 Stellar Descent - Farallon - 02 BANDCAMP EDIT (PART 1)- Farallon- A Sequence of Subduction and Orogeny.tak
 133 182 044 Stellar Descent - Farallon - 02 BANDCAMP EDIT (PART 1)- Farallon- A Sequence of Subduction and Orogeny.wav

 131 018 525 Stellar Descent - Farallon - 03 BANDCAMP EDIT (PART 2)- Farallon- A Sequence of Subduction and Orogeny.flac
 137 946 456 Stellar Descent - Farallon - 03 BANDCAMP EDIT (PART 2)- Farallon- A Sequence of Subduction and Orogeny.sac
 166 106 033 Stellar Descent - Farallon - 03 BANDCAMP EDIT (PART 2)- Farallon- A Sequence of Subduction and Orogeny.tak
 181 548 104 Stellar Descent - Farallon - 03 BANDCAMP EDIT (PART 2)- Farallon- A Sequence of Subduction and Orogeny.wav

This is a known issue with many lossless codecs. I tested "sac --framelen=1" and got
Code: [Select]
   97.745.754 "'Stellar Descent - Farallon - 02 BANDCAMP EDIT (PART 1)- Farallon- A Sequence of Subduction and Orogeny"

Usually large framesizes increase compression (Sac uses 20s). Some files (e.g. processed with lossyWAV) use a kind of "adaptive noise shaping" where the LSBs are constant for some time. The larger the framesize, the less likely the encoder can find constant LSBs.

Sac has a generalized model, which not only can compress these type of files better, but also other files where the pcm spectrum is sparse. This model has a lot of side-information so just reducing framesize doesn't do it.

I already mentioned in https://hydrogenaud.io/index.php/topic,127096.msg1056114.html#msg1056114 that i am aware of that issue, but its a lot of work to do right.




Re: Sac: state-of-the.-art lossless audio compression

Reply #12
This is a known issue with many lossless codecs. I tested "sac --framelen=1" and got
Code: [Select]
   97.745.754 "'Stellar Descent - Farallon - 02 BANDCAMP EDIT (PART 1)- Farallon- A Sequence of Subduction and Orogeny"

Usually large framesizes increase compression (Sac uses 20s). Some files (e.g. processed with lossyWAV) use a kind of "adaptive noise shaping" where the LSBs are constant for some time. The larger the framesize, the less likely the encoder can find constant LSBs.

Sure, but for that particular file, FLAC did not exploit a single wasted bit.
 
The explanation is in FLAC's residual coding scheme. In most of the subframes of this file, it predicts a sample by just the immediate predecessor, and lets the residual encoder do the dirty work - by changing the Rice exponent every other sample.

Re: Sac: state-of-the.-art lossless audio compression

Reply #13
Sure, but for that particular file, FLAC did not exploit a single wasted bit.
 
The explanation is in FLAC's residual coding scheme. In most of the subframes of this file, it predicts a sample by just the immediate predecessor, and lets the residual encoder do the dirty work - by changing the Rice exponent every other sample.

are you 100% sure about this? I checked the resulting file with "sac --listfull" and all frames have a MSB at 14 or 15 - so it looks like there is not much variation. And it also would be the first file i encountered with such behavior.

There is some adaptive block switching in Sac but for this types of files i have to do something different. I wonder how offen this happens in the wild.

Code: [Select]
Beautyslept.wav 2.757.094 Bytes
flac -8         1.746.326 Bytes
sac             1.267.817 Bytes

Re: Sac: state-of-the.-art lossless audio compression

Reply #14
Sure, but for that particular file, FLAC did not exploit a single wasted bit.
 
The explanation is in FLAC's residual coding scheme. In most of the subframes of this file, it predicts a sample by just the immediate predecessor, and lets the residual encoder do the dirty work - by changing the Rice exponent every other sample.

are you 100% sure about this?

flac -a outputs an analysis file. 16258 matches for  "   wasted_bits=0   " on 8128 stereo blocks. By the way, 7241 were encoded dual-mono.
I tried to encode with flac's minimum block size of 16 samples, got me an analysis file of 7x the .wav, and found wasted bits in < 2 percent of the subframes.

Re: Sac: state-of-the.-art lossless audio compression

Reply #15
flac -a outputs an analysis file. 16258 matches for  "   wasted_bits=0   " on 8128 stereo blocks. By the way, 7241 were encoded dual-mono.
I tried to encode with flac's minimum block size of 16 samples, got me an analysis file of 7x the .wav, and found wasted bits in < 2 percent of the subframes.

Now you got me.
I patched Sac to use 0.25s frames and got it down to 96.971.880 Bytes (flac: 97.301.142).
It seems to be really a matter of efficient encoding the noise by static block switching.

My idea is to use large frames (20s) for prediction and sub-frames for the residuals.
In this way we can use the sparse-model, wasted-bits and noise-modelling more efficiently.

I did another small test and extracted the first 20s and tested paq8px - which is usually worse than Sac but has a really sophisticated residual coder.

Code: [Select]
sample.wav    3.539.022 Bytes
paq8px -6     2.117.387 Bytes
Sac           2.269.896 Bytes
flac -8       2.554.018 Bytes

Re: Sac: state-of-the.-art lossless audio compression

Reply #16
flac -a outputs an analysis file. 16258 matches for  "   wasted_bits=0   " on 8128 stereo blocks. By the way, 7241 were encoded dual-mono.
I tried to encode with flac's minimum block size of 16 samples, got me an analysis file of 7x the .wav, and found wasted bits in < 2 percent of the subframes.

Now you got me.
I patched Sac to use 0.25s frames and got it down to 96.971.880 Bytes (flac: 97.301.142).
It seems to be really a matter of efficient encoding the noise by static block switching.

My idea is to use large frames (20s) for prediction and sub-frames for the residuals.

I presume that your "sub-frames" are just a part of the frame - closest thing in FLAC lingo is "partition" (FLAC uses "subframe" for an encoded channel, possibly an encoded mid or side).

So here goes, though you probably know most already. FLAC has no "partition header": Partition size is encoded in the frame header (well, the "subframe" header, i.e. the frame-channel) with two bytes (edit, corrected) r=0 ... 15, signifying 2^r partitions, each with one single parameter stored: the 4-bit Rice exponent. [Details omitted about 15 and 5-bit.]
A partition is then blocksize/2^r samples long. Format constraints this to be integer (and exceed partition order); merely changing block size from 4096 to 4095 enforces r=0 (and brings this file up to nearly 120).
When the partition is decoded, the new parameter takes over for the next with no further flags/headers needed.

In this particular file, most of the partitions are 2 samples: it spends 4 bits just to tell how to encode the residual of the upcoming two samples, and then it spends 4 bits again. 11 megabytes extra are spent just to encode those parameters - and it pays off on this file.

So Rice exponents change often in this file, and by quite a lot: exponents 1 through 9 count for < 5 percent of them. Thus if the format allowed the Rice exponents to be compressed further, one could have reduced size. Not to mention, sparsity could then be exploited in a trade-off between the number of Rice parameters to actually employ, and the penalty of choosing "wrong"; very often, the penalty of choosing "just one away from optimal", is very small.

But FLAC is of course designed for light decoding footprint, leaving that playground to ... other codecs.


Re: Sac: state-of-the.-art lossless audio compression

Reply #17
Thanks for the additional info. Yes i mean "sub-frames".
Sac doesn't use rice-coding but bitplane-coding. It is a lot more complicated than rice-coding
https://github.com/slmdev/sac/blob/master/src/libsac/vle.cpp

Unfortunately this only pays off for files which are already compressible. I haven't touched the residual coding in a long time but this file is a good motivation to do so :)

Re: Sac: state-of-the.-art lossless audio compression

Reply #18
Out of curiosity I tested this weird "1-Bit file" from
https://hydrogenaud.io/index.php/topic,124862.msg1038855.html#msg1038855

There seems to be a cosmetic bug in calculating the length of the file (possible overflow)
checked if Sac decoded file is bit-perfect via file-compare

Code: [Select]
"C - I IV bVIIM IV Strummed 1-bit 882kHz.wav"

original              16.283.124 Bytes
ofr --preset max      13.613.463 Bytes
sac --sparse-pcm=0    10.455.952 Bytes
flac -8 -lax           3.758.114 Bytes
sac                    3.339.773 Bytes (~2min to compress)
sac --high             3.276.452 Bytes (~35min to compress)
paq8px                   897.230 Bytes (~15min to compress)

Re: Sac: state-of-the.-art lossless audio compression

Reply #19
It makes no sense to try to compare two codecs that differ in operating speed by hundreds of times for a small compression ratio. Normally, people cannot adjust parameters for each type of music to get better results. Moreover, FLAC can still surprise you even in its current state.

Re: Sac: state-of-the.-art lossless audio compression

Reply #20
It makes no sense to try to compare two codecs that differ in operating speed by hundreds of times for a small compression ratio.

why not?
I personally don't care about speed in a research project
its "only" two times slower than MP4ALS but providing (much) better compression. It could be worse.

Doing speed optimizations is trivial. Improving compression after a certain point not. And that's the fun for me!

Nobody forces you to use anything higher than --normal.
But if you want to squeeze the last 1-2% out of your benchmark, you can spend 10h instead of 1min.
Its unlikely that the optimization is able to give more gains.

Sac is designed to be very flexible, catching a lot of unusual patterns (e.g. stereo-modelling, long-range prediction, sparseness etc).
There  are no "magic numbers" or "shift by 5 bits and multiply by 13" heuristics.
Its predictor design is simple and elegant (in my opinion) and it took me a lot of time to figure this out.

I wanted to counter the (partial) negative comments of Porcus in providing (newer) benchmarks for his weird samples.

If you spend 15 years on the problem of building a lossless audio codec with the best possible compression
while giving the sources away for free, it hurts to read things like

- "utterly useless"
- "Complete failure"
- "quite worthless without some engineering craft"
- "its --optimize ... is of no use"
- "It takes more than just a run-of-the-mill development corpus"
- "ultra-useless sac compressor"

How many persons do you know which have archived something like me with Sac (in its specific niche)? Let me guess: none!

I left hydrogenaudio 10 years ago, because here are a lot of stubborn people - ungrateful for things others do in their spare time.

It should have been a warning that you chased away TBeck and i (now) realized again, that it was a failure to come here.

Re: Sac: state-of-the.-art lossless audio compression

Reply #21
Calm down. You do yourself describe it as "a research project aiming to find the limits of compression" - that is a fact-finding mission, not a usefulness-finding mission.
The readme used to describe it as "throw a lot of muscles at the problem and archive only little gains", and if you want to approach the limits of what is even achievable, then usefulness is and must be expendable.
Don't whine over your own scope. And don't compare it to TAK, which is universally admired for being something completely different.

It doesn't look very good to quote so far out of context, when that very context is what justifies that Sac is even invoked. Sac is to be taken into compression tests either as a benchmark (that's why people like myself have called for it in comparisons) or as a kind of sanity check; when codec x does exceptionally well, fire up a very heavy one to see what happens. Guess what, quite often there is no value in throwing a lot more muscles at it unless you catch that particular feature of the signal.
And ordinary users should be warned it was not invoked for being useful. Apparently, my computer encodes sac --veryhigh in about 0.01x realtime per thread, which does limit how often it can be included even as a benchmark.

Re: Sac: state-of-the.-art lossless audio compression

Reply #22
I left hydrogenaudio 10 years ago, because here are a lot of stubborn people - ungrateful for things others do in their spare time.

It should have been a warning that you chased away TBeck and i (now) realized again, that it was a failure to come here.
I understand where you’re coming from, but the comments of a vocal few shouldn’t cast aspersion on everyone. It seems like the anonymity of the Internet facilitates bad behavior like this, and who knows, you might find out some of these people making comments are teenagers. What’s the expression, “haters gonna hate”?

Here are three recent query threads on the WavPack forum where I spent a decent amount of time to provide what I thought were useful answers, only to be rewarded with crickets. In fairness, one thanked me in advance right in the query, but still.

Thread #1.
Thread #2.
Thread #3.

Others are truly appreciative and they make it worthwhile. I haven’t commented on your stuff, but I have been aware of it for many years (encode.su) and have been following along, and I’m sure there are many others silently doing the same thing. Cheers!

Re: Sac: state-of-the.-art lossless audio compression

Reply #23
Don't whine over your own scope.

Maybe its a language problem - instead of "useless" maybe describe it as "impractical".
I remember having an --insane mode for SAC1 (~2012) which was slow as hell on my computer - taking 12h for 4min.
When i run it now, i am laughing because it is so fast in comparison. Future will tell.

Quote
And don't compare it to TAK, which is universally admired for being something completely different.

I don't care about any (niche) software which is closed-source, be it TAK, HALAC, OFR, LA or what not. I referred to that cringy discussions.

Quote
It doesn't look very good to quote so far out of context, when that very context is what justifies that Sac is even invoked.

So i state "i have the best lossless audio codec" und you answer with "nah on this weird sample it got beaten by run length encoding"

At least provide version numbers when you publish tests. I have no idea what you tested.
There is at least some SACv0.0.6a4 floating around which is worse than OFR. There was a version on github for quite some time, which was not working at all, etc.

Quote
Guess what, quite often there is no value in throwing a lot more muscles at it unless you catch that particular feature of the signal.

That is something you can figure out in 5min by checking the provided readme and looking at the benchmarks.
The optimization modes are not designed to find particular features of a signal, but to make the prediction more "optimal".
They don't offer more than 1-2% gains and cost 100times more computing power.

If you find a better way to archive such compression, let me know!

I appreciate your (testing-)efforts nonetheless

Although, I don't think it is useful to test compression by benchmarking whole albums.
I would be rather surprised if a codec that is the best on a 1min sample of a particular album, is suddenly far behind on a 30min sample.

And while i am happy with Sac in its current state (as a benchmark tool for small testsamples < 20s),
i will include processing of multiple frames at once in one of the next releases :)
So Sac --normal will run in ~x5 realtime on a standard Laptop.


Comments of a vocal few shouldn’t cast aspersion on everyone

you are right

Quote
I haven’t commented on your stuff, but I have been aware of it for many years (encode.su) and have been following along, and I’m sure there are many others silently doing the same thing. Cheers!

I know of some - always pushing for playback plugins :D - and I appreciate your comment.

Cheers

Re: Sac: state-of-the.-art lossless audio compression

Reply #24
Maybe its a language problem - instead of "useless" maybe describe it as "impractical".
I remember having an --insane mode for SAC1 (~2012) which was slow as hell on my computer - taking 12h for 4min.
When i run it now, i am laughing because it is so fast in comparison. Future will tell.
Unfortunately, the speed increase in processors is not the same as before. Core speeds do not change much, core numbers increase. And there are minor speed increases from other things. Intel's situation is a serious symptom of this.

I don't care about any (niche) software which is closed-source, be it TAK, HALAC, OFR, LA or what not. I referred to that cringy discussions.
I don't think that the very successful closed source codecs above are very interested in SAC either.

If speed is not an issue, as in SAC, paq8px is now a better option. It gives better results. It is also an open source study. I think if you support him, a better work can be done.