Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Tested: compressing silence (Read 4331 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Tested: compressing silence

I wonder if I should have made just a single topic about compressing "uncommon" signals, but heck: here goes silence.
It is known that FLAC/WavPack/TAK/OptimFROG compress using wasted bits: say, if a 16 bit signal is padded to 24, then they can know that the last 8 are "wasted" and compress "as 16" without size penalty.
ALAC/Monkey's/TTA can not. I consider this a design flaw in a heavy compressor like Monkey's - not so much one in ALAC that never had the ambition of scoring high at compression.

Question is, does this make all the difference when compressing something that is zero effective bits?
And - for the next post - does it matter a lot for 2 second gaps in CD images?

But to the first test: While scratching my head over multichannel performance, I created a 96/24 5.1 file of 25 minutes silence. Results (TTA to be explained later):

Blue text: working in an NTFS compressed folder. The .wav file takes up 4 KB of actual disk space. The wav-in.zip quoted as 73 KB takes up 8.
* ALAC isn't happy about this, obviously. So not only does it fail to accommodate wasted bits (i.e. when compressing 16-padded-to-24 it cannot  "compress as 16" - FLAC/TAK/WavPack can) - it also spends quite a few bits compressing these 24 zeroes.
* Monkey's isn't happy either, likely for the same reason. Not unexpected, NTFS compression gets the "insane" ape much further down; apparently it uses bigger blocks, and thus fewer repeating patterns.
* Then TTA is actually a surprise. Like ALAC and Monkey's, it lacks the wasted bits feature, which explains how it is a couple of orders of magnitude above flac -8, but it nevertheless being so much better than Monkey's ... suggests a design flaw in the latter. (Not to mention in ALAC, but ALAC was never designed for being a high compressing codec anyway!)
* The two WavPack files are of identical size. Probably just block headers and something saying "silence".
* FLAC differences and TAK differences are apparently due to differences in block sizes, hence differences in number of blocks. The TAK difference looks formidable viewed as a 4x, but note that the

* And then it is the two remaining TTA elements: the 0 KB and the second-to-bottom entry.
They exhibit a flaw in the encoder: It interprets sample count as unsigned integer, and so at 2 GB size its default operation refuses the .wav file; forcing it to take in a signal of unknown length with the -b switch makes it first write a temporary raw PCM file to get the length, and then it will start compressing.
That means that if you have say a spinning-drive NAS with not-so-fast transmission, you will have to wait quite a while before TTA even starts the encode job.

Re: Tested: compressing silence

Reply #1
Test number 2: Injecting 2 seconds of silence between tracks in a live recording.
I took one live album and inserted 2 seconds of silence before/after each song.
* Live recording because: then sudden silence is "a discontinuity". Had it been a fade out, it might already have been silence or near-silence there. This makes the test arguably less realistic, but it is a clear cut: this is silence, it is distinct from the material, what difference does it make?
* The album doesn't have a track boundary inside music, but in the talk and crowd noise between the songs. So not as abrupt as full-volume music being suddenly muted. I selected YOB: Live at Roadburn 2010, https://open.spotify.com/album/6qhc2btBAoSX153IDKYK94 for no other good reason than my missing last week-end's Roadburn festival :-/ ... only four songs. The musical content shouldn't matter much if anything.
* Encoded two full images: the entire album, and one with 2 seconds silence before track 1 and between 1/2, 2/3, 3/4 and after 4. Encoded both.

The 10 seconds make for around 1723 KiB difference uncompressed. How do the codecs respond to that?
* Conjecture before the experiment: larger block size should inhibit the codecs' ability to compress away those extra seconds - because a larger block spanning the track boundary would then have to fit the predictor to both the recording and the silence.
* Result: True for Monkey's (badly!), TTA (has a block size of a second). Otherwise, maybe noticeable.

* 1723 difference for WAV. That is for 10 seconds in an hour.
* 1771 difference for Monkey's Insane. That is right, the 5 pieces of 2 seconds silence makes more difference to Monkey's Insane than storing those 10 seconds uncompressed. This is likely due to Insane mode's long blocks: two seconds of audio simply doesn't fit the rest of the signal in the block and has to be treated as a residual.
Apart from that, this is music where Monkey's does a good job: but the "silences gaps" penalty makes a bigger difference than the difference between Monkey's Extra High and Monkey's Insane.
* 189 for Monkey's Extra high. 110 for High.
* 107 for TTA. A second block size.  All this makes sense: a downside to a long block.
* It also makes sense that OptimFROG --preset 10 compresses this extra silence "worse" than --preset 2 does; but the difference is quite small: 24 KiB vs 10 KiB. I guess this is OptimFROG's general willingness to pay (in terms of complexity and CPU usage) to compress away patterns. And it succeeds at that.
* ALAC compressed silence the worst, but has a small block size: difference is 19 KiB, equivalent to 15 kbit/seconds. (Compress 10 seconds standalone, and it makes 3.
* Then there are some unexpected results, but things like these happen: WavPack -hx --blocksize=131072 makes for 50 KiB smaller file when the silence is added (while -hx goes the opposite way). Also FLAC -5 is marginally smaller.
* Apart from that, results aren't that far from ALAC's 3/sec on standalone-CDDA-silence. We are talking not too many kilobytes for a full album.



As of 2022 it is a bit futile to argue against Monkey's Audio: it has always done its own thing, and - when Mr. Ashland has made up his mind and decided not to break compatibility on new features - a Monkey's Insane file is the same as in 3.99.
But given how Monkey's once was the codec of choice for those who would rip a CD to an image file as small as possible, even if it occasionally couldn't be played back on a portable player, it is a bit strange how it omitted wasted bits, compressing silence so badly. With shorter tracks, e.g. one gap per 200 seconds rather than one per 12 minutes, it could be a percent of the CD compressed badly, a penalty of half a percentage point. I have a hunch that people who would choose Insane fifteen years ago would be jolly good with much less than half a point compression improvement for taking into account something that is supposed to take place on audio CDs.
Even on this CD, five gaps outweigh the difference between Extra High and Insane. 

Re: Tested: compressing silence

Reply #2
Question is, does this make all the difference when compressing something that is zero effective bits?
And - for the next post - does it matter a lot for 2 second gaps in CD images?

I didn't think that the 2-second gap was actually silent audio - isn't it just part of the format (ie, not recorded in the audio?)

Re: Tested: compressing silence

Reply #3
isn't it just part of the format (ie, not recorded in the audio?)

It can be part of the audio originally or simply added in as part of the audio.  The gaps in CDs are still audio as they don't have to be silent.  An artist can space audio tracks in an album however they want including hiding tracks with long periods of silence to make you think the CD has ended.

Re: Tested: compressing silence

Reply #4
Twenty years ago, the net was full of "how to burn gapless?" - several CD authoring applications would inject a two seconds digital silence as INDEX 00 before each track.
So with many CDs actually having digital silence between tracks - and many index 00's being two seconds - well I guess that considering this for compression of CD images, amounts to at least "a decent shot at something plausibly relevant".


Now index 00 doesn't have to be two seconds (I am not sure if that was ever even part of the spec?) and it doesn't have to be digital silence - I mean I took a concert album that wasn't!
And moreover, the market had largely ditched "index points" in favour of "tracks for everything" already in the eighties before digital audio workstations became ubiquitous, so even many "tracks based" albums (disregarding those with continuous music!) would be mastered for CD by inserting index point 00 in what is "digitized analog tape hiss".
Sure. Valid objection to conclusions about average impact.

Re: Tested: compressing silence

Reply #5
I believe the main reason that flac/tak/wavpack don't have smaller results is that they all encode the sample or frame number into the frame. This means that the header is different in a compressible way but more importantly for a compressed filesystem the crc is different in an incompressible way. Because they all have wasted bits or "constant" encoding as an efficient way of encoding silence this test boils down to benchmarking the efficiency of the overhead, tak wins because it uses larger blocks and wavpack loses because the overhead is relatively high (but possibly doesn't lose too much because it can also use larger blocks).

Re: Tested: compressing silence

Reply #6
Mono silence behaves way different from multichannel silence.
Or rather, the other way around, in terms of order of development.
Apologies to Monkey's in particular, it fell out much worse than on your typical signal. I didn't realize up then that channel count affects codecs performance so wildly, and the first test wasn't representative for other than multi-channel. For example, Monkey's doesn't do as well on multi-channel as on stereo - generally. Same goes for silence.

CD-long mono silence, 44.1/16, no tags to distort the measurement.
Percent of .wav, ordered by size and rounded off to something semi-sane.

0.0008%     ape     Insane, that is 8 ppm = 8 bytes per megabyte.
0.0023%     ofr     could get the frog down here with --optimize high, but not with --preset max
0.003%     ape     Extra high
0.003%     ofr     default --preset 2. max and 6 and 9 tested, slightly worse, 0.003 to 0.004.
0.011%     flac    at max block size of 65535
0.012%     ape     fast, normal, high
0.016%     ofr     fastest --preset 0
0.024%     wv     biggest possible block size
0.04%     flac    block size 16384 produces ~half size of 8192 and quarter size of default 4096
0.07%     tak     highest, -p4m 
0.14%     wv     default block size
0.14%     tak     default -p2
0.16%     flac     default block size
0.19%     tak     fastest -p0
0.29%     alac     refalac
0.39%     alac     ffmpeg
0.58%     flac     -0, it has block size 1152
  >6%     tta     (kid you not)
    
It seems that most codecs depend pretty much on block size. Sort of strange that a codec like OptimFROG doesn't detect silence and utilize it straight away.