Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Recent Posts
3
FLAC / Re: FLAC v1.4.x Performance Tests
Last post by Porcus -
8192 beats 16384 size-weighted in my tests on upsampling those 38 CDs, but it varies quite a lot. Medians could very well tell a different story. I messed up something and it is still running, but I can report on those two block sizes at least.

Using -A "subdivide_tukey(3);blackman" on everything, then
** In overall size:
At 96 kHz, -b8192 beats -b16384, and -eb8192 beats -eb16384.
At 192 kHz, same happens.
At 384 kHz, -b8192 beats -b16384 by around 0.12 percent, but -eb16384 beats -eb8192 by around 0.18 percent.

192 kHz, let's look into that further: No -e here.
* Classical music benefits from larger blocksize -b16384, 12 albums to 2; all except harpsichord and (near-zero) Cage's percussion works. Total impact 0.32 percent (not percentage points!), varying from -0.15 (harpsichord) to 0.63 percent (Bruckner, vocals)
Median impact = 0.37 = median absolute value impact.
But then the rest:
* The heavier music: -b8192 wins by 7 albums against 3, switching sign on impact to signify that:
Total impact -0.14 percent, varying from -0.71 (Laibach, biggest benefit for -b8192) to 0.24 percent (Gojira, that benefits from 16384).
Median impact = -0.24. Remove the sign for median absolute value impact.
* The others. -b8192 wins by 9 albums against 5
Total impact -0.28 percent, max benefit from -b8192 is -1.31 percent (Wovenhand, in this release that is singer/songwriter) and then -0.99 (Sopor Aeternus, that is something completely different: darkwave) - and on the other end, benefiting most from larger blocksizes are the jazz albums: 0.41 percent for both Davis and Johansson. Those were near-mono before dithering I think.
Median impact = -0.32 percent. Median absolute impact: 0.38.


For those who did not follow the previous discussions, I am talking about optimizations for >=4x upsampled data, so the parameters listed above are not suitable for encoding "real" hi-res files.

... but who knows how many hi-res files are "real".
5
General Audio / Re: AI language models can exceed PNG and FLAC in lossless compression, says study
Last post by C.R.Helmrich -
Alright then, in that case I'll assume they did not use dither, just simple rounding. Then, many speech pauses will turn into digital silence, making FLAC compression to ≈30% of the WAV size possible, I guess.

And the fact that LZMA2 compresses a bit better than FLAC can probably be explained by the frame-wise header overhead in FLAC, which turns significant with file sizes as small as those which you reported.

Chris

P.S.: And compared with the input media (audio, color images), none of what they do is actually lossless then. Still wondering about the 16.4%, though.
6
General Audio / Re: AI language models can exceed PNG and FLAC in lossless compression, says study
Last post by Kamedo2 -
This is the test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0009.flac file, converted to 8bit using sox. 76,195 bytes.

"2414-128292-0009 FOR WHEN ZARATHUSTRA SCRUTINISED HIM WITH HIS GLANCE HE WAS FRIGHTENED AS BY A SUDDEN APPARITION SO SLENDER SWARTHY HOLLOW AND WORN OUT DID THIS FOLLOWER APPEAR"

Original, distributed version:
Quote
General
Complete name                  : C:\~~~~~~~\test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0009.flac
Format                         : FLAC
Format/Info                    : Free Lossless Audio Codec
File size                      : 235 KiB
Duration                       : 13 s 125 ms
Overall bit rate mode          : Variable
Overall bit rate               : 146 kb/s

Audio
Format                         : FLAC
Format/Info                    : Free Lossless Audio Codec
Duration                       : 13 s 125 ms
Bit rate mode                  : Variable
Bit rate                       : 141 kb/s
Channel(s)                     : 1 channel
Channel layout                 : C
Sampling rate                  : 16.0 kHz
Bit depth                      : 16 bits
Compression mode               : Lossless
Stream size                    : 227 KiB (97%)
Writing library                : libFLAC 1.2.1 (UTC 2007-09-17)
MD5 of the unencoded content   : C6C1AF5F80BB643A4172F406186017E7

16 bit original turned into 8 bit by sox.
Quote
General
Complete name                  : C:\~~~~\librispeech-8bit.flac
Format                         : FLAC
Format/Info                    : Free Lossless Audio Codec
File size                      : 74.4 KiB
Duration                       : 13 s 125 ms
Overall bit rate mode          : Variable
Overall bit rate               : 46.4 kb/s

Audio
Format                         : FLAC
Format/Info                    : Free Lossless Audio Codec
Duration                       : 13 s 125 ms
Bit rate mode                  : Variable
Bit rate                       : 41.4 kb/s
Channel(s)                     : 1 channel
Channel layout                 : C
Sampling rate                  : 16.0 kHz
Bit depth                      : 8 bits
Compression mode               : Lossless
Stream size                    : 66.3 KiB (89%)
Writing library                : libFLAC 1.4.3 (UTC 2023-06-23)
MD5 of the unencoded content   : 17D503AC844BB657FC6BE24F755B9BE7
7
General Audio / Re: AI language models can exceed PNG and FLAC in lossless compression, says study
Last post by Kamedo2 -
I'm not familiar with that Sox command-line, so let me ask: did you just recreate the WAV header such that one 16-bit sample is interpreted as two 8-bit samples, i.e., the high-byte as one sample and the low-byte as another sample?
It is a normal bit-depth reduction command. It is not a command to split one file into two. As expected from being 8bit, it has lower S/N. It is wholly dithered.                                                                                                                                                                                                                                                                                                                            

How can this be better compressible by FLAC than a meaningful unscrambled 16-bit waveform? Have you listened to the resulting files or looked at them in an audio editor? Do they contain long passages of digital silence?

The MSB is easier than the LSB to predict. The dither is audible throughout the tracks. No section of digital silence, the whole tracks are dithered, even when the space between speech.
8
FLAC / Re: FLAC v1.4.x Performance Tests
Last post by bennetng -
So I tried it. The total duration of the two attached playlists is 41h 39m 06s. The playlists contain single tracks and images, so not every single track name is shown.

-8
38.1 GB (41002782313 bytes)

-8b8192 -A "subdivide_tukey(3);blackman"
37.7 GB (40507157807 bytes)

5 files out of 306 still clipped with -2dB gain. Yes, if one keeps throwing Merzbow to the chain that requires like -10dB it would harm the stats, so don't do that. Clipping should be controlled to as low as possible, or not at all (e.g. by using RG max non-clip gain).

For simplicity and speed, I did all the conversions within foobar and foo_dsp_resampler.
[attach type=image]27188[/attach]
[attach type=image]27190[/attach]
Same corpus with same settings transcoded from pre-upsampled flac files (16/44 -> 24/192 with -2dB preamp), using foobar multi-file multi-thread, flac 1.4.3. Decoding times are single-threaded using foobar x64 benchmark, all tested on NVMe SSD, i3-12100.

-8p
Total encoding time: 1:49:49.297, 22.75x realtime
40794570053 bytes

-8b8192 -A "subdivide_tukey(3);gauss(22e-2)"
Total encoding time: 19:30.984, 128.05x realtime
40502851907 bytes

-8b8192 -A "subdivide_tukey(3);blackman;gauss(22e-2)"
Total encoding time: 21:16.360, 117.47x realtime
40419431443 bytes

-8b9216 -A "subdivide_tukey(3);blackman;gauss(22e-2)"
Total encoding time: 21:18.485, 117.28x realtime
40406946503 bytes

-8b16384 -A "subdivide_tukey(3);blackman;gauss(22e-2)"
Total encoding time: 21:33.359, 115.93x realtime
40404848546 bytes
Decoding: 240.030x realtime

-8e
Total encoding time: 1:07:58.406, 36.76x realtime
40330004928 bytes
Decoding: 241.866x realtime

-8b8192 -A "subdivide_tukey(4);blackman;gauss(22e-2)"
Total encoding time: 30:46.781, 81.19x realtime
40053310660 bytes
Decoding: 240.803x realtime

From the data above I think one can deduce how slow -8pe on this CPU would be, so I am not going to test it.

gausswork
Gauss, like other windows, works best when coupled with an optimal blocksize. Also, the gauss parameter needed to be calculated for best performance. For example, the target 192kHz sample rate has a Nyquist of 96kHz, if the resampler is lowpassed at about 23kHz like the red plot below, then the value should be 23/96, so it would be gauss(24e-2), the green and yellow plots would be 23e-2 and 25e-2.
X

Blackman on the other hand describes a faster than expected overall decay trend at upper spectrum without doing anything very specific, so can catch more different filter shapes.

If the hardware does not support higher blocksizes, slightly increasing -l and -r would help a lot, like -l14 to -l16 and -r7. The increased decoding complexity should not be a big deal especially for mains-powered devices.

-b16384 can still cause inflation at 192kHz depending on materials, use with caution.

For those who did not follow the previous discussions, I am talking about optimizations for >=4x upsampled data, so the parameters listed above are not suitable for encoding "real" hi-res files.
9
General Audio / Re: AI language models can exceed PNG and FLAC in lossless compression, says study
Last post by C.R.Helmrich -
I'm not familiar with that Sox command-line, so let me ask: did you just recreate the WAV header such that one 16-bit sample is interpreted as two 8-bit samples, i.e., the high-byte as one sample and the low-byte as another sample? How can this be better compressible by FLAC than a meaningful unscrambled 16-bit waveform? Have you listened to the resulting files or looked at them in an audio editor? Do they contain long passages of digital silence?

Here come the times when for encoding you shouldn't consider only the size of encoded files, but executable and dictionary/model size too. Computing power and memory bandwidth requirements rose significantly too.
Exactly! That's common practice in both MPEG and 3GPP codec standardizations, btw, especially when the codec in question is supposed to run on a mobile device (so basically every codec nowadays).

Chris
10
General Audio / do you use Volume² ?
Last post by francesco -
Hi
seen I use mostly windows , and windows 10 or 7 or even w 11 have an ugly volume osd ,
reported to be 100% portable , so it does write in the registry or files
would like to know if i can increase and switch between wsapi / asio or primary sound driver

may I you if do you use Volume²  ?
and what do you think about it
thanks