Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Recent Posts
1
FLAC / Re: FLAC v1.4.x Performance Tests
Last post by Porcus -
8192 beats 16384 size-weighted in my tests on upsampling those 38 CDs, but it varies quite a lot. Medians could very well tell a different story. I messed up something and it is still running, but I can report on those two at least.

Using -A "subdivide_tukey(3);blackman" on everything, then
** In overall size:
At 96 kHz, -b8192 beats -b16384, and -eb8192 beats -eb16384.
At 192 kHz, same happens.
At 384 kHz, -b8192 beats -b16384 by around 0.12 percent, but -eb16384 beats -eb8192 by around 0.18 percent.

192 kHz, let's look into that further: No -e here.
* Classical music benefits from larger blocksize -b16384, 12 albums to 2; all except harpsichord and (near-zero) Cage's percussion works. Total impact 0.32 percent (not percentage points!), varying from -0.15 (harpsichord) to 0.63 percent (Bruckner, vocals)
Median impact = 0.37 = median absolute value impact.
But then the rest:
* The heavier music: -b8192 wins by 7 albums against 3, switching sign on impact to signify that:
Total impact -0.14 percent, varying from -0.71 (Laibach, biggest benefit for -b8192) to 0.24 percent (Gojira, that benefits from 16384).
Median impact = -0.24. Remove the sign for median absolute value impact.
* The others. -b8192 wins by 9 albums against 5
Total impact -0.28 percent, max benefit from -b8192 is -1.31 percent (Wovenhand, in this release that is singer/songwriter) and then -0.99 (Sopor Aeternus, that is something completely different: darkwave) - and on the other end, benefiting most from larger blocksizes are the jazz albums: 0.41 percent for both Davis and Johansson. Those were near-mono before dithering I think.
Median impact = -0.32 percent. Median absolute impact: 0.38.


For those who did not follow the previous discussions, I am talking about optimizations for >=4x upsampled data, so the parameters listed above are not suitable for encoding "real" hi-res files.

... but who knows how many hi-res files are "real".
3
General Audio / Re: AI language models can exceed PNG and FLAC in lossless compression, says study
Last post by C.R.Helmrich -
Alright then, in that case I'll assume they did not use dither, just simple rounding. Then, many speech pauses will turn into digital silence, making FLAC compression to ≈30% of the WAV size possible, I guess.

And the fact that LZMA2 compresses a bit better than FLAC can probably be explained by the frame-wise header overhead in FLAC, which turns significant with file sizes as small as those which you reported.

Chris

P.S.: And compared with the input media (audio, color images), none of what they do is actually lossless then. Still wondering about the 16.4%, though.
4
General Audio / Re: AI language models can exceed PNG and FLAC in lossless compression, says study
Last post by Kamedo2 -
This is the test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0009.flac file, converted to 8bit using sox. 76,195 bytes.

"2414-128292-0009 FOR WHEN ZARATHUSTRA SCRUTINISED HIM WITH HIS GLANCE HE WAS FRIGHTENED AS BY A SUDDEN APPARITION SO SLENDER SWARTHY HOLLOW AND WORN OUT DID THIS FOLLOWER APPEAR"

Original, distributed version:
Quote
General
Complete name                  : C:\~~~~~~~\test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0009.flac
Format                         : FLAC
Format/Info                    : Free Lossless Audio Codec
File size                      : 235 KiB
Duration                       : 13 s 125 ms
Overall bit rate mode          : Variable
Overall bit rate               : 146 kb/s

Audio
Format                         : FLAC
Format/Info                    : Free Lossless Audio Codec
Duration                       : 13 s 125 ms
Bit rate mode                  : Variable
Bit rate                       : 141 kb/s
Channel(s)                     : 1 channel
Channel layout                 : C
Sampling rate                  : 16.0 kHz
Bit depth                      : 16 bits
Compression mode               : Lossless
Stream size                    : 227 KiB (97%)
Writing library                : libFLAC 1.2.1 (UTC 2007-09-17)
MD5 of the unencoded content   : C6C1AF5F80BB643A4172F406186017E7

16 bit original turned into 8 bit by sox.
Quote
General
Complete name                  : C:\~~~~\librispeech-8bit.flac
Format                         : FLAC
Format/Info                    : Free Lossless Audio Codec
File size                      : 74.4 KiB
Duration                       : 13 s 125 ms
Overall bit rate mode          : Variable
Overall bit rate               : 46.4 kb/s

Audio
Format                         : FLAC
Format/Info                    : Free Lossless Audio Codec
Duration                       : 13 s 125 ms
Bit rate mode                  : Variable
Bit rate                       : 41.4 kb/s
Channel(s)                     : 1 channel
Channel layout                 : C
Sampling rate                  : 16.0 kHz
Bit depth                      : 8 bits
Compression mode               : Lossless
Stream size                    : 66.3 KiB (89%)
Writing library                : libFLAC 1.4.3 (UTC 2023-06-23)
MD5 of the unencoded content   : 17D503AC844BB657FC6BE24F755B9BE7
5
General Audio / Re: AI language models can exceed PNG and FLAC in lossless compression, says study
Last post by Kamedo2 -
I'm not familiar with that Sox command-line, so let me ask: did you just recreate the WAV header such that one 16-bit sample is interpreted as two 8-bit samples, i.e., the high-byte as one sample and the low-byte as another sample?
It is a normal bit-depth reduction command. It is not a command to split one file into two. As expected from being 8bit, it has lower S/N. It is wholly dithered.                                                                                                                                                                                                                                                                                                                            

How can this be better compressible by FLAC than a meaningful unscrambled 16-bit waveform? Have you listened to the resulting files or looked at them in an audio editor? Do they contain long passages of digital silence?

The MSB is easier than the LSB to predict. The dither is audible throughout the tracks. No section of digital silence, the whole tracks are dithered, even when the space between speech.
6
FLAC / Re: FLAC v1.4.x Performance Tests
Last post by bennetng -
So I tried it. The total duration of the two attached playlists is 41h 39m 06s. The playlists contain single tracks and images, so not every single track name is shown.

-8
38.1 GB (41002782313 bytes)

-8b8192 -A "subdivide_tukey(3);blackman"
37.7 GB (40507157807 bytes)

5 files out of 306 still clipped with -2dB gain. Yes, if one keeps throwing Merzbow to the chain that requires like -10dB it would harm the stats, so don't do that. Clipping should be controlled to as low as possible, or not at all (e.g. by using RG max non-clip gain).

For simplicity and speed, I did all the conversions within foobar and foo_dsp_resampler.
[attach type=image]27188[/attach]
[attach type=image]27190[/attach]
Same corpus with same settings transcoded from pre-upsampled flac files (16/44 -> 24/192 with -2dB preamp), using foobar multi-file multi-thread, flac 1.4.3. Decoding times are single-threaded using foobar x64 benchmark, all tested on NVMe SSD, i3-12100.

-8p
Total encoding time: 1:49:49.297, 22.75x realtime
40794570053 bytes

-8b8192 -A "subdivide_tukey(3);gauss(22e-2)"
Total encoding time: 19:30.984, 128.05x realtime
40502851907 bytes

-8b8192 -A "subdivide_tukey(3);blackman;gauss(22e-2)"
Total encoding time: 21:16.360, 117.47x realtime
40419431443 bytes

-8b9216 -A "subdivide_tukey(3);blackman;gauss(22e-2)"
Total encoding time: 21:18.485, 117.28x realtime
40406946503 bytes

-8b16384 -A "subdivide_tukey(3);blackman;gauss(22e-2)"
Total encoding time: 21:33.359, 115.93x realtime
40404848546 bytes
Decoding: 240.030x realtime

-8e
Total encoding time: 1:07:58.406, 36.76x realtime
40330004928 bytes
Decoding: 241.866x realtime

-8b8192 -A "subdivide_tukey(4);blackman;gauss(22e-2)"
Total encoding time: 30:46.781, 81.19x realtime
40053310660 bytes
Decoding: 240.803x realtime

From the data above I think one can deduce how slow -8pe on this CPU would be, so I am not going to test it.

gausswork
Gauss, like other windows, works best when coupled with an optimal blocksize. Also, the gauss parameter needed to be calculated for best performance. For example, the target 192kHz sample rate has a Nyquist of 96kHz, if the resampler is lowpassed at about 23kHz like the red plot below, then the value should be 23/96, so it would be gauss(24e-2), the green and yellow plots would be 23e-2 and 25e-2.
X

Blackman on the other hand describes a faster than expected overall decay trend at upper spectrum without doing anything very specific, so can catch more different filter shapes.

If the hardware does not support higher blocksizes, slightly increasing -l and -r would help a lot, like -l14 to -l16 and -r7. The increased decoding complexity should not be a big deal especially for mains-powered devices.

-b16384 can still cause inflation at 192kHz depending on materials, use with caution.

For those who did not follow the previous discussions, I am talking about optimizations for >=4x upsampled data, so the parameters listed above are not suitable for encoding "real" hi-res files.
7
General Audio / Re: AI language models can exceed PNG and FLAC in lossless compression, says study
Last post by C.R.Helmrich -
I'm not familiar with that Sox command-line, so let me ask: did you just recreate the WAV header such that one 16-bit sample is interpreted as two 8-bit samples, i.e., the high-byte as one sample and the low-byte as another sample? How can this be better compressible by FLAC than a meaningful unscrambled 16-bit waveform? Have you listened to the resulting files or looked at them in an audio editor? Do they contain long passages of digital silence?

Here come the times when for encoding you shouldn't consider only the size of encoded files, but executable and dictionary/model size too. Computing power and memory bandwidth requirements rose significantly too.
Exactly! That's common practice in both MPEG and 3GPP codec standardizations, btw, especially when the codec in question is supposed to run on a mobile device (so basically every codec nowadays).

Chris
8
General Audio / do you use Volume² ?
Last post by francesco -
Hi
seen I use mostly windows , and windows 10 or 7 or even w 11 have an ugly volume osd ,
reported to be 100% portable , so it does write in the registry or files
would like to know if i can increase and switch between wsapi / asio or primary sound driver

may I you if do you use Volume²  ?
and what do you think about it
thanks
9
Scientific Discussion / What is the pre-amp input window range for dynamic microphone voltages
Last post by Maxotics -
First, thanks for your time!
I'd like to stay away from decibel scale if at all possible; that is, stick to voltages.  Thanks!
Also, I'd like to ignore impedance
Also, I'd like to ignore frequency response, noise, etc.
All I want to focus on is the voltage (pressure) at any given point of time, let's say 48kHz sampling
(let me know if you think the above impossible or just plain dumb).

A dynamic microphone outputs, let's say, a variable voltage between -x and y.  I still can't figure out, in all my research, what this range is  -2mV to 2mV?  -1mV to 1mv, 0.1...  I believe the number is SMALL! I understand impedance is involved, etc., but I want to simplify as much as possible to answer this question as broadly as possible (if possible  :D  )  Perhaps if you have a specific mic in mind that would be helpful.  Anyway...

I assume when I raise or lower the analog gain on my ADC I cannot amplify EVERY voltage (change) input from the microphone equally

Because of all that stuff above we can't talk about  :'( , I must pick a range to amplify it.  The min and max I effectively chose will determine how much noise or clipping I might experience. 

Any answers would be helpful.  Again, all this general.  Do most pre-amps take a window of 50% of the voltages, 60%, 90%.  That is, if we have the gain set at zero, I would assume we are taking X% voltages from the microphone's maximum output voltage (or close to it) down to 0.  If we're at max gain than we're starting as close to 0 and working our way up.

(If we do use dbs, and the mic has 60db of sensitivity and our Audio Interface has 120db dynamic range--wait, how does that make sense?  We shouldn't need a gain knob at all?)

Or let me put question another way, how much of a dynamic microphones voltage can an ADC accurately amplify at any given time? 

Sorry for the confusing question. 

 






10
General Audio / Re: AI language models can exceed PNG and FLAC in lossless compression, says study
Last post by Kamedo2 -
LibriSpeech original samples librispeech-8bit.wav bytes librispeech-8bit.flac bytes FLAC ratioLZMA2 bytesLZMA2 ratio
test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0006.flac104,76642,52140.6%36,96835.3%
test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0007.flac69,16431,83546.0%24,99936.1%
test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0008.flac93,64437,89340.5%31,74233.9%
test-other.tar\test-other\LibriSpeech\test-other\2414\128292\2414-128292-0009.flac210,04476,19536.3%71,19233.9%
Code: [Select]
flac-1.4.3-x64-AVX2\flac.exe -d -f --force-legacy-wave-format in.original.flac -o flac-decode-16bit.wav
SoX-14.4.2-20230624-x64\sox.exe flac-decode-16bit.wav -r 16000 -c 1 -b 8 librispeech-8bit.wav
flac-1.4.3-x64-AVX2\flac.exe -f librispeech-8bit.wav -o librispeech-8bit.flac
7-Zip\7z.exe a librispeech-8bit.7z librispeech-8bit.wav

It's close, and LZMA2 is certainly smaller than the FLAC in this particular setting.