Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: lossyWAV Development (Read 561262 times) previous topic - next topic
0 Members and 2 Guests are viewing this topic.

lossyWAV Development

Reply #175
This is an interesting thread that I keep getting back to between being busy and I wish you all well. This sounds promising. It's likely that predictor-based lossy coders (like Wavpack lossy, as I'm sure Bryant is thinking) could use the same sort of analysis for a safe VBR lossy mode with the additional advantage of setting the amount of permissible prediction error to match the noise-floor relevant to that instant regardless of the bit-depth and block length in use.

It's interesting to see how one must look for the true noise floor and ignore the small chance troughs in the power spectrum that tend to vanish if we shift the transform window slightly, as 2Bdecided pointed out, and it's great to see the problem-solving at work, such as Bryant's recognition of the unusually low frequency of the quietest frequency bin during the atem-lied problem moments.

It occurs to me that there might be ways to optimise the computation of multiple overlapping FFTs (or any roughly equivalent transform) to attempt to set the noise floor more accurately without rogue troughs, though I can't get past the fact that one has to pre-multiply the samples in each analysis segment by a windowing function centred on that segment, thus making it difficult to efficiently re-use the results from any part of the analysis in calculating an overlapping FFT without compromising the smoothness of the windowing function, so I guess the averaging solution, while more temporally spread out/needing skewing adjustments to make it unABXable for atem-lied, is the most computationally-viable option.

There's probably little to be gained, but I presume it's rare that anything above, say 18 kHz is the lowest power bin in the power spectrum, but presumably it's pretty safe to ignore any bins above 18 kHz if this analysis should happen to yield a higher noise floor. It should be safe enough given that people have such difficulty ABXing music lowpassed at 18 kHz. And of course LossyFLAC preprocessing wouldn't actually lowpass anything in this scenario, it would just be capable of ignoring the noisefloor in any frequencies above 16, 17 or 18 kHz for example in its calculation of the noisefloor for the whole block, while still passing all frequencies unaltered except for the bit-depth and hence the exact pattern of the noise, which one could ABX as inaudible.

Anyway, loving your work, guys. I'm not averse to pre-scaling (and dithering) my audio with Album Gain, which saves many percent in lossless for anything over-loudly mastered and considering it an excellent source for encoding into lossy (which I tend to do with Album Gain pre-applied or supplied via a --scale switch where convenient in any case), and I'd equally consider a safe lossy mode based on robust noise-floor calculations like this and no other psychoacoustics so be an excellent storage medium for sound reproduction, including heavy EQ, processing and the like, and of course, as a pretty-darned robust source for transcoding to conventional lossy formats or indeed something like resampled-to-32 kHz wavpack lossy.
Dynamic – the artist formerly known as DickD

lossyWAV Development

Reply #176
Thanks Dynamic, the appreciation is appreciated!

Attached is v0.2.0 : - Superseded by v0.2.1.

Revised skewing function - skews below 3.7kHz (gleaned from ReplayGain technical data - equal loudness curves) by up to 9dB (0dB at 3.7kHz, -9dB at 20Hz);

Tidied up code, revised quality -1.

As said previously, when testing Guru's sample set there is only a difference of 50kB in 101.12MB between skewed and non-skewed (skewed bigger, as hoped). On my 50 sample set skewing increases the size of the FLAC'ed set by 203kB in 44.65MB.

I haven't created variants of Atem_lied to upload for ABX, but I'm confident now that if any are created they should be pretty good. Late now, will read bug reports tomorrow....
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #177
I tried Atem-lied with v0.2.0 using -s, -3 -s -t, and -3 -t, and the results were all transparent to me.
From feeling I'd call -3 -t a tiny bit worse than -s.

Good work, Nick.

If I see it correctly lowering the fft bin values as you do with skewing can be used for seemlessly adjusting the noise threshold.
So maybe something like quality -3, -s, additional lowering the fft bin values, and averaging over 3 bins instead of 4 below say 3.5 kHz may save more bits on average while keeping quality at the level attained by -2 -s.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #178
You're right that, at low frequencies, this convolution might be smoothing over important dips. I tested this originally (around 1kHz, not 200Hz) and found it was OK. If you have an extreme dip of about 4 bins or less in width, then it does get partly filled with noise, but not enough to be audible (to me). The only issue was if it was narrow and short - with contrived signals, you can get something that's (just) audibly overlooked, so I included the optional 5ms FFT to catch that.

However, it seems to me in my experiments with the noise shaping version, that SebG is exactly right (and even basic masking models from decades ago support this): you need a greater SNR at lower frequencies than higher frequencies.

Hi David,

It's interesting that these low frequency bands would be an issue here. I'm sure that for conventional codecs it's a non-issue because those bands take so little data to encode accurately it's probably not worth worrying about. But here we're adding white noise, so no frequency gets a free ride...

I haven't gotten too far on my implementation yet, but I was thinking of doing the convolution in both the time and frequency domain, perhaps with a 3x3 kernel, and perhaps not uniformly weighted. This was not based on any experimentation but just because I find it more elegant, although now that I see how wide the bins are at low frequencies I like it even more. I also don't care for the idea of a filter that varies with frequency, but it might be necessary if nothing fixed will work well.

It looks like the LF skew is working well and since most material has a lot of LF energy I can see why it doesn't have a large effect on bitrate for most samples. However, once you start shaping the noise it's going to make a much bigger difference, so it's probably a good idea to get it accurate now.

One thing I really don't care for is the level shift always being on. I think some very low level samples (like Guru's) are only going to be transparent if unmodified. You can easily imagine the case where a sample has over 16 bits effective resolution (at some frequencies) due to noise shaping and this would be destroyed by just about any modification, especially dithering. I don't have a solution, but perhaps there would be a way to have the shift be level dependent? I realize that this would introduce dynamic compression and maybe a little harmonic distortion depending on how it was done, but I suspect both might be okay. Of course, one could argue that very low level samples should be played at very low levels, but I'm not sure everyone will buy this... 

Anyway, it's certainly looking very promising at this point. 

David

lossyWAV Development

Reply #179
@Halb27 : I'm glad that Atem_Lied is better. It's interesting that -s works better than -3 -t, This demonstrates the value of more fft_lengths in the analysis process. From memory, -3 -t -s produces a bigger FLAC file than -s, so this would not be a more attractive option - quicker maybe, but not better.

@Bryant: My conditional clipping reduction in the Matlab script analysed all the blocks, determined all the bits_to_remove and at the same time noted the peak amplitude for each block. The clipping reduction value for the whole file was then calculated taking into account actual bits_to_remove and block peak value for each block and taking the lowest value.

This resulted in much less level reduction (a surprising number of files did not require to be amplitude reduced). However it requires two passes through the blocks - something I was initially unwilling to do, however it's probably not *that* time consuming for the analysis. I will take that on as my next modification to the code.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #180
@Bryant: My conditional clipping reduction in the Matlab script analysed all the blocks, determined all the bits_to_remove and at the same time noted the peak amplitude for each block. The clipping reduction value for the whole file was then calculated taking into account actual bits_to_remove and block peak value for each block and taking the lowest value.

This resulted in much less level reduction (a surprising number of files did not require to be amplitude reduced). However it requires two passes through the blocks - something I was initially unwilling to do, however it's probably not *that* time consuming for the analysis. I will take that on as my next modification to the code.

Well, if I understand you correctly that means that a single wild sample in a file would alter the way the whole file was processed. That's a little weird too, but it might be a reasonable compromise.

I didn't mean to imply with my original post that I thought this was a critical issue, but it is something that might make a lot of samples possible to ABX under the right circumstances. I'm kind of glad I don't need to deal with it for a WavPack version... 

BTW, hats off to you and halb27 for getting this going so quickly!

David

lossyWAV Development

Reply #181
Now that we've reached the point where we can use lossyWav for practical purposes (though a lot of more listening experience is most welcome) I wonder which block size to use.

A blocksize of 576 samples is attractive to use thinking of FLAC performance. However 2Bdecided worried about blocksizes below 1024 samples for the lossyWav procedure.

With a blocksize of 1024 which blocksize should be used with FLAC? If I interpret the FLAC documentation correctly blocksize must be a multiple of 576. Is it wise to use a lossyWav blocksize of 1024 and a FLAC blocksize of 576?

EDIT:
I just tried, and FLAC is working with a blocksize of 1024.

Or should I use a lossyWav and FLAC blocksize of 1152?

Another question: I was quite happy using wavPack lossy with a sample rate of 32 kHz though 32 kHz is a bit too low. I'd like to use a sample frequency of 35 kHz which I can do with my DAP using FLAC.
Can I consider it a safe procedure to a) resample to 35 kHz b) apply lossyWav c) apply FLAC, that is: can I consider the current lossyWav procedure applicable to 35 kHz sampled tracks?
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #182
I tried all my usual samples with v0.2.0 using -s and couldn't abx any difference to the original.

As before I have a suspicion that trumpet isn't totally fine (sec. 0.6 ... 2.6). However I am not the one who can abx it (my best approximation towards a difference was 5/7, and I ended up 5/10).
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #183
... One thing I really don't care for is the level shift always being on. I think some very low level samples (like Guru's) are only going to be transparent if unmodified. ...

May be a 'worth while' consideration on a per block basis may be some help.
In case only say 1 bit is removed in the block (or maybe 2 bits) the block remains untouched, and this can easily be restricted to blocks with an RMS below a certain threshold to address low volume blocks.
In  this case the machinery isn't worth while and has a tendency to give a bad SNR, be it only due to dithering.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #184
With a blocksize of 1024 which blocksize should be used with FLAC? If I interpret the FLAC documentation correctly blocksize must be a multiple of 576. Is it wise to use a lossyWav blocksize of 1024 and a FLAC blocksize of 576?

Very interesting thread. Flac itself is using 4096 default which isn't a multiple 576 either.

Code: [Select]
  -b, --blocksize=#            Specify the blocksize in samples; the default is
                               1152 for -l 0, else 4096; must be one of 192,
                               576, 1152, 2304, 4608, 256, 512, 1024, 2048,
                               4096 (and 8192 or 16384 if the sample rate is
                               >48kHz) for Subset streams.

lossyWAV Development

Reply #185
I wanted to see the behavior of lossyWav together with FLAC for a set of 50 full tracks which is typical of the kind of music I usually love to listen to (pop music and singer/songwriter music).
All values given are according to what foobar says when looking at the properties of the 50 selected songs:

Original ape files (extra high mode): 703 kbps
flac (--best -e) files: 744 kbps
lossyWav (-s), followed by flac (--best -e -b 1024): 507 kbps
lossyWav (-s -c 576), followed by flac (--best -e -b 576): 503 kbps
ssrc_hp (--rate 35000 --twopass --dither 0 --bits 16), lossyWav (-s), followed by flac (--best -e -b 1024): 453 kbps.

So with this kind of music staying with a blocksize of 1024 is fine, and the average bitrate is roughly 500 kbps.
Pre-resampling to 35 kHz saves some filesize though it is a bit disappointing (data flow is ~20% lower, but for the file size it's only ~10%). Quality is fine judging from listening without abxing and taking samples from this 50 track set. However abxing problem samples is required which I haven't done so far.

Remarkable was Simon & Garfunkel's short and calm Bookend Theme: the 1024 sample-block lossy.flac version was 1668 KB in size, which is more than the original ape file size (1535 KB) and only slightly less than the lossless FLAC version (1699 KB).
Using debug mode I saw there wasn't any block with bits removed so it was only the dithering which changed the file. I think samples like this are a good argument not to change a block at all when it's not worth while.
The question is of course: when is it worth while, but I think when 0 or 1 bit is removed it is not,  independently of volume as measured by RMS. I also think with low-volume blocks it's not worth while (and a bit dangerous) in case 2 bits should be removed.
This per block consideration can be dragged also to a total file consideration: if the average number of bits removed is below a threshold of say 1 bit the lossy.wav output should be identical to the wav input.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #186
If I was to implement a two-pass version of lossyWAV then where clipping_reduction=1 (i.e.file left at 100% amplitude) and bits_to_remove=0 then no dither, block output = block input and compression of that block should be identical to the original.

I have been developing a variable spreading_function_length dependent on fft_length, i.e. larger fft_length = larger spreading_function_length, switched on using the -v parameter. The -t parameter is now obsolete.

Enabling variable spreading_function_length also reduces skewing_amplitude from 9dB to 6dB and changes noise_threshold_shift to -1.5.

v0.2.1 attached. Superseded.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

 

lossyWAV Development

Reply #187
v0.2.1 attached.

Hey Nick,

I've noticed that you have been deleting the previous version every time you upload a new one. I understand why you wouldn't want people to use obsolete versions, but halb27 just did a whole bunch of testing with a specific version (v0.2.0) and I wanted to download it for a reference and it was already gone!

Perhaps you could set up a place where previous versions are archived, or maybe just put a note indicating a version is obsolete without actually deleting the dowload link?

Either that, or I'll write a script to download each one as it appears before you can delete it! 

Thanks,
David

lossyWAV Development

Reply #188

v0.2.1 attached.

Hey Nick,

I've noticed that you have been deleting the previous version every time you upload a new one. I understand why you wouldn't want people to use obsolete versions, but halb27 just did a whole bunch of testing with a specific version (v0.2.0) and I wanted to download it for a reference and it was already gone!

Perhaps you could set up a place where previous versions are archived, or maybe just put a note indicating a version is obsolete without actually deleting the dowload link?

Either that, or I'll write a script to download each one as it appears before you can delete it! 

Thanks,
David


Apologies - I'll upload v0.2.0 tonight and in future merely indicate obsolescence rather than remove the file.

Command line parameters are being re-written at the moment to allow more sensible naming of parameters and inclusion of "-nts" to force noise_threshold_shift to a specific value, among others.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #189
... If I was to implement a two-pass version of lossyWAV then where clipping_reduction=1 (i.e.file left at 100% amplitude) and bits_to_remove=0 then no dither, block output = block input and compression of that block should be identical to the original. ...

I welcome very much such a two-pass version but keeping block and/or track output = input is independent of two-pass processing.
'bits_to_remove=0 then no dither' is logical but why do you want to restrict it to the 'bits_to_remove=0' case?
It seems obvious to me that adding noise in the 'bits_to_remove=1' case has a bad advantage/disadvantage relation, and this is especially true for low-volume spots where S/N ratio is bad anyway.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #190
I welcome very much such a two-pass version but keeping block and/or track output = input is independent of two-pass processing.
'bits_to_remove=0 then no dither' is logical but why do you want to restrict it to the 'bits_to_remove=0' case?
It seems obvious to me that adding noise in the 'bits_to_remove=1' case has a bad advantage/disadvantage relation, and this is especially true for low-volume samples where S/N ratio is bad anyway.


If the amplitude-reduced block is not dithered then there is a strong chance of unwanted noise - all blocks are automatically reduced in amplitude to prevent potential clipping based on minimum_bits_to_keep=5, so they are also automatically dithered.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #191
I see.

Maybe an approach like this can help:

- a priori think of the track not having to be reduced in amplitude and use output block = input block wherever it's not worth while resp. where it's dangerous to apply the lossyWav mechanism.
- as soon as you find amplitude reduction has to be done restart the procedure for the whole track using amplitude reduction.

Would be advantageous especially for those tracks which at the moment seem to be the most critical ones for the lossyWav procedure: tracks with low volume spots in them.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #192
Hello,

I have been developing a variable spreading_function_length dependent on fft_length, i.e. larger fft_length = larger spreading_function_length, switched on using the -v parameter. The -t parameter is now obsolete.


Sorry to bother you, but... Is it possible for a lazy man like me that don't want to dive into Matlab and technical stuff to have a LossyWav.txt in the archive that simply (to some extent) explain the parameters ?
I know that this app is in its early stages and clearly developers & golden-ears-gurus oriented but think about the future documentation of this great tool ! 

Have a nice day,


        AiZ

lossyWAV Development

Reply #193
Thanks for the input Guru - where the bits_to_remove is zero, lossyWAV will still dither the samples because there's an automatic anti-clipping amplitude reduction to 95.28% (30.49/32, i.e. 32 -1 (triangular_dither amplitude) -0.5 (normal rounding) -0.01 (Nick.C's error margin)) as the file is processed, so dither is still required.
Hang on a second. You shouldn't be changing the gain (even by 0.42dB) if you're getting people to ABX. This is especially important when there's virtually no other audible different between the files.

With the gain change disabled, you shouldn't dither when no bits are removed. I know it's in my code as an option, but I don't think I enabled it even when I was changing the gain. In theory you should, but in practice I wouldn't.

Hope this helps.

EDIT: Now I've read the rest of the thread...

Remember the gain/declipping is only for efficiency, not sound quality. The "clipping" is only ever by 1LSB, and only happens when lossyFLAC has determined that several LSBs can be removed. In other words, the clipping will only be audible if the lossyFLAC algorithm itself is broken (and if it is, we can all go home anyway). Furthermore, the "clipping" will move the sample value closer to its original value (because it happens when lossyFLAC wants to increase it, but can't).

So I would not "leave the gain adjustment enabled unless there might be sound quality issues". I would "leave the gain adjustment disabled unless there are so many "clipped" samples that it reduces efficiency".

For my personal use, I would disable the lossyFLAC gain adjustment entirely. Instead, I'd run a ReplayGain album analysis, and apply only the negative ones, before using lossyFLAC. I'm guessing lots of people wouldn't like this idea though.

Cheers,
David.

lossyWAV Development

Reply #194
Sorry to bother you, but... Is it possible for a lazy man like me that don't want to dive into Matlab and technical stuff to have a LossyWav.txt in the archive that simply (to some extent) explain the parameters ?
I know that this app is in its early stages and clearly developers & golden-ears-gurus oriented but think about the future documentation of this great tool ! 
AiZ,
If it was down to me, when it was finished, it wouldn't have enough command-line options to need a manual! Just compact/default/overkill modes.

The problem is figuring out what these should be, hence all the current playing around.

If the "final version" still needs all these tweaks, then IMO we've failed!

Cheers,
David.

lossyWAV Development

Reply #195
As David said, it is his intention to limit user choice to the 3 stated quality levels. During the development phase of the command-line version, I have enabled an increasing number of command-line parameters to allow users to "tweak" settings in search of transparency.

I will attempt to more clearly illustrate what each one does in the command-line reference within lossyWAV.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #196
I wanted to see the behavior of lossyWav together with FLAC for a set of 50 full tracks which is typical of the kind of music I usually love to listen to (pop music and singer/songwriter music).
All values given are according to what foobar says when looking at the properties of the 50 selected songs:

Original ape files (extra high mode): 703 kbps
flac (--best -e) files: 744 kbps
lossyWav (-s), followed by flac (--best -e -b 1024): 507 kbps
lossyWav (-s -c 576), followed by flac (--best -e -b 576): 503 kbps

Perhaps a bit late, but I also ran lossyWav 0.20 (using -3 instead of -s though) and FLAC 1.21 on a selection more or less representative of my cd collection (at least in terms of bitrate when compressed with TAK) and got pretty much the same result. Overall 576 produces slightly smaller files, but for classical or generally highly compressible music 1024 is better.

lossyWAV Development

Reply #197
Hi again,

If it was down to me, when it was finished, it wouldn't have enough command-line options to need a manual! Just compact/default/overkill modes.

The problem is figuring out what these should be, hence all the current playing around.

As David said, it is his intention to limit user choice to the 3 stated quality levels. During the development phase of the command-line version, I have enabled an increasing number of command-line parameters to allow users to "tweak" settings in search of transparency.

I will attempt to more clearly illustrate what each one does in the command-line reference within lossyWAV.


In one hand, it's Ok, I get the point. Sure, only one or two parameters are way better for the final release, no need for a manual.
But on the other hand, if someone who discovers the project today decides to help you, it would be fine for him not to search through this (long) post what all these changing parameters mean ; hence, a little doc up-to-date accompanying the executable would be perfect.

I stop here my off-topic posts and thank you for your dedication in better and clever audio.


        AiZ

lossyWAV Development

Reply #198
Atem-lied with v0.2.1 -v -s:

I got at 6/6, but managed to finish it up with 6/10.
Generally speaking I have a tendency to do a bad job with the second half of my guesses.

Looking at the debug results I guess two many blocks have 6 bits removed again in the critical area, and I think that's too much.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #199
I found a bug in the -v parameter - it was picking the wrong spreading_function_length. I will post v0.2.2 tonight. For the moment, and by special request : v0.2.0 for Bryant. Definitely superseded now.....
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)