Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: lossyWAV Development (Read 561505 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

lossyWAV Development

Reply #200
.. For my personal use, I would disable the lossyFLAC gain adjustment entirely. Instead, I'd run a ReplayGain album analysis, and apply only the negative ones, before using lossyFLAC. ...

How do you do that exactly:
- ReplayGain using foobar as a 1 step procedure with encoding?
- 16 or 24 bit? dither or not dither?
- How do you make sure only negative replaygaining is applied? Manual control?

Especially the answer to the 16/24 bit dither/not dither question is relevant to me as the answer applies to resampling as well.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #201
Attached v0.2.2. Superseded.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #202
Hi Nick,
Thanks for your hard work.

I think it's good to have paramters of the mechanism adjustable in this pre-release state.
However it's hard as at least I have no idea about what some of the parameters are really doing.

For getting the behavior of 0.2.0 -s: is a setting of -skew 9.0 sufficient in order to get exactly the same behavior?
How exactly is the spreading function varying when using -vsfl? From history I guess the variation is with fft length. But how exactly, and what should we expect from it?
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #203
I found a bug in the -v parameter - it was picking the wrong spreading_function_length. I will post v0.2.2 tonight. For the moment, and by special request : v0.2.0 for Bryant.

Thanks Nick, I got it.

And it looks like I wasn't the only one who wanted it! 

David

lossyWAV Development

Reply #204
Just out of curiosity, will bit-reduction cause ALAC compress any better. I just bought an iPod    and the idea of lossy flac is great but now I'm stuck without my trusty FLAC format if I want lossless music (and rockbox doesn't really appeal to me that much).

Thanks,
Bobby

lossyWAV Development

Reply #205
For getting the behavior of 0.2.0 -s: is a setting of -skew 9.0 sufficient in order to get exactly the same behavior?
How exactly is the spreading function varying when using -vsfl? From history I guess the variation is with fft length. But how exactly, and what should we expect from it?


A skew of 9.0 is the same as v0.2.0;

In previous versions, spreading function length was the same for each fft_length, regardless of the length of the fft, i.e. 4 for 64 and 4 for 1024 - taking into account the bin frequency widths, see bins.txt, this seemed to be a bit unbalanced, so if -vsfl is selected the spreading function length vs fft_length is as follows: 2/64; 3/128; 4/256; 5/512; 6/1024. If it is felt that more than 4-bin averaging is excessive, this can be changed at the next revision.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #206
For a promising way to continue testing (and keeping in mind that v0.2.0 -s yielded excellent results) it is important to know what we're testing.
Can you please confirm or correct the following statements:

a) just to make sure the basis:
    v0.2.2 -skew 9.0 (no other options) yields exactly the same results as v0.2.0 -s (no other options).
    Especially the v0.2.2 noise threshold default is exactly equal to that of v0.2.0?

b) to try out the fft_length dependent bin averaging the following options are useful to test
    v0.2.2 -vsfl -skew x -nts y
    with x<=9.0 (for instance x=6.0) and y>=-3.0 (for instance y=-1.0).

And as 2Bdecided said -nfc should be used for abxing to make sure no loudness difference is abxed.

What exactly does the weighted spreading function option -wsf do?
lame3995o -Q1.7 --lowpass 17

 

lossyWAV Development

Reply #207
For a promising way to continue testing (and keeping in mind that v0.2.0 -s yielded excellent results) it is important to know what we're testing.
Can you please confirm or correct the following statements:

a) just to make sure the basis:
    v0.2.2 -skew 9.0 (no other options) yields exactly the same results as v0.2.0 -s (no other options).
    Especially the v0.2.2 noise threshold default is exactly equal to that of v0.2.0?

b) to try out the fft_length dependent bin averaging the following options are useful to test
    v0.2.2 -vsfl -skew x -nts y
    with x<=9.0 (for instance x=6.0) and y>=-3.0 (for instance y=-1.0).

And as 2Bdecided said -nfc should be used for abxing to make sure no loudness difference is abxed.

What exactly does the weighted spreading function option -wsf do?
a) Should be almost exactly the same (although the skew in v0.2.0 was 9.0309, i.e. 1.5 x 20 x log(2)).
b) Sounds good.
-nfc is alright - unless the sample clips under bit-reduction. There is no clipping prevention at all when -nfc is used.

-wsf creates spreading functions as follows: [1]; [2/3,1/3]; [3/6,2/6,1/6]; [4/10,3/10,2/10,1/10]; [5/15,4/15,3/15,2/15,1/15]; [6/21,5/21,4/21,3/21,2/21,1/21];
rather than [1]; [1/2,1/2]; [1/3,1/3,1/3]; [1/4,1/4,1/4,1/4] etc. The weighted spreading function tends to 75% below midlength, 25% above midlength as length tends to infinity. I am developing with a variant of this which tends to 7/12 below midlength, 5/12 above midlength.

@2Bdecided: I noticed on running the Matlab script with the same parameters several times in a row on the same input file that the average bits_to_remove value changes....? I can't pin down the cause, does it do that for you?
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #208

.. For my personal use, I would disable the lossyFLAC gain adjustment entirely. Instead, I'd run a ReplayGain album analysis, and apply only the negative ones, before using lossyFLAC. ...

How do you do that exactly
Hypothetically! Though I've tested it manually.
Quote
- ReplayGain using foobar as a 1 step procedure with encoding?
- 16 or 24 bit? dither or not dither?
- How do you make sure only negative replaygaining is applied? Manual control?

Especially the answer to the 16/24 bit dither/not dither question is relevant to me as the answer applies to resampling as well.
It depends on your use. If you're going to gain the files before lossyFLAC, the output should ideally be 24-bit no dither, but adding dither make little difference (less efficient on digital silence!). Nick's portable won't play 24-bit, so 16-bit no dither. I know you _should_ dither, but it reduces efficiency and adds hiss. Without it, you can in theory introduce distortion. Pick your poison.

Whether you should enabled dither within lossyFLAC is a different question. I have an artificial sample where it's required to avoid quite nasty noise pumping artefacts, but for efficiency I'm normally testing with no dither. The only place I think I've heard a difference is annoyinglyloudsong, but there it's not artefacting - it sounds louder to me without dither, that's all. I should ABX it because I'm probably talking rubbish.

Cheers,
David.

lossyWAV Development

Reply #209
... Whether you should enabled dither within lossyFLAC is a different question. I have an artificial sample where it's required to avoid quite nasty noise pumping artefacts, but for efficiency I'm normally testing with no dither. ...

Can you give this artificial sample please?
So you say dithering within lossyWav isn't necessarily the way to go.

Nick, can you provide a switch please to disable dithering? Or in your opinion is there a strong reason for dithering?
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #210
Nick, can you provide a switch please to disable dithering? Or in your opinion is there a strong reason for dithering?
Don't ask me difficult questions! I'm just the programmer!!!

I had already decided to re-implement the dither_choice option as a -dither parameter (0=none, 1=rectangular, 2=triangular).

Also, I feel that opinion regarding clipping_reduction indicates that the default option (0) should be none, with 1 = fixed reduction taking into account dither amplitude (if any) and 2 = my 2-pass conditional (but consistent across the file) clipping reduction.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #211

... Whether you should enabled dither within lossyFLAC is a different question. I have an artificial sample where it's required to avoid quite nasty noise pumping artefacts, but for efficiency I'm normally testing with no dither. ...

Can you give this artificial sample please?
Attached

Quote
So you say dithering within lossyWav isn't necessarily the way to go.
Probably not. You might doubt this when you hear this sample though! Remember I know exactly how lossyFLAC works, and therefore I know exactly how to break it. This sample sounds like it's just white noise, but it isn't, as you'll see if you look at the waveform (and more precisely, the sliding paired sample values) in a wave editor.

There's still an issue about rounding/truncating/clipping/dithering. They're all tied together. What's in lossyFLAC6 works well enough, but I think it could be tweaked slightly. It's not a priority.

Cheers,
David.

lossyWAV Development

Reply #212
@2Bdecided: I noticed on running the Matlab script with the same parameters several times in a row on the same input file that the average bits_to_remove value changes....? I can't pin down the cause, does it do that for you?
Are you re-generating the noise threshold reference table each time? If so, yes. If not, no.

With dither off and a fixed noise threshold table, the output should be identical every time. It's a deterministic process: a computer program where you aren't changing anything!

With dither on, the output will be different every time, but the bits removed should be identical. The FLAC bitrate may vary due to the dither.

I'm still running lossyFLAC6. It's not changed much since I uploaded it on 4th July, but I'll upload it again anyway. (Attached).

Please don't be disappointed it doesn't have any of your improvements. It doesn't have any of my improvements either!

Cheers,
David.

lossyWAV Development

Reply #213
Using v0.2.2 I tried Atem-lied:

a) -skew 9.0 -nfc : I abxed it 9/10, so I wonder if this really is the same procedure as with v0.2.0 -s.

b) -skew 6.0 -nfc -vsfl : sounds okay to me.

I missed a bit the debug mode I got used to.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #214
Using v0.2.2 I tried Atem-lied:

a) -skew 9.0 -nfc : I abxed it 9/10, so I wonder if this really is the same procedure as with v0.2.0 -s.

b) -skew 6.0 -nfc -vsfl : sounds okay to me.

I missed a bit the debug mode I got used to.


-debug is now -detail.

I'm re-writing the part of the code which actually does the analyses. I think that there was a difference in the way that David and I determined the bounds of each individual fft analysis - I'm working to correct that now.

The only thing to make sure of at a) would be to add -nts -3.0103 to the command line to see if that makes any difference.

Sorry about the inconsistency.

[edit] I've rewritten the analysis code and the output very closely matches that of the Matlab script (thanks David for the latest version!) I've got a bit more checking to do, but I think that v0.3.0 will be released tomorrow night. [/edit]
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #215
Attached summary spreadsheet of my 50 sample set, processed using soon to be released lossyWAV alpha v0.3.0 (-3 -dither 0 -clipping 0 -nts 0), against Matlab LossyFLACv6_revised script with same settings, 1024 sample codec_block_size. Matlab script used 5000 iterations to calculate reference_threshold values.

As can be seen, although not identical, 161 extra bits removed over 29799 codec blocks is not too bad a comparison. It can (and hopefully will) be improved upon.
Superseded.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #216
.. As can be seen, although not identical, 161 extra bits removed over 29799 codec blocks is not too bad a comparison. It can (and hopefully will) be improved upon. ...

Very promising indeed.

In theory however chance is the sum of the removed bits is more or less the same but bit removal is different at different spots.
Or is it a block by block comparison adding the deviations per block?
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #217
.. As can be seen, although not identical, 161 extra bits removed over 29799 codec blocks is not too bad a comparison. It can (and hopefully will) be improved upon. ...
Very promising indeed.

In theory however chance is the sum of the removed bits is more or less the same but bit removal is different at different spots.
Or is it a block by block comparison adding the deviations per block?

I knew someone would ask that question - the number quoted is for overall change, I will get going with block by block differences (+ve and -ve) and post:

644 bits extra removed, 483 less bits removed, 161 extra bits removed overall - amended results attached.

Will use the same reference_threshold creation technique in Matlab and re-process the Matlab results - to see if there is something more fundamental wrong rather than just noise_analysis results[/s]. Superseded.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #218
Thanks for your work.

So this 1127 BTR difference gives quite a different picture than the just 161 BTR summed difference.

From your remarks it looks like the BTR mechanism of the current Delphi version is different from that of the MATLAB version. I think it would be good before starting tweaking to have exactly the same mechanism and result of the Delphi version as is offered by the original version. If the versions do differ it makes things worse when starting to do changes with the MATLAB version.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #219
Thanks for your work.

So this 1127 BTR difference gives quite a different picture than the just 161 BTR summed difference.

From your remarks it looks like the BTR mechanism of the current Delphi version is different from that of the MATLAB version. I think it would be good before starting tweaking to have exactly the same mechanism and result of the Delphi version as is offered by the original version. If the versions do differ it makes things worse when starting to do changes with the MATLAB version.


The starting point for the investigation has to be to remove the random element, namely the calculation of the reference_threshold values used to determine the threshold_index arrays (one for each fft analysis). The Matlab version initially used 1000 iterations x (64, 1024, 256) sample fft lengths, i.e. a max of 1024000 sample x iteration. The delphi version uses constants based on a constant 2^25 count, i.e.32MB sample x iteration, i.e. 1048576 iterations at 32 sample fft; 524288 at 64; ....; 32768 at 1024 sample fft.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #220
What about the predefined values?
Or values stored in a file (IIRC the MATLAB script can make use of that)?

There's no need for perfection for these values in the first step - all that counts now is an identical basis for the MATLAB and DELPHI version as you said.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #221
What about the predefined values?
Or values stored in a file (IIRC the MATLAB script can make use of that)?

There's no need for perfection for these values in the first step - all that counts now is an identical basis for the MATLAB and DELPHI version as you said.


I have plugged the pre-calculated Delphi values into Matlab and the results from the Matlab script do not seem to have changed - although I can confirm that the reference_threshold values are identical.

I am in the process of a side-by-side block-by-block, sub-block-by-sub-block comparison of the fft analysis results - thankfully keys_1644ds.wav is quite short!

I've identified the error, if not immediately the solution - from the second codec block onwards, the result of the fft analysis of the first sub-block (and only the first sub-block), for each fft_length, for each channel, differs between Matlab and lossyWAV. All the rest are giving identical results to the Matlab script.

Oh well, back to debugging.......

And, I think that I've found the problem..... The audio data is not consistent between Matlab and lossyWAV - I looked at the fft outputs, then at the window_function'ed inputs, then at the bare audio data - there appear to be some discrepencies between the raw audio data, +/- 1 that I have found so far.

@David: I changed over from wavread/write to my previous wavreadraw/writeraw, removed the multiplication / division of inaudio > inaudio_int > outaudio and removed the inaudible addition, in favour of a 20*log10(max(1,min(conv......))).
This has improved the results somewhat, but it's too late to do the comparison and post it.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #222
Okay, so having narrowed down the difference to the second codec block onwards, first sub-block analysis only, and bearing in mind that the end-overlap is fft_length/2 and fft_overlap is fft_length/2 - the answer struck me (early) this morning.....

The Matlab script is removing the bits block by block just after the codec-block is analysed, and before the next codec-block is analysed - thus contaminating the audio data in the pre-block-start overlap of the next analysis block.

This hasn't yet been tested, but it seems *too* likely.

Now that I am happy with the Delphi code, please find attached lossyWAV alpha v0.3.0 Superseded.

@David:
In the script (replicated in the Delphi code) when the minimum min_bin value is calculated for an fft analysis the result is *rounded*, i.e. can be increased by up to 0.5 when looking up the threshold_index table to determine the bits_to_remove. Would it not be better to *floor* this value as it would reduce the likelihood of increasing noise above the minimum determined value?

Modified version of your LossyFLAC6_revised attached including the pre-calculated constants to re-create the reference_threshold values. Superseded.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #223
... The Matlab script is removing the bits block by block just after the codec-block is analysed, and before the next codec-block is analysed - thus contaminating the audio data in the pre-block-start overlap of the next analysis block.

This hasn't yet been tested, but it seems *too* likely. ...

Great work, Nick. So with respect to this your Delphi code is supposed to be better than the MATLAB script.

So it's worth doing intensive listening tests now.
Unfortunately (in this respect) I'm leaving for holidays tomorrow (will be back on Oct 7) and have to prepare a lot for it this evening (most of all find a B&B for the first nights which turned out to be a problem - Lake District, Cumbria, seems to be very popular these days [guess not only these days]).

Anyway I will try at least to do a short test this evening.

But it is most welcome if more members could contribute in testing.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #224
The Matlab script is removing the bits block by block just after the codec-block is analysed, and before the next codec-block is analysed - thus contaminating the audio data in the pre-block-start overlap of the next analysis block.
Nick, you're a genius - that's exactly what's happening.

The two parts used to be separate (analysis loop then rounding loop) - when I put them together I didn't spot this. Interesting that the effect was so little ("644 bits extra removed, 483 less bits removed, 161 extra bits removed overall over 29799 codec blocks") – it shows how benign the added noise is. It bodes well for this being multi-generation proof and transcode-proof with a noise threshold shift of -6 or -12 dB.


Quote
In the script (replicated in the Delphi code) when the minimum min_bin value is calculated for an fft analysis the result is *rounded*, i.e. can be increased by up to 0.5 when looking up the threshold_index table to determine the bits_to_remove. Would it not be better to *floor* this value as it would reduce the likelihood of increasing noise above the minimum determined value?
Yes, probably. If you're going to do that, you should also move the threshold shift down from where it is, into that calculation, otherwise the additional accuracy is pretty meaningless.


Quote
Modified version of your LossyFLAC6_revised attached including the pre-calculated constants to re-create the reference_threshold values.
Thank you. Just for clarity: this includes those thresholds, but hasn't fixed the "contamination" bug?

Cheers,
David.