Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: lossyWAV Development (Read 561349 times) previous topic - next topic
0 Members and 3 Guests are viewing this topic.

lossyWAV Development

Reply #400
Wonderful. Something like this is what I expected.
  It might be worth being less conservative with the -nts parameter, i.e. try -nts 0 for -3 to see what that does to the bitrate. On my "problem" set:

-3 -nts -0.5 -skew 24 -snr 12 > 458.7kbps; (default -3)
-3 -nts -0.5 -skew 18 -snr 12 > 446.1kbps;
-3 -nts -0.5 -skew 18 -snr 18 > 448.3kbps;

-3 -nts 0 -skew 12 -snr 6 > 433.3kbps;
-3 -nts 0 -skew 12 -snr 12 > 433.3kbps;
-3 -nts 0 -skew 12 -snr 18 > 435.8kbps;

-3 -nts 0 -skew 18 -snr 12 > 440.2kbps;
-3 -nts 0 -skew 18 -snr 18 > 442.9kbps.

-3 -nts 0 -skew 24 -snr 12 > 452.9kbps;
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #401
Hi Nick,

I just started examining the behavior of -3 with respect to -skew and -snr.
I only started using -snr cause I think there's something wrong:

Code: [Select]
        | -skew 0 | -skew 12| -skew 24| -skew 36
-snr 0  |390 / 510|390 / 510|390 / 510|390 / 510


These values can't  be identical to my former test, cause I used FLAC -b 1024 then and FLAC -b 512 now. But I wonder what's wrong hear: identical results with various -skew values is not what I expected.
390/510 is a good result IMO, but is expected to be achieved with around -skew 24.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #402
Hi Nick,

I just started examining the behavior of -3 with respect to -skew and -snr.
I only started using -snr cause I think there's something wrong:

Code: [Select]
        | -skew 0 | -skew 12| -skew 24| -skew 36
-snr 0  |390 / 510|390 / 510|390 / 510|390 / 510

These values can't  be identical to my former test, cause I used FLAC -b 1024 then and FLAC -b 512 now. But I wonder what's wrong hear: identical results with various -skew values is not what I expected.
390/510 is a good result IMO, but is expected to be achieved with around -skew 24.
I've run some skew tests on my 52 sample set:

-3 -skew 0 -snr 0 > 433.0kbps;
-3 -skew 6 -snr 0 > 435.3kbps;
-3 -skew 12 -snr 0 > 439.1kbps;
-3 -skew 18 -snr 0 > 446.1kbps;
-3 -skew 24 -snr 0 > 458.7kbps;
-3 -skew 30 -snr 0 > 479.8kbps;
-3 -skew 36 -snr 0 > 511.3kbps.

Is it possible that *none* of your samples have a minimum result below 3.45kHz?
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #403
@Mitch 1 2 - Excellent find! Should extend the userbase of David's method......

Nice to see that WMALSL is working. I gave it a quick run and, with my old version of WMALSL, it looks like best frame size for that codec is 2048. When somebody else can confirm that is the case also with newer versions, we may want to add a dedicated switch, to avoid people using it with frame size 512 or 1024.

By the way @2048 WMALSL performs halfway between TAK and FLAC.

Set F, WMALSL-WMP9, 0.3.18 11236-FFFFF-1246D-FFFFF
Code: [Select]
------- ----- ----- ----- 
|      |  1  |  2  |  3  |
------- ----- ----- -----
|  512 | 434 | 427 | 425 |
| 1024 | 432 | 427 | 425 |
| 2048 | 430 | 424 | 422 |
| 4096 | 460 | 453 | 451 |
------- ----- ----- -----

lossyWAV Development

Reply #404
@Mitch 1 2 - Excellent find! Should extend the userbase of David's method......
Nice to see that WMALSL is working. I gave it a quick run and, with my old version of WMALSL, it looks like best frame size for that codec is 2048. When somebody else can confirm that is the case also with newer versions, we may want to add a dedicated switch, to avoid people using it with frame size 512 or 1024.

By the way @2048 WMALSL performs halfway between TAK and FLAC.

Set F, WMALSL-WMP9, 0.3.18 11236-FFFFF-1246D-FFFFF
Code: [Select]
------- ----- ----- ----- 
|      |  1  |  2  |  3  |
------- ----- ----- -----
|  512 | 434 | 427 | 425 |
| 1024 | 432 | 427 | 425 |
| 2048 | 430 | 424 | 422 |
| 4096 | 460 | 453 | 451 |
------- ----- ----- -----
So, a -wm parameter to set codec_block_size to 2048 for all quality levels for WMALSL?
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #405
Hi Nick,
I just started examining the behavior of -3 with respect to -skew and -snr. ...

I've run some skew tests on my 52 sample set:

-3 -skew 0 -snr 0 > 433.0kbps;
-3 -skew 6 -snr 0 > 435.3kbps;
-3 -skew 12 -snr 0 > 439.1kbps;
-3 -skew 18 -snr 0 > 446.1kbps;
-3 -skew 24 -snr 0 > 458.7kbps;
-3 -skew 30 -snr 0 > 479.8kbps;
-3 -skew 36 -snr 0 > 511.3kbps.

Is it possible that *none* of your samples have a minimum result below 3.45kHz?

My problem sample set should respond at least as heavy as yours on -skew variation (and it did with my v0.3.18 test).
Thanks for your test. I must have done something wrong and will look into it.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #406
So, a -wm parameter to set codec_block_size to 2048 for all quality levels for WMALSL?

Nice we have another candidate for a codec-specific option.

As for the internal codec block size: I think if we're working internally with a blocksize of 1024, there is no problem to use a blocksize of 2048 with a lossless encoder if this is most effective with it in an overall sense. Lossless encoder blocksize should be just a multiple of the internal lossyWav blocksize.

But it brings up the question: what is the meaning of our lossyWav internal blocksize at all?
Taking the big view not looking at internal details we have a two stage process:
Stage 1: Transform the input wav file to an output wav file with the effect of bringing as many LSBs of each sample to zero as long as we can expect this doesn't have an audible impact.
Stage 2: Use a lossless codec on the output of stage 1.

In principle there is no use talking about blocks within stage 1. We can think of the stage 1 process as of a process concerning each sample individually.

We should give advice for blocksize use with the various encoders. Encoders take profit from short blocks as this adapts best to what's done in stage 1. But as encoders are partially not efficient with short blocks (wavPack, WMAlossless) a best general compromise has to be found for each codec. This seems to be not difficult.

I guess thinking of a codec blocksize within stage 1 is mixed up with what it's really up to: FFT windowing.
When getting it clearer we may improve things - maybe as well as with respect to quality as well as with respect to practical usage.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #407
Nice to see that WMALSL is working. I gave it a quick run and, with my old version of WMALSL, it looks like best frame size for that codec is 2048. When somebody else can confirm that is the case also with newer versions, we may want to add a dedicated switch, to avoid people using it with frame size 512 or 1024.

I came to the same conclusion, but I used a hex editor.
lossyFLAC (lossyWAV -q 0; FLAC -b 512 -e)

lossyWAV Development

Reply #408
Just for knowledge about the FFT windowing / codec block details:

Is my imagination about the current way of doing it correct?:
For definiteness let's talk about -3 (codec block size: 512, FFT lengths of 64 and 1024), and for a moment let's ignore the effect of spreading, skewing and using
-snr.

We're looking at a specific 512 sample codec block CB.
FFT analysis of length 1024 is done starting with the first sample of CB.
The analysis result is applied to all the 512 samples of CB.
FFT analysis of length 64 is applied to the 8 consecutive 64-sample-subblocks SB1, ... , SB8 of CB.
In principle we can look at each of the SB1, ..., SB8 seperately and apply the FFT analysis of length 1024 to any of these subblocks currently under investigation: look for the lowest bin in both FFTs and decide about the number of bits to remove based on this minimum bin. In principle this needs to be restricted to only to the 64-sample-subblocks, but we use it as a temporary result, look at all the subblocks of CB, and then - based on the subresults of each subblock decide on the bits to remove for the entire CB.

In principle we can decide on the bits to remove on a 64 sample block basis which corresponds to the short 64 sample FFT.
Sure we mix information that belong to 1024 samples with information that belong to 64 samples which formally is not correct. But if we want to be that correct we also may not use a codec blocksize of 512 with a FFT length of 1024 (or as with -1 a codec blocksize of 1024 with a FFT length of 2048 [which resulted from a probably bad idea of mine - so we should either return back to a blocksize of 2048 or maybe better skip the 2048 FFT]).
Other than that we can improve when thinking of more adequate FFT windows - for instance build several 1024 sample FFT windows (8 in the extreme case) in a way that the 64 sample window under investigation is more or less in the center of a 1024 sample FFT window. Or something more intelligent.

Brings back the idea of overlapping FFT analysis you offered already, Nick, in a specific form.

Anyway, by a considerations like these we seperate FFT analysis considerations from codec block size considerations which should belong to the lossless encoder of stage 2 alone.

Edited: nonsense removed.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #409
Just one more idea:

Though I love the idea of deciding (at least in principle) for each individual sample about the number of bits to remove we can see it a bit more practically:

Overall view:
Our stage 1 process provides blocks of 512 samples, and all samples within this block have the same number of bits removed.
We do it under all circumstances, that is especially for -1, -2, -3.
This way we are free with the stage 2 encoder to use any multiple of 512 as the blocksize, and for our best knowledge so far it's easy to find an appropriate blocksize (for instance 512 for FLAC and TAK, 1024 for wavPack, 2048 for WMAlossless).
Especially bitrate for -2 would still go down a bit with FLAC and TAK and a blocksize of 512.

Detail view for stage 1:
With a 512 sample block we can easily let it consist of several consecutive length-64-FFT and (for -1, -2) length-256-FFT windows.
We can build for each 512 sample block an individual length-1024-FFT in a way that our 512 sample block lies in the middle of the 1024 sample FFT window. (Looking at only the length-1024-FFT windows: these cover the entire track overlappingly).
May be it's good to apply a complex FFT window function for the length-1024-FFT, but I guess the simple approach is good enough.
The length-1024-FFT window contains information from 256 samples in front of and after the block which make up for an inaccuracy. These access sample window parts correspond to ~5.8 msec each - a pretty short period IMO. Moreover in case the shorter FFTs have an independent influence on the number of bits to remove I don't think this is a dangerous inaccuracy. What I mean is: if one of the shorter FFTs yields a very low value bin, and if there's no lower one in the length-1024-FFT, this low value bin from a shorter FFT decides on the number of bits to remove.

But this is the place IMO where we should say goodbye to length-2048-FFTs.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #410
Just one more idea:

Though I love the idea of deciding (at least in principle) for each individual sample about the number of bits to remove we can see it a bit more practically:

Overall view:
Our stage 1 process provides blocks of 512 samples, and all samples within this block have the same number of bits removed.
We do it under all circumstances, that is especially for -1, -2, -3.
This way we are free with the stage 2 encoder to use any multiple of 512 as the blocksize, and for our best knowledge so far it's easy to find an appropriate blocksize (for instance 512 for FLAC and TAK, 1024 for wavPack, 2048 for WMAlossless).

Detail view for stage 1:
With a 512 sample block we can easily let it consist of several consecutive length-64-FFT and length-256-FFT windows.
We can build for each 512 sample block an individual length-1024-FFT in a way that our 512 sample block lies in the middle of the 1024 sample FFT window. (Looking at only the length-1024-FFT windows: these cover the entire track overlappingly).
May be it's good to apply a complex FFT window function for the length-1024-FFT, but I guess the simple approach is good enough.
The length-1024-FFT window contains information from 256 samples in front of and after the block which make up for an inaccuracy. These access sample window parts correspond to ~5.8 msec each - a pretty short period IMO. Moreover in case the shorter FFTs have an independent influence on the number of bits to remove I don't think this is a dangerous inaccuracy. What I mean is: if one of the shorter FFTs yields a very low value bin, and if there's no lower one in the length-1024-FFT, this low value bin from a shorter FFT decides on the number of bits to remove.

But this is the place IMO where we should say goodbye to length-2048-FFTs.
I will happily remove 2048 sample fft's from the analysis. Looking at fft analysis, currently there are separate fft analyses carried out on the data in the current codec block, some of the previous block and some of the next block (assuming we are not analysing the ends of the file). The overlap is fft_length/2 and the spacing of analyses is fft_length/2, so for a 1024 sample codec_block_size, 3 fft analyses are performed: -512 to 511; 0 to 1023 and 512 to 1535. For a 512 sample codec_block_size 2 analyses are performed: -512 to 511; 0 to 1023 (-ve samples counts are in the previous block, +ve sample counts in excess of codec_block_size-1 are in the next block).

So, for a 1024 sample codec_block size there are 3 1024 sample fft analyses carried out; 9 256 sample fft analyses carried out and 33  64 sample fft analyses carried out. Spreading, minimum searching and averaging is carried out on all of them and the smallest derived value used to determine bits_to_remove.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #411
Thanks for clarification. So you do a lot of overlapping analyses.

Looking at this current way you do it I don't see a reason why not use a lossyWav blocksize of 512 throughout. (I'd like to call it lossyWav blocksize cause it's not neccesarily the blocksize of the encoding codec).
In case there should be something not appropriate with this way of doing the analyses it is so as well with a lossyWav block size of 1024.

A lossyWav blocksize of 512 gives way for any appropriate blocksize as a multiple of 512 in the stage 2 encoding process.

What might be wrong with doing the analyses this way?
Hopefully nothing of course, but I'm a bit afraid of the energy that originates from outside of the codec block influencing the analysis for the codec block. The way it's done energy from ~11.5 msec before and after the block make it into the decision making for the block. So a potential min bin may loose its min status due to energy from outside the block.
If that's fine: alright, if it can be problematic statistics is in favor of 1024 sample lossyWav blocks as for each block say the 1024 sample FFTs provide for a 100% access samples being used whereas with 512 sample lossyWav blocks this extends to 200%.
Anyway it should be problem free (at least problem poor) in any case.

That's all about the way it is.
But what would be the disadvantage with the approach of my last post: overlapping only in the case of length-1024-FFTs (with the 512 sample lossyWav block right in the middle), and with consecutive non-overlapping FFT windows for the other FFT lengths.
Would reduce the foreign energy problematic (in case there is one) and would reduce the number of FFTs.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #412
It's inefficient to remove more bits than a given lossless encoder can take advantage of.

So say, for example, you run lossyWAV with 512 and FLAC with 1024.

That means, within any FLAC block, half the samples might have more zeros than FLAC can take advantage of (because the other half have fewer zeros, defining and limiting the number of "wasted_bits" within that FLAC block).

"So what?" you might think. Well, removing more bits equates to adding more noise. And the more noise you add, the less efficient a lossless codec will be (excepting the special case where the "noise" is a string of zeros which it can take advantage of).

So it's possible (and in my very early tests, true) that lossyWAV 512 with FLAC 1024 will give a higher bitrate than lossyWAV 1020 with FLAC 1024 (and, of course, a theoretically lower quality, though hopefully both are transparent).

Cheers,
David.


The way it's done energy from ~11.5 msec before and after the block make it into the decision making for the block. So a potential min bin may loose its min status due to energy from outside the block.
That's intentional. One of the FFT analyses is usually concentrated on the block boundary, which for a 1024-point FFT at 44.1kHz is, as you say, about +/-11.6ms - though the windowing means the effect at the edges is pretty small. The reason for doing this is to catch low energy moments near the block boundary, which could otherwise be completely missed. If you miss them, you add too much noise; worse still, you can put a hefty transition in there as you switch to more bits removed.

More generally, if you don't overlap analysis blocks, then there are moments of the audio that you never check, so you won't know if the noise you're adding is above or below the noise floor during those moments.

Cheers,
David.

lossyWAV Development

Reply #413
It's inefficient to remove more bits than a given lossless encoder can take advantage of.
So say, for example, you run lossyWAV with 512 and FLAC with 1024. ... That means, within any FLAC block, half the samples might have more zeros than FLAC can take advantage of (because the other half have fewer zeros, defining and limiting the number of "wasted_bits" within that FLAC block). ...

Sure if we provide 512 sample blocks with lossyWav we loose efficiency when using an encoder with a blocksize of 1024 in case the encoder works efficiently with a blocksize of 512. The lossyWav512/FLAC1024 isn't attractive and should be replaced by lossyWav512/FLAC512. But encoders like wavPack or WMAlossless prefer larger blocksizes for efficiency so it's about finding the sweet spot combination. So maybe lossyWav512/WMAlossless2048 is the better combination than lossyWav512/WMAlossless512 (not for sure at all). But I can't see a mechanism that makes the lossyWav512/WMAlossless2048 inferior to the lossyWav2048/WMAlossless2048 combination. Sure encoder blocksize should always be an integer multiple of lossyWav blocksize.
The way it's done energy from ~11.5 msec before and after the block make it into the decision making for the block. So a potential min bin may loose its min status due to energy from outside the block.
That's intentional. One of the FFT analyses is usually concentrated on the block boundary, which for a 1024-point FFT at 44.1kHz is, as you say, about +/-11.6ms - though the windowing means the effect at the edges is pretty small. The reason for doing this is to catch low energy moments near the block boundary, which could otherwise be completely missed. If you miss them, you add too much noise; worse still, you can put a hefty transition in there as you switch to more bits removed.

More generally, if you don't overlap analysis blocks, then there are moments of the audio that you never check, so you won't know if the noise you're adding is above or below the noise floor during those moments.

Cheers,
David.

You certainly know more about these things than I do. But with a lossyWav blocksize of 512 the length-1024-FFT which is overlapping covers your fears.
So at least there is no need to do this extensive overlapping with the 1024 FFTs.
The shorter FFTs don't hurt my proposal done the way it is done now.
Moreover there is the problem of unwanted energy from outside the block under investigation having an influence in bits to remove for the current block. With my proposal this influence is lower.
Encoding speed improves (though IMO this is a minor aspect).

So in the end: why not just use only 512 sample blocks in lossyWav and just 1 1024-FFT for each of these 512-lossyWav blocks, with the lossyWav block centered in the FFT window?

BTW is there a windowing function like hanning used? With the overlapping it would be most welcome I think and it would reduce potential negative side effects of the 'foreign' samples. It would also reduce errors resulting from a rectangular window.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #414
BTW is there a windowing function like hanning used? With the overlapping it would be most welcome I think and it would reduce potential negative side effects of the 'foreign' samples. It would also reduce errors resulting from a rectangular window.
The Hanning window is used. I did toy with the idea of the centred analysis previously, but at that time I was more concerned with being able to duplicate exactly the results from David's Matlab script.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #415
The Hanning window is used. I did toy with the idea of the centred analysis previously, but at that time I was more concerned with being able to duplicate exactly the results from David's Matlab script.

Yes, IMO that was the right thing to do then.
But now we're ahead of that, and it's wonderful that we have the same idea.

Edited: Removed the idea of having a smaller overlap area for the 64 and 256 sample FFT. Not a good idea.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #416
lossyWAV alpha v0.3.20 attached: Superseded.

-overlap parameter added to reduce the end_overlap of FFT analyses to 25% FFT_length rather than 50%.[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV alpha v0.3.20 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-1            extreme quality level (-cbs 1024 -nts -3.0 -skew 30 -snr 24)
-2            default quality level (-cbs 1024 -nts -1.5 -skew 24 -snr 18)
-3            compact quality level (-cbs  512 -nts -0.5 -skew 18 -snr 12)

-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists; default=off

Advanced / System Options:

-nts <n>      set noise_threshold_shift to n dB (-18dB<=n<=0dB)
              (reduces overall bits to remove by 1 bit for every 6.0206dB)
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (0dB<=n<=48dB)
-skew <n>     skew fft analysis results by n dB (0db<=n<=48db) in the
              frequency range 20Hz to 3.45kHz
-cbs <n>      set codec block size to n samples (512<=n<=4608, n mod 16=0)
-overlap      enable aggressive fft overlap method; default=off

-spf <3x5chr> manually input the 3 spreading functions as 3 x 5 characters;
              e.g. 44444-44444-44444; Characters must be one of 1 to 9 and
              A to Z (zero excluded).
-clipping     disable clipping prevention by iteration; default=off
-dither       dither output using triangular dither; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #417
OOPs, you're so fast!!

I've read a lot about fft overlapping windows, and I haven't seen anybody doing less than 50% overlapping.
I've just removed this part from my post, and a second later seen you having realized it.

Thanks for your version and sorry for the confusion!

But now that you've done it: let's see what 2Bdecided and other people have to say about it.

Anyway for the 1024 sample FFT I think we should do the 1 FFT center approach - at least as long as we're happy with a 50% overlapping of the other FFTs as this has pretty much the same feasibility background.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #418
OOPs, you're so fast!!

I've read a lot about fft overlapping windows, and I haven't seen anybody doing less than 50% overlapping.
I've just removed this part from my post, and a second later seen you having realized it.

Thanks for your version and sorry for the confusion!

But now that you've done it: let's see what 2Bdecided and other people have to say about it.

Anyway for the 1024 sample FFT I think we should do the 1 FFT center approach - at least as long as we're happy with a 50% overlapping of the other FFTs as this has pretty much the same feasibility background.
Or, what about a fixed proportion of the largest FFT_length as the end_overlap? Say, 256, i.e. 0.25 of the 1024, for *all* analyses?
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #419
Or, what about a fixed proportion of the largest FFT_length as the end_overlap? Say, 256, i.e. 0.25 of the 1024, for *all* analyses?

I guess your concern is the same as mine: for the starting and ending 'overlap' half the FFT_length for the area outside the lossyWav block is a bit much and brings in wrong information to a major extent.
Your approach of 25% seems appropriate to me and corresponds to the 50% overlap between adjacent FFT windows (meaning the most central 50% samples of the FFT windows are considered to take good care of by the hanning windowed FFT analysis).
But why do you want to relate it to the longest FFT? IMO it should be 25% of the current FFT length.
This more general procedure matches perfectly with the 1 FFT center spproach for a lossyWav blocksize of 512 and a 1024 sample FFT.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #420
Or, what about a fixed proportion of the largest FFT_length as the end_overlap? Say, 256, i.e. 0.25 of the 1024, for *all* analyses?

I guess your concern is the same as mine: for the starting and ending 'overlap' half the FFT_length for the area outside the lossyWav block is a bit much and brings in wrong information to a major extent.
Your approach of 25% seems appropriate to me and corresponds to the 50% overlap between adjacent FFT windows (meaning the most central 50% samples of the FFT windows are taken good care of by the hanning windowed FFT analysis).
But why do you want to relate it to the longest FFT? IMO it should be 25% of the current FFT length.
This more general procedure matches perfectly with the 1 FFT center spproach for a lossyWav blocksize of 512 and a 1024 sample FFT.
alpha v0.3.20 uses 25% of the current fft_length for the -overlap option. I'd be interested to know if there is any perceptible difference in the quality of the output as using -overlap increases bits_to_remove.

Although, niggling at the back of my mind is the thought that if it holds that you should overlap by 50% inside a codec block, why would we change that when looking outside the codec block in the end_overlap area?
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #421
I repeated my v0.3.19 skew and -snr analysis (using -3) which I did wrong yesterday (first result: average bitrate of my full length regular music set, second result: average bitrate from my problem sample set):
Code: [Select]
        | -skew 0 | -skew 12| -skew 24| -skew 36
-snr 0  |382 / 480|383 / 490|390 / 510|421 / 547
-snr 12 |382 / 480|383 / 490|390 / 510|421 / 547
-snr 24 |387 / 486|393 / 501|402 / 524|429 / 560

Pretty much the same result as with v0.3.18. (keep in mind that the v0.3.18 test was done with a FLAC blocksize of 1024).
lame3995o -Q1.7 --lowpass 17

 

lossyWAV Development

Reply #422
Although, niggling at the back of my mind is the thought that if it holds that you should overlap by 50% inside a codec block, why would we change that when looking outside the codec block in the end_overlap area?

If you overlap 50% inside of the lossyWav block this means you have confidence that the region 25% to either side of the FFT window center carries the necessary information. Let's take this as valid assumption (otherwise we would have to increase the overlapping). With a 50% overlap these '25% away from the center' regions  consecutively and nonoverlappingly cover the lossyWav block. At the start this means you need to start the first window just 25% before the current lossyWav block. The lossyWav block then starts at the very beginning of our trusted region of the first FFT window. At the end it's the same thing as only the last 25% of the FFT window makes up for the trailing untrusted region.

Most vital it's for the long FFT as a lot of foreign energy makes it into the current lossyWav block analysis with the current form we do it.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #423
Although, niggling at the back of my mind is the thought that if it holds that you should overlap by 50% inside a codec block, why would we change that when looking outside the codec block in the end_overlap area?
If you overlap 50% inside of the lossyWav block this means you have confidence that the region 25% to either side of the FFT window center carries the necessary information. Let's take this as valid assumption (otherwise we would have to increase the overlapping). With a 50% overlap these '25% away from the center' regions  consecutively and nonoverlappingly cover the lossyWav block. At the start (analogously for the end) this means you need to start the first window just 25% before the current lossyWav block. The lossyWav block then starts at the very beginning of our trusted region of the first FFT window.

Most vital it's for the long FFT as a lot of foreign energy would make it into the current lossyWav block analysis.
I see where you're coming from and agree with the logic, although the 50% end_overlap does allow the beginning and end samples to have full weight in the Hanning Window. As you said previously, we need guidance as to whether this approach has in some way been tried and discredited.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #424
....
lossyWAV alpha v0.3.20 attached:

-overlap parameter added to reduce the end_overlap of FFT analyses to 25% FFT_length rather than 50%....

Hi Nick,

Did you change this already or did it go unnoticed to me:
Does that mean the overlap within a lossyWav block is 50% as before, but the overlap at the beginning and end of a lossyWav blocks stretches just 25% into the neighboring lossyWav blocks?

Would be great, as I'm really worried about the behavior with 1024 sample FFTs where we have 2 FFT windows which get exactly the same amount of information from the neighboring lossyWav blocks as from the block under consideration, and no other FFT window in the case of lossyWav block size = 512 resp. just 1 more FFT window (so 1 out of 3) in the case of lossyWav block size = 1024 (this ione at least gets the right information).
Min finding makes the situation worse.

Hope I interpret your -overlap option correctly cause 25% overlap in the interior wasn't a good idea. Sorry again for going wild.
lame3995o -Q1.7 --lowpass 17