lossyWAV Development

Topic: lossyWAV Development (Read 566218 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

lossyWAV Development

Reply #150 – 2007-09-19 19:25:22

Sorry, I abxed Atem-lied 9/10 using v0.1.6 -s.

BTW this does not mean this version is worse than the last one. I use my very open Alessandro MS-2 when my wife is not around, otherwise I use my canal phones ultimate ears super.fi 5 pro. They are both very good but sound very differently. But even with same hedphones I'm sure there are differences for me from day to day in being able to hear such subtle problems.

Just for comparison and to exclude the possibility that something may be slightly wrong with the Delphi implementation: is there a chance to get a analogue MATLAB generated version of Atem-lied?

lossyWAV Development

Reply #151 – 2007-09-19 20:28:46

Quote from: halb27 on 2007-09-19 19:25:22

Sorry, I abxed Atem-lied 9/10 using v0.1.6 -s.

BTW this does not mean this version is worse than the last one. I use my very open Alessandro MS-2 when my wife is not around, otherwise I use my canal phones ultimate ears super.fi 5 pro. They are both very good but sound very differently. But even with same hedphones I'm sure there are differences for me from day to day in being able to hear such subtle problems.

Just for comparison and to exclude the possibility that something may be slightly wrong with the Delphi implementation: is there a chance to get a analogue MATLAB generated version of Atem-lied?

The Matlab script has not been keeping pace with the Delphi in this instance. I will attempt to incorporate the skewing function into Matlab and post the result.

Currently trying to include a mersenne-twister random number generator instead of the delphi standard version - just to see if it makes a difference.

In terms of input / output file naming, the drive / directory name (if any) in the input filename will now be stripped and the output file will default to the current directory, unless the -o parameter is used to indicate an alternative output directory. I am thinking of changing the -o parameter to require the whole output filepath / filename to be specified.

I will post 0.1.6b soon which will not reduce the noise_threshold_shift at all when the skewing is switched on. However, this may result in bitrate bloat of the resultant FLAC file, as David suggested.

Or, are we trying to redesign the whole method around one problem sample? I don't know which way to go right now. Have you tried Atem_lied at quality -1? If so, is it better and if better can you ABX it?

lossyWAV Development

Reply #152 – 2007-09-19 21:24:59

I also see it as an option to ignore Atem-lied some day especially as the problem is extremely subtle (to me).

At the moment however there may be a small chance that the problem is due to a Delphi implementation error and we shouldn't give away the chance to find it.
Is the method of the current Delphi version when not using skewing identical to that of the Matlab script? Then it makes sense to try the Matlab version.

I hope I have fixed the wavIO problem and will test it tomorrow (I'm too tired now). I'll also try quality -1 tomorrow.

lossyWAV Development

Reply #153 – 2007-09-19 22:23:48

Quote from: halb27 on 2007-09-19 21:24:59

I also see it as an option to ignore Atem-lied some day especially as the problem is extremely subtle (to me).

At the moment however there may be a small chance that the problem is due to a Delphi implementation error and we shouldn't give away the chance to find it.
Is the method of the current Delphi version when not using skewing identical to that of the Matlab script? Then it makes sense to try the Matlab version.

I hope I have fixed the wavIO problem and will test it tomorrow (I'm too tired now). I'll also try quality -1 tomorrow.

I have implemented the debug mode to allow the bits_to_remove for each codec_block to be examined on a block-by-block basis, i.e. to check if the Matlab and Delphi output is the same. As the average bits to remove for the Matlab version seem to be higher than the Delphi version (see comparison txt files above) I feel that the Delphi version is *slightly* more conservative than the Matlab version - why, I don't quite know - but I sense another debug session tomorrow night.........

I agree that we shouldn't ignore the possibility that there's an error in implementation!

Attached the bits_to_remove data, block-by-block for atem_lied, no skewing, 3 analyses (10,8,6 bit), triangular dither, noise_threshold_shift=-3. As can be seen, mainly the same, only a few differences.

I will go through the maths regarding the determination of the sub-blocks for analysis again tomorrow and see if the result improves.

One problem I am having is re-creating the noise analysis for creating the reference_threshold and threshold_index values - currently I am using the pre-processed constants to re-create the surface (fft bits x bits_to_remove), accurate to <0.2dB.

lossyWAV Development

Reply #154 – 2007-09-20 07:58:12

Thanks for the tables.
Judging from that I don't think the Atem-lied problem is a problem of the Delphi implementation.
In the critical region ~ 6 bits are removed (with both the Delphi and the Matlab version), and I think this is really not appropriate in this situation.

So I think we have two choices:

1) try more variants, for instance averaging over 3 pins instead of 4.

2) if things don't essentially improve accept that Atem-lied is a (very minor) problem sample we get fully transparent only with best quality setting (will try it tonight if the current best quality setting does it).
In this case however more listening experience than just mine is most welcome (not only true in this case).

lossyWAV Development

Reply #155 – 2007-09-20 08:24:00

Quote from: halb27 on 2007-09-20 07:58:12

Thanks for the tables.
Judging from that I don't think the Atem-lied problem is a problem of the Delphi implementation.
In the critical region ~ 6 bits are removed (with both the Delphi and the Matlab version), and I think this is really not appropriate in this situation.

So I think we have two choices:

1) try more variants, for instance averaging over 3 pins instead of 4.

2) if things don't essentially improve accept that Atem-lied is a (very minor) problem sample we get fully transparent only with best quality setting (will try it tonight if the current best quality setting does it).
In this case however more listening experience than just mine is most welcome (not only true in this case).

Last night, I got two out of three of the noise analysis calculations to give output which agrees with the matlab output - unfortunately they were the no dither and rectangular dither calculations - triangular dither still gives about 1.5dB less than it should, it's as if the whole surface has been shifted down by that amount.

I will add a "-a" parameter to allow the spreading function length to be reduced from 4 to 3.

lossyWAV Development

Reply #156 – 2007-09-20 12:02:40

If you want to tackle atem_lied (or any other sample) I propose a very simple procedure...

Use the default original code, and change the noise threshold only. ABX these various different versions, and find out where the problem is solved.

I suggest obscuring the actual noise threshold shift from the listener - so if you're going to pass files to halb27, randomise the order and re-name them A, B, C etc. You can losslessly re-encode to FLAC at different (random) block sizes to hide the real bitrate too, since block size impact efficiency - more so when it doesn't match that used by the pre-processor, which should be left at default.

As a result of this ABXing, you know (by checking) how many bits can be removed before a problem appears.

Then you can mess around with any options you want, and look at the number of bits removed. If it's more than the known good figure, it probably doesn't solve the problem. You can do all this playing without constant ABX testing. When you have something that seems to work numerically, then ABX to make sure that the bits are actually being removed at the correct time in the file.

Hope this suggestion makes sense.

Cheers,
David.

lossyWAV Development

Reply #157 – 2007-09-20 12:17:52

David,

Thanks for the valued input - you make it sound so easy! (well it is, but we hadn't come up with it , and you came up with the original script......)

I will try this out and post, say 3 samples tonight.

My other main debug is to make *very* sure that I'm using the correct reference_threshold values for triangular dither - the fact that no dither and rectangular dither calculate correctly has me a bit worried about the triangular dither calculation in Delphi and Matlab until I can get them to match up.

It would be *really* nice if I could get the Delphi to spit out exactly what the Matlab script produces in terms of WAV output (no dither, as you said previously). My take on the sub-blocks for calculation and end overlap may not be exactly the same as yours, so I'll see what the differences are - I re-invented the wheel a bit on that element of the coding.

Could I be having problems with the Delphi Random Number Generator? I have tried (a little bit) to find another that I could just plug in to the delphi code, but I haven't found a suitable candidate (yet......).

[edit] It was the <censored> random number generator - at least indirectly. In a discussion in another thread it was intimated that one way to carry out triangular dither is to generate one random number per cycle, and subtract the previous random number from the new one. This was the basis upon which my triangular dither *was* coded. Unfortunately, this (in Delphi at least) does not give the same result as in the Matlab code. By switching to generating two random numbers per cycle and subtracting one from the other the problem of not matching the Matlab calculated values is solved!

Thinking it through, if there was a problem with the triangular dither in my noise_analysis code, there must also be in the bit-reducing code - therefore both have been "fixed". I will post v0.1.7 this evening. [/edit]

Nick.

lossyWAV Development

Reply #158 – 2007-09-20 13:44:15

Something that struck me in looking at the spectrum of Atem_lied is that the lowest bins through the critical section are around 200 Hz, which I don't remember as that common. At those frequencies, averaging over 4 bins might get too much from higher frequencies because each bin covers such a wide range down there. Perhaps averaging fewer bins at lower frequencies might help (a little like the Bark scale).

lossyWAV Development

Reply #159 – 2007-09-20 14:07:06

Nice find Bryant! Where might the crossover be say, from 3 to 4 bins or 2 to 3 to 4 bins, in frequency terms? It took me long enough with the latter (i.e. working) version of CONV

lossyWAV Development

Reply #160 – 2007-09-20 15:02:23

Quote from: Nick.C on 2007-09-20 12:17:52

[edit] It was the <censored> random number generator - at least indirectly. In a discussion in another thread it was intimated that one way to carry out triangular dither is to generate one random number per cycle, and subtract the previous random number from the new one.

Don't do that here, at least not yet. It gives you high pass filtered dither noise. There are clear advantages to that, but the threshold calculations assume a flat noise floor.

btw, if you were using that, and it was working properly, and atem_lied still sounds bad, then you/we have worse problems than we thought - since the noise was already a few dB lower at the critical frequencies for that sample.

Cheers,
David.

lossyWAV Development

Reply #161 – 2007-09-20 15:23:06

Quote from: 2Bdecided on 2007-09-20 15:02:23

Don't do that here, at least not yet. It gives you high pass filtered dither noise. There are clear advantages to that, but the threshold calculations assume a flat noise floor.

btw, if you were using that, and it was working properly, and atem_lied still sounds bad, then you/we have worse problems than we thought - since the noise was already a few dB lower at the critical frequencies for that sample.

Cheers,
David.

Wouldn't it sort of cancel itself out? But..... I was using the pre-calculated constants from Matlab, i.e. the ones I've just now managed to duplicate, so the constants *were* correct and I was using the filtered triangular dither method, so maybe there is indeed a problem.

As Bryant pointed out there may be merit in changing the CONV routine to average over fewer samples at low frequencies. Maybe institute a mid_frequency_bin for each analysis and average across fewer samples between lfb and mfb then across more samples between mfb and hfb? As seen from one of my previous posts, reducing the number of bins being averages reduced the bits_to_remove value. In the same way, could the number of bins averaged be *increased* above a certain threshold?

lossyWAV Development

Reply #162 – 2007-09-20 15:26:42

Quote from: bryant on 2007-09-20 13:44:15

Something that struck me in looking at the spectrum of Atem_lied is that the lowest bins through the critical section are around 200 Hz, which I don't remember as that common. At those frequencies, averaging over 4 bins might get too much from higher frequencies because each bin covers such a wide range down there. Perhaps averaging fewer bins at lower frequencies might help (a little like the Bark scale).

Thank you David. Can you help me think this through...

You could do psychoacoustically sensible (or at least slightly more like psychoacoustically sensible) spreading - but that's not what I put the spreading there for. Simply, if you FFT something, some of the bins are going to get very little energy just out of coincidence. Move the window by a few samples, and it'll be different bins which get very little energy. Those minima are pretty much irrelevant. The spreading function is there to smooth them out, otherwise they'll be chosen as the noise floor and the bit rate reduction will be very low. (You could do more FFTs, greatly overlapped, and average in time to achieve a similar thing, but that would really slow things down).

You're right that, at low frequencies, this convolution might be smoothing over important dips. I tested this originally (around 1kHz, not 200Hz) and found it was OK. If you have an extreme dip of about 4 bins or less in width, then it does get partly filled with noise, but not enough to be audible (to me). The only issue was if it was narrow and short - with contrived signals, you can get something that's (just) audibly overlooked, so I included the optional 5ms FFT to catch that.

However, it seems to me in my experiments with the noise shaping version, that SebG is exactly right (and even basic masking models from decades ago support this): you need a greater SNR at lower frequencies than higher frequencies.

So I don't know why atem_lied is failing - is it because a LF dip is smoothed too much, or is it because LFs simply require a higher SNR? Either a variable length conv, or a low frequency threshold skew, can solve this problem. The question is which is correct (and does it matter)?

Obviously I favour the LF skew because it's much easier to implement! But if it's just a "bodge" then it may leave another problem lurking elsewhere. A correct psychoacoustic-ish spreading function might make it more efficient as well as more careful.

If you really want to go mad, for both the standard version, and the noise shaping version (unreleased), you could replace the current model (which is basically "find the noise floor, keep the noise below it") with a psychoacoustic model from your favourite lossy encoder. I don't know how useful this would be, and I don't recommend it - but it'll be a nice project for someone when everything else is finished.

Cheers,
David.

lossyWAV Development

Reply #163 – 2007-09-20 20:06:27

I tried Atem-lied using v0.1.6 -1 -s and couldn't abx it.

I tried my other samples using default quality and -s and could not abx any of them (trumpet however may be a bit on the edge and I can imagine it can be abxed by someone with better hearing - but that's speculation).

-s seems to have a tendency to bring the number of removed bits down a bit.
I didn't really get what the skewing option does. Can you explain it please?

Now that I use LossyWav from the commandline I see the average number of bits removed. I was quite astonished to see for instance herding_calls having only 0.1150 bits removed on average. With such a sample I would have expected to see more bits removed.

It's not just with herding_calls. From that I think a little bit more bits can be removed on average. But of course in this case there should be some method to cover problems like Atem-lied.

I'd really like to see the behavior when averaging over 3 bins. Maybe it's possible to rise the noise threshold this way compensating for the impact of averaging over 3 instead of 4 bins.

To me David Bryants idea is plausible, and may be a rough approximation is already valuable like averaging over 4 bins in the frequency range beyond 1.5 kHz and 3 bins below that.
IMO it's worth trying.

Added:
I looked up earlier in this thread where I could not abx a MATLAB Atemlied version based on 3 analyses, noise_threshold_shift=-3, triangular_dither.
Where are we with the current Delphi version in terms of these parameters?

lossyWAV Development

Reply #164 – 2007-09-20 21:44:47

Quote from: halb27 on 2007-09-20 20:06:27

I didn't really get what the skewing option does. Can you explain it please?

I'd really like to see the behavior when averaging over 3 bins. Maybe it's possible to rise the noise threshold this way compensating for the impact of averaging over 3 instead of 4 bins.

To me David Bryants idea is plausible, and may be a rough approximation is already valuable like averaging over 4 bins in the frequency range beyond 1.5 kHz and 3 bins below that.
IMO it's worth trying.

Added:
I looked up earlier in this thread where I could not abx a MATLAB Atemlied version based on 3 analyses, noise_threshold_shift=-3, triangular_dither.
Where are we with the current Delphi version in terms of these parameters?

Skewing lowers the outputs in bins at the low end of the FFT by up to 6dB, with no reduction at the high frequency bin (16kHz), there is a 1-cos shape to the curve.

New option -t (3 bin average) added.

~~Files attached - 2 Matlab, 2 lossyWAV. Same analyses, noise threshold shift and dither - 2 are 576 sample blocks and 2 are 1024 sample blocks.~~ Removed - flawed processing.

Also attached - I was playing with the random number generator and tried rectangular, triangular, (triangular + triangular)/2 [Tr2] and Tr3 - results attached as Dither.txt. Tr2 and Tr3 seem to have a gaussian shape to them - is this something which might be of use?

Also, looking at the frequency coverage of each bin at varying fft lengths I started something which may end up being the basis for variable bin number averaging - see Bins.txt

I'm tidying up v0.1.7 just now.

lossyWAV Development

Reply #165 – 2007-09-20 22:09:57

A short intermediate result:

I abxed the w version 6/6 and ended up 9/10.
I abxed the x version 6/6 and ended up 7/10.

I don't think w is easier to abx for me. I'm just tired now (especially of listening to Atem-lied), and I think it's better to continue with the remaining versions tomorrow (may be tomorrow morning before going to work if I have sufficient time).

Nick, I've emailed you the corrected version of the wavIO unit. Should be ok now.

lossyWAV Development

Reply #166 – 2007-09-20 22:39:14

Quote from: halb27 on 2007-09-20 22:09:57

A short intermediate result:

I abxed the w version 6/6 and ended up 9/10.
I abxed the x version 6/6 and ended up 7/10.

I don't think w is easier to abx for me. I'm just tired now (especially of listening to Atem-lied), and I think it's better to continue with the remaining versions tomorrow (may be tomorrow morning before going to work if I have sufficient time).

Nick, I've emailed you the corrected version of the wavIO unit. Should be ok now.

Thanks very much for your "ear-time" - it's much appreciated, as is the work put into the wavIO unit!

~~Attached is v0.1.7:~~ - Superseded b v0.1.8

-t parameter added : sets spreading_function_length to 3 rather than 4.
-s parameter modified : skewing function amended, no longer changes noise_threshold_shift value.
-f parameter added : sets fft_overlap to 1/n * fft_length samples, i.e. 1/4 = 5 analyses in 2 fft_lengths, 1/8 = 9 analyses in 2 fft_lengths.

I will have a think about the mechanism whereby variable spreading_function_length can be applied in the CONV function, using a 3.5kHz transition. Is there any merit in thinking of the frequency range in octaves, i.e. spreading_function_length increases exponentially as frequency increases?

lossyWAV Development

Reply #167 – 2007-09-21 07:04:33

In short my result for atem_lied.lossy.y.flac: 2/2 -> 5/7 -> 6/10, so I couldn't abx it.

I do think it's better than the w and x version, though I don't think it's transparent. May be it was not a good idea to test it this morning as I'm a bit pressed to go to work now.

I'll redo the test this evening together with the z version.

lossyWAV Development

Reply #168 – 2007-09-21 12:33:58

Horst's updated wavIO unit incorporated into code;

Small error in the CONV routine (yet again!) fixed;

Skewing now follows a (sin-1) [sin(pi/2*min(hfb-lfb,max(0,this_bin-lfb))/(hfb-lfb))-1] shape rather than (1-cos), now 9dB amplitude rather than 6dB;

Small error in calculation of Average Bits Removed fixed;

Small error in individual fft_analysis result calculation fixed.

[edit] Superseded - v0.2.0 [/edit]

lossyWAV Development

Reply #169 – 2007-09-21 13:06:10

Perhaps we should split this thread somehow in one that contains bug reports and announcements of versions (which I'm not interested in) and a technical one where strategies and techniques are discussed.

Quote from: Nick.C on 2007-09-21 12:33:58

Skewing now follows a (sin-1) shape rather than (1-cos), now 9dB rather than 6dB;

Forgive my ignorance but what exactly is "skewing"?
Is there a relation to noise shaping?
What's the current state strategy on selecting the 'wasted_bits' count / noise shaping filters (if any) ?

Cheers!
SG

lossyWAV Development

Reply #170 – 2007-09-21 13:19:30

Quote from: SebastianG on 2007-09-21 13:06:10

Perhaps we should split this thread somehow in one that contains bug reports and announcements of versions (which I'm not interested in) and a technical one where strategies and techniques are discussed.
Quote from: Nick.C on 2007-09-21 12:33:58

Skewing now follows a (sin-1) shape rather than (1-cos), now 9dB rather than 6dB;
Forgive my ignorance but what exactly is "skewing"?
Is there a relation to noise shaping?
What's the current state strategy on selecting the 'wasted_bits' count / noise shaping filters (if any) ?

Cheers!
SG

The original thread is still running in the FLAC forum - maybe technical discussion should move to that?

Skewing in this instance artificially lowers the fft bin values, in this case at the lower end of the fft results.

As applied in the code, at the low_frequency_bin the dB reduction is by the full amplitude of the selected reduction amount. At the high_frequency_bin there is no reduction at all. The shape of the dB reduction curve is a scaled (1-sin[value]) curve where value is 0 at the lfb (or lower) and pi/2 at the hfb (or higher). For a 32 sample fft_length, lfb=2, hfb=11:

Code: [Select]

Bin  Freq. 1-sin dB reduction
00     0  -1.000  -9.031
01  1378  -1.000  -9.031
02  2756  -1.000  -9.031
03  4134  -0.826  -7.463
04  5513  -0.658  -5.942
05  6891  -0.500  -4.515
06  8269  -0.357  -3.226
07  9647  -0.234  -2.113
08 11025  -0.134  -1.210
09 12403  -0.060  -0.545
10 13781  -0.015  -0.137
11 15159   0.000   0.000
12 16538   0.000   0.000

From the discussions previously, maybe the zero-point in this reduction should be the nearest bin to 3.5kHz, and maybe the amplitude of the skew should be more extreme.

The bits_to_remove value for each codec block is the threshold_index corresponding to the dB of the lowest (CONV'd) bin in any of the analyses carried out on that codec block. The threshold index is determined by calculating the dithered bit reduction noise dB for each bit_to_remove for each fft length.

lossyWAV Development

Reply #171 – 2007-09-21 16:54:59

I tried atem_lied.lossy.z.flac and couldn't abx it (5/10).
I also retried atem_lied.lossy.y.flac and didn't arrive at a better result than this morning.

lossyWAV Development

Reply #172 – 2007-09-21 17:00:43

Quote from: halb27 on 2007-09-21 16:54:59

I tried atem_lied.lossy.z.flac and couldn't abx it (5/10).
I also retried atem_lied.lossy.y.flac and didn't arrive at a better result than this morning.

The problem I found in lossyWAV today will probably be what caused "w" and "x" to be so poor - "y" and "z" were Matlab versions, the only difference being the codec_block_size (x&y=1024, w&z=576). This leads me to think that the fft_overlap will help this sample - i.e. more analyses of the same length over the same data.

I will re-process and post "w" and "x" tonight - using all the same parameters as before, as well as the same with 9dB skewing (20Hz to 3.7kHz).

lossyWAV Development

Reply #173 – 2007-09-21 19:37:40

Does LossyWav remove some noise?
I ABXed (with pain) sample No.E12 on 01.00 - 01.50 range encoded with 0.18. Issue: there's more noise... on reference file.

On the other side very low volume sample played at high level gain (sample No.V03 for example) have more noise (obvious) after LossyWav processing (bitrate is higher too). I guess it's expected.

EDIT: ABX log for E12:

Code: [Select]

foo_abx v1.2 report
foobar2000 v0.8.3
2007/09/21 20:26:28

File A: file://C:\150 samples\E12_MODERN_CHAMBER_L_piano_flute.flac
File B: file://C:\150 samples\E12_MODERN_CHAMBER_L_piano_flute.lossy.flac

20:26:28 : Test started.
20:26:50 : 01/01  50.0%
20:26:58 : 01/02  75.0%
20:27:01 : 01/03  87.5%
20:27:04 : 02/04  68.8%
20:27:09 : 03/05  50.0%
20:27:16 : 04/06  34.4%
20:27:20 : 05/07  22.7%
20:27:43 : 06/08  14.5%
20:27:49 : 07/09  9.0%
20:28:07 : 07/10  17.2%
20:28:46 : 08/11  11.3%
20:28:59 : 09/12  7.3%
20:29:07 : 10/13  4.6%
20:29:12 : 11/14  2.9%
20:29:54 : 11/15  5.9%
20:30:38 : 12/16  3.8%
20:30:44 : Test finished.

 ---------- 
Total: 12/16 (3.8%)

lossyWAV Development

Reply #174 – 2007-09-21 20:36:36

Thanks for the input Guru - where the bits_to_remove is zero, lossyWAV will still dither the samples because there's an automatic anti-clipping amplitude reduction to 95.28% (30.49/32, i.e. 32 -1 (triangular_dither amplitude) -0.5 (normal rounding) -0.01 (Nick.C's error margin)) as the file is processed, so dither is still required.

Maybe a follow on batch file to detect which files have become bigger and annihilate them?

A bit surprised about E12 - and glad that your trained ears are not shuddering at the output from lossyWAV.

From my own processing testing I am getting ever close to the Matlab output in terms of matching bits_to_remove on a block-by-block basis - the latest build has only 6 instances of 1 bit difference between the processors for Atem_lied, on 524 blocks. Oddly enough they cancel each other out as 3 are +1 and 3 are -1. I am going to use the same reference threshold surface on both processors to bottom that out and then can move on, confident that the output of lossyWAV is the same as that of Matlab.

When I processed your 150 samples with and without skewing there was only about 50kB difference over the whole sample set between skewed and non-skewed (i.e. not a lot of minimum bins between 20Hz and 3.7kHz.....)

Notice