Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: lossyWAV Development (Read 573711 times) previous topic - next topic
0 Members and 2 Guests are viewing this topic.

lossyWAV Development

Reply #925
Hi, 2B!

I just skimmed trough LossyFLAC.m and noticed that there's a misunderstanding regarding filter coefficients. The filter coefficients from "the book" are b=[2.033 -2.165 1.959 -1.590 0.6149]; which corresponds to H(z)=2.033-2.165*z^-1...+0.6149*z^-4. But this isn't actually the noise shaping filter in this case. 1-z^-1*H(z) is. It's common and popular to write the transfer function of noise shaping filters as 1-z^-1*H(z). So, in case you have the filter coefficients for H(z) and want to plot the frequency response of the actual noise shaping filter you need to use freqz([1 -b]) for the FIR cases. Since you're removing the leading coefficient and inverting signs you just need to skip this part for the "book filter".
Thank you. I didn't know that's what was being quoted. It's amazing it works as well as it does!

I don't need that step for any of the filters then (just drop the leading ones before typing them in!). As I said, the code is hacked from another version, which does need that process.

Quote
Just to confuse you a bit more I'm rewriting the transfer function's expression of the filter I was suggesting:
I'm laid up in bed with a cold. I'm not even going to try to follow this now!

Quote
The following image is a "DSP circuit picture" explaining how noise shaping can be done:
[a href="http://img441.imageshack.us/my.php?image=noiseshaper2de9.png" target="_blank"]

Cheers,
David.

lossyWAV Development

Reply #926
Quote
Just to confuse you a bit more I'm rewriting the transfer function's expression of the filter I was suggesting:
I'm laid up in bed with a cold. I'm not even going to try to follow this now!
Quote
The following image is a "DSP circuit picture" explaining how noise shaping can be done:
[a href="http://img441.imageshack.us/my.php?image=noiseshaper2de9.png" target="_blank"]

Yeah, that rings a bell. But I can't remember agreeing on whether the patent really applies or not.

Cheers,
SG

lossyWAV Development

Reply #927
Hi, 2B!

I just skimmed trough LossyFLAC.m and noticed that there's a misunderstanding regarding filter coefficients. The filter coefficients from "the book" are b=[2.033 -2.165 1.959 -1.590 0.6149]; which corresponds to H(z)=2.033-2.165*z^-1...+0.6149*z^-4. But this isn't actually the noise shaping filter in this case. 1-z^-1*H(z) is. It's common and popular to write the transfer function of noise shaping filters as 1-z^-1*H(z). So, in case you have the filter coefficients for H(z) and want to plot the frequency response of the actual noise shaping filter you need to use freqz([1 -b]) for the FIR cases. Since you're removing the leading coefficient and inverting signs you just need to skip this part for the "book filter".

You'll see that the response of the filter isn't that bad after all. Its deviation from the one I was suggesting is within +/-5 dB at nearly all frequencies.

Just to confuse you a bit more I'm rewriting the transfer function's expression of the filter I was suggesting:
Code: [Select]
1 -1.1474 z^-1 +0.5383 z^-2 -0.3520 z^-3 +0.3475 z^-4
-----------------------------------------------------  =
1 +1.0587 z^-1 +0.0676 z^-2 -0.6054 z^-3 -0.2738 z^-4


            2.2061 -0.4707 z^-1 -0.2534 z^-2 -0.6213 z^-3
1 - z^-1 -----------------------------------------------------
         1 +1.0587 z^-1 +0.0676 z^-2 -0.6054 z^-3 -0.2738 z^-4
The new numerator is simply a-b with the leading zero removed (polynomial division + factoring out -z^-1). This form has its advantages when it comes to implementig noise shaping. The following image is a "DSP circuit picture" explaining how noise shaping can be done:

Still, I think the use of fixed shaping for this purpose is very limited. You could do much better with some easy signal adaptive filters like H(z)/A(z) where H(z) is some fixed filter and 1/A(z) is the LPC synthesis filter for the current frame or something like that.


Cheers,
SG
Okay, I admit to being a bit baffled at the moment....

I looked up noise shaping in Wikipedia and found
Code: [Select]
y(n) = x(n)+A.E(x(n-1))+B.E(x(n-2))+C.E(x(n-3))+D.E(x(n-4))+
E.E(x(n-5))+F.E(x(n-6))+G.E(x(n-7))+H.E(x(n-8))+I.E(x(n-9))
I also found some code which was using a a filter with 9 coefficients so I implemented the noise shaping in lossyWAV like that, i.e. output = input - coeff[0..8] x quantization_error[0..-8], where quantization_error = output - input.

Initially, this kept crashing until I divided all coefficients by coeff[0] and then disregarded coeff[0] (per David's code).

Looking again at the Wikipedia article, it appears that I have omitted to include dither in the calculation.

I still feel as if I'm groping in the dark here and would gratefully accept any advice, pointers, etc.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #928
Hi,
  I'm completely new to the forum - this is my first post so please excuse me if I'm posting in the wrong place or whatever

I'm fascinated and quite excited by lossywav and have done some testing with v0.6.7_RC2. I have not found any obvious problems using 16/44 input files but I can't get it to run with 24 bit files. I've read the item that says that most testing has been done with 16/44 but the wiki says that it can handle up to 32/48 and I'm very keen to use it on 24 bit files as all of the lossless codecs give relatively poor results in terms of file size with 24 bit files.

I have tried files generated by adobe audition in "24 bit packed int (type 1 - 24 bit)". This causes Lossywav to error instantly with the message "FMT chunk wrong size"

I have also tried files in "24 bit packed int (type 1 - 20 bit". Lossywav manges to openthe file, recognises the format as 48.00khz; 2ch.; 20 bit but it then fails with a Windows error message "lossyWAV.exe has encountered a problem and needs to close.  We are sorry for the inconvenience."

As I said, I'm not sure if I'm posting in the right place but I would like very much to help with testing if you think it might be useful. I should point out though that I am not very technical, I'm just a music lover. In fact this is the first time I've run anything using cmd - I've always relied on GUI front ends

Botface

 

lossyWAV Development

Reply #929
Hi,
  I'm completely new to the forum - this is my first post so please excuse me if I'm posting in the wrong place or whatever

I'm fascinated and quite excited by lossywav and have done some testing with v0.6.7_RC2. I have not found any obvious problems using 16/44 input files but I can't get it to run with 24 bit files. I've read the item that says that most testing has been done with 16/44 but the wiki says that it can handle up to 32/48 and I'm very keen to use it on 24 bit files as all of the lossless codecs give relatively poor results in terms of file size with 24 bit files.

I have tried files generated by adobe audition in "24 bit packed int (type 1 - 24 bit)". This causes Lossywav to error instantly with the message "FMT chunk wrong size"

I have also tried files in "24 bit packed int (type 1 - 20 bit". Lossywav manges to openthe file, recognises the format as 48.00khz; 2ch.; 20 bit but it then fails with a Windows error message "lossyWAV.exe has encountered a problem and needs to close.  We are sorry for the inconvenience."

As I said, I'm not sure if I'm posting in the right place but I would like very much to help with testing if you think it might be useful. I should point out though that I am not very technical, I'm just a music lover. In fact this is the first time I've run anything using cmd - I've always relied on GUI front ends

Botface
Hi there,

lossyWAV will only work with PCM integer values (4 to 32 bit as in wiki article, *not* 32bit float). These are packed out to the nearest byte and stored. I am unsure what type of audio data your values apply to. [edit] If the FMT chunk is the wrong size (i.e. not integer values) then lossyWAV will exit. [/edit]

Sorry not to be much help.

Nick.

[edit] ps. Please could you post a sample (<=30 seconds in length) for me to test with? It would be very much appreciated. [/edit]
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #930
Okay, I admit to being a bit baffled at the moment....

It's probably the z-domain thingy. It takes a while to wrap one's head around it.

I looked up noise shaping in Wikipedia and found
Code: [Select]
y(n) = x(n)+A.E(x(n-1))+B.E(x(n-2))+C.E(x(n-3))+D.E(x(n-4))+
E.E(x(n-5))+F.E(x(n-6))+G.E(x(n-7))+H.E(x(n-8))+I.E(x(n-9))
I also found some code which was using a a filter with 9 coefficients so I implemented the noise shaping in lossyWAV like that, i.e. output = input - coeff[0..8] x quantization_error[0..-8], where quantization_error = output - input.

The 1st problem with this wikipedia article is that it's not really obvious what E is. Is it the unfiltered or the filtered noise? Btw, output-input isn't the the quantization error. It's the already-filtered error. So, in your case you'll get an all-pole filter which is a totally different beast than a FIR filter where the actual quantization errors are used. The difference is subtle: Note, that in the picture I made I pick up the signal after the feedback and right before dither and quantization noise is added to compute the "quantization error" (unfiltered noise).
The 2nd problem with this wikipedia article is that it doesn't say anything about FIR or IIR filters and whether and/or how they can be used and what type of filter is actually described there.

That said: Regardless of what E is, their noise shaping formula is equivalent to the structure I drew where H(z) either corresponds to an all-pole-IIR or a FIR filter.

Initially, this kept crashing until I divided all coefficients by coeff[0] and then disregarded coeff[0] (per David's code).

By "crashing" I guess you mean the filter went unstable. You probably used the coefficients in a wrong way. It might be a sign problem (sign of E is wrong) or you got the wrong E (filtered noise instead of unfiltered noise).

Looking again at the Wikipedia article, it appears that I have omitted to include dither in the calculation.
I still feel as if I'm groping in the dark here and would gratefully accept any advice, pointers, etc.

If you omit dither you can't guarantee the quantization error to be white/uncorrelated. The noise shaping stuff still works but you may get unexpected results because the filter is supposed to be applied on white/uncorrelated noise. So, that's why at least rectangular dithering should be used.

More explanations and pseudo code following...
Code: [Select]
You might have missed some informations regarding the picture I drew
X      : input signal
Y      : output signal ( = input + filtered noise )
E      : dither & quantization noise (unfiltered white noise please)
+      : is obviously mixing two signals. Note it can also be used
         for subtraction (source line(s) marked with a minus)
         Also, quantization is modelled as mixing the signal with
         errors.
[z^-1] : This is a simple filter: A delay of one sample
[H(z)] : This is any filter you like to use

So, suppose you have some given filter coefficients for H(z):
b[] = {b[0],b[1],b[2],...,b[n]}; // array, indexed starting at 0
a[] = {  1 ,a[1],a[2],...,a[m]}; // array, we don't need a[0]
The index actually corresponds to the power of 1/z for the z-domain
interpretation, 'b' holds the numerator coefficients and 'a' holds
the denominator coefficients.

x[k] and y[k] are the input and output samples.

We also need some filter memory with exactly max(n+1,m) samples. Let's
write fifo[0] for the last sample we added to the fifo, fifo[1] was
the last sample in the previous loop and so on...

Then, the inner loop over 'k' would look like this:
{
   wanted_temp = x[k] + fifo[0] * b[0]
                      + fifo[1] * b[1]
                      + ..............
                      + fifo[n] * b[n];
   y[k] = quantize( wanted_temp + dither );
   qerror_temp = wanted_temp - y[k];
   new_fifo_sample = qerror_temp - fifo[0] * a[1]
                                 - fifo[1] * a[2]
                                 - ..............
                                 - fifo[m-1] * a[m];
   fifo_add( fifo, new_fifo_sample );
   // Now: fifo[0] == new_fifo_sample
}

For implementing H(z) I used the direct form II structure where the delay-line is shared among the recursive and non-recursive filter parts.

The 4th order filter I was suggesting for 24->16 bit word length reduction @ 44 kHz sampling frequency leads to the following coefficients for H(z):
b[0..3] = { 2.2061 , -0.4707 , -0.2534 , -0.6213 };
a[1..4] = { 1.0587 , 0.0676 , -0.6054 , -0.2738 };
Again: H(z) is NOT the transfer function of the noise shaper, it is G(z) = 1 - z^-1 * H(z).

Note: This post comes with no warrenty and might contain errors.

Cheers,
SG

lossyWAV Development

Reply #931
Okay, I admit to being a bit baffled at the moment....
It's probably the z-domain thingy. It takes a while to wrap one's head around it.
I looked up noise shaping in Wikipedia and found
Code: [Select]
y(n) = x(n)+A.E(x(n-1))+B.E(x(n-2))+C.E(x(n-3))+D.E(x(n-4))+
E.E(x(n-5))+F.E(x(n-6))+G.E(x(n-7))+H.E(x(n-8))+I.E(x(n-9))
I also found some code which was using a a filter with 9 coefficients so I implemented the noise shaping in lossyWAV like that, i.e. output = input - coeff[0..8] x quantization_error[0..-8], where quantization_error = output - input.

The 1st problem with this wikipedia article is that it's not really obvious what E is. Is it the unfiltered or the filtered noise? Btw, output-input isn't the the quantization error. It's the already-filtered error. So, in your case you'll get an all-pole filter which is a totally different beast than a FIR filter where the actual quantization errors are used. The difference is subtle: Note, that in the picture I made I pick up the signal after the feedback and right before dither and quantization noise is added to compute the "quantization error" (unfiltered noise).
The 2nd problem with this wikipedia article is that it doesn't say anything about FIR or IIR filters and whether and/or how they can be used and what type of filter is actually described there.

That said: Regardless of what E is, their noise shaping formula is equivalent to the structure I drew where H(z) either corresponds to an all-pole-IIR or a FIR filter.

Initially, this kept crashing until I divided all coefficients by coeff[0] and then disregarded coeff[0] (per David's code).
By "crashing" I guess you mean the filter went unstable. You probably used the coefficients in a wrong way. It might be a sign problem (sign of E is wrong) or you got the wrong E (filtered noise instead of unfiltered noise).
Looking again at the Wikipedia article, it appears that I have omitted to include dither in the calculation.
I still feel as if I'm groping in the dark here and would gratefully accept any advice, pointers, etc.
If you omit dither you can't guarantee the quantization error to be white/uncorrelated. The noise shaping stuff still works but you may get unexpected results because the filter is supposed to be applied on white/uncorrelated noise. So, that's why at least rectangular dithering should be used.

More explanations and pseudo code following...
Code: [Select]
You might have missed some informations regarding the picture I drew
X      : input signal
Y      : output signal ( = input + filtered noise )
E      : dither & quantization noise (unfiltered white noise please)
+      : is obviously mixing two signals. Note it can also be used
         for subtraction (source line(s) marked with a minus)
         Also, quantization is modelled as mixing the signal with
         errors.
[z^-1] : This is a simple filter: A delay of one sample
[H(z)] : This is any filter you like to use

So, suppose you have some given filter coefficients for H(z):
b[] = {b[0],b[1],b[2],...,b[n]}; // array, indexed starting at 0
a[] = {  1 ,a[1],a[2],...,a[m]}; // array, we don't need a[0]
The index actually corresponds to the power of 1/z for the z-domain
interpretation, 'b' holds the numerator coefficients and 'a' holds
the denominator coefficients.

x[k] and y[k] are the input and output samples.

We also need some filter memory with exactly max(n+1,m) samples. Let's
write fifo[0] for the last sample we added to the fifo, fifo[1] was
the last sample in the previous loop and so on...

Then, the inner loop over 'k' would look like this:
{
   wanted_temp = x[k] + fifo[0] * b[0]
                      + fifo[1] * b[1]
                      + ..............
                      + fifo[n] * b[n];
   y[k] = quantize( wanted_temp + dither );
   qerror_temp = wanted_temp - y[k];
   new_fifo_sample = qerror_temp - fifo[0] * a[1]
                                 - fifo[1] * a[2]
                                 - ..............
                                 - fifo[m-1] * a[m];
   fifo_add( fifo, new_fifo_sample );
   // Now: fifo[0] == new_fifo_sample
}
For implementing H(z) I used direct form II for arbitrary IIR filters.

The 4th order filter I was suggesting for 24->16 bit word length reduction @ 44 kHz sampling frequency leads to the following coefficients for H(z):
b[0..3] = { 2.2061 , -0.4707 , -0.2534 , -0.6213 };
a[1..4] = { 1.0587 , 0.0676 , -0.6054 , -0.2738 };
Again: H(z) is NOT the transfer function of the noise shaper, it is G(z) = 1 - z^-1 * H(z).

Note: This post comes with no warrenty and might contain errors.

Cheers,
SG
Huge thanks, Sebastian - It will take me some time to get my head round it but I will endeavour to implement it when I get back from a few days away....
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #932
Hi,
  I'm completely new to the forum - this is my first post so please excuse me if I'm posting in the wrong place or whatever

I'm fascinated and quite excited by lossywav and have done some testing with v0.6.7_RC2. I have not found any obvious problems using 16/44 input files but I can't get it to run with 24 bit files. I've read the item that says that most testing has been done with 16/44 but the wiki says that it can handle up to 32/48 and I'm very keen to use it on 24 bit files as all of the lossless codecs give relatively poor results in terms of file size with 24 bit files.

I have tried files generated by adobe audition in "24 bit packed int (type 1 - 24 bit)". This causes Lossywav to error instantly with the message "FMT chunk wrong size"

I have also tried files in "24 bit packed int (type 1 - 20 bit". Lossywav manges to openthe file, recognises the format as 48.00khz; 2ch.; 20 bit but it then fails with a Windows error message "lossyWAV.exe has encountered a problem and needs to close.  We are sorry for the inconvenience."

As I said, I'm not sure if I'm posting in the right place but I would like very much to help with testing if you think it might be useful. I should point out though that I am not very technical, I'm just a music lover. In fact this is the first time I've run anything using cmd - I've always relied on GUI front ends

Botface
Hi there,

lossyWAV will only work with PCM integer values (4 to 32 bit as in wiki article, *not* 32bit float). These are packed out to the nearest byte and stored. I am unsure what type of audio data your values apply to. [edit] If the FMT chunk is the wrong size (i.e. not integer values) then lossyWAV will exit. [/edit]

Sorry not to be much help.

Nick.

[edit] ps. Please could you post a sample (<=30 seconds in length) for me to test with? It would be very much appreciated. [/edit]

Nick,
      I've tried to send you a test file a couple of times but my posts just don't seem to be there. I'm assuming the file was too large as the attach procedure took ages. So, here's another, smaller file. It was recorded from vinyl at 32/48 and saved as "24 bit packed int (type 1 - 24 bit)".

Let me know if you need anything else

botface

lossyWAV Development

Reply #933
Code: [Select]
------------------------------------------------------------------------------
lFLCDrop v1.2.0.5
lFLC.bat for lFLCDrop v1.0.0.7
------------------------------------------------------------------------------

lFLCDrop Change Log:
v1.2.0.5
- presets updated to -1 through -7
- all presets always create correction files, except custom
- "Delete Source Files" option removed

lFLC.bat Change Log:
v1.0.0.7
- added a new set of variables for decoding
- added automatic functionality for the -merge option
- added support for auto-merging legacy lossyWAVs with proper naming convention
- added automatic encoding of .lwcdf.wav while encoding an already lossy .wav
- custom preset defaults now match -2 (default) frontend preset functionality


Let me know if you encounter any bugs.  The batch file is just getting to the level of complexity (10.4KB!) where there may be combinations of logic in the code that I just haven't thought to test fully.  But it should all be working without bugs, and there's error checking built into everything, so the main thing is that the logic would end up doing something that doesn't seem like it's what should happen.

After the command-line options (if any) for noise-shaping settle down, I'll do a release to support those additions, and I'll include a documentation on what command-line options to send to lFLC.bat for encoding & decoding from other software or batch files.  That way people can implement things like tagging through batch files in EAC, and call lFLC.bat for the dirty work of encoding, and then tag afterwards.  Feel free to use the methods in lFLC.bat for creating your own. 


Enjoy

lossyWAV Development

Reply #934
@jesseg --- Thanks for the update!

Just been running:

lFLCDrop v1.2.0.5
lFLC.bat for lFLCDrop v1.0.0.7

First thing I noticed was that after doing the correction file encoding a single wav the DOS Window prompts "press any key to continue", is that supposed to happen - as I'd prefer it just encoded the whole batch without punctuation.

Also, is there any way you could create an option whereby the user specifies a directory (browse/create directory) for the correction files.

e.g.

Source Folder/ [inputs] *.wav  ,  [outputs] *.lossy.flac
Source Folder/Corrections Files/ [outputs] *.lwcdf.flac

C.
PC = TAK + LossyWAV  ::  Portable = Opus (130)

lossyWAV Development

Reply #935
I re-zipped the directory and re-uploaded it.  Somehow I had removed that pause at the last second, but forgot to re-zip it before uploading.  My bad, thanks for catching it.    It was in the exit, so it would have happened no matter what you were trying to do.  Oops. 

[edit]
And re: a sub-folder of current folder option, could be added, but it would only be controllable through lFLC.bat, not through the frontend - unless I make my own frontend.  And if I do or anyone else does, I can imagine that it's not going to rely on a batch file at all.
[/edit]

lossyWAV Development

Reply #936
... So, here's another, smaller file. ... [Budapest_10_secs.wav]

I have no problem at all with your file.
First I renamed your file to a.wav, called 'lossywav a.wav' from cmd.exe, and got a wonderful 24 bit 48 kHz a.lossy.wav file.
Then I used my standard lossyFLAC bat file with foobar on your Budapest_10_secs.wav, and this too yielded a perfect lossy.flac result.
Did you try plain 'lossywav a.wav'?
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #937

... So, here's another, smaller file. ... [Budapest_10_secs.wav]

I have no problem at all with your file.
First I renamed your file to a.wav, called 'lossywav a.wav' from cmd.exe, and got a wonderful 24 bit 48 kHz a.lossy.wav file.
Then I used my standard lossyFLAC bat file with foobar on your Budapest_10_secs.wav, and this too yielded a perfect lossy.flac result.
Did you try plain 'lossywav a.wav'?

Funnily enough I am now able to process the file without problems either. I also have no problems with the latest beta. I've also successfully procesed a 24/88.2 file. I can't imagine what went wrong the first time.

Thanks for trying it anyway

lossyWAV Development

Reply #938
I havn't checked how noise shaping is implemented in lossyFLAC.m. So, if you say your implementation is equivalent to what's shown in the picture and you are requireing the coefficients for H(z) then you need to use the numerator's and denominator's coefficients of the rewritten transfer function because removing the leading one doesn't do it for IIR filters...
I think I cheated - can you take a look and tell me if it works or not?

Also, I think several of us would really appreciate it if you could spend some time writing a good page on noise shaping for the HA wiki. (If you don't have the time, just ignore our questions on here!).

I couldn't find a single decent reference to IIR filters in noise shaping, hence my guess at how to do it.

The other problem is that the explanations that exist are often written for mathematicians. I suppose engineers and programmers should be able to understand such explanations, but I usually find them lacking. On the one hand, I want to understand at a high level what's happening, at on the other hand I want to understand bit-by-bit what's happening. Many explanations walk a fine line down the middle leaving both of these unclear to me.

Cheers,
David.
P.S. It wasn't a cold - it was/is a chest infection. Still laid up. 

lossyWAV Development

Reply #939
Also, I think several of us would really appreciate it if you could spend some time writing a good page on noise shaping for the HA wiki.

I'm on it.

edit: I finished the article. Still waiting for wiki write acces, though.

P.S. It wasn't a cold - it was/is a chest infection. Still laid up. 

Ouch! Hope you get well soon!.

Cheers,
SG

lossyWAV Development

Reply #940
Right, I'm back from a few days away.....

I've tried to implement the method SebastianG so kindly posted and I'm getting unexpected (even more hiss) results when I use it. I'm posting an excel fragment which will show how I've implemented it, for comment / criticism.....

As well as that, I've been re-thinking the spreading function and have realised that the current method takes into account certain values more often than it should because of the relationship between bin-width and the frequency bands used (some bands have the same start bin at short FFT lengths due to bin-width). So, I'm in the process of rewriting the spread function and will include it as a -newspread parameter to allow back to back comparison.

I'm going to try to puzzle my way through the input / output directory problem as well.

[edit] Having transcribed the function to excel, I seem to have identified and corrected an error in my implementation of the noise shaping function. I'm listening to -7 -nts 36 -snr 0 -shaping at the moment and it's not bad at all (for DAP purposes)........ [/edit]

[edit2] David, I take it that I should re-calculate the reference_threshold values with noise shaping activated to get the full benefit? [/edit2]
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #941
lossyWAV beta v0.8.4 attached to the first post in this thread.
Table of processed bitrates, for my 53 problem sample set, using lossyWAV v0.8.4 with and without -shaping & -newspread.
Code: [Select]
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
| -shaping | -newspread |    -1     |    -2     |    -3     |    -4     |    -5     |    -6     |    -7     |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
|    n     |     n      | 543.5kbps | 494.6kbps | 433.9kbps | 408.2kbps | 385.6kbps | 365.4kbps | 348.1kbps |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
|    y     |     n      | 560.1kbps | 518.3kbps | 466.8kbps | 445.8kbps | 427.5kbps | 411.9kbps | 399.2kbps |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
|    n     |     y      | 568.0kbps | 533.9kbps | 462.9kbps | 442.6kbps | 400.9kbps | 383.8kbps | 352.7kbps |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
|    y     |     y      | 581.4kbps | 552.0kbps | 491.4kbps | 475.0kbps | 441.7kbps | 428.4kbps | 403.8kbps |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #942
I've tried to implement the method SebastianG so kindly posted and I'm getting unexpected (even more hiss) results when I use it.
[...]
[edit] Having transcribed the function to excel, I seem to have identified and corrected an error in my implementation of the noise shaping function. I'm listening to -7 -nts 36 -snr 0 -shaping at the moment and it's not bad at all (for DAP purposes)........ [/edit]

Still, the noticable hiss could be explained. The fletcher munson equal loudness curves have different shapes at different levels. The ATH-derived noise shaping filter is only a special case for low noise levels. So, at higher noise levels the noise shaping filter might expose the high frequency part of the noise noticably which is why I think the use of this kind of fixed filter for lossyWAV is rather limited.

Cheers!
SG

lossyWAV Development

Reply #943
I've tried to implement the method SebastianG so kindly posted and I'm getting unexpected (even more hiss) results when I use it.
[...]
[edit] Having transcribed the function to excel, I seem to have identified and corrected an error in my implementation of the noise shaping function. I'm listening to -7 -nts 36 -snr 0 -shaping at the moment and it's not bad at all (for DAP purposes)........ [/edit]
Still, the noticable hiss could be explained. The fletcher munson equal loudness curves have different shapes at different levels. The ATH-derived noise shaping filter is only a special case for low noise levels. So, at higher noise levels the noise shaping filter might expose the high frequency part of the noise noticably which is why I think the use of this kind of fixed filter for lossyWAV is rather limited.

Cheers!
SG
The hiss I experienced in the previous build was *very* pronounced, in beta v0.8.4 I can't hear anything wrong with the output at all when using -shaping.

The only minor disappointment when using -shaping is, as David said previously, the bitrate increases quite dramatically.

I transcoded my Mike Oldfield collection (261 tracks) this evening using -7a -nts 30 -snr 6 -shaping and got an average bitrate of 340kbps. I've listened to several of the tracks and am very pleased with the results.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #944
Thinking about the added bitrate due to noise shaping, are there some 2 or 3 coefficient filters which might be useful as a compromise between the quality and high bitrate of SebastianG's filters and no noise shaping but lower bitrate?

[edit] I'm not going to go any further with -7a -nts 30 -snr 6 -shaping and have reverted to -7a -shaping. I've converted 1643 tracks and the average bitrate is 374kbps.

There may be some merit in revising the skewing value when noise shaping is enabled (or even when -newspread is enabled) - however, this would take a bit of work from those who have ABXed during settings testing (and would require the -spf parameter to make a re-appearance). [/edit]
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #945
Thinking about the added bitrate due to noise shaping, are there some 2 or 3 coefficient filters which might be useful as a compromise between the quality and high bitrate of SebastianG's filters and no noise shaping but lower bitrate?


There's a simple way of softening minimum phase filters:
Take the coefficients from the noise transfer function (N)
[1 b1 b2 b3 b4 ... ] (numerator)
[1 a1 a2 a3 a4 ... ] (denominator)
and create a new set of coefficients like this:
[1 b1*s b2*s^2 b3*s^3 b4*s^4 ... ] (numerator)
[1 a1*s a2*s^2 a3*s^3 a4*s^4 ... ] (denominator)
where s=1 leads to the original filter and s=0 to N(z)=1 which is no noise shaping at all.
All values inbetween are also fine.

However, you should seriously think about adaptive filters at this stage. Maybe 2B can shed some more light on the alleged danger of patent infringement. I hardly think this is an issue. Adaptive spectral noise shaping isn't big news. Pretty much every speech codec does it including Speex, by the way.

You're already very close to it: You're doing spectral analysis, psychoacoustic modelling and have a working noise shaper implementation. The only thing that's missing now is code to compute the filters. Jean-Marc Valin (jmspeex) and Monty wrote a paper about how Speex can benefit from Vorbis' psychoacoustic model. The same thing applies to LossyWAV. I don't remember how Monty and Jean-Marc did it but I guess it's somthing like computing the autocorrelation of the optimal noise shaping filter's impulse response via iFFT and feeding the result to the Levinson-Durbin algorithm which would give you all the denominator's coefficients (a1, a2, ...) for an all-pole noise shaping filter (b1=b2=...=0).

Cheers!
SG

lossyWAV Development

Reply #946
Thinking about the added bitrate due to noise shaping, are there some 2 or 3 coefficient filters which might be useful as a compromise between the quality and high bitrate of SebastianG's filters and no noise shaping but lower bitrate?
There's a simple way of softening minimum phase filters:
Take the coefficients from the noise transfer function (N)
[1 b1 b2 b3 b4 ... ] (numerator)
[1 a1 a2 a3 a4 ... ] (denominator)
and create a new set of coefficients like this:
[1 b1*s b2*s^2 b3*s^3 b4*s^4 ... ] (numerator)
[1 a1*s a2*s^2 a3*s^3 a4*s^4 ... ] (denominator)
where s=1 leads to the original filter and s=0 to N(z)=1 which is no noise shaping at all.
All values inbetween are also fine.

However, you should seriously think about adaptive filters at this stage. Maybe 2B can shed some more light on the alleged danger of patent infridgement. I hardly think this is an issue. Adaptive spectral noise shaping isn't big news. Pretty much every speech codec does it including Speex, by the way.

You're already very close to it: You're doing spectral analysis, psychoacoustic modelling and have a working noise shaper implementation. The only thing that's missing now is code to compute the filters. Jean-Marc Valin (jmspeex) and Monty wrote a paper about how Speex can benefit from Vorbis' psychoacoustic model. The same thing applies to LossyWAV. I don't remember how Monty and Jean-Marc did it but I guess it's somthing like computing the autocorrelation of the optimal noise shaping filter's impulse response via iFFT and feed the Levinson-Durbin algorithm with it which would give you all the denominator's coefficients (a1, a2, ...) for an all-pole noise shaping filter (b1=b2=...=0).

Cheers!
SG
Thanks for the pointer - I'll have a play with it, maybe even allow -shaping to take a supplementary value in the range 0..1 as you said above.

From memory, David was very reluctant to publish his code which included adaptive filtering. Another consideration is that each codec block is only 512 samples long (per channel) - would this not require fairly heavy processing input to calculate the optimal noise shaping filter?

As an aside (and I know that looking at the spectrum in foobar is not any way to evaluate anything....) I looked at the spectral output for a lossyWAV correction file (replaygained +45dB or so) and almost all of the signal was in the high end of the spectrum - so it "looks" like my implementation of your noise shaping filter works!
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #947
Another consideration is that each codec block is only 512 samples long (per channel) - would this not require fairly heavy processing input to calculate the optimal noise shaping filter?


No. The iFFT+LevinsonDurbin approach should be quite fast. But the resulting filters aren't the best ones which is why I'm currently trying to understand how this can be combined with frequency warping. I have a stack of papers about this on my desk waiting to be read by me. ;-)


so it "looks" like my implementation of your noise shaping filter works!

Cool!

Cheers!
SG

lossyWAV Development

Reply #948
Another consideration is that each codec block is only 512 samples long (per channel) - would this not require fairly heavy processing input to calculate the optimal noise shaping filter?
No. The iFFT+LevinsonDurbin approach should be quite fast. But the resulting filters aren't the best ones which is why I'm currently trying to understand how this can be combined with frequency warping. I have a stack of papers about this on my desk waiting to be read by me. ;-)
so it "looks" like my implementation of your noise shaping filter works!
Cool!

Cheers!
SG
I've added the supplementary parameter to -shaping in the range 0..1 and at 0.5 the added bitrate due to noise shaping is significantly reduced. I'll do a bit more testing with a view to posting v0.8.5 tomorrow.

[edit] Most of your post flew right over my head.... However, whatever can be added to lossyWAV to improve the quality of the output is well worth the effort - many thanks again! [/edit]

[edit2] 3556 tracks processed using -7a -shaping 1.000, 372kbps average bitrate...... [/edit2]
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #949
Hi,

Something struck me about dB level and lossy.wav performance which may well have implications as to how to get the best out of LossyWAV.

To date I've used MP3Gain (for MP3) and WavGain (for WAV prior to lossless encoding), as I've wanted my files to play at same level regardless of the player (I use foobar, so I could get foobar to do this - but I'm not always listening to my files on my system). Anyway, the results of my very small test suggest that for lossy.FLAC files encoding the original WAV versus the WavGained WAV would be a good idea:

Using:
lFLCDrop.v1.2.0.5.lFLC.bat.v1.0.0.7
lossyWAV beta v0.7.9
FLAC 1.2.1

Test:
2 copies of the same file (original.wav and wavgained.wav), the only difference being that the latter has been through Wav Gain and is 4.55 dB lower in volume.

SETTINGS: lossy.wav -3, FLAC -5

original.wav (93.55 dB)  [FLAC-5 = 690kbps, lossy.FLAC = 475kbps]
<edited to make sense>
lossy.FLAC is 31% smaller than the FLAC. </edit>

wavgained.wav (89.00 dB)  [FLAC-5 = 626kbps, lossy.FLAC = 477kbps]
<edit>lossy.FLAC is 24% smaller than the FLAC.</edit>

So this tells me that if I use the original and use foobar to look after the replay gain function, my lossy.FLAC collection would be approx 2/3 of the size of my FLAC collection.

But if I WavGain my files prior to encoding (which I had been doing) my lossy.FLAC collection would be approx 3/4 of the size of my FLAC collection.

That's a substantial difference.

If this is not news and everyone already knows this then fine, but it will be useful later to make clear to users that this is the case, as obviously it has implications regarding which method of replay gain one goes for.

C.
PC = TAK + LossyWAV  ::  Portable = Opus (130)