The error looks like filter instability
I need to work out what conditions that occurs under.
lossyWAV beta 1.2.2j attached to post #1 in this thread.
I have worked out a temporary fix by only feeding part (>80%) of the "quantization error" into the WAPL_Update function - this seems to work for the two samples provided by doccolinni.
The determination of bits-to-remove has not changed - I remember you saying that the make_filter routine determines this (gain?) but it was much lower than the value determined using the existing method and was not used. Maybe this could be used totally separately (i.e. no other FFT analyses of the signal) to determine bits-to-remove, but then it wouldn't be lossyWAV as we know it.
IMHO, you should generate better curves in the first place instead of feeding "bad" curves to the filter design routines and messing with the noise shaping loop.
You should partition the spectrum into non-uniform subbands (for example, starting with bandwidths of 100Hz and increasing bandwidths up to 1000Hz for higher frequencies) compute tolerable noise levels in dB for each of these subbands based on the signal's power and "tonality" in those areas. As a first approximation you can use fixed SNRs depending on the frequency ranging from 40 dB (lower frequencies) to 10 dB (above 6 kHz). Make sure that the levels between neighbouring bands don't differ too much (i.e. differences restricted to +/- 15 dB). In order to do this you might have to decrease some noise levels. Keep in mind that 0dB corresponds to the noise level of rectangular quantization noise with +/- 1/2 LSBs. So, going below, say, -20 dB hardly makes sense. Interpolate those resulting data points smoothly (for example, with a 2nd order B-Spline as "interpolator").
To be really clear, IMO (please correct me if I'm wrong here SebG), what SebG is talking about is how to get the thing working - not how to do a really good job.
If you're going to have one noise shaping filter per block (and that seems like a good start), then for a conservative approach I think you could try keeping the previous approach of using multiple FFT sizes to determine the lowest signal level in the block.
[...] Then smooth that target function, bringing the peaks down (not raising the troughs up). The bits you can remove is defined by the area under that target function.
I think any "proper" psy model should also be post-processed in this way, finding the lowest allowed noise level in each frequency band in each block, and smoothing the resulting target function so the filter is realisable and stable.
Going further, you could have multiple (successive) target filters in each lossyWAV block (though obviously only one bits_to_remove value - unless you communicate varying block size to the encoder!). Multiple target filters in one block would bring an advantage in a signal where the spectrum shape changed a lot within one block (different filter required) while the area under the spectrum didn't change much (same bits to remove required), but it wouldn't help much otherwise.
Also IMO you should "just" grab the psy model from musepack or similar.
What would be the practical reason between using a Musepack-based-processed FLAC encode and, say, a Musepack --braindead encode?
The bitrates should be roughly similar
and I imagine there will be some minorly adverse effects for transcoding to another lossy format either way.
What's the practical reason to use lossyWAV at all?Your DAP plays FLACs?You have more confidence in FLAC (or whatever)'s longevity?You don't want to use quite so much space as lossless?You like the potential for fast lossless efficient transcoding to most other lossless formats?Fun?
...in order to avoid further duplication of effort, I believe LossyWAV's adaptive noise-shaping/psymodel research priority should be second-generation transparency in processed-to-lossy transcoding, unlike Musepack's first-generation transparency in listening.