If you want a similar preprocessing for FLAC or WavPack you'd do something like this:- estimate LPC filter coeffs (H(z)) and temporarily filter the block to get the residual- check the residual's power and select "wasted_bits" accordingly- quantize original (unfiltered) samples so that the "wasted_bits" least sigcificant bits are zero- use 1/H(z) as noise shaping filter.
If you further check what psychoacoustic models usually do you'll notice that they allocate more bits to lower frequencies than to higher frequencies (higher SNR for lower freqs) most of the time. You then can tweak the noise shaping filter to W(z)/H(z) where W(z) is some fixed weighting so that you have a higher SNR for lower freqs.
(I derived W(z) by feeding OggEnc with mono pink noise).
begin pcll:=low_frequency_bin[analysis_number]-1; pchl:=high_frequency_bin[analysis_number]-1; for pci:=0 to pchl-pcll+1 do Begin v1:=fft_result[pci]; v2:=fft_result[pci+1]; v3:=fft_result[pci+2]; vMax:=max(v1,max(v2,v3)); vMin:=min(v1,min(v2,v3)); vTot:=v1+v2+v3; vMid:=vTot-vMax-vMin; vAvg:=vTot/3; Case fft_bit_length[analysis_number] of 0.. 6 : fft_result2[pci+1]:=(vAvg); 7 : fft_result2[pci+1]:=(vMax*1.50+vMid+vMin*0.5)/3; 8 : fft_result2[pci+1]:=(vMax*2.00+vMid)/3; 9 : fft_result2[pci+1]:=(vMax*2.50+vMid*0.5)/3; 10..15 : fft_result2[pci+1]:=(vMax); End; End; end;
To be honest I really don't understand why you guys insist on introducing white-only noise.
It's like travelling from A (lossless) to B (perceptual lossy) and stopping right in the middle where both disadvantages are combined: lossy encoding (B) + high bitrate (A) necessary due to lack of noise shaping.
IMHO the best thing to do here is following Edler, Faller and Schuller: Perceptual Audio Coding Using a Time-Varying Linear Pre- and Post-Filter. Their psychoacoustic analysis results in a "pre-filter" and a "post-filter". The post filter acts like a noise shaper. To make it work for lossy FLAC justskip the prefilter, we don't need it.derive wasted_bits according to the first sample of the post-filter's impulse response. This first sample tells you the optimal quantizer step size.use the ("normalized") post-filter as noise shaping filter. (Normalized: A noise shaping filter's impulse response must start with the coefficient '1' and has an average log response of 0 dB on a linear frequency scale.)About sharing code: I'd have to locate the source code, first. It's been a while since I touched it. Exactly what are you interested in? The "complicated" part of it was the levinson durbin algorithm. I could share a Java version if you like. It's not hard to find other source code for it with the help of Google, I suppose. If you want to follow the "Edler et al type approach" you could borrow a lot of Speex code for handling the filters.
2. I didn't stop there. I've done a noise shaping version. See the previous page!
You could, of course, make this a proper psychoacoustic codec, but I'd only do this for fun - what would be the practical point? You'd be forcing the underlying issues of FLAC onto a psychoacoustic codec - why would you do that? Surely it would be much better to use Vorbis or something without these issues?
I don't think myself or Nick are up for designing a new psychoacoustic model(!), though I guess we could "borrow" one.
Thank you for this. All pointers greatfully received!Does it have any IP attached?What form is the "post-filter" in?The reason for the first question is obvious! I ask the second because I know what a noise shaping filter should be like (you missed minimum phase off your list) and it's not trivial getting exactly what you want - the LPC-based method delivers filters which check all the boxes - does this one? If not, is "normalization"/conversion easy?
The idea is simple: lossless codecs use a lot of bits coding the difference between their prediction, and the actual signal. The more complex (hence, unpredictable) the signal, the more bits this takes up. However, the more complex the signal, the more "noise like" it often is. It's seems silly spending all these bits carefully coding noise / randomness.So, why not find the noise floor, and dump everything below it?This isn't about psychoacoustics. What you can or can't hear doesn't come into it. Instead, you perform a spectrum analysis of the signal, note what the lowest spectrum level is, and throw away everything below it. (If this seems a little harsh, you can throw in an offset to this calculation, e.g. -6dB to make it more careful, or +6dB to make it more aggressive!).
Using fb2k bit compare as a quick way to "see" differences, lossyWAV has fewer samples which are different to the lossless original than OGG and a smaller maximum magnitude of difference than OGG.
Quote from: Nick.C on 11 October, 2007, 06:43:51 AMUsing fb2k bit compare as a quick way to "see" differences, lossyWAV has fewer samples which are different to the lossless original than OGG and a smaller maximum magnitude of difference than OGG.What happened to the strong argument that audio quality should not be "seen" and codecs not evaluated by substracting...
Please don't feel discouraged.I think it's okay if somebody thinks there is no use in this approach.
Quote from: j7n on 11 October, 2007, 06:52:07 AMQuote from: Nick.C on 11 October, 2007, 06:43:51 AMUsing fb2k bit compare as a quick way to "see" differences, lossyWAV has fewer samples which are different to the lossless original than OGG and a smaller maximum magnitude of difference than OGG.What happened to the strong argument that audio quality should not be "seen" and codecs not evaluated by substracting...[/size]Yes, I know, sorry, I won't do it again. However, as lossyWAV only ever rounds a sample to fewer bits the sample value barely changes. Surely fewer changed samples has some merit?
Sounds like you're trying to get the worse from standard lossy and lossless codecs. What you have now is a *lossy* codec that just uses a really crappy psychoacoustic model *and* is stuck with time-domain linear prediction instead of frequency transforms. [...] I can't see any advantage of your idea compared to a lossy codec at very high rate (e.g. Vorbis q10 or something like that).
FWIW there are circumstances where a real psychoacoustic model (even backed off from the assumed threshold of audibility by several dB via the use of "insane" quality settings), is still inferior to having no psychoacoustic model at all.
Quote from: 2Bdecided on 11 October, 2007, 07:26:19 AMFWIW there are circumstances where a real psychoacoustic model (even backed off from the assumed threshold of audibility by several dB via the use of "insane" quality settings), is still inferior to having no psychoacoustic model at all.I totally disagree. Having no model at all is for sure inferior to having a model that's a bit off. Also, even if you don't trust the raw output of a psy model you can still enforce some safety conditions like it's possible with MusePack (--minSMR so_and_so).Maybe we interpret "having no/some psychoacoustic model" differently. Let's say we do 2-pass VBR to achieve some target bitrate. How can an encoder without an idea of how we perceive things perform better than an encoder who knows about psychoacoustics?
You can't shoot for a given bitrate (CBR or VBR) with lossyFLAC. You can only shoot for a given quality. Even there, options are limited!
The idea here is to have a codec which delivers transparency, or transparency plus resilience to anything upstream/downstream.