It's true some heuristics were introduced, especially spreading and skewing - spreading from the very start. Without these heuristics the method may have a better justification, but it comes at the price of a seriously increased bitrate.With the advanced options everybody who wants to can get rid of the heuristics: -skew 0 -snr 0 -fft 10101 -spf 11111-11111-11111-11111-11111 -nts 0 for instance when using a 64, 256, and 1024 sample FFT.I personally love the reduced bitrate given by spreading and skewing, and I feel secure enough with it according to experience.I agree however that this gives rise to the question whether we should readjust the quality levels. Maybe -1 should go to Axon's pure method, and maybe -2 should be a mixture of current -2 and -1, for instance the FFT usage like that of -1 (maybe dropping the 128 sample FFT), but with an -nts value of 2.I personally would agree with such a solution.ADDED:I just saw your new beta, Nick. So I see -snr should be negative to the limit for avoiding the skewing/snr heuristics. Spreading length should be 1 however IMO to avoid the spreading heuristics. The constant spreading of 4 was just 2Bdecided's spreading heuristics at his start up as far as I can see it. There's no reason IMO to use a blocksize of 1024. 2Bdecided just used a 1024 sample block size when he started things.Of course not averaging FFT outcome at all is fine in a pure sense but is suspected to be a huge overkill especially in the high frequency range bringing bitrate up.
As an aside, using -0 -spf 11111-11111-11111-11111-11111 -cbs 512 -fft 10001 yields: 56.47MB / 637.0kbps; changing to -fft 10101 yields: 57.60MB / 649.7kbps on my 53 sample set.Bearing in mind that the source FLAC files amount to 69.36MB / 781kbps, that's not really a great saving.
 And the 4 bin spreading function was there from the very beginning in David's original script. [/edit]
I just re-read Axon's post. I'm not sure any more if he dislikes spreading as he seems to accept the critical band heuristics being the most important basis for our current spreading parameters. Sure this means already to accept some heuristics.Anyway the question remains: should we have the -1 configuration in such a way that configuration details have a very high degree of theoretical justification?
I'm just wanting to see if my understanding of the preprocessing method is somewhat accurate:Let's say that an amplitude of part of a 16-bit wave is +32295 (1111111000100111), LossyWAV will clip it so that the binary value contains many trailing zeros so that FLAC will compress those away as wasted_bits. The processed value of that amplitude will then become something like +32256 (1111111000000000) and save 9 bits. Is this the basic principle? Just wanting a little bit of clarification, thanks808
The primary advantage of lossless formats, it seems to me, is the future-proof factor (being able to benefit from it when a new and better encoder or a different format comes around rather than having that option made unattractive by the huge quality per bitrate losses involved in transcoding). So has anybody done listening tests to see how files processed by lossyWAV do when encoded into MP3/AAC/Vorbis/whatever?Also, where is the preferred place to discuss lossyWAV? It seems like it would belong in the "other lossy formats" forum, but all the discussion of it seems to be restricted to this thread and the original thread in the FLAC forum.
Quote from: Josef Pohm on 27 November, 2007, 12:47:57 PM...OFR supports wasted bits but I can't see a way for it to use a 512 samples frame size (nor my OPINION is that OFR was designed to work with such a small frame size).As long as the target codec can work on a multiple of the lossyWAV codec_block_size, or use -cbs xxx to set the lossyWAV codec_block_size to the same as the target codec, or I get off my behind and implement a -ofr parameter to specify codec specific settings (as for WMALSL).
...OFR supports wasted bits but I can't see a way for it to use a 512 samples frame size (nor my OPINION is that OFR was designed to work with such a small frame size).
The reason why Monkey uses large frames (up to 4s at 44.1khz) relies on it's architecture. OptimFROG suffers from the same problem. The adaptive predictors have to catch up some data...
Well, insofar as nothing in psychoacoustics is set in stone and there are going to be heuristics to evaluate very complicated phenomena, you can't escape them. I mean, the Bark scale seems like a hack in the first place, as every closed-form EBW equation probably is.But clearly, spreading exists in any halfway-complete masking model. To leave such a tempting bone out there without chewing on it is madness. I'd just like to know how the predicted -spf numbers line up against what the tunings are, and have an option to use the theoretical numbers.I would use a different option than -1 for a setting that matched theoretical predictions, because there's still a need for -1 to -3 in their current incarnations. Moreover, whatever setting exists must still be absolutely transparent. It seems like 2BDecided's original code had some artifact problems... which makes no sense if it was purely by the book.
... @Halb27: Maybe I'm being a little over protective of the settings we have arrived at after quite a bit of work. Let's rename them as -DAP1, -DAP2 & -DAP3, and start again on the pure method versions. Thinking about it, I feel that -snr may be useful in the pure method.Attached again (to bring it closer to the conversation) my spreading excel sheet.
Quote from: Nick.C on 28 November, 2007, 03:39:53 AM... @Halb27: Maybe I'm being a little over protective of the settings we have arrived at after quite a bit of work. Let's rename them as -DAP1, -DAP2 & -DAP3, and start again on the pure method versions. Thinking about it, I feel that -snr may be useful in the pure method.Attached again (to bring it closer to the conversation) my spreading excel sheet.Sorry it was me who brought in some confusion wanting to have -1 going the extremely pure way.I've thought it over at night (see my last post) - and come to the conclusion that with our current -1 we're going the pure way. Stuff from the sausage factory like skewing doesn't hurt quality a bit - the contrary is true. We do have to make some practical considerations for the way we do the FFT analyses, but here too I think this is in agreement with the pure way though details are always disputable.So I think we can leave -1 as is. Sure suggestions for improvements are always welcome.-3 is typically used with DAPs as you said, and -2 is a compromise for -3 and -1, kind of a -1 for the more practically minded.BTW your spreading excel sheet was of high value for me on deciding about the spreading details - as far as it was me who worked out the details.A suggestion:It looks like it will be hard to disqualify -3 qualitywise (which is a good thing of course). Maybe for testing we can do it the other way around, start with an even less demanding quality setting in such a way that we do get into trouble, and increase the quality demands until quality is fine with the problems found. This way we can get a feeling of how big the security margin of -3 is. It is expected to be small, but who knows?Essentially this means that we should be able to set -nts to a value higher than +6.
... It's easier than that: use -snr <large negative number> with v0.5.3.....
Quote from: Nick.C on 28 November, 2007, 09:52:22 AM... It's easier than that: use -snr <large negative number> with v0.5.3.....I have no idea what a negative -snr value is doing. I had thought bringing in snr means giving the relevant min the chance to go lower than when not using snr. From this understanding any snr value has only the chance to make things more defensive compared to not using snr. Sure as we do use a snr value of 21 we will get lower bitrate when turning the -snr value down. However I wonder what makes your problem samples set go so low in bitrate. Guess there's a specific meaning of a negative snr value.Anyway I'd prefer to use a higher -nts value of up to say 40 instead. It would give us the chance to keep the usual skew/snr combination and go extreme with noise threshold for learning about lossyWAV behavior.
lossyWAV beta v0.5.4 : WAV file bit depth reduction method by 2Bdecided.Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.orgUsage : lossyWAV <input wav file> <options>Example : lossyWAV musicfile.wavQuality Options:-0 emulate script [2xFFT] (-cbs 1024 -nts 0.0 -skew 0 -snr -215 -spf 44444-44444-44444-44444-44444 -fft 10001)-1 extreme quality [4xFFT] (-cbs 512 -nts -2.0 -skew 36 -snr 21 -spf 22224-22225-11235-11246-12358 -fft 11011)-2 default quality [3xFFT] (-cbs 512 -nts +1.5 -skew 36 -snr 21 -spf 22224-22235-22346-12347-12358 -fft 10101)-3 compact quality [2xFFT] (-cbs 512 -nts +6.0 -skew 36 -snr 21 -spf 22235-22236-22347-22358-2246C -fft 10001)-o <folder> destination folder for the output file-nts <n> set noise_threshold_shift to n dB (-48.0dB<=n<=+48.0dB) (-ve values reduce bits to remove, +ve values increase)-force forcibly over-write output file if it exists; default=offCodec Options:-wmalsl optimise internal settings for WMA Lossless codec; default=offAdvanced / System Options:-snr <n> set minimum average signal to added noise ratio to n dB; (-215.0dB<=n<=48.0dB) Increasing value reduces bits to remove.-skew <n> skew fft analysis results by n dB (0.0db<=n<=48.0db) in the frequency range 20Hz to 3.45kHz-cbs <n> set codec block size to n samples (512<=n<=4608, n mod 32=0)-fft <5xbin> select fft lengths to use in analysis, using binary switching, from 64, 128, 256, 512 & 1024 samples, e.g. 01001 = 128,1024-overlap enable conservative fft overlap method; default=off-spf <5x5hex> manually input the 5 spreading functions as 5 x 5 characters; These correspond to FFTs of 64, 128, 256, 512 & 1024 samples; e.g. 22235-22236-22347-22358-2246C (Characters must be one of 1 to 9 and A to F (zero excluded).-allowable select allowable number of clipping samples per codec block before iterative clipping reduction; (0<=n<=64, default=0).-clipping disable clipping prevention by iteration; default=off-dither dither output using triangular dither; default=off-quiet significantly reduce screen output-nowarn suppress lossyWAV warnings-detail enable detailled output mode-below set process priority to below normal.-low set process priority to low.Special thanks:David Robinson for the method itself and motivation to implement it in Delphi.Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
lFLCDrop Change Log:v18.104.22.168-added support for "-0 (emulate script)" optionlFLC.bat Change Log:v22.214.171.124- improved temp file handling- fixed quality preset bug
I just tried insane -nts settings on my problem set to get a feeling about the security margin we have when using -3:a) -3 -nts 30 => 319/390 kbps for my regular/problem sample setI was astonished about the quality of Atem-lied which I tried first. badvilbel was next and also has a remarkable quality. bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A however have big errors (no abxing required), and the errors of furious and triangle are also easy to perceive though quality isn't really bad.The big errors of bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A are pretty much of the kind I know from wavPack lossy.Everybody who likes to hear the potential problems lossyWav has when accuracy demand is too small is invited to do a listening test with this setting. The problems of the bad samples mentioned are easy to hear.b) -3 -nts 20 => 320/405 kbps for my regular/problem sample setResults were a lot better. Only bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A are not transparent, with bibilolo and S37_OTHERS_MartenotWaves_A being already roughly acceptable. Just keys_1644ds is still missing quality very seriously, though it too has improved in a remarkable way.c) -3 -nts 16 => 321/419 kbps for my regular/problem sample setOnly key_1644ds and S37_OTHERS_MartenotWaves_A are not transparent to me. S37_OTHERS_MartenotWaves_A is already very hard to abx for me, and even for key_1644ds it's not easy.d) -3 -nts 12 => 326/438 kbps for my regular/problem sample setOnly keys is not totally transparent to me - and I was able to abx keys only with a pretty bad 7/10 result.e) -3 -nts 9 => 333/455 kbps for my regular/problem sample setNow also keys_1644ds is transparent to me.Looking at these results to me even -3 (-nts 6 defaulted) seems to have a remarkable security margin.The default -3 setting yields 345/474 kbps for my regular/problem sample set.
This gives me an opportunity to thank you all though for the work that you have put in. I think this is an extremely exciting development.
Quote from: Synthetic Soul on 27 November, 2007, 06:02:40 AMThis gives me an opportunity to thank you all though for the work that you have put in. I think this is an extremely exciting development.I second this!Thank you very much!If lossyWAV get's enough users, i will evaluate if some modifications of TAK can significantly improve the compression of it's output. In this context "significantly" means at least by about 20 kbps. I have some ideas, but you can not be sure until you tried it.Thank you again! Thomas
-nts amended as requested.Now you can really cause awful results...Attached File lossyWAV_beta_v0.5.4.zip
Quote from: Nick.C on 28 November, 2007, 11:21:15 AM-nts amended as requested.Now you can really cause awful results...Attached File lossyWAV_beta_v0.5.4.zipJust a side note again .. when you're going to experiment further (in the code) with settings it would be best to call those (in between) versions Alpha again. When you arrive at something you're confident about you could release another beta. (I'm not saying something isn't right, but maybe another alpha round is needed?)
... The more I listen to -3 -snr -215, the more I like it. ...
Quote from: Nick.C on 29 November, 2007, 03:09:44 AM... The more I listen to -3 -snr -215, the more I like it. ...From the bitrate you gave for your sample set which consists of problem samples to a high degree it's hard to imagine that keys_1644ds, bibilolo, or Martenotwaves are fine. I will try it this weekend. Anyway I'd like to know what a negative -snr value is doing.