I am currently looking at what impact a spreading_function_length of 1 would have and how to implement it. It could be as simple as if FFT_length<256 then spreading_function_length=1. if 256 or 512 then 1,2,3,4. if 1024 or above then 2,3,4,5.

Quote from: Nick.C on 22 October, 2007, 07:59:33 AMI am currently looking at what impact a spreading_function_length of 1 would have and how to implement it. It could be as simple as if FFT_length<256 then spreading_function_length=1. if 256 or 512 then 1,2,3,4. if 1024 or above then 2,3,4,5.Wonderful, thank you. In case this brings bits to remove too much down there's still room for compromise especially for FFT_length < 256. Guess for the high frequency range spreading_length needs not be 1 even with short FFT lengths.

I added a final table to the bottom of the spreadsheet which takes the max(1,int(log2(number_of_bins_in_critical_band_width))) - this yields a sensible starting point.

Quote from: Nick.C on 22 October, 2007, 09:56:58 AMI added a final table to the bottom of the spreadsheet which takes the max(1,int(log2(number_of_bins_in_critical_band_width))) - this yields a sensible starting point.Fine, this table shows under what circumstances Width of Critical Band Width in FFT Bins is < 1 which is most critical IMO. IMO it should be >1 (better: >= 2), resp. spreading_length should be 1 in case 'Width of Critical Band Width in FFT Bins > 1' cannot be achieved.This is with respect to where these requirements are not fulfilled at the moment. I'm not talking about making spreading length larger than 5 in the high frequency area with long FFTs though to a cautiously chosen extent this may be possible - especially for -2 and more so -3. This is something that can be considered later.

BitRate SNR=00 SNR=03 SNR=06 SNR=09 SNR=12 SNR=15 SNR=18 SNR=21 SNR=24 SNR=27 SNR=30SKEW=00 468.4 468.4 468.4 468.4 468.4 469.2 471.4 476.2 483.2 494.7 508.7SKEW=03 468.7 468.7 468.7 468.7 468.8 469.8 472.0 477.3 484.9 497.3 512.1SKEW=06 468.9 468.9 468.9 468.9 469.0 470.3 472.8 478.5 486.9 499.9 515.5SKEW=09 469.5 469.5 469.5 469.5 469.6 471.0 473.8 479.9 488.9 502.4 518.7SKEW=12 470.1 470.1 470.1 470.1 470.2 471.8 474.9 481.4 491.1 505.1 522.1SKEW=15 470.9 470.9 470.9 470.9 471.1 472.7 476.2 483.1 493.5 507.7 525.4SKEW=18 471.9 471.9 471.9 471.9 472.1 473.9 477.6 484.8 495.9 510.2 528.7SKEW=21 473.3 473.3 473.3 473.3 473.5 475.3 479.2 486.7 498.3 513.0 531.9SKEW=24 475.2 475.2 475.2 475.2 475.4 477.0 481.3 488.9 500.9 515.6 535.1SKEW=27 477.5 477.5 477.5 477.5 477.7 479.2 483.6 491.2 503.7 518.6 538.3SKEW=30 480.5 480.5 480.5 480.5 480.6 482.0 486.4 494.0 506.6 521.7 541.6

So from this table a higher value of skew than usual so far isn't critical as long as the snr value isn't chosen very high.We're in a world of heuristics, but to me the skew option is more meaningful than the snr option.So values up to say skew=21 or 24 and snr=18 are well acceptable IMO for -1 judging from your table.(Sure I have headroom in mind for the variable spreading function modifications).

I've tried a first attempt at spreading which varies with every fft_length. Reference: FLAC=788.6kbps / 67.91MBWhen there is no averaging at 64 sample fft_length, -2 yields 619.6kbps / 53.36MB (64:1,1,1,1,1; 256:1,1,2,2,3; 1024:2,3,3,4,5). A less conservative version (still more conservative than previous 2,3,3,4,5 for all fft_lengths) yields 485.8kbps / 41.84MB (64:2,2,2,3,3; 256:2,2,3,3,4; 1024:2,3,3,4,5). Another iteration (64:2,2,2,2,2; 256:2,2,2,3,3; 1024:2,3,3,4,5) yields 510.3kbps / 43.95MBThis in comparison with the current fixed spreading yields 470.2 kbps / 40.49MB.

Do you mind trying: (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5)? I still care most about the very low frequency edge.Just a question: What's your sample set? If it's regular music we should try to hold bitrate down. If it's problem samples we shouldn't care about bitrate going up. Ideally bitrate is kept rather low with regular music and increases significantly with problem samples (not necessarily individually but as classes of well- and bad-behaving samples).

04 - Black Sabbath - Iron Man.wav06_florida_seq.wav10 - Dungeon - The Birth- The Trauma Begins.wav14_Track03beginning.wav16_Track03entreaty.wav18_Track04cakewithtea.wav34_Gabriela_Robin___Cats_on_Mars.wav41_30sec.wavA02_metamorphose.wavA03_emese.wavAngelic.wavannoyingloudsong.wavaps_Killer_sample.wavAtem_lied.wavATrain.wavBachpsichord.wavbadvilbel.wavbibilolo.wavBigYellow.wavbirds.wavbruhns.wavcricket__insect___edit_.wavdither_noise_test.wavE50_PERIOD_ORCHESTRAL_E_trombone_strings.waveig.wavFurious.wavglass_short.wavharp40_1.wavherding_calls.wavjump_long.wavkeys_1644ds.wavladidada_10s.wavLiebe_so_gut_es_ging.wavMoon_short.wavPoets_of_the_fall___Shallow.wavrach_original.wavrawhide.wavRush___Hold_Your_Fire___Turn_the_Page.wavS13_KEYBOARD_Harpsichord_C.wavS30_OTHERS_Accordion_A.wavS34_OTHERS_GlassHarmonica_A.wavS35_OTHERS_Maracas_A.wavS53_WIND_Saxophone_A.wavSeriousTrouble.wavswarm_of_wasps__edit_.wavthewayitis.wavthe_product.wavtriangle.wavtriangle_2_1644ds.wavtrumpet.wavVELVET.wavwait.wav

Quote from: halb27 on 23 October, 2007, 03:46:16 PMDo you mind trying: (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5)? I still care most about the very low frequency edge....Done - attached alpha v0.3.16b : 494.2kbps / 42.56MB.

Do you mind trying: (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5)? I still care most about the very low frequency edge....

Quote from: Nick.C on 23 October, 2007, 03:53:45 PMQuote from: halb27 on 23 October, 2007, 03:46:16 PMDo you mind trying: (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5)? I still care most about the very low frequency edge....Done - attached alpha v0.3.16b : 494.2kbps / 42.56MB.Thank you. So as 494.2kbps is the result of (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5) I think that's very, very promising, and this is especially true as your sample set consists more or less of short problem samples.With this in mind I guess it's even acceptable to go a bit more conservative (as a target for -1 when we're done), something like(64:1,1,1,2,4; 256:1,1,2,3,4; 1024:1,3,3,4,5) - looking at your wonderful 'Width of Critical Band Width in FFT Bins' table more closely.I'd love to go through my 51 regular song collection I used before with this setting, if you can provide such a version. BTW default for -skew and -snr is still 12 for each of these options?

... lossyWAV alpha v0.3.16c attached : 536.5 kbps / 46.20MB. ...

I welcome most your idea to have a fixed fft analysis strategy (fft length of 64, 256, 1024) for any quality setting (as done with -2 so far).Sufficient IMO and makes fine tuning a lot more easy:For fine tuning purposes can you provide spreading length options of the kind:-spreading64 11234-spreading256 12334-spreading1024 23345or similar.This way anybody can try to find a promising spreading length strategy.I'd love to search for such strategies for -1, -2, -3, and I wouldn't have to bother you with building new lossyWav versions for whatever comes to my mind.

lossyWAV alpha v0.3.17 : WAV file bit depth reduction method by 2Bdecided.Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.orgUsage : lossyWAV <input wav file> <options>Example : lossyWAV musicfile.wavOptions:-1, -2 or -3 quality level (1:overkill, 2:default, 3:compact)-nts <n> set noise_threshold_shift to n dB (-15dB<=n<=0dB, default=-1.5dB) (reduces overall bits to remove by 1 bit for every 6.0206dB)-snr <n> set minimum average signal to added noise ratio to n dB; (0dB<=n<=48dB, default=12dB)-skew <n> skew fft analysis results by n dB (0db<=n<=48db, default=12dB) in the frequency range 20Hz to 3.45kHz-spf <15hex> manually input the 3 spreading functions as 3 x 5 hex characters; e.g. 444444444444444, default=111241123423345; Hex characters must be one of 1,2,3,4,5,6,7,8,9,A,B,C,D,E,F (zero excluded).-o <folder> destination folder for the output file-clipping disable clipping prevention by iteration; default=off-force forcibly over-write output file if it exists; default=offAdvanced / System Options:-dither dither output using triangular dither; default=off-quiet significantly reduce screen output-nowarn suppress lossyWAV warnings-detail enable detailled output mode-below set process priority to below normal.-low set process priority to low.Special thanks:Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.

... This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). ...

Quote from: Nick.C on 24 October, 2007, 02:54:28 AM... This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). ...Sorry, I don't understand this. Can you please explain it a bit?

Quote from: halb27 on 24 October, 2007, 04:20:27 AMQuote from: Nick.C on 24 October, 2007, 02:54:28 AM... This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). ...Sorry, I don't understand this. Can you please explain it a bit? Basically, you will need to input a 15 character hexadecimal string, regardless of how many analyses will actually be carried out at the specified quality level (-1 = 2048/1024/256/64 sample fft_length; -2 = 1024/256/64 sample fft_length; -3 = 1024/64 sample fft_length). What would happen is that the user always inputs 3 spreading functions and those three are mapped to 64, 256 and 1024 fft_length spreading. Then, copies are made into the spreading functions for 128, 512 and 2048 fft_length spreading functions.

Quote from: Nick.C on 24 October, 2007, 07:24:26 AMQuote from: halb27 on 24 October, 2007, 04:20:27 AMQuote from: Nick.C on 24 October, 2007, 02:54:28 AM... This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). ...Sorry, I don't understand this. Can you please explain it a bit? Basically, you will need to input a 15 character hexadecimal string, regardless of how many analyses will actually be carried out at the specified quality level (-1 = 2048/1024/256/64 sample fft_length; -2 = 1024/256/64 sample fft_length; -3 = 1024/64 sample fft_length). What would happen is that the user always inputs 3 spreading functions and those three are mapped to 64, 256 and 1024 fft_length spreading. Then, copies are made into the spreading functions for 128, 512 and 2048 fft_length spreading functions.I imagined it to be like that - just wanted to make sure.In this case the user doesn't have full control of the spreading length for every fft length.If for instance it turns out to be important for the 1024 bin fft that there is a 1 in the spreading like in (1,3,3,4,5), it would be so for a 2048 bin fft as well and might have a negative impact on bitrate.There are dependancies which I'd prefer to see avoided.I thought you wanted to be content with 3 analyses. So do you still think of using a fft length of 2048 for -1?If yes I'd prefer a 20 character hex string covering all fft lengths used (64, 256, 1024, 2048) in this order, and you ignore the 256 and 2048 part if -3 is used resp. you ignore the 2048 part if -2 is used.

lossyWAV alpha v0.3.18 : WAV file bit depth reduction method by 2Bdecided.Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.orgUsage : lossyWAV <input wav file> <options>Example : lossyWAV musicfile.wavOptions:-1, -2 or -3 quality level (1:overkill, 2:default, 3:compact)-nts <n> set noise_threshold_shift to n dB (-15dB<=n<=0dB, default=-1.5dB) (reduces overall bits to remove by 1 bit for every 6.0206dB)-snr <n> set minimum average signal to added noise ratio to n dB; (0dB<=n<=48dB, default=12dB)-skew <n> skew fft analysis results by n dB (0db<=n<=48db, default=12dB) in the frequency range 20Hz to 3.45kHz-spf <4x5hex> manually input the 4 spreading functions as 4 x 5 hex characters; e.g. 44444-44444-44444-44444, default=11124-11234-23345-34456; Hex characters must be one of 1 to 9 and A to F (zero excluded).-o <folder> destination folder for the output file-clipping disable clipping prevention by iteration; default=off-force forcibly over-write output file if it exists; default=offAdvanced / System Options:-dither dither output using triangular dither; default=off-quiet significantly reduce screen output-nowarn suppress lossyWAV warnings-detail enable detailled output mode-below set process priority to below normal.-low set process priority to low.Special thanks:Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.

We're in a world of heuristics, but to me the skew option is more meaningful than the snr option.

Quote from: halb27 on 22 October, 2007, 12:09:08 PMWe're in a world of heuristics, but to me the skew option is more meaningful than the snr option.What I understood what SKEW was for, it is an "offset" to SNR to give the low freqs (where we would more easily discern noise) a better snr. (with a stretch you could call it a form of noise shaping)So if you change SNR, this will impact the values where SKEW is applied too.If I'm correct the effect on quality (==snr?) would be- when you raise SKEW you (only) give better snr to the lower frequenties- when you raise SNR and lower SKEW (at the same time) you (only) give the high freqs a better snr.So choose where you want the extra quality... or just vary the SNR.BTW. Has anybody found that SKEW above 9 improves a problem sample?

Quote from: GeSomeone on 24 October, 2007, 02:24:38 PMQuote from: halb27 on 22 October, 2007, 12:09:08 PMWe're in a world of heuristics, but to me the skew option is more meaningful than the snr option.What I understood what SKEW was for, it is an "offset" to SNR to give the low freqs (where we would more easily discern noise) a better snr. (with a stretch you could call it a form of noise shaping)So if you change SNR, this will impact the values where SKEW is applied too.If I'm correct the effect on quality (==snr?) would be- when you raise SKEW you (only) give better snr to the lower frequenties- when you raise SNR and lower SKEW (at the same time) you (only) give the high freqs a better snr.So choose where you want the extra quality... or just vary the SNR.BTW. Has anybody found that SKEW above 9 improves a problem sample?Well, the skew option is more meaningful to me than the snr option just because I have an imagination about the effect of skew (though I don't really know how useful it is), but I personally don't really understand the idea behind snr. Maybe Nick can help.I personally accept that we are partially doing a bit of rather wild experimenting as long as this is done in a pretty conservative way that makes sure the very good quality already achieved.

To me, -snr is a safety net that calculates the average of all the relevant fft bins and then deducts the value (default=12) to derive a threshold value. If the minimum result of the relevant fft bins is below the threshold value then the minimum result is used, if above then the threshold value is used.

Quote from: Nick.C on 24 October, 2007, 03:36:06 PMTo me, -snr is a safety net that calculates the average of all the relevant fft bins and then deducts the value (default=12) to derive a threshold value. If the minimum result of the relevant fft bins is below the threshold value then the minimum result is used, if above then the threshold value is used.If that's all then they are not related, and I was wrong. I must be mixing up -SNR with some other noise threshold.