It seems like 2BDecided's original code had some artifact problems... which makes no sense if it was purely by the book.
Axon,I share your unease at the way pseudo-psychoacoustics have been arrived at for lossyWAV. I wouldn't put it any stronger than that though. I don't have the time to get involved, and am very grateful to Nick and halb27 for pushing this forward with such enthusiasm.Quote from: Axon on 28 November, 2007, 02:42:00 AMIt seems like 2BDecided's original code had some artifact problems... which makes no sense if it was purely by the book.The basic algorithm is just "find the noise floor, and quantise at or below it".The fundamental flaw in my implementation was that it couldn't "see" dips in the noise floor at low frequencies which are audible to human listeners - so it would happily fill them with noise. The "resolution" I used wasn't sufficient for low frequencies. The solution is either to skew the results, or modify the spreading, or both (I haven't taken the time to figure out which is the "right" approach) - the current version does both, to great effect. The reason my original script got away with it most of the time is because there are very few recordings where the noise floor is lowest at low frequencies - normally, the lower limit is at a high frequency, so inaccuracies in estimating it at low frequencies have no effect on the result for most recordings.There was also a bug in later lossyFLAC MATLAB scripts which caused it to analyse the tail end of the "noise it had just added to the previous block" when assessing the noise floor of the current block. Nick spotted that, and corrected it in his code. I haven't generated a "fixed" MATLAB version.The obvious "extras" for lossyWAV are a hybrid/lossless mode (quite possible), and a noise-shaped mode (already implemented, but not released for IP reasons). Finally, it might make sense to delineate between a proper psychoacoustic model (borrow one?) and a non-psychoacoustic implementation (close to now, but tamed a little).btw Nick, I don't have any objections to you leaving switches in the final release for testing - just hide them well away in the depths of the manual! And please don't feel like you have to respect my wishes or anything - you've well and truly adopted my baby now! Cheers,David.
I corrected the Matlab script as well as my code and posted it as LossyFLAC6_x (I think).
Attached File Spread___Skew.zip ( 7.99k )
Quote from: Nick.C on 30 November, 2007, 07:33:53 AMAttached File Spread___Skew.zip ( 7.99k )It is very hard to see the effect of a parameter change because of the random Log FFT output Having read 2Bdecided's comment it might be best to ditch the -0 settings as they emulate a flawed implementation.
Having read 2Bdecided's comment it might be best to ditch the -0 settings as they emulate a flawed implementation.
I'm a bit concerned as to how, if I go down the route of two WAV files: one lossy; one lwcdf; that if a WAV file is processed more than once, then what happens if the wrong correction file is added to the lossy file? Probably something not too good....
I'd store a checksum of the lossyWAV result in the correction file so you can figure out a wrong combination in the bring-it-back-to-lossless application.
Other than that I'm having a hard time with listening tests resulting from your -snr -215 approach.I easily found that there's no magic with negative snr values: for my sample sets -snr -215/-100/-10/0 all gave the same average bitrate, and the result of -snr 10 was close by. So it's just the same machinery as with positive snr values: modifying the FFT min if the snr offset from the FFT average is lower. With -snr -215 or similar there's simply no modification of the FFT min, and -snr -215 simply works as if there was no snr machinery at all.
-3 -snr -215 yields 313/430 kbps with my regular/problem samples set. While this is welcome with regular tracks, it looks a bit low with the problem samples.I listened to it (to get used to problems I started with -nts 16), and I added more problem samples. The result wan't good with badvilbel, bibilolo, bruhns, dithernoise_test, eig, furious, keys_1644ds, utb. There are clear artifacts/distortions audible. Sure that was with an insane setting of -nts 16 for a warm up.Using -nts 9 and -nts 6 improves a lot, the distortion like noise is gone, I'd even call the results 'acceptable', but I can still abx furious, dithernoise_test, keys, utb, and badvilbel.My usual approach for improving is to bring bitrate up for the problem set but to a minor degree with the regular set. From current -3 setting and previous experience I know a '1' instead of the '2' for the first frequency zone of the 1024 sample FFT should do the job. It does, but only for the statistics, my listening experience yielded pretty much the same not totally satisfying quality.That's my current state. The interesting question is: if -2 -snr -215 is a bit poor for some problems, what is the most effective way to improve: may be a higher -skew value will do it, or may be just the basic thing of the entire machinery: a lower -nts value (would match the idea of going a bit back to the pure basics), or may be really the snr machinery has en essential participation in preserving quality (after all the current -3 quality is very good). Quite interesting questions, but the answers will take some time.And of course I'll try your new suggestion for the -3 setting.
Quote from: halb27 on 02 December, 2007, 05:20:55 AMI'd store a checksum of the lossyWAV result in the correction file so you can figure out a wrong combination in the bring-it-back-to-lossless application.Not sure how I will achieve this inside a WAV file....
So in the end IMO we should stick with current -3. An average bitrate of ~350 kbps for regular music is very good I think, and it seems we can't do essentially better with our weaponry without sacrificing safety margin to a considerable extent. What the investigation has shown is that -snr has it's own specific part in preserving quality. It's not just an amplification of the merits of the -skew option.
So your new proposal is within the quality demand which to me is fine for -3 though it's on the cutting edge.
The average bitrate for my regular sample set is 335 kbps which is only 10 kbps lower than that of current -3. Average bitrate however of the problem essence is 446 kbps, and that's 18 kbps less than that of current -3.
Quote from: halb27 on 02 December, 2007, 04:01:23 PMSo your new proposal is within the quality demand which to me is fine for -3 though it's on the cutting edge.Isn't that exactly where -3 should be? And -2 being "transparent as far as could be determined"?QuoteThe average bitrate for my regular sample set is 335 kbps which is only 10 kbps lower than that of current -3. Average bitrate however of the problem essence is 446 kbps, and that's 18 kbps less than that of current -3.3% to 4% extra compression is something lossless codecs would have to work very hard for, so nothing to give away easily, except for a reason of course.Thanks, for your testing and observations.
lossyWAV beta v0.5.5 : WAV file bit depth reduction method by 2Bdecided.Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.orgUsage : lossyWAV <input wav file> <options>Example : lossyWAV musicfile.wavQuality Options:-1 extreme settings [4xFFT] (-cbs 512 -nts -2.0 -skew 36 -snr 21 -spf 22224-22225-11235-11246-12358 -fft 11011)-2 default settings [3xFFT] (-cbs 512 -nts +1.5 -skew 36 -snr 21 -spf 22224-22235-22346-12347-12358 -fft 10101)-3 compact settings [2xFFT] (-cbs 512 -nts +6.0 -skew 36 -snr 21 -spf 22235-22236-22347-22358-2246C -fft 10001)Standard Options:-o <folder> destination folder for the output file-nts <n> set noise_threshold_shift to n dB (-48.0dB<=n<=+48.0dB) (-ve values reduce bits to remove, +ve values increase)-force forcibly over-write output file if it exists; default=offCodec Specific Options:-wmalsl optimise internal settings for WMA Lossless codec; default=offAdvanced / System Options:-snr <n> set minimum average signal to added noise ratio to n dB; (-215.0dB<=n<=48.0dB) Increasing value reduces bits to remove.-skew <n> skew fft analysis results by n dB (0.0db<=n<=48.0db) in the frequency range 20Hz to 3.45kHz-spf <5x5hex> manually input the 5 spreading functions as 5 x 5 characters; These correspond to FFTs of 64, 128, 256, 512 & 1024 samples; e.g. 22235-22236-22347-22358-2246C (Characters must be one of 1 to 9 and A to F (zero excluded).-fft <5xbin> select fft lengths to use in analysis, using binary switching, from 64, 128, 256, 512 & 1024 samples, e.g. 01001 = 128,1024-cbs <n> set codec block size to n samples (512<=n<=4608, n mod 32=0)-quiet significantly reduce screen output-nowarn suppress lossyWAV warnings-detail enable detailled output mode-below set process priority to below normal.-low set process priority to low.Special thanks:David Robinson for the method itself and motivation to implement it in Delphi.Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
... 3% to 4% extra compression is something lossless codecs would have to work very hard for, so nothing to give away easily, except for a reason of course. ...
I can also confirm it. The increase in speed justifies the consistently negligible (<1kbps) increase in bitrate.
...On my 53 sample set, this increases the bitrate from 407.5 (-8) to 413.4 (-3 -e -r 2 -m -b 512) - Fast though......
Quote from: Nick.C on 05 December, 2007, 02:56:22 AM...On my 53 sample set, this increases the bitrate from 407.5 (-8) to 413.4 (-3 -e -r 2 -m -b 512) - Fast though......That's not negligible to me, but I hope that's due to the nature of your more or less problematic snippets set (guess that's still your 53 sample set). With full sized regular music as Mitch_1_2 said I expect the difference to be <1 kbps on average.If somebody finds that on a real life sample set of several full length tracks difference is > 1 kbps please let us know. For getting the precise difference we can look at the total size of the files under consideration. I expect difference to be ~0.1%.
...Jean Michel Jarre - Oxygene / 773kbps / 454kbps / 372kbps / 377kbps...So, overall an average of 850kbps / 410kbps / 350kbps / 351kbps
Quote from: Nick.C on 05 December, 2007, 03:18:23 AM...Jean Michel Jarre - Oxygene / 773kbps / 454kbps / 372kbps / 377kbps...So, overall an average of 850kbps / 410kbps / 350kbps / 351kbpsThanks for your test, Nick. So in an overall sense FLAC -3 -m -e -r 2 is fine IMO, though it's quite interesting that with an album like Oxygene things aren't totally satisfying.Do you mind trying FLAC -3 -m -e -r 3 and FLAC -3 -m -e on Oxygene?
AC/DC - Dirty Deeds Done Dirt Cheap / 781kbps / 398kbps / 331kbps / 332kbpsB52's - Good Stuff / 993kbps / 408kbps / 361kbps / 362kbpsDavid Byrne - Uh-Oh / 937kbps / 398kbps / 344kbps / 345kbpsFish - Songs From The Mirror / 854kbps / 384kbps / 336kbps / 336kbpsGerry Rafferty - City To City / 802kbps / 400kbps / 338kbps / 338kbpsIron Maiden - Can I Play With Madness / 784kbps / 422kbps / 371kbps / 372kbpsJean Michel Jarre - Oxygene / 773kbps / 454kbps / 372kbps / 377kbpsMarillion - The Thieving Magpie / 790kbps / 404kbps / 344kbps / 344kbpsMike Oldfield - Tr3s Lunas / 848kbps / 421kbps / 365kbps / 366kbpsScorpions - Best Of Rockers N' Ballads / 922kbps / 421kbps / 354kbps / 354kbps
...So, overall an average of 850kbps / 410kbps / 351kbps / 351kbps