HydrogenAudio

Hydrogenaudio Forum => Uploads => Topic started by: Nick.C on 2007-07-12 08:05:34

Title: lossyWAV Development
Post by: Nick.C on 2007-07-12 08:05:34
lossyWAV 1.0.0b release thread. (http://www.hydrogenaudio.org/forums/index.php?act=ST&f=32&t=63225)

Link to the wiki article (http://wiki.hydrogenaudio.org/index.php?title=LossyWAV)

Change log 1.0.0b: 13/05/08
WAV chunk handling improved to allow unknown chunks before the 'data' chunk to be copied verbatim;
Error in --merge parameter associated with 24-bit files corrected.

Change log 1.0.0: 12/05/08
Code tidied up and GNU GPL references included;
Minor change to determination of RMS value of codec_block: minimum value of all channels now taken rather than average of all channels;
A SourceForge project will be created and the code posted in due course.

Change log beta v0.9.8d: 06/05/08
-spf preset values changed to: '22222-22223-22224-12234-12245-12356' in line with discussion on page 48;
Code tidied up a bit and work done on the noise shaping code for v1.1.0, including the implementation of a Fibonacci shift register PRNG for triangular dither (Thanks to DualIP for making me aware of this method of fast pseudo random number generation!).

Change log beta v0.9.8c: 04/05/08
-snr preset parameters revised to (18,22,23.5,23.5,23.5,25,28,31,34,37,40);
-impulse parameter renamed to -fft32 to more clearly indicate its function.

Change log beta v0.9.8b: 01/05/08
-snr preset parameters revised to (18,22,22,22,22,25,28,31,34,37,40);
-nts preset parameters revised to (20,16,9,6,3,0,-2.4,-4.8,-7.2,-9.6,-12);
-impulse is automatic from -q 3 (this will manifest itself as a step change in bitrate from -q 2.9999 to -q 3.0).

Change log beta v0.9.8: 01/05/08
-snr preset parameters revised to (18,19,20,21,22,25,28,31,34,37,40);
-snr and -nts parameters temporarily re-enabled to allow further testing.
-spf for 32 sample FFT set to 22222.

Change log beta v0.9.7: 29/04/08
-impulse parameter implemented in an attempt to trap impulse based artefacts in the processed output by calculating additional overlapping 32 sample FFT's on the sample data. This additional processing unfortunately adds about 40% to the processing time.
Revised -snr values from v0.9.6 variant #1 (not released, but discussed - page 46) retained.
[edit] First 9 downloads did not recognise the -analyses parameter correctly. [/edit]
[edit2] -spf parameter re-enabled for short-term testing: -spf <6 x 5 hexchar separated by '-' characters> (35 characters long in total). [/edit2]

Change log beta v0.9.6: 24/04/08
-<n> presets removed in favour of -q <n> (0<=n<=10 quality preset selection. -q 0 = old -8; -q 5 = old -3; -q 10 = old -0.
-snr and -nts parameters removed;
-minbits <n> (0<=n<=8; resolution = 0.01; default=3;) introduced as an advanced option to allow the user to select the minimum number of bits to keep (relating to the log2 of the rms value of all the samples in the codec block);
-help and -longhelp parameters introduced and basic no parameter help reduced. System options moved to -help; Advanced options moved to -longhelp. This still needs some fleshing out.

Change log beta v0.9.5: 22/04/08
a,b or c suffix to quality preset removed in favour of the new -analyses <n> parameter (2<=n<=5);
-8 quality preset introduced, -nts=20, -snr=16;

Change log beta v0.9.4: 18/04/08
Changed the default number of FFT analyses to 2 lengths for all quality presets;
Tightened up the spreading function (same for all quality presets);
Implemented floating point quality presets (-0.0 to -7.0, resolution 0.0001);
Made highest quality preset (-0) settings more conservative.

Change log beta v0.9.3: 17/04/08
Error in skewing function preparation found and rectified - knock-on effect that bitrate reduced by around 20kbps for all quality presets and variations in bitrate between spreading functions reduced;
All quality presets now use the spreading function for -1.

Change log v0.9.2 RC3: 13/04/08
Code tidied up and slight increase in processing throughput achieved;
-shaping and -autoshape parameters removed in accordance with roadmap (should return in v1.1).

Change log beta v0.9.1: 02/04/08
-autoshape now non-linear with respect to bits-to-remove, i.e. 1-((bits-per-sample-3-bits-to-remove)/(bits-per-sample-3))^2

Change log beta v0.9.0: 01/04/08
Minor correction to noise shaping code;
Further IA-32/x87 speedups found, processing rate increased by a further 10%.

Change log beta v0.8.9: 29/03/08
-autoshape parameter implemented (incompatible with -shaping <n>). This applies shaping variably depending on bits-to-remove and the bitdepth of the sample, i.e. shaping-to-apply = min(1, bits-to-remove / (bitdepth-of-sample - minimum-bits-to-keep)).

Change log beta v0.8.8: 27/03/08
Error in the -merge parameter tracked and amended;
FFT now makes use of the ability to calculate a real FFT of length 2N using a complex FFT of length N (20% to 25% speedup);
Reads and writes to disk are now larger to reduce file fragmentation.

Change log beta v0.8.7: 21/03/08
Error in the -merge parameter tracked and amended to adopt David's method of storing the difference when scaled;

Change log beta v0.8.6: 18/03/08
Error in the -merge parameter tracked and amended;
-scale <n> parameter implemented to allow WAV data to be scaled (in the range 0 to 1, resolution 0.000001) prior to processing. -scale is compatible with the -correction and -merge parameters (although combined filesize may be large);
Complete FFT unit now in IA-32/x87.

Change log beta v0.8.5: 17/03/08
-shaping parameter now takes a supplementary value between 0 and 1 (0.001 resolution) which specifies the "proportion" of noise shaping to apply (0=fully off [default], 1=fully on);
-newspread parameter removed as results are identical to the existing spreading function that I thought that I had doubts about. The revised method will probably be faster when fully optimised in IA-32/x87 and will replace the existing method in the near future.

Change log beta v0.8.4: 14/03/08
Total rewrite of the -shaping parameter, in line with gratefully received guidance from SebastianG. No dither has been included (yet). The program will automatically select either the 44.1kHz or the 48kHz functions as required by the input WAV file. At present these are the only two sample rates for which noise shaping functions have been incorporated;
A rewrite of the spreading function has been included and is enabled using the -newspread parameter. This fixes a problem where some samples would be used too many times in the calculation of the average value of the FFT output;
Limits for -snr and -nts modified to 0 to 48 and -48 to 36 respectively to allow testing of the effectiveness of the noise shaping function.

Change log beta v0.8.3:
Implementation of -shaping parameter to make fixed noise shaping optional (default=off);
minor amendment to shaping code;

Change log beta v0.8.2:
First real attempt at implementing noise shaping, thanks to David for the pointers. It is currently not an optional parameter and will be applied to all quality presets.
-merge parameter "repaired" (wasn't looking in the right places for files).
-1 quality preset reduced from 4 to 3 FFT analyses; -2 quality preset reduced from 3 to 2 FFT analyses; (use a,b,c to increase if so wished).

Change log beta v0.8.1:
Revision to -snr and -nts limits to allow extremely low bitrate testing (see page 37).

Change log beta v0.8.0:
Revision of all presets in line with discussion on -7 preset (page 36).

Change log beta v0.7.9:
Implementation of -6 & -7 quality presets: -4 = -3.5; -5 = -4.0; -6 = -4.5; -7 = -5. For bitrates and detailed settings, see end of page 35.

Change log beta v0.7.8:
Implementation of -5 quality preset, as -4 except -snr=15(-4=21); -nts=12(-4=6).

Change log beta v0.7.7:
Correction made to maximum_bits_to_remove;
-merge parameter implemented.

Change log beta v0.7.6:
Addition of -4 quality preset, analogous to -3 at v0.6.4 RC1, but with 5 allowable clips per channel per codec_block;
Some work done on maximum_bits_to_remove: log2 of RMS value of all samples in a codec_block is taken and minimum_bits_to_keep is subtracted rather than bits_per_sample-minimum_bits_to_keep;
-overlap parameter removed;
-centre parameter removed.

Change log beta v0.7.5:
Handling of 24-bit samples corrected.

Change log beta v0.7.4:
-extrafft parameter removed as superseded;
-1, -2 & -3 parameters augmented by -1a, -2a, -2b, -3a, -3b, -3c. The suffix character denotes how many additional FFT analysis lengths will be used in the processing of the file, a=1, b=2, c=3, i.e. 1a = 4+1 = 5; 3b = 2+2 = 4.

Change log beta v0.7.3:
-overlap parameter revised to take a value (0..16). 1024 Sample FFT end_overlap = 512-16*(overlap_value);
-centre parameter revised to add a central 1024 sample FFT to the analysis (unless overlap=16).

Change log beta v0.7.2:
-overlap parameter implemented to modify end_overlap to 448 samples (from 512 samples) for 1024 sample FFT;
-centre parameter implemented to centralise 1024 sample FFT on centre of codec_block, i.e. end_overlap = 256 samples;
Codec_blocks full of zero's are now not processed.

Change log beta v0.7.1:
Window function slightly modified and bit reduction noise constants re-calculated;
Allowable clips per channel per codec_block set to -1=0; -2=1; -3=2.
-noclips parameter implemented to allow user to set allowable clips=0 for -2 & -3;
Code optimised further in IA-32/x87;
Now checks for existence of correction file and requires -force parameter to over-write.

Change log beta v0.7.0:
Implementation of "-clips" parameter to set number of allowable clips per channel per codec_block (0<=n<=512).

Change log beta v0.6.9:
Code speedup;

Change log beta v0.6.8:
Implementation of dynamic minimum_bits_to_keep=5. Dynamic in the sense that the maximum bit is determined for each codec_block (taking sign into account) rather than just assuming bits_per_sample;
Implementation of allowable_clips per channel per codec block. -1 = 0; -2 = 1; -3 = 5. Based on the 512 sample codec_block_size this will allow at most 0.1134 milliseconds of clipping per channel per codec_block.

Change log v0.6.7 RC2:
-nts values for -1, -2 & -3 changed to -4, -2 and 0 respectively;
Processing speedup identified during problem sample investigation incorporated (thanks Alex B!);
Spreading function string for -3 changed back to: 22224-22236-22347-22358-2246C;
53 sample test set processed at -3 now produces 462.2kbps; 41.0MB.

Change log beta v0.6.6:
Positive change in bits to remove limited to an increase of +2 bit per codec_block, no -ve limit;
Additional 1024 sample FFT analysis removed (reverted to -512:511; 0:1023 on a 512 sample codec_block);
Spreading Function string for -3 changed to: 22224-22236-22347-22358-22469;
53 sample test set processed at -3 now produces 440.8kbps; 39.1MB.

Change log beta v0.6.5:
Additional 1024 sample FFT analysis introduced per codec_block;
Fairly massive speedup "accidentally" found and implemented - compromised by the additional analysis;
positive change in bits to remove limited to an increase of +1 bit per codec_block, no -ve limit;
Now able to process between 4 and 32 bit sample WAV files (I think - limited testing so far.....).

Change log v0.6.4 RC1:
Parameters kept:
-1, -2, -3; -o <folder>; -nts <n>; -snr <n>; -force; -check; -correction; -quiet; -nowarn; -below; -low.
Parameters removed:
-skew <n>; -spf <5x5hex>; -fft <5xbin>; -cbs <n>; -detail; -wmalsl.
Silence detection routine removed - very small gain for dubious benefit.
Code tidied and slight assembly optimisations implemented.

Change log beta v0.6.3:
[Implementation of experimental silence detection method using -detection parameter]. Removed - not satisfied with results.

Change log beta v0.6.2:
Fixed sample limit checking bug introduced in v0.6.1

Change log beta v0.6.1:
-correction parameter implemented which will create a .lwcdf.WAV file which, when added to the lossy.WAV file using a not yet implemented parameter of lossyWAV, will reconstitute the lossless original file.
Error finally found in remove_bits routine (which is why it's taken so long for me to implement the -correction parameter) - very slight increase in bitrate (about 0.54kbps for my 53 problem sample set).
-shaping parameter removed.
When the corresponding .lossy.wav and .lwcdf.wav files, processed using lossyWAV -3, are encoded using FLAC -3 -m -e -r 2 -b 512, the total size for my 53 sample set (69.4MB FLAC) is 76.3MB : 39.0MB .lossy.FLAC, 37.3MB .lwcdf.FLAC.
Title: lossyWAV Development
Post by: shadowking on 2007-07-12 08:21:03
Thanks, I'll check these when I get a chance.
Title: lossyWAV Development
Post by: shadowking on 2007-07-15 03:56:52
Its close to Dualstream quality 3 , better than wavpack at 320k -similar to wavpack 350k high modes but better bitrate efficiency . I will need to abx these when its dead quite but so far can do only abx atemlied (slight noise) and metmorphose  [abrupt noise]. Average bitrate = 339 k ranging from 315~395 k

Metamorphose shows savage burst of noise not heard in wavpack or dualstream when using flat noise approach. Usually there is a rise in hiss but this is something that I've heard in shorten lossy and could be an issue.

Overall it looks good. I am more interested in overall performance at 340k than @ 480k.
Title: lossyWAV Development
Post by: Nick.C on 2007-07-15 20:47:44
I re-processed Atem_Lied & Metamorphose using: 1.5ms & 20ms analyses, force_dither_LSB, use min(min(bits_to_remove_table))+1 bits to remove *not mean(mean...)*, experimental triangular gaussian dither and 30/32 fix_clipped reduction, minimum_bits_to_keep=6.

<files removed - obsolete>
Title: lossyWAV Development
Post by: Wombat on 2007-07-15 22:38:25
I re-processed Atem_Lied & Metamorphose using: 1.5ms & 20ms analyses, force_dither_LSB, use min(min(bits_to_remove_table))+1 bits to remove *not mean(mean...)*, experimental triangular gaussian dither and 30/32 fix_clipped reduction, minimum_bits_to_keep=6.

Well. I never tested these lossy "lossless" approaches but was bit curious.

This Atemlied problem sounds like these problems lame mp3 has on several tonal samples and only was shortly improved. Like somewhere near you hear a silent windblow.

I only listened to Atemlied and wonder how clear this problem is audible. The second approach you offer here Nick.C is only marginal better than the one above.
Title: lossyWAV Development
Post by: Nick.C on 2007-07-16 06:55:03
Thanks for the input Shadowking & Wombat - it seems more and more likely that removing any more bits than 2Bdecided's method calculates is going to noticeably impair quality.
Title: lossyWAV Development
Post by: Nick.C on 2007-07-18 21:55:49
Updated script containing revised fix_clipped method.

Updated (again) - code (and my thought processes) tidied up a fair bit. (20070719)

<files removed - obsolete>
Title: lossyWAV Development
Post by: Nick.C on 2007-07-20 21:01:11
Source modified again - realised that rectangular dither = triangular dither /2 and the gaussian dither I was using equated to triangular / (4 to 6 or more....). Changed the dither routine a bit - introduced a dither_amplitude parameter - rectangular = 0.5; triangular = 1.0.

Had another go at the conditional clipping reduction factor - I think that it's closer to "right" now.

<files removed - obsolete>
Title: lossyWAV Development
Post by: 2Bdecided on 2007-07-21 23:47:37
Wow - very neat - you put me to shame!

(and you should see the state of the MATLAB scripts I write which I _don't_ release!)

Great work spotting the better codec block size. You could do a check on each file, trying various options (automatically I mean, but it would be painful). If it goes into Wavpack, I hope David does this. When I looked (though I didn't go down below 1024) the optimal lossyFLAC block size is often related (not perfectly) to performance of standard FLAC - on a lot of these samples, 1024 is better than 4096 without lossy pre-processing.


I'm a bit uncomfortable with having a different amount of scaling in each block (to prevent clipping). It's like a very weird DRC. Still, it's just an option, and quite useable for your application. If you crossfaded, it would be better still.

I think you've broken the rectangular dither. Half amplitude triangular dither ~= rectangular dither. Plot the PDFs to see why, but the clue is in the names .


I like the structure, but I see that just after I combined two loops (analysis then apply) into one, you split it back into two. (Unless I did that? It's late, I forget). Anyway, that will make it a bit harder for someone to come along (as they eventually must) and make this work on files on disc, rather than loading the whole file into memory. It does make it a little easier to test and develop though, which is why I started with two loops.

When I get back to it, my main planned task is noise shaping. That's either going to revolutionise it, or not work!

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-07-22 09:55:35
Ah - sorry about the rectangular dither - easily mended....

The scaling is applied to the whole file, not just one block. It's calculated to find the minimum block value then that minimum is applied to the whole file when the bit-reduction is done.

Still having fun......
Title: lossyWAV Development
Post by: Nick.C on 2007-07-23 09:36:01
Rev.23: Dither "fixed" (i.e. returned back to previous working version....  )
There was about 0.9MiB difference between proper rectangular and 0.5 x triangular when compressed (rectangular bigger, 33.9Mib vs 33.0MiB).

Rev.24: "more likely to be nearer the mark" implementation of amplitude_response modification. Fileset now: WAV: 98.6MiB; FLAC 56.8MiB; ss.FLAC 28.4MiB over the 41 samples.

<files removed - obsolete>
Title: lossyWAV Development
Post by: Nick.C on 2007-07-25 15:14:00
Rev:25 Revised implementation of equal_loudness_filter. Files now lose more bits under the equal_loudness_filter if they are louder - as might be expected. Fileset: WAV: 98.6MiB; FLAC: 56.9MiB; ss.FLAC(no elf): 35.8MiB; ss.FLAC(elf): 29.6MiB. Fileset using equal loudness filter, no dither, no clip-fixing comes in at 25.2Mib 



<files removed - obsolete>
Title: lossyWAV Development
Post by: Wombat on 2007-07-25 22:13:55
Rev:25 Revised implementation of equal_loudness_filter. Files now lose more bits under the equal_loudness_filter if they are louder - as might be expected. Fileset: WAV: 98.6MiB; FLAC: 56.9MiB; ss.FLAC(no elf): 35.8MiB; ss.FLAC(elf): 29.6MiB. Fileset using equal loudness filter, no dither, no clip-fixing comes in at 25.2Mib

If it is for any help. Atemlied is still easily abxble and sounds nearly as the second try you provided.
I calles it ss2 in the abx test.

foo_abx 1.3.1 report
foobar2000 v0.9.4.3
2007/07/25 23:03:12

File A: C:\Temp\nforce\temp\Atem-lied.wav
File B: C:\Temp\nforce\temp\Atem_lied.ss2.flac

23:03:12 : Test started.
23:04:36 : 01/01  50.0%
23:04:52 : 02/02  25.0%
23:05:06 : 03/03  12.5%
23:05:29 : 04/04  6.3%
23:05:49 : 05/05  3.1%
23:06:04 : 06/06  1.6%
23:06:19 : 07/07  0.8%
23:06:35 : 08/08  0.4%
23:06:50 : 09/09  0.2%
23:07:02 : 10/10  0.1%
23:08:48 : Test finished.

----------
Total: 10/10 (0.1%)
Title: lossyWAV Development
Post by: Nick.C on 2007-07-26 08:00:16
<file removed - obsolete>
Title: lossyWAV Development
Post by: Wombat on 2007-07-26 20:21:48
Hmmmmm....... Try this one - triangular dither, no elf, clip_reduction.

Just downloaded and testet. I have to admit this is on the edge what i can clearly abx but it is still possible on 2 places i picked in the beginning. I don´t think the filesize is that promising also.

foo_abx 1.3.1 report
foobar2000 v0.9.4.3
2007/07/26 21:11:08

File A: C:\Temp\nforce\temp\Atem_lied.ss3.flac
File B: C:\Temp\nforce\temp\Atem-lied.wav

21:11:08 : Test started.
21:11:30 : 01/01  50.0%
21:11:51 : 02/02  25.0%
21:12:13 : 03/03  12.5%
21:13:09 : 04/04  6.3%
21:13:49 : 05/05  3.1%
21:14:10 : 06/06  1.6%
21:14:25 : 07/07  0.8%
21:14:51 : 08/08  0.4%
21:15:13 : 09/09  0.2%
21:17:18 : 10/10  0.1%
21:17:35 : Test finished.

----------
Total: 10/10 (0.1%)
Title: lossyWAV Development
Post by: Nick.C on 2007-07-26 21:01:07
Last attempt (for tonight anyway....) - elf on (algorithm changed), triangular dither, more clip reduction.

ps. Thanks for the testing 


<file removed - obsolete>
Title: lossyWAV Development
Post by: Wombat on 2007-07-26 21:06:50
Last attempt (for tonight anyway....) - elf on (algorithm changed), triangular dither, more clip reduction.

ps. Thanks for the testing

Sorry, no need to abx. At second 3-4 is clearly more noise than in your last try.

Edit: to me it sounds even worse than your second try you lately provided cause of this more pronounced hiccup.
Title: lossyWAV Development
Post by: Nick.C on 2007-07-26 21:27:23
<file removed - obsolete>
Title: lossyWAV Development
Post by: Wombat on 2007-07-26 21:49:50
Here's another......

Well, i can´t abx this!
I have to add that i am already tired like hell from a hard day.
Title: lossyWAV Development
Post by: Nick.C on 2007-07-26 22:02:13
Thanks again - the filesize is going up, but compared to the FLAC file it's still quite small. I'm going to try a few permutations on block_size.....
Title: lossyWAV Development
Post by: Nick.C on 2007-07-27 13:34:52
I've been looking at the FFT_Lengths used in the analysis process and the number of analyses. For triangular dithered, fix_clipped=1, force_dither_LSB=1, no elf I get the following:

FFT_Lengths: 1024, 64: size=34.0MiB; Rate: 3.01x - 2Bdecided's original process;
FFT_Lengths: 1024, 256, 64: size=34.7MiB; Rate: 2.34x - 2Bdecided's overkill process;
FFT_Lengths: 1024, 512, 256, 128, 64: size=35.4MiB; 1.54x - Total overkill, although it covers the full set of analyses between original limits.

<file removed - obsolete>

I've now got the script storing individual bits_to_remove_table values for each block in an array for analysis.
Title: lossyWAV Development
Post by: Wombat on 2007-07-27 17:39:41
I've been looking at the FFT_Lengths used in the analysis process and the number of analyses. For triangular dithered, fix_clipped=1, force_dither_LSB=1, no elf I get the following:

FFT_Lengths: 1024, 64: size=34.0MiB; Rate: 3.01x - 2Bdecided's original process;
FFT_Lengths: 1024, 256, 64: size=34.7MiB; Rate: 2.34x - 2Bdecided's overkill process;
FFT_Lengths: 1024, 512, 256, 128, 64: size=35.4MiB; 1.54x - Total overkill, although it covers the full set of analyses between original limits.

Atem_lied appended from the 1024, 256, 64 process.

I've now got the script storing individual bits_to_remove_table values for each block in an array for analysis.

No, again no abx result. We may be in a region here my PC noise comes thru more than anything wrong with the file.
Title: lossyWAV Development
Post by: Nick.C on 2007-07-27 19:50:20
Considering further complicating this with some downsampling. We'll see how the code goes before I produce some results.

Right, I've implemented a not quite crude downsampler (n samples > n-1 samples, freq > old freq * (n-1)/n). For the sample attached, I went 3 > 2, 44.1kHz > 29.4kHz with triangular dither then through the bit reduction process separately.

On the other hand - I can let Foobar do a transcode from wav to wav with the resampling DSP enabled - very clean! See attached.

<file removed - obsolete>
Title: lossyWAV Development
Post by: 2Bdecided on 2007-07-30 09:55:42
There's a resampler built into MATLAB and by default it's not very good. SSRC (or fb2k, CEP/Audition etc) are much better options. Stick with 32kHz as a target rate.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-07-30 10:33:51
Target rate - 32kHz (used foobar2000 PPHS resamples, ultra mode), high frequency limit 15.5kHz (16kHz gave v.large files.....)

.sl31 = equal loudness filter on; 3 analyses, btr_type=1 (min(min....));
.ss31 = equal loudness filter off; 3 analyses, btr_type=1 (min(min....));

Source will follow when tidied up. 

<files removed - obsolete>
Title: lossyWAV Development
Post by: 2Bdecided on 2007-07-30 11:12:49
I can't ABX, but don't have a quiet environment so please don't rely on me!

I haven't had chance to try your code, but the bitrates are comparable to the original code with ns=6. Look back in the original thread to see what halb27 could ABX at ns=6 - I think it was "furious". It might be worth trying.

I think resampling to 32k is the way to go for lower bitrates, if your DAP supports it and your ears can't hear it (I'm OK on both counts!).

Sorry I haven't had time to add anything constructive.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-07-30 12:01:08
Right - revised source (and 1 external function) - uses wavreadraw and wavwriteraw - not attached, but basically don't convert raw audio data into +/- 1.0 range.
Title: lossyWAV Development
Post by: BGonz808 on 2007-08-02 19:58:13
I really like this idea of a preprocessor, and of course the near-lossless small flac files! But how can I use the MATLAB script. I don't have matlab and it seems impossible to get a trial. Could this preprocessor be turned into a foobar2000 dsp plugin by any chance? or a commandline program? 
and why aren't wavreadraw and wavwriteraw attached!

Thanks
Bobby
Title: lossyWAV Development
Post by: BGonz808 on 2007-08-04 21:07:58
Please attach the wavreadraw and wavewriteraw so I can give this prog a spin. I dont know how to code matlab to use raw wav.

im a noob!
Title: lossyWAV Development
Post by: Nick.C on 2007-08-05 09:48:30
Wavread and wavwrite are copyrighted Matlab code and I will not post them - however they are easily modifiable - look for a section which multiplies (wavwrite) or divides (wavread) the audio data by 32767 or 32768 and insert a "%" before that line to "REM" it out - that will sort it for 16 bit audio. Oh, and save the functions to a different name or you will have broken the originals.
Title: lossyWAV Development
Post by: BGonz808 on 2007-08-06 01:40:57
Thanks. That was a bit of an oversight on my part
Title: lossyWAV Development
Post by: Nick.C on 2007-08-06 16:36:41
Realising that there are only so many parameters to be played with without destroying the audio quality of the output.......

I've been playing around with the spreading function - previously length=4 (i.e. [0.25,0.25,0.25,0.25]) - I've tried even numbers of length from 6 to 16 and am pleasantly surprised by the results. Atem_Lied attached for spreading function lengths of 8, 12 and 16 for your listening pleasure(?!).

[edit]
Following the processing of these samples (constant spreading_function_length with variable fft_length per analysis), I've started "playing about" with variable spreading_function_length with variable fft_length per analysis. There should be some processed results later tonight.
[/edit]

[edit2]
Right, samples attached - .ssx1.flac is 3 analyses (1024,256,64 fft_lengths) and corresponding spreading_function_lengths: 16,8,4;.ssx2.flac is 3 analyses (1024,256,64 fft_lengths) and corresponding spreading_function_lengths: 64,16,4;
[/edit2]



<files removed - obsolete>
Title: lossyWAV Development
Post by: halb27 on 2007-08-06 21:52:38
The ss12, ss16, ssx1 and ssx2 versions are easily abxable.
Not quite so with ss8 - it took me a lot of concentration. Guess with 'normal' though concentrated listening it will go unnoticed.

But: what are the advantages against 2Bdecided's original apprach? Are you attaining a significantly lower bitrate?
Title: lossyWAV Development
Post by: Nick.C on 2007-08-06 22:16:15
Thanks for the listening time!

The bitrate is coming down a fair amount. For the 41 samples in the set, all using triangular dither:

WAV=98.6MiB;
FLAC=56.9MiB;
2Bdecided's (fft_length=1024,64; codec_block_length=1024; spreading_function_length=4,4)=35.4MiB;
NIC .ss20 (fft_length=1024,64; codec_block_length=576; spreading_function_length=4,4)=34.0MiB;
NIC .ss30 (fft_length=1024,256,64; codec_block_length=576; spreading_function_length=4,4,4)=34.7MiB;

Revised script appended.

<files removed - obsolete>
Title: lossyWAV Development
Post by: Wombat on 2007-08-07 21:58:03
Realising that there are only so many parameters to be played with without destroying the audio quality of the output.......

I've been playing around with the spreading function - previously length=4 (i.e. [0.25,0.25,0.25,0.25]) - I've tried even numbers of length from 6 to 16 and am pleasantly surprised by the results. Atem_Lied attached for spreading function lengths of 8, 12 and 16 for your listening pleasure(?!).

[edit]
Following the processing of these samples (constant spreading_function_length with variable fft_length per analysis), I've started "playing about" with variable spreading_function_length with variable fft_length per analysis. There should be some processed results later tonight.
[/edit]

[edit2]
Right, samples attached - .ssx1.flac is 3 analyses (1024,256,64 fft_lengths) and corresponding spreading_function_lengths: 16,8,4;.ssx2.flac is 3 analyses (1024,256,64 fft_lengths) and corresponding spreading_function_lengths: 64,16,4;
[/edit2]



<files removed - obsolete>

Ok. today i was able to abx all 3 versions ss12, ssx1 and ss8.

What do you want now with all these attached files above?
Title: lossyWAV Development
Post by: halb27 on 2007-08-08 07:19:44
@ Nick.C:

I appreciate 2BDecided's and your work very much.
But if you go and produce an inflation of numerous variants I guess we're heading into a problem.
On one hand I'm afraid not a lot of members will love to do such listening tests on the 121st of your variants, but what's worse is: you may find a variant producing a good atem-lied encoding and save 15% against 2BDecided's version. But what about general quality outside of Atem-lied?

IMO it would be best if you and 2BDecided work together even more closely in the sense that you go along a specific approach which you both think is most promising. And for this provide various listening samples for us to give quality feedback to you.
Though a saving of bitrate is very welcome the more important target at the moment IMO is a robust excellent quality. Don't worry but so far to me it seems that an approach closer to 2BDecided's original one seems to produce the more reliable results. But I think if you bring your both ideas together something great will come out. Maybe it's not so appropriate to produce something that makes the lossy flac encoding competitive with say wavPack lossy regarding bitrate. After all we have wavPack lossy for that. But as FLAC is widely supported on music players there is sense in having lossy FLAC files of extremely high quality of significant smaller size than the lossless ones.
Title: lossyWAV Development
Post by: Nick.C on 2007-08-08 07:58:53
Apologies for "going through the permutations" on the various options available in the script. Simplistically, it comes down to:

2 or 3 analyses? (processing time implication, slight size increase on 3);
fixed or variable spreading_function_length? (smaller size on variable);
ELF on or off - still unproven.

So, The only ones that are likely to be "better" than .ss8.flac are .ss20; .ss21; .ss30 and .ss31, i.e. .ss(2 or 3 analyses)(0=fixed;1=variable spreading_function_length).

From Halb27's and Wombat's comments earlier I would guess .ss20 or .ss31 are realistic candidates.

Having it narrowed down to two (or possible 1 - .ss20, as it's the closest to the original concept), is it worth producing a set of selected samples for ABX? If so, which samples would you recommend of those previously mentioned in the main thread (or others...)?
Title: lossyWAV Development
Post by: halb27 on 2007-08-08 08:32:17
I welcome most if you can narrow it down to one, more so if this is closest to 2BDecided's original version and in case .ss31 doesn't give hope for the chance of significantly improving things over .ss20.

I proposeThese are specific problem samples where problems with these kind of codecs should be most obvious.
We should also have samples where 'normal' hiss is most prominent.
I know justbut we should have more samples.
Title: lossyWAV Development
Post by: shadowking on 2007-08-08 11:20:50
I agree with Halb27. I don't have time to test all these modes and I don't know what happens at lower bitrates. Wavpack usually sounds good from 230 k , but others I tested don't - shorten, rkau (violent bursts of noise etc). On metamorphose sample I heard some similar phenom with the preprocessor.

I am happy with the original 2Bdecided method. People will be very suspicious with the thought of lossy FLAC etc. If we can from the start produce a near lossless reduction that is virtually not *abxable* under any condition and as good as lossless from a practical point of view then that will be more acceptable than another threshold than won't always hold. Once someone with lots of time and effort finds some fault people will start spreading bad rumours that we are destroying lossless compression etc etc

On the other hand 512k is not small but still much more so than lossless. If one desires an extreme high quality that holds up to anything then that will be a new 'lossless' to the masses @ 512 k.. size won't be the issue but imperfection will.

So wavpack , optimfrog, flac @ 512k end-to-all quality is better than 350k - 99% perfect quality when you package the lossy mode with FLAC name.
Title: lossyWAV Development
Post by: halb27 on 2007-08-08 11:28:00
.. So wavpack , optimfrog, flac @ 512k end-to-all quality is better than 350k - 99% perfect quality when you package the lossy mode with FLAC name. ..

Perfectly said.
Never thought about quality demands being higher for lossy .flac files but I think this is absolutely true.
A lossy flac file should be indistinguable from the original with a probability of 1 (within the limitations of getting sure of that in practice).
Title: lossyWAV Development
Post by: Nick.C on 2007-08-08 11:42:41
Which prompts me to consider introducing a *negative* noise_threshold_shift value (say -1 or -2) to the parameter setting (i.e. reduce bits to remove slightly).

Using 2 analyses, fixed length spreading function, NTS=-1, the sample set increases from 34.0MiB to 34.9MiB lossy flac (56.9MiB flac / 98.6MiB wav).

<files all now found - thanks to Halb27!>

Atem_Lied, Badvilbel, Bruhns, Furious, Keys & Triangle_2 attached - 2 analyses; fixed length spreading_function; NTS=-1.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-08-08 12:53:40
I think there will be room for two or three settings only...

1. Transcode and multi-gen proof (or overkill option for cautious people). Re-encode it 20 times at this setting and it'll still be alright. Transcode it to anything and it'll still sound (about) as good as encoding straight from the original.

2. Normal. Chances of ABXing original from lossyFLAC normal should tend to zero, but it probably won't stand up to 20 generations of re-encoding.

3. Compact. Allows you to introduce known compromises to get the filesize down if you want to, e.g. resampling to 32kHz.


I have tried to deliver number 2 on that list. If it fails ABX with anything, then some of the parameters will need to be tightened up. So far it hasn't, but let's see. I never dreamed that someone would be as inventive as Nick in using these parameters to reduce the bitrate - I intended to use them to tweak the code to improve quality (if necessary).

I think it's obvious how to deliver number 1 - shift the noise threshold (already implemented) down and put in some extra checks (e.g. extra FFT size - already implemented, M/S checking - not yet implemented). Some of the extra checks might end up in number 2 anyway if it's ABXed - we'll see.

I believe Nick is trying to deliver number 3 on that list. To be honest, with a flat noise floor, I don't think there's much that can be done to deliver this. The noise floor is already pretty much where I think it should be - at the same level (or, if it's shifted, related to the level) of the minimum noise floor in the recording. If the existing calculation is wrong, and it puts noise above or below the existing noise level, then this should be fixed and integrated into number 2. The only extra steps you can take are to ignore stuff above a fixed frequency (already implemented by myself), or to take account of the MAF (already implemented by Nick). Anything else, how ever clever, must by definition be pushing the noise above the noise floor of the original recording. It may be audible, it may not - you'd need a psychoacoustic model to decide. However, I've already seen people ABX tracks with the noise threshold 6dB up (i.e. 1 more bit removed) so it doesn't seem that there's much room for improvement. There could be some - it depends on the signal, how much you want to lower the bitrate by, and how hard you're willing to work to do it.

What can deliver number 3 (at least for most signals) is to use a shaped noise floor, as suggested by SebG on page one of the original thread...

http://www.hydrogenaudio.org/forums/index....st&p=498376 (http://www.hydrogenaudio.org/forums/index.php?showtopic=55522&view=findpost&p=498376)

This is basically what's described here...

http://telecom.vub.ac.be/Research/DSSP/Pub.../AES-2002-B.pdf (http://telecom.vub.ac.be/Research/DSSP/Publications/int_conf/AES-2002-B.pdf)
(there are other similar papers by the same authors)


I tried a cheats version by designing the minimum phase noise feedback filter directly from the desired magnitude response (quite easy, and already built into MATLAB sig proc toolbox, though I'd coded it myself before I found this!), but that doesn't take account of the constraints of gain (which should average to unity on a log scale, if I understand it correctly), and needing the first filter coefficient to be 1. If scaling the coefficients to make the first coefficient be 1 also happens to result in a reasonable gain, it works well. Normally this won't happen, and you'll add tens of dB of extra noise!

So to make it work, I (or someone!) will have to implement what's described in that paper. I haven't worked on LPCs before, but they seem to describe a short cut, and I'll give it a go when I get chance.

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2007-08-08 15:43:52
IMO the efficiency option 3 can be considered seperately.
As you mentioned anyone who is out for smaller file size can achieve it right now by resampling to 32 kHz in advance (that's what I do with wavPack lossy).
This kind of noise shaping sounds interesting, but it's a new building block and can be done later.
At the moment it makes things more complicated and thus keeps us further away from what is needed most: that a nice guy come up and create an exe program from your idea.
Maybe it would help if you could provide a more detailed description of it that can be understood by a programmer without very detailed DSP knowledge.
Title: lossyWAV Development
Post by: halb27 on 2007-08-08 18:35:28
... Currently hunting for those furious & bruhns - can't find them - they seem to have been removed. ...

Here they are:
[attachment=3578:attachment] [attachment=3579:attachment]
Title: lossyWAV Development
Post by: Nick.C on 2007-08-08 18:44:13
Many thanks!
Title: lossyWAV Development
Post by: halb27 on 2007-08-08 19:04:39
...Atem_Lied, Badvilbel, Keys & Triangle attached - 2 analyses; fixed length spreading_function; NTS=-1. ...

Atem_lied: 9/10 (pretty hard for me to abx)
badvilbel: could not abx
keys: 8/10 (easier to abx than shown by the score - didn't catch the problem with my first two guesses)

triangle: guess I wasn't specific enough with the triangle sample I was thinking of. Thought of this one:
[attachment=3580:attachment]
I don't have the original of your triangle version.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-08-08 19:58:27
IMO the efficiency option 3 can be considered seperately.
As you mentioned anyone who is out for smaller file size can achieve it right now by resampling to 32 kHz in advance (that's what I do with wavPack lossy).
This kind of noise shaping sounds interesting, but it's a new building block and can be done later.
At the moment it makes things more complicated and thus keeps us further away from what is needed most: that a nice guy come up and create an exe program from your idea.
Maybe it would help if you could provide a more detailed description of it that can be understood by a programmer without very detailed DSP knowledge.


Last point first: I'd have thought that "a programmer without very detailed DSP knowledge" could work from the MATLAB code (and an FFT library) more easily than from a description. If there's anything confusing about the code, I'd be more than happy to help. I would stress that it's not optimised. It's there for people to find problem samples, and update it. However, I guess this will be much easier if it's an exe, so to solve the chicken and egg situation, an exe would be great!

The noise shaping will have to wait until someone has the time to do it anyway. It might end up in option 2 if it works well enough, or be switchable separately.

So yes, certainly, if anyone can take on the task of coding it properly, please go for it. Nicks code is clearer than mine, but I don't think the experimental quality reducing options should be included, unless they work.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-08-08 20:14:31
............but I don't think the experimental quality reducing options should be included, unless they work.


Neither do I - I'm only now realising the importance of maintaining excellent quality in any processing to be implemented and subsequently encoded in the flac format - the last thing I would want to do is adversely skew "public" opinion against flac due to a poor lossy implementation.

I will post a clean version of the script without any extraneous experimental gubbins - in the hope that someone can turn it into a usable binary.
Title: lossyWAV Development
Post by: halb27 on 2007-08-08 21:10:06
... Bruhns, Furious, ....Triangle_2 ...

Bruhns: Did two sessions on two different spots that were suspicious to me and got at 7/10 in each session.
            Very hard for me.
Triangle: Could not abx a difference..
Furious: Could not abx a difference.

So as far as to my results towards these samples the quality of your variant is very good to me keeping in mind that these are hard problems for wavPack lossy suspected to be not eeasy for this preprocessor too.
A good candidate for 2BDecided's option 3 when it's up to that.
Title: lossyWAV Development
Post by: Nick.C on 2007-08-08 21:40:08
If you're up to some more listening, 2Bdecided originally added a third analysis as an "overkill" option. The other way to increase bitrate is to introduce a negative noise_threshold_shift. The attached samples were processed with 3 analyses, noise_threshold_shift=-2; triangular_dither; force_dither_lsb=1; fix_clipped automatically if necessary after bit reduction and rounding.


Revised script attached - no longer requires external amplitude function but still requires modified wavread/write functions.
Title: lossyWAV Development
Post by: Wombat on 2007-08-08 22:29:56
When it is of any interest i abxed the bruhns sample you offered in post 41.

foo_abx 1.3.1 report
foobar2000 v0.9.4.3
2007/08/08 22:57:53

File A: C:\Temp\nforce\temp\bruhns.ss.flac
File B: C:\Temp\nforce\temp\bruhns.wv

22:57:53 : Test started.
22:59:55 : 01/01  50.0%
23:00:37 : 02/02  25.0%
23:01:08 : 03/03  12.5%
23:02:02 : 04/04  6.3%
23:03:03 : 05/05  3.1%
23:03:52 : 06/06  1.6%
23:05:06 : 06/07  6.3%
23:05:37 : 06/08  14.5%
23:06:27 : 07/09  9.0%
23:07:12 : 07/10  17.2%
23:07:49 : Test finished.

----------
Total: 7/10 (17.2%)

Not that well but i wasn´t able to tell anything wrong with the one offered in post 50.

After realizing the offered wavpack file is nearly the same size as the lossy flacs versions i get my doubts about this approach.
Title: lossyWAV Development
Post by: halb27 on 2007-08-09 07:09:37
.. the bruhns sample ... After realizing the offered wavpack file is nearly the same size as the lossy flacs versions i get my doubts about this approach. ...

Classical music as well as other music with a considerably amount of quiet spots compresses relatively well losslessly, so with this kind of music we can't expect a big file size saving (which of course desn't make it very attractive to lovers of these genres).
Popular music however compresses pretty badly when done losslessly so there will be a big saving in file size. So far something like 500 kbps are realistic and this means roughly half the file size of lossless encodings.

So I think this approach is not only only intelligent, but also of real practical importance to many music lovers.
Title: lossyWAV Development
Post by: Nick.C on 2007-08-09 08:08:53
Thanks Wombat - it's nice to know that with improved settings the problem samples seem to become less of a problem.



Are we approaching 2Bdecided's option 2 with these settings?
Title: lossyWAV Development
Post by: shadowking on 2007-08-09 09:00:54
A short test on a dozen or so classical samples:

Wavpack lossless -x: 16.25 MB - 722k vbr
Wavpack 550k -x :    11.45 MB - 509k abr

This is a significant saving IMO. Even on very quite cd's there will be some 15 % saving.
Title: lossyWAV Development
Post by: halb27 on 2007-08-09 18:44:26
If you're up to some more listening...

Atemlied: 7/10, extremely hard
badvilbel: could not abx the difference
bruhns: could not abx the difference
furious: could not abx the difference
keys: could not abx the difference
triangle: could not abx the difference

Very good quality.
Title: lossyWAV Development
Post by: Nick.C on 2007-08-09 21:52:17
Thanks very much for the additional ABX'ing Halb27! Hopefully we're nearer the mark with the following settings:

Okay, 3 analyses, noise_threshold_shift=-3, triangular_dither, smart clipping reduction as before.

49 files: WAV=111MiB; FLAC=63.4MiB; LossyFLAC=42.0MiB.

So, a 1/3rd reduction over the original FLAC filesize - that can't be bad? This equates to approx 536kbps average for this (problematic samples) fileset.

Just processed "The Travelling Wilburys Collection" Disc 1: WAV: 431MiB; FLAC: 307MiB (1005kbps); LossyFLAC: 143MiB (468kbps), using the same settings as above.

[edit] I've noticed that the reference_threshold values calculated just prior to the calculation of the threshold_index values are *extremely* close to linear in two senses in the bits sense and in the fft_length sense, so the whole set of results can be calculated (closely) as in the attached code:

This gives the same average bits to remove figures (to 0.001 accuracy) for a file for dither_choice=1 or 2 and within 0.006 bits average for dither_choice=0.

The variables_filename is no longer dependent on noise_threshold_shift - that's done later, so less calculating of constants.....
[/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-08-10 22:26:49
Tried to abx the atemlied version from your last post but no chance.
Very good.
The bitrate achieved with your sample album is also very promising.
Title: lossyWAV Development
Post by: Wombat on 2007-08-11 00:25:03
After all, this is just what us noobs can abx with these samples...
Title: lossyWAV Development
Post by: Nick.C on 2007-08-11 08:29:15
Thanks guys - now, as has been said before, we need an executable version to distribute for further testing....

Most importantly, thanks to 2Bdecided for instigating this and providing the original application of the method in script form - the only thing I've added is the conditional fix_clipped method - all the other possible settings were there.....

However, a Foobar2000 DSP plugin has to be at the top of my wishlist - it would make it all *so* much easier, and would more easily preserve tagging information.
Title: lossyWAV Development
Post by: halb27 on 2007-08-11 10:27:25
... Most importantly, thanks to 2Bdecided for instigating this and providing the original application of the method in script form - the only thing I've added is the conditional fix_clipped method - all the other possible settings were there .....

Wonderful cause I think the more variations we have the bigger is the risk of not getting extremely good quality especially in this early stage.
And as Wombat pointed out the quality verification status at present isn't an extremely good one though probably reflects what can be expected at the moment.
I personally don't care too much about it cause different from the highly efficient codecs IMO there can't go too much wrong with this approach as far as I understand it and especially as we are in the 450+(+) kbps range.

However, a Foobar2000 DSP plugin has to be at the top of my wishlist - it would make it all *so* much easier, and would more easily preserve tagging information.

At the moment I think it's more important to have a standalone exe. For integrating into foobar we can have a simple .bat file that combines the preprocessing with the flac (or whatsoever) encoding. I painlessly use a bat file that resamples to 32 kHz using ssrc_hp and encodes the result to wavPack.

Well I've looked a bit into the script in order to find out whether I should try to produce an exe (at the moment I'm too busy but maybe that's different in a few weeks).
From first view it's not unrealistic cause it's not a very large script and a big part of it I think is not too hard to write in other languages. Anyway there seems to be a lot of stuff that's pretty MATLAB specific (the non-scalar operations) and would be not easy to understand.
Moreover questions (like rounding) may be vitally in context with internal MATLAB representation respectively the properties of numerical data in MATLAB.
Moreover my personality dislikes doing something just formally and blindly not knowing what I'm really doing. It's not necessary (though welcome) to know the exact DSP background, but a more logically and less technical procedure in bringing this code to another language would be welcome.
Looking at your last script it seems well documented though not easy to understand. I can't see for instance directly which operations are done on the entire audio data of the wav file and which are done on a block basis. Maybe it's because everything is done in a large sequence of statements corresponding to a wav file. It would be easier to understand if we had instead of this large sequence of rather atomic statements (though well documented) a rather short sequence of high level statements (aka procedural calls) of the kind
            a) do (a procedural call) logical operation aaa on the entire audio track
            b) do logical operation bbb on the entire audio track
            ...................................
            c) loop through the blocks of n samples:
                c1) do logical operation aaa on the block
                c2) do logical operation bbb on the block
            ....................................
and keep any initialization operations (configurational settings as well) as much as possible inside the corresponding operation itself.

Talking about a logically operation I mean (in contrast to an internal technical operation) an operation that adresses a logical detail of the encoding preprocessing method as such and not an operation that is computationally necessary in a technical sense.

Sure these things are all a matter of taste. I just write about what I feel if I were to transcode it into another language.
Title: lossyWAV Development
Post by: Nick.C on 2007-08-11 16:16:14
Which language would you re-code the script into (assuming you got the time to do it)? I have used Turbo Pascal  / x86 Assembler very successfully in the past and more recently have hacked about with Visual Basic (inside Excel, for work related engineering calculations). If it is a language to which I can get access then I will more than happily contribute to the coding exercise.

I will try to "compartment"  and / or sub-function the code and add comments which make it more clear which element does what. At this time, it may be useful to remove the portion which loops through and processes a number of files - each file would be the subject of a single call to the executable.

Also, it may be sensible to reduce the possible settings to something like -1, -2 or -3 (per 2Bdecided's quality level statement previously), with the settings corresponding to the most recent processed sample set which has ABX'd *very* well being those for "-2". With that in mind, I would suggest that only triangular dithering be used and also that force_dither_LSB=1, i.e. always dither, even if no bits removed.

I will also try some multi-generational processing - to try and determine which settings might be appropriate for the "-1" setting.

The settings for "-3" might be more difficult - we know that there *will* be noticable artifacts in *some* samples, but without a side-by-side ABX will they be particularly noticable - these settings will be the subject of quite a lot of discussion, I think.
Title: lossyWAV Development
Post by: halb27 on 2007-08-12 09:05:10
Which language ...

Apart from time (at the moment) that's a problem to me which keeps me from saying enthusiastically 'I'll do it'.
I'm skilled to VBA and VB programming but I definitely won't do it this way. Good for small to medium sized applications within my company but not for this purpose. Especially wouldn't result in a standalone exe file.
Next language I'm most used to in recent years is Euphoria but as this is so special and the code should be shared this is also not the way to go.
Next comes Pascal aka Delphi so this is the most probable language I'd use.
Best for shared code would be C, and because my Delphi experience has a bit come to age and as I did code in C a long time ago I will consider this too. But I am aware I have to obey (not entirely but also) to my own emotions, and I definitely prefer Pascal coding over C coding.
So I guess I'd do it in Delphi. Delphi performance is good, so this shouldn't be a problem, especially as there is the possibility to use Assembler which should be restricted to minor parts of the code of course if used at all.

... I have used Turbo Pascal  / x86 Assembler very successfully in the past and more recently have hacked about with Visual Basic (inside Excel, for work related engineering calculations). If it is a language to which I can get access then I will more than happily contribute to the coding exercise. ...

Wonderful. So let's go Pascal/Delphi.
I wonder a bit about why you transcoded the code to this MATLAB clone instead of directly going Pascal. You obviously have a deep understanding of the code involved.
If it's just about the understanding of reading/writing a wav file which is a black box in the MATLAB script I can help you out. I've done it in my wavPack quality checker but it's pretty simple anyway at least when restricting to the basic wav structure used on Windows based pcs (going more general can be done later).

I will try to "compartment"  and / or sub-function the code and add comments which make it more clear which element does what. At this time, it may be useful to remove the portion which loops through and processes a number of files - each file would be the subject of a single call to the executable.

Wonderful. That should make the logics clearer and invite other programmers to take part in coding.

Also, it may be sensible to reduce the possible settings to something like -1, -2 or -3 (per 2Bdecided's quality level statement previously), with the settings corresponding to the most recent processed sample set which has ABX'd *very* well being those for "-2". With that in mind, I would suggest that only triangular dithering be used and also that force_dither_LSB=1, i.e. always dither, even if no bits removed.

Great. This will clear things up even more and make things easier to understand.
Most consequent would be a restriction to exactly what's used right now (and keeping in mind and/or keeping track of in another place what can be changed to arrive at other options/settings).
Title: lossyWAV Development
Post by: Nick.C on 2007-08-14 09:25:02
Okay, latest version of the script, more heavily commented.

I will be installing Turbo Delphi tonight and expect to have absolutely *nothing* useful for a few days as I work out simply how to set about creating a win32 command line executable.......

I have implemented the "single-command-line-option" principle and have further developed the use of pre-calculated constants for calculating reference_threshold values.

Using the 4 settings contained in the script (-1=VHQ (estimate), -2=ABX'ed good quality settings, -3=estimate at "reduced quality" settings and -0 = 2Bdecided's original settings):

WAV=111.9MiB; FLAC=63.4MiB; -0=39.9MiB; -1=48.3MiB; -2=42.0MiB and -3=38.9MiB.
1411kbps; 800kbps; 503kbps; 609kbps; 530kbps; 491kbps.
Title: lossyWAV Development
Post by: halb27 on 2007-08-14 13:30:34
Thanks for your work.
It's a good idea to have the more technical parameters bundled as details of quality options.
Makes things a lot clearer.

I've done a first more detailed look at the script.

If I see it correctly, the script is not self-contained for transcoding to Delphi with respect to the conv, fft, and hanning function (apart from wavread/write), which have to be coded from other sources and/or own understanding. The hanning function should be easy to implement if I have taken that correctly from a short google search.

The script can be made easier if it would restrict to the case use_calculated_reference_threshold = 1 used with any compression_option except for option 4.
Though I'd like to know how to arrive at the reference_threshold by simulated noise it looks to me like this can be done in a special tool (MATLAB welcome) to arrive at the rt_b_b constants used with use_calculated_reference_threshold = 1.

Many MATLAB specials are getting clear when asking Google, but what do the curly braces mean in for instance
spreading_function{analysis_number}=ones(spreading_function_length,1)/spreading_function_length; ?
The right hand side is clear, it's just a vector of the spreading weights.
So spreading_function must be this vector. But this vector does not depend on analysis_number, and even if it did: what's the meaning of the curly braces?

Moreover: What's
peaks_over=length(find(inaudio==peak_max));
Shortly it sounds like the number of samples with a peak_max value. But as inaudio is composed of the vectors of samples for the left and right channel: is peaks_over an array giving the number of peak samples for the left and the right channel seperately, or is it a scalar counting the peak levels of both channels together? From usage it looks like it's a scalar.

Sorry for asking such stupid questions but I'm totally new to MATLAB code.
Title: lossyWAV Development
Post by: Nick.C on 2007-08-14 14:24:38
1: the script is not self-contained for transcoding to Delphi with respect to the conv, fft, and hanning function (apart from wavread/write), which have to be coded from other sources and/or own understanding. The hanning function should be easy to implement if I have taken that correctly from a short google search.

Yes;

The script can be made easier if it would restrict to the case use_calculated_reference_threshold = 1 used with any compression_option except for option 4.

Absolutely - if those with clearer knowledge of the topic are happy with this shortcut;

Though I'd like to know how to arrive at the reference_threshold by simulated noise it looks to me like this can be done in a special tool (MATLAB welcome) to arrive at the rt_b_b constants used with use_calculated_reference_threshold = 1.

My only concern at the moment is that the calculated constants relate to specific low and high frequency limits, therefore high_frequency_bin / low_frequency_bin values. Scratch that, I have just started looking at 20Hz to Nyquist frequency and the constant *seems* to be very close to that calculated for 20Hz to 15848Hz (23/32*44100) on only 128 iterations........

Many MATLAB specials are getting clear when asking Google, but what do the curly braces mean in for instance: spreading_function{analysis_number}=ones(spreading_function_length,1)/spreading_function_length; ?


The curly brackets allow you to refer to an array (which need not be of constant dimensions) from another array (or at least that's the way that I have rationalised it out), more like a pointer.

Moreover: What's peaks_over=length(find(inaudio==peak_max));

find(inaudio==peak_max)); produces a list of indices of values which are equal to the peak_max value, looking at both channels (in the case of stereo). length gives the total number of instances, ie. the length of the array.
Title: lossyWAV Development
Post by: halb27 on 2007-08-14 15:31:41
... My only concern at the moment is that the calculated constants relate to specific low and high frequency limits, therefore high_frequency_bin / low_frequency_bin values. Scratch that, I have just started looking at 20Hz to Nyquist frequency and the constant *seems* to be very close to that calculated for 20Hz to 15848Hz (23/32*44100) on only 128 iterations........
Thanks for your answer.
What about different sampling frequencies like 32 kHz?
Is the script taking full care of that (for instance concerning the constants which make up for reference_threshold) or are there some holes to be filled?
(Of course I ask cause I'm a 32 Khz lover).
Title: lossyWAV Development
Post by: Nick.C on 2007-08-14 20:43:01
Thanks for your answer.
What about different sampling frequencies like 32 kHz?
Is the script taking full care of that (for instance concerning the constants which make up for reference_threshold) or are there some holes to be filled?
(Of course I ask cause I'm a 32 Khz lover).


The high_frequency_limit will influence the high_frequency_bin, i.e. 16kHz hfl > hfb=32 (16000/32000*64) on a fft_length of 64. So, the calculated reference_threshold *should* work for all input frequencies - I think.

I tried badvilbel at 32kHz using PPHS and it was nasty even before I processed it. However PPHS worked well at 29.4kHz (i.e.44.1kHz * 2/3). Not sure if my iPAQ plays 29.4kHz accurately.
Title: lossyWAV Development
Post by: halb27 on 2007-08-14 21:26:16
...I tried badvilbel at 32kHz using PPHS and it was nasty even before I processed it. However PPHS worked well at 29.4kHz (i.e.44.1kHz * 2/3). Not sure if my iPAQ plays 29.4kHz accurately.

29.4 kHz is a bit  too low for real good quality (32 KHz is on the edge for me).

But your bad 32 kHz quality seems to be a PPHS problem. I use ssrc_hp and I'm very happy with it (after having found out to use the --twopass option to avoid clipping).
You can get it from http://shibatch.sourceforge.net/ (http://shibatch.sourceforge.net/) if you like to try it.
Title: lossyWAV Development
Post by: Nick.C on 2007-08-14 21:33:26
You can get it from http://shibatch.sourceforge.net/ (http://shibatch.sourceforge.net/) if you like to try it.


Thanks for the pointer - I'll install it and try it out.....

Back to something that you said earlier - you use ssrc to resample to 32kHz, using a batch file, if I remember correctly? Could you please post a copy of the relevant batch file as I'm interested in how it achieves the resampling / FLAC & tag operations.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-08-14 21:38:08
There are lots of places where the code is written to allow lots of tweaking. If such tweaking is not going to happen, it could be simplified.

The reference thresholds are one example. If fixed, with flat dither (as now) they can be calculated without all that simulation and are independent of low and high frequency limits. They depend on noise amplitude and fft size only.

Please don't ask for a formula - it's too late. (I have young kids and a job - 9:30pm is late!). I think it's already in the unfinished unworking noise shaping version - I'll have a look some time this week, if it helps.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-08-14 21:54:12
Thanks David - the apparently planar nature of the reference_threshold values for different fft_length values seems to be too good an opportunity to miss. I'm trying to determine constants for different dither amplitudes too.

I should be up and running with Delphi tomorrow - tonight was scratched because I received a 2nd hand RAID card (eBay ftw!) today, so I *had* to reconfigure my home server  .

Ditto with the kids and job  - addicted to playing with the script I guess...... Thanks again for the script to play with - it's been great fun trying out all the various dead-end methods of reducing even further - then discarding them in favour of what you already had in there.
Title: lossyWAV Development
Post by: halb27 on 2007-08-14 22:11:43
...Back to something that you said earlier - you use ssrc to resample to 32kHz, using a batch file, if I remember correctly? Could you please post a copy of the relevant batch file as I'm interested in how it achieves the resampling / FLAC & tag operations.

No problem, but probably it will be no help to you as you want to care about tagging. My bat file just joins ssrc_hp and wavPack:

C:\Programme\wavPack\ssrc_hp.exe --rate 32000 --twopass --dither 0 --bits 16 %2 tmp.wav
C:\Programme\wavPack\wavPack.exe %1 tmp.wav %3
del tmp.wav

%1: wavPack options
%2: input wav File (foobar's %s)
%3 output wavPack file (foobar's %d)

My personal tagging stategy is easy: I only use the title, artist and album tags, and they make up for the filename of my lossless ape files.
This filename tagging makes it easy through the encoding procedure with my bat file.
As a final step I use mp3tag to convert the filename 'tags' into real wavPack tags.
Title: lossyWAV Development
Post by: Nick.C on 2007-08-15 23:41:59
Thanks for the information - I'll try to get my head around applying it to <optional SSRC>, lossyFLAC.exe, FLAC.exe (with tags) later.......

I've installed Turbo Delphi (36214 days of licence left!) and started with the basics - set parameters from the command line. I will post code when it is a little more advanced and also hunt for code to read / write .WAV files.

To make the process quicker and less memory hungry, I think that the variable fix_clipped method *may* have to bite the dust - we would have to read the (potentially *enormous*) .WAV file twice and we almost certainly couldn't read it all into RAM - again assuming unlimited filesize. So, the next step is to use 2Bdecided's 30/32 multiplier (for triangular_dither) to reduce the amplitude of the audio data block by block.

Trying to write in Delphi / Pascal after Matlab is painful - I must stop writing in Matlab.........
Title: lossyWAV Development
Post by: halb27 on 2007-08-16 08:19:33
.. fix_clipped method ...

As for that David Bryant's remark comes to my mind saying that when preprocessing for wavPack clipping should not be avoided as wavPack benefits not only from a sequence of trailing zero bits but also of a sequence of leading 1 bits.
So I think it's a good idea to have a corresponding option on the command line.

May be it's good to think of these things in a pure logical way. This means having an optimize option for potentially various target formats, something like '-optimize <format-extention>', that is '-optimize wv' when it's up to wavPack. The optimization potential for various formats is restricted (may be restricted to just not doing clipping prevention for wavPack) but it keeps up the possibility for anything that will come up.

As you are about starting coding right now which I can't (and you're the expert anyway):
How can I help you with things that doesn't take me too much time at the moment? Shall I look for Delphi fft and conv implementations resp. correspending Pascal code?
Title: lossyWAV Development
Post by: Nick.C on 2007-08-16 08:50:55
As for that David Bryant's remark comes to my mind saying that when preprocessing for wavPack clipping should not be avoided as wavPack benefits not only from a sequence of trailing zero bits but also of a sequence of leading 1 bits.
So I think it's a good idea to have a corresponding option on the command line.

May be it's good to think of these things in a pure logical way. This means having an optimize option for potentially various target formats, something like '-optimize <format-extention>', that is '-optimize wv' when it's up to wavPack. The optimization potential for various formats is restricted (may be restricted to just not doing clipping prevention for wavPack) but it keeps up the possibility for anything that will come up.

As you are about starting coding right now which I can't (and you're the expert anyway):
How can I help you with things that doesn't take me too much time at the moment? Shall I look for Delphi fft and conv implementations resp. correspending Pascal code?
 

No problems with trying to make this WAV processor work with more than the initially targetted FLAC format - the more the merrier! Maybe "-f" for FLAC and "-w" for WavPack? I am a fan of simplistic command lines with single character switches (if possible - and this is not going to be *too* complex......).

I am just beginning to start coding - if you could find fft and conv implementations that would be excellent - I'll get going on the functional elements and introduce procedures / functions in great number to reduce the complexity of the main code.
Title: lossyWAV Development
Post by: halb27 on 2007-08-16 09:41:40
... Maybe "-f" for FLAC and "-w" for WavPack? I am a fan of simplistic command lines with single character switches (if possible - and this is not going to be *too* complex......).
...

Fine, however - for definiteness and greater clarity - what about -<format-extension> like -flac or -wv as the optimization option?

I'll go and find fft and conv implementations.
Title: lossyWAV Development
Post by: Nick.C on 2007-08-16 10:06:58
Fine, however - for definiteness and greater clarity - what about -<format-extension> like -flac or -wv as the optimization option?
Yes, I see your point and agree - "-flac" and "-wv" it is! Thanks for volunteering to go hunting for code...... I'll start on the wavread / wavwrite implementations tonight.
Title: lossyWAV Development
Post by: halb27 on 2007-08-16 12:12:25
... I'll start on the wavread / wavwrite implementations tonight. ...

Oh.. I forgot that. I can transcode my Euphoria reading and writing of wav files to Delphi.
Can do it this weekend if you can wait for that.
Title: lossyWAV Development
Post by: Nick.C on 2007-08-16 14:52:24
Oh.. I forgot that. I can transcode my Euphoria reading and writing of wav files to Delphi.
Can do it this weekend if you can wait for that.
Absolutely! I'll try to get the rest of the algorithm side of it as far advanced as possible while waiting for fft, conv, wavread and wavwrite.

Many thanks!

[edit] May have found viable FFT / CONVOL routines - TPMAT036 - certainly look promising, and free! Available at http://www.unilim.fr/pages_perso/jean.debo...math/tpmath.htm (http://www.unilim.fr/pages_perso/jean.debord/tpmath/tpmath.htm) [/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-08-16 22:29:15
[edit] May have found viable FFT / CONVOL routines - TPMAT036 - certainly look promising, and free! Available at http://www.unilim.fr/pages_perso/jean.debo...math/tpmath.htm (http://www.unilim.fr/pages_perso/jean.debord/tpmath/tpmath.htm) [/edit]

Oh, you're real fast !!!
I looked up the documentation and it looks very promising as you said.
Title: lossyWAV Development
Post by: halb27 on 2007-08-19 14:27:00
Well, I've done some Delphi Coding and createdAs for bits per sample I think 8 bits per sample need not be supported. I reject such input files.
At the moment I also reject 24 bit per sample files. The structure of the unit is thus that 24 bit are supported but with actual reading and writing this is not the case. I will add this within the next days (now I'm gonna prepare dinner for friends).

As for the sampling frequency I reject any sample frequency below 32 kHz and above 48 kHz. I am afraid the logical details of the preprocessing procedure will depend on the sampling frequency as the number of samples taken into account correspond to a certain time period. If everything is optimized for 44.1 kHz things may work fine for 48 kHz cause this is just ~10% off. With 32 kHz it's worse (~30% off).
Anyway I think we should be conscious about it and take care of everything we support.
In order to make things precise (what we support) I've restricted sample frequency to the range mentioned which probably is the most common range anyway.

[attachment=3625:attachment]
Title: lossyWAV Development
Post by: Nick.C on 2007-08-19 20:15:58
Thanks for the code - I'm back from a weekend away, so should be able to devote some time to the project this week.
Title: lossyWAV Development
Post by: halb27 on 2007-08-20 22:56:22
24 bit input files now supported in unit wavIO:

[attachment=3628:attachment]
Samples are read and written blockwise where a block corresponds to a FLAC/wavPack/TAK etc. block.
wavIO deals with sample blocks for channel 0 and 1 (in the case of stereo) of the kind:
sampleBlockCh0, sampleBlockCh1: array[0..blocksize] of LongInt;

Thus sample values are always 32 bit integers. With 16 (24) bit files the 16 (24) bit make up for the 16 (24) MSBs and the remaining bit are set to 0.

(In my previous version the 16 bit samples were just 16 bit ints (judging from value range) in an 32 bit integer container which was not a good idea as 16 bit and 24 bit input files would not have a matching representation).

Edited: Link is to new zip file. Sorry I forgot to remove testing statements in the previous version.
Title: lossyWAV Development
Post by: halb27 on 2007-08-21 17:36:45
I'm just playing round with the bibilolo sample from recent 64 kbps listening test.
As it's a bandwidth testing sample I wanted to find out whether or not my 32 kHz resampling does have an audible effect for me with this sample. However what I found was much more of concern: it's a very problematic sample for wavPack lossy, for instance at sec. 17.2-19.2.

So it may be worth while testing with this preprocessor. Nick.C., do you mind processing it?
AlexB showed me it's sample 3 from Gabriel's samples for an 48 kbps AAC test:

http://www.mp3-tech.org/tests/aac_48/samples/ (http://www.mp3-tech.org/tests/aac_48/samples/).
Title: lossyWAV Development
Post by: Nick.C on 2007-08-21 23:20:03
I'm just playing round with the bibilolo sample from recent 64 kbps listening test.
As it's a bandwidth testing sample I wanted to find out whether or not my 32 kHz resampling does have an audible effect for me with this sample. However what I found was much more of concern: it's a very problematic sample for wavPack lossy, for instance at sec. 17.2-19.2.

So it may be worth while testing with this preprocessor. Nick.C., do you mind processing it?
AlexB showed me it's sample 3 from Gabriel's samples for an 48 kbps AAC test:

http://www.mp3-tech.org/tests/aac_48/samples/ (http://www.mp3-tech.org/tests/aac_48/samples/).
Not a problem at all - attachment processed using -2 presets as agreed in the previous posts.
I'm having "fun" with Delphi - my head hurts after a few hours with it and it's late now. I will try to have a (very limited) version of lossyWAV.exe available later this week.
Title: lossyWAV Development
Post by: halb27 on 2007-08-21 23:35:03
Thank you.
Result is very good, no audible problem to me (though I will do it again more carefuly tomorrow).
The preprocessor really shines. It knows when not to throw away a lot (negligible saving in bitrate with this sample).
wavPack lossy does it the other way around and uses a bitrate lower than the nominal one (rare with wavPack). This is a sample where kind of a quality control with wavPack lossy is missed badly.
Title: lossyWAV Development
Post by: Nick.C on 2007-08-22 22:40:02
  Okay then - v0.0.1 of lossyWAV.exe.
It *will* crash occasionally. badvilbel always makes it crash for instance;
Settings are not yet fully implemented.
Quality checks are not yet implemented.
Only one fft length (1024) is used at present.
Posting just for those who want to play with it at this early stage.
syntax: lossyWAV <inputfilename> <outputfilename>
Have fun!


[edit 20070825] Too little, too early - sorry. File removed.[/edit]
Title: lossyWAV Development
Post by: Nick.C on 2007-08-23 10:31:46
Foobar2000 compatible batch file to use as an external encoder:

Code: [Select]
@echo off
set lossyWAV_path="c:\data_nic\_wav\lossyWAV.exe"
set flac_path="c:\program files\flac\flac.exe"
%lossyWAV_path% %1 "%~D1%~P1%~N1.ss.wav"
%flac_path% -8 -f -b 1024 -o"%~D1%~P1%~N2%~X2" "%~D1%~P1%~N1.ss.wav"
del "%~D1%~P1%~N1.ss.wav"
set lossyWAV_path=
set flac_path=
Remember (on multi-processor / multi-core processor PC's) to set affinity of Foobar2000 to only one processor - or it will crash when trying to process the second file on the convert list.

See attached image for settings in Foobar2000. Superseded.
Title: lossyWAV Development
Post by: collector on 2007-08-23 14:35:54
Code: [Select]
.WAV   59317484  Same Thing -org.wav      1411   org

.FLA   31141612  Same Thing -1.flac        741   org
.FLA   29462093  Same Thing -8.flac        701   org
.FLA   13009242  Same Thing -lf.flac       309   flossy
.FLA   20282435  Same Thing shi.flac       482   32 kHz samplerate

Just a quick test. For dos/win98/ lovers: BatchEnc 1.51 from Speek works too 
I think it's a promising project. Thanks. Not abx'ed yet.
Noticed that not only the high frequencies are cut off at 32 k, but I'm also missing the deep bass in my test sample. Which is Same thing from Bonnie Raitt
Title: lossyWAV Development
Post by: halb27 on 2007-08-23 15:23:15
For resampling I suggest to use ssrc_hp with the --twopass option (to avoid clipping). Haven't found any problem with it so far.

32 kHz is a standard sample rate and as such has its own merits, but maybe something like 35 kHz together with flossy is a very attractive choice.
35 kHz can be played back for instance by foobar, winAmp (and also my rockbox armed iRiver DAP). It may depend on your sound card however.
Title: lossyWAV Development
Post by: guruboolez on 2007-08-23 16:51:50
Thanks for this tool (and the screenshot!!!). I was curious to see how much space it would spare with some quiet classical music (lossless bitrate <400 kbps).
I give a try: as expected it didn't spare that much bitrate (7 kbps). But the bad thing is that it's easily ABXable, even at low playback volume (no replaygain and volume knob on a quiet position):

http://www.megaupload.com/?d=WAEP6D5F (http://www.megaupload.com/?d=WAEP6D5F)





EDIT:

I found even worse:
now lossy encoding is ~50% bigger than lossless but awfully noisy ?!

http://www.megaupload.com/?d=WL4G98P7 (http://www.megaupload.com/?d=WL4G98P7)
Title: lossyWAV Development
Post by: 2Bdecided on 2007-08-23 16:54:38
But guru, this isn't a complete port of lossyFLAC - it's a first stab. It's missing half the analysis and won't be anywhere near transparent!

And you're right - it's less useful for "your kind of music" which is usually (intelligently) encoded with little loss, which is exactly what's required.

If you have any test samples which you'd like encoded properly with the MATLAB version, just post them here.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-08-23 16:55:06
Disappointed (but not *really* surprised) to hear that - sorry if I raised false hopes / expectations.

I will try to implement the multi-length FFT analyses and also re-introduce the noise_threshold_shift tonight at the same time as reinstating the settings derived from Wombat and Halb27's ABXing earlier in the thread.

Additional comment as the build quality increases / becomes measurably closer to the Matlab script will be very gratefully received - observations are always useful.

I intend to carry out some side-by-side testing to allow codec-block-by-codec-block checking of the bits_to_remove for each analysis fft_length - to see if the Delphi version matches the Matlab version.

I also freely admit that v0.0.1 is a "quick win", i.e. the first build that actually uses the fft analysis and threshold_index values to determine number of bits to remove, and does not (hopefully) represent the quality of output of later versions.

Possibly I shouldn't have posted it publicly at such an early stage - I may need to resort to a more private alpha test scenario.
Title: lossyWAV Development
Post by: guruboolez on 2007-08-23 17:01:37
I'm sorry... I thought this tool include the full analysis.
If you have any test samples which you'd like encoded properly with the MATLAB version, just post them here.

Cheers,
David.

The problem is: I don't know what kind of sample may be interesting with this kind of processing. That's why I was waiting to experiment on my side with a wide library of sample.
Anyway, I recall that my gallery of 150 samples (http://gurusamples.free.fr/)is still online and if something must go wrong with this kind of PCM processor this collection may help to find it.
Title: lossyWAV Development
Post by: collector on 2007-08-23 17:04:16
For resampling I suggest to use ssrc_hp with the --twopass option (to avoid clipping).

I already do, thanx to you. The aac 'problem' isn't a problem to me, it was merely a test. I only use flac, and mp3 for the wife's portable.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-08-23 17:21:03
I intend to carry out some side-by-side testing to allow codec-block-by-codec-block checking of the bits_to_remove for each analysis fft_length - to see if the Delphi version matches the Matlab version.


As long as you have dither switched off, you can compare the resulting .wav files. They'll be identical _if_ you use the same reference noise thresholds for both. In reality, since the reference thresholds are set by measuring a sample of noise, they probably won't be - don't let that surprise you or make look for bugs that aren't there!

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2007-08-23 18:55:44
...I intend to carry out some side-by-side testing to allow codec-block-by-codec-block checking of the bits_to_remove for each analysis fft_length - to see if the Delphi version matches the Matlab version. ...

I understand very well that you are proud of being so extraordinary quick in creating this first Delphi version of lossyWave but I also thought it produces what you've arrived at with the MATLAB script.
An intermediate state isn't so useful I think.
It's a very good idea to test parts of the Delphi version so that it arrives at exactly the same result as the MATLAB version (with the exception of possibly errors found in the MATLAB version). After all you've arrived already at a quality with the MATLAB version which seems to be very good and which should be preserved.
So I think it's worth waiting some more days (or weeks) and a having a real productive version for public testing.

With what parts can I help getting on? After I have started contributing a little bit I also want to go ahead right now. Cleaning up the photos from my summer holidays with which I'll be busy for some weeks can wait.
Title: lossyWAV Development
Post by: Nick.C on 2007-08-25 18:37:33
@halb27: I would very much appreciate it if you could further develop the cliParameters unit to allow settings (extra settings), of the type "-'char'" followed by numeric or text parameter depending on the char in the parameter, e.g. -b 16 to force 16 bit output regardless of input bitdepth; -c 1024 to set codec_block_size to 1024 bytes; etc, etc.

I would also expect to automatically derive the output filename from the input, i.e. outfile = name.lossy.wav; or possibly specify an output directory (-d pathname\ ?)

I know that the latter concept was to limit the user specifiable options to one, i.e. -1, -2 or -3, but at this stage it might be useful to allow the user to over-ride certain settings in the pursuit of settings less easy to ABX.........

p.s. please PM me an e-mail address and I will forward the latest project code.
p.s.2. code is getting neater and more understandable - there are some no-parameter functions and procedures - horrible coding, but fast and it makes the code clearer - as you already mentioned. I will be looking at multiple analyses tonight and coding a CONV routine - very simplistic with the [0.25,0.25,0.25,0.25] spreading_function.

Best regards,

Nick.
Title: lossyWAV Development
Post by: Nick.C on 2007-08-27 22:00:43
@halb27 - ygem.
Title: lossyWAV Development
Post by: Wombat on 2007-08-27 22:09:40
Anyway, I recall that my gallery of 150 samples (http://gurusamples.free.fr/)is still online and if something must go wrong with this kind of PCM processor this collection may help to find it.

Really looking forward to this input!
Title: lossyWAV Development
Post by: Nick.C on 2007-08-28 08:07:25
Guru, thanks for the pointer to the samples - links to E16, E36, S10 & S43 are broken.

I have processed the 146 150 files I downloaded (using the Matlab script) and the size results are as follows:

WAV: 252.3MiB; FLAC: 122.2MiB; Matlab Script lossyFLAC:100.9MiB (current lossyWAV.exe: 93.3MiB)

Matlab script output attached.

The Delphi translation is coming on well - the exe will now carry out multiple analyses *without* overlapping FFT analysis. My task for tonight is to reinstate the overlapping FFT analyses, firstly within a 2^n block length and secondly either side of an arbitrary block length.

The crashing bug has been identified as being caused by digital silence and has been fixed.
Title: lossyWAV Development
Post by: guruboolez on 2007-08-28 11:56:24
The four links are corrected. Thanks for these notifications
Title: lossyWAV Development
Post by: Nick.C on 2007-08-30 11:52:43
Still implementing the script in Delphi - now capable of analysis overlapping the edges of the codec_block for "power-of-2" length codec_blocks. At the moment, the output is not exactly the same as the Matlab script, however the bits removed values have come down to a closer approximation of the script output, indeed over Guru's 150 samples the flossy files are almost exactly the same size in total, if not individually.

Input parameters are being developed by Halb27 as is the wavIO unit - very gratefully received, although not yet selectable by the user - current code only processes 3 analyses, 10, 8, 6 bit fft_lengths, noise_threshold_shift=-3.0, triangular_dither.

I hope to have lossyWAV alpha v0.0.2 available this weekend, which will only be released if it more closely matches the Matlab script output.

Trying to speed optimise the code as well, but that necessarily requires to be delayed until I actually achieve matched output.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-08-30 16:44:42
I know it's in this thread somewhere, but which sample was ABXed with noise_threshold=0? What were the other parameters?

I'm not saying it shouldn't happen, just wondering where it did.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-08-31 19:21:42
I think that it may have been atem_lied - but I'm not sure.  It was in a spread of 2 or 3 analyses with or without my experimental pereceived noise algorithm - also, I was (at that time) varying the length of the spreading_function.
Title: lossyWAV Development
Post by: halb27 on 2007-09-01 11:48:54
I guess it was when I abxed atemlied, but I don't know how what I listened to was related to internal quality settings.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-11 11:20:20
lossyWAV alpha v0.0.2 attached: - superseded by v0.1.0 - see later post.

Command line parameters for quality and codec_block_size over-ride implemented;

Overlapping fft analyses implemented;

Output not yet identical to 2Bdecided's script - examination of low level fft output information required.

Feedback extremely welcome - unless tied to a brick...... 

Nick.


[edit]
Crashing bug found and fixed (fingers crossed) - v0.0.2b attached. Superseded.
[/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-09-11 19:22:45
Hi Nick,

Thank you very much for this version which is already pretty good.

I tried 10 samples most of them known as problematic to wavPack, and out of these 10 I could only abx 2:
badvilbel has a tiny amount of added hiss which I could abx 8/10 at sec. 5.6-7.2. Not tranparent, but a negligible issue IMO.
bibiblolo is worse, and even at low volume there's kind of distortion at sec. 17.1-19.2. No abxing necessary.

Atemlied is fine to me.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-12 09:49:12
I think that I've found the problem - my poor attempt at a CONV function seems to be the root of the problem. I will start to re-write it tonight.

Also, I think that I've finally determined the cause of the random crashing - a poorly written Magnitude (of a complex number) routine - fixed.

[edit] I've convinced myself that the fft routine that I am using is indeed accurate - so, the next problem *really* is the conv routine. ~sigh~. [/edit]
Title: lossyWAV Development
Post by: Nick.C on 2007-09-14 12:15:59
<minor fanfare> lossyWAV alpha v0.1.0 attached. Superseded.

Closely approximates matlab script by 2Bdecided.

CONV function fixed, window function fixed (was the *real* problem).

Have fun!

[edit] window function properly fixed (finally). Even more closely approximates matlab script. lossyWAV alpha v0.1.0b attached[/edit]

[edit2] v0.1.0b superseded. [/edit2]
Title: lossyWAV Development
Post by: halb27 on 2007-09-16 21:22:02
Thanks a lot, Nick. Great News.
I've just come home from visiting friends this weekend and I'm too tired for a listening test right now. Will do it tomorrow.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-16 22:36:22
Thanks a lot, Nick. Great News.
I've just come home from visiting friends this weekend and I'm too tired for a listening test right now. Will do it tomorrow.

Don't try *that* one - try *this* one.....  v0.1.1 - Superseded by v0.1.2

Added :
-w option to suppress warnings,
-q option to reduce screen output.
Title: lossyWAV Development
Post by: bryant on 2007-09-17 02:06:26
Hey guys, it looks like you're making good progress! 

I was playing around with alpha v0.1.1 and ran into a couple problems. The first was that I couldn't open files that were write protected. Not a big deal really, but I'm sure that's not how you intended it.

The second problem is that I found several files that caused a range error exception right near the end of the file and truncated the output (I suspect some error in handling the last block). I am attaching two of the failing files.

Cheers!

David
Title: lossyWAV Development
Post by: Nick.C on 2007-09-17 07:08:34
Hey guys, it looks like you're making good progress! 

I was playing around with alpha v0.1.1 and ran into a couple problems. The first was that I couldn't open files that were write protected. Not a big deal really, but I'm sure that's not how you intended it.

The second problem is that I found several files that caused a range error exception right near the end of the file and truncated the output (I suspect some error in handling the last block). I am attaching two of the failing files.

Cheers!

David

Many thanks - which quality setting were you using for the failed files (although it might not make a difference)?

The read-only file problem is much easier to fix. - Fixed for v0.1.2

Thanks again!

Nick.

[edit] I cannot seem to replicate the range exception failure - what CPU are you using? [/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-09-17 08:42:54
I'll look into the wavIO unit tonight for why write-protected files aren't usable.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-17 09:13:45
I'll look into the wavIO unit tonight for why write-protected files aren't usable.

I've already done it - filemode:=0 before opening infile; filemode:=2 before opening outfile - I'll em you the revised source.

I've changed from extended reals to double reals (which has speeded things up a bit) - v0.1.2 attached. - Superseded by v0.1.3

I've been playing with the spreading function length (again....). For badvilbel, quality 2:

SFL :  BTR
1    : 0.4119
2    : 1.2374
3    : 1.6652
4    : 1.9018 (default)
5    : 2.1192
6    : 2.3611
7    : 2.5811
8    : 2.7572

This may be a setting to be changed when defining permanent quality level settings. I may implement a switch to allow the spreading function length to be reduced only.

[edit] Minor bug - required enter to be pressed if an error occured. [/edit]
Title: lossyWAV Development
Post by: user on 2007-09-17 10:02:23
btw., do you take care, that lossyFlac cannot be mistaken as *.flac ie. Lossless flac ?
(or exchange flac with any lossless format, which this algorithm is used on)

The difference between lossyFlac and Flac should be recognizeable already by file extension .flac vs. .lfla or whatever.

it is somehow bad wording also, if you call something:

lossy free lossless audio codec.

Either it is lossy or lossless.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-17 10:15:29
btw., do you take care, that lossyFlac cannot be mistaken as *.flac ie. Lossless flac ?
(or exchange flac with any lossless format, which this algorithm is used on)

The difference between lossyFlac and Flac should be recognizeable already by file extension .flac vs. .lfla or whatever.

it is somehow bad wording also, if you call something:

lossy free lossless audio codec.

Either it is lossy or lossless.


A FLAC file is a FLAC file - how the input WAV file has been processed is anybody's guess. At present, how do you know that a FLAC file has not been created from MP3?

Yes, the extension is changed from ".WAV" to ".lossy.WAV", but apart from that there is no change other than to the WAV data itself.

I am probably going to write a test program to determine if a WAV file has been processed with this method - however with variable codec_block_size this may take quite a while to determine with accuracy for a given WAV file.

For me, this processor is for someone (i.e. me, and others) to use on their own lossless files to create processed FLAC files. It is not intended for files to be distributed in this manner - but that would be probably be illegal, wouldn't it.

In the same way as Wavpack has a lossless and lossy mode, this processor allows a user to decide for themself to use a processed WAV file to create a (smaller) FLAC file (than unprocessed, most often) for personal use.

Another option is to create a Wavpack style correction file to allow the processed data to be restored to fully lossless.
Title: lossyWAV Development
Post by: user on 2007-09-17 10:42:17
well, I knew, the wavpack lossy/hybrid example would be thrown in.

Difference:

if you create wavpack-lossy with file extension .wv , which could be mistaken as lossless-wavpack (well, even I would prefer another extension, so that lossy-wv files and lossless-wv files are different),
but you get the clear information, if you watch the properties of a wv file. it is written inside the wv file, if encoded as lossy-wv or lossless-wv.

From my very rough understanding, this new "lossyFlac-wav" method is similar, what wavpack-lossy and Optimfrog-DS already do.

The problems I foresee (maybe for me, maybe also for you, if you use lossy-wav-to-flac), that you have Lossless flac files in your system, but also these lossy-wav-to-flac files.
There might come a time, when you yourself cannot remember, which was what.
A program to detect lossy-wav-to-flac files (and to discern from true Lossless) will be another pain, and from current point of view innecessary (work for end-users to select the lossy-wav-to-flac files from true-flac files).

From a user's point of view, I would simply use Wavpack lossy or better hybrid, as obviously users of lossy-wav-to-flac want to circumvent danger of typical lossy-codec artifacts/problems, want to buy a little bit higher bitrate than lossy, but want to buy also lower bitrate than Lossless.
IMO, this is already achived by Optimfrog DS and Wavpack hybrid/lossy. Maybe the new algorithm could be integrated into existing formats, maybe making existing lossy-wavpack/Dualstream work more efficient.

Or as add-on implemented into flac, a flac-lossy/hybrid, but with clear solution of described problem, at least a clear inside information in the file, if it is flac or lossy like in wavpack wv, or maybe better new extensions.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-17 10:51:41
The problems I foresee (maybe for me, maybe also for you, if you use lossy-wav-to-flac), that you have Lossless flac files in your system, but also these lossy-wav-to-flac files.


If the processed files are converted directly, without renaming (i.e. left as filename.lossy.wav) then the resultant FLAC file will be filename.lossy.FLAC - surely that is instantly recognisable?

Yet another option would be to add a metadata tag to the FLAC file indicating that the WAV file from which it was created had been processed.

Unfortunately, to maintain compatibility with hardware / software players the ".FLAC" extension cannot be changed.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-09-17 11:07:39
We had this discussion in the original thread...

http://www.hydrogenaudio.org/forums/index....showtopic=55522 (http://www.hydrogenaudio.org/forums/index.php?showtopic=55522)

In short: it's up to someone to create a FLAC decoder or tool which checks for the wasted_bits feature being used.

If "wasted_bits" changes block by block, then it's almost certainly a lossyFLAC file. If it's the same every block, then it's probably a strange source (very rare). If it's not used at all, then it's not a lossyFLAC file (or it's lossyFLAC, but the pre-processor decided not to remove any bits at all, or just maybe it removed very few and someone changed the FLAC block size so that they didn't ever all fall within one block).


More generally, as Nick has said, you could check for 0s in the LSBs of consecutive (blocks of) samples. There is an issue in not knowing the block size, but even in a generic file editor (set show 16-bit binary values) it's going to be pretty obvious.


There is simply no failsafe way to mark these files as lossy - no more than a decoded mp3, encoded to FLAC, is marked as lossy.

There is also no way to say "this is a different format" because the whole point is that it's not. No more than a .wav created from a decoded mp3 is a different format from a .wav from a CD.

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2007-09-17 11:41:36
These things are always a matter of taste to some extent, and there will always be people who have strong emotion that FLAC means lossless in an overall sense so that to them using lossy FLAC is not an option.
The problem only starts when they think 'FLAC should mean lossless (in an overall sense) to everybody'. Sure nobody can say what something should mean to everybody, and worse so - as was pointed out already - even if the idea of a lossy FLAC preprocessing would not have come up we never know about the source quality of a FLAC file we receive from somewhere. But we're usually talking about FLAC files produced by ourselves, so this shouldn't be a problem especially as .lossy.flac files are named like that.

The most neutral point of view towards lossy flac IMO is to see it as a welcome usage extension for lossless codecs like FLAC, TAK, wavPack. Nobody needs to use these codecs with a lossy preprocessor, but whoever does gets the benefit of arriving at roughly half the file size with popular music with no audible quality impact (this is the weakest point so far due to limited experience).
The most important usage is with FLAC, as FLAC has a rather good hardware support. This is the major advantage against say OptimFrog DS. wavPack lossy is more interesting because of existing hardware support, and David Bryant has just recently made a big progress due to implementing dynamic noise shaping that works very well, and he seems to be implementing the lossy FLAC idea for a quality control. So in a sense this is the best way to go. Anyway people who can't play wavPack lossy on their hardware player, now have a good alternative when using lossy FLAC.
Title: lossyWAV Development
Post by: bryant on 2007-09-17 16:59:09
[edit] I cannot seem to replicate the range exception failure - what CPU are you using? [/edit]

Oh, that's surprising. I had guessed it would be easy (which is why I didn't provide more info).

Anyway, my CPU is a Pentium 4 Willamette 1700.

When the error occurs I get a popup with The exception unknown software exception (0x0eedfade) occurred
in the application at location 0x7c57e592
. After I click through I see on the console:

Code: [Select]
Processing : 5.08MB; Exception ERangeError in module lossyWav.exe at 0000A1F6.
Range check error.


That's all with v0.1.2. With v0.1.1 the location displayed on the console is 0000A2CA and the location on the popup is the same.

Examining the resulting WAV files I see that Track68.lossy.wav is 36470 bytes short and Track104 is 22470 bytes short.

Maybe it has something to do with uninitialized data being interpreted as FP. I assume there must be some special processing with leftover data at the end of the file...

I verified that the open on read-only files is fixed. 

David
Title: lossyWAV Development
Post by: Nick.C on 2007-09-17 19:32:58
Hmmm.... No uninitialised data is processed, I think. I will try this evening to replicate the bug and get back to you. Better yet, I'll implement a -d option to switch on a debug mode - which will tell you the number of bits being removed block by block - so the block where it crashes will be easier to determine.

Assuming that you were using the default block size (1024 samples) then 2 channels by 2 bytes per sample by 1024 = 4096 bytes per block, therefore the first crash is 8.9(04) blocks from the end and the second is 5.4(86) blocks from the end. So, unless there are additional WAV chunks after the data then I don't really understand..... Unless, that is, the last data is in the buffer within the wavIO unit. If so, it may well be down to my maths.....

Oh well, back to debugging 
Title: lossyWAV Development
Post by: halb27 on 2007-09-17 21:38:41
I've tried the new version (default quaklty) with the samples I used last time.
bibilolo is fine to me now, as are the other samples with the exception of Atemlied. I was able to abx Atemlied 8/10 though it was pretty hard.

I also tried to reproduce the problem with track68 and track104, but with my cpu (AMD mobile Athlon = low power Barton) I couldn't reproduce the problem.
Nick, are you using an AMD cpu?
I'll take lossyWav and the failing tracks with me to work tomorrow where I use an Intel cpu.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-17 21:57:30
I've tried the new version (default quaklty) with the samples I used last time.
bibilolo is fine to me now, as are the other samples with the exception of Atemlied. I was able to abx Atemlied 8/10 though it was pretty hard.

I also tried to reproduce the problem with track68 and track104, but with my cpu (AMD mobile Athlon = low power Barton) I couldn't reproduce the problem.
Nick, are you using an AMD cpu?
I'll take lossyWav and the failing tracks with me to work tomorrow where I use an Intel cpu.


I tried it on a Centrino Duo T2500 (laptop) and Core2Duo E6600 (main PC) - no problem. Kids PC is the same as your CPU.

Atem_lied is proving to be a particularly problematic sample - I'll increase the noise_threshold_shift by one (when I get through this debugging frenzy) and recompile - will be posted as v0.1.3, probably tomorrow.
Title: lossyWAV Development
Post by: halb27 on 2007-09-17 22:05:40
@David Bryant:

Do you mind trying this tiny test program which just copies in.wav to in.lossy.wav?:

[attachment=3750:attachment]

This way we can see whether the error is in the wavIO unit or not.

usage: lossyWavTest track104.wav

and it should produce a track104.lossy.wav file which is bitidentical to track104.wav.
Title: lossyWAV Development
Post by: bryant on 2007-09-17 22:36:31
@David Bryant:

Do you mind trying this tiny test program which just copies in.wav to in.lossy.wav?:

[attachment=3750:attachment]

This way we can see whether the error is in the wavIO unit or not.

usage: lossyWavTest track104.wav

and it should produce a track104.lossy.wav file which is bitidentical to track104.wav.

Hmm. I'm at work now on another P4 (Prescott, 3 GHz) and I still get the problem with the alpha on the 2 failing files (at the same spot).

I couldn't get the lossywavtest program to work. It immediately aborts with Access violation at address 004080F4 in module 'LossyWavTest.exe'. Read of address 0096C000. Sorry...

I'll try on my wife's 64-bit AMD laptop when I get home.

BTW, my system here is XP and at home I'm using Win2K. Also, I forgot to mention I'm not using any other options.

David
Title: lossyWAV Development
Post by: Nick.C on 2007-09-17 22:42:31
Update: v0.1.3. - superseded by v0.1.4.

-d parameter added : debug output mode - should help to identify codec_block number if (when) a crash occurs - command line use;

noise_threshold_shift increased in magnitude for default quality from -3 to -4.

ps. I don't know exactly what I did to the compiler, but the .exe file is much smaller - works okay though
Title: lossyWAV Development
Post by: halb27 on 2007-09-17 22:47:51
...I couldn't get the lossywavtest program to work. It immediately aborts with Access violation at address 004080F4 in module 'LossyWavTest.exe'. Read of address 0096C000. Sorry...

Thanks for the test. So it looks like it's within the wavIO unit.
Title: lossyWAV Development
Post by: halb27 on 2007-09-17 22:51:19
...I couldn't get the lossywavtest program to work. It immediately aborts with Access violation at address 004080F4 in module 'LossyWavTest.exe'. Read of address 0096C000. Sorry...

Thanks for the test. So it looks like it's within the wavIO unit.
Title: lossyWAV Development
Post by: bryant on 2007-09-17 23:12:47

...I couldn't get the lossywavtest program to work. It immediately aborts with Access violation at address 004080F4 in module 'LossyWavTest.exe'. Read of address 0096C000. Sorry...

Thanks for the test. So it looks like it's within the wavIO unit.

I was actually confused. I thought that the lossywavtest program wasn't working at all because the error came so quickly, but I hadn't realized how quickly it would go without processing!

But I just tried it on another WAV file and it worked fine; the copy is the exact same size and the input. On the failing cases the output is truncated by the same amount as using the alpha.

Sorry for the confusion, although it seems like I was the only one confused! 
Title: lossyWAV Development
Post by: halb27 on 2007-09-18 08:43:04
Sorry for my multiple last post - I had connection problems to HA and tried to send it several times expecting HA would recognize it as the identical post.

I'm at work now and tried to reproduce the problem with my Intel Pentium 4 HT 2.8 GHz powered business pc.
LossyWav and LossyWavTest are working without any problem with track104 and track68.

I'll look into wavIO this evening but will have only restricted time to do so. Can do it carefully tomorrow.
Title: lossyWAV Development
Post by: Josef Pohm on 2007-09-18 09:06:10
Update: v0.1.3.

ps. I don't know exactly what I did to the compiler, but the .exe file is much smaller - works okay though


Hmmm... not really working here, unless I'm missing something. With version 0.1.3 my pc is complaining about rtl100.bpl not being present...

Apart from that, my sincere admiration for the impressive work to you, 2BDecided and Halb27. Also Bryant interest for an integration into WavPack sounds really promising. Thank you guys!
Title: lossyWAV Development
Post by: Nick.C on 2007-09-18 09:09:42
Update: v0.1.3.

ps. I don't know exactly what I did to the compiler, but the .exe file is much smaller - works okay though

Hmmm... not really working here, unless I'm missing something. With version 0.1.3 my pc is complaining about rtl100.bpl not being present...

Apart from that, my sincere admiration for the impressive work to you, 2BDecided and Halb27. Also Bryant interest for an integration into WavPack sounds really promising. Thank you guys!

Was v0.1.2 (or earlier) okay? I'll try to "undo" the changes which caused the .exe size shrink/

Okay, .exe shrink changes undone - plus experimental fft_result skewing: -s parameter.

The fft_result skewing reduces overall bits removed by weighting the results across the frequency spectrum.

v0.1.4 attached. Superseded by v0.1.5.
Title: lossyWAV Development
Post by: Josef Pohm on 2007-09-18 09:31:44
Was v0.1.2 (or earlier) okay? I'll try to "undo" the changes which caused the .exe size shrink/
Okay, .exe shrink changes undone - plus experimental fft_result skewing: -s parameter.
v0.1.4 attached.

Yes, v0.1.2 was fine for me and so is v0.1.4.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-18 09:40:39

Was v0.1.2 (or earlier) okay? I'll try to "undo" the changes which caused the .exe size shrink/
Okay, .exe shrink changes undone - plus experimental fft_result skewing: -s parameter.
v0.1.4 attached.

Yes, v0.1.2 was fine for me and so is v0.1.4.

Many apologies for breaking it - feedback on the skewing function will be used to formulate the permanent parameter settings for -1, -2 & -3 quality settings.

Best regards,

Nick.
Title: lossyWAV Development
Post by: halb27 on 2007-09-18 09:56:11
... Apart from that, my sincere admiration for the impressive work to you, 2BDecided and Halb27. Also Bryant interest for an integration into WavPack sounds really promising. Thank you guys! ....


2BDecided is the one with the great idea behind lossyWav.
NickC is the one who is driving the Delphi project.
I just did a little contribution to it.

Nick, when I tried lossyWav at work I did it from the commandline (before I always did it from within foobar) and I saw you mention the authors when running lossyWav.
You mention me there at first place. That's very kind of you, but if you do want to mention me please do it as the last one.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-09-18 09:59:33
I've tried the new version (default quaklty) with the samples I used last time.
bibilolo is fine to me now, as are the other samples with the exception of Atemlied. I was able to abx Atemlied 8/10 though it was pretty hard.

What's the problem you're hearing with Atemlied?

We need to get this tied down, if you're willing to listen some more halb27?

I don't think just lowering the threshold is an answer. It will increase the bitrate for everything, which seems very wasteful.

It's possible that the problem with Atemlied stems from the quietest frequency bin being at a low-ish frequency, rather than a high-ish one. lossyFLAC currently treats all frequencies equally important in this respect, which isn't sensible - just easy!

I suggested the frequency dependent threshold skewing to Nick, but only gave an example of what values to try. If we get this right, I hope we can crack Atemlied without bloating bitrates across the board. SebG suggested a way of setting it (by copying the skew from what OggVorbis does with white noise - see first page of original thread) - I think we should try that.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-18 10:00:02
... Apart from that, my sincere admiration for the impressive work to you, 2BDecided and Halb27. Also Bryant interest for an integration into WavPack sounds really promising. Thank you guys! ....

2BDecided is the one with the great idea behind lossyWav.
NickC is the one who is driving the Delphi project.
I just did a little contribution to it.

Nick, when I tried lossyWav at work I did it from the commandline (before I always did it from within foobar) and I saw you mention the authors when running lossyWav.
You mention me there at first place. That's very kind of you, but if you do want to mention me please do it as the last one.

Okay - thanks! The skewing function may be a bit flaky - I'm testing a variant which may be more effective.

v0.1.5 attached. - Superseded by v0.1.6.

6dB skewing optional (-6dB at 0Hz, 0dB at Nyquist). Attached comparison of sample set "default" quality with and without skewing. Noise_threshold_shift at default quality changed back to -3dB.

Further comparison of Guru's sample set - Matlab, lossyWAV no skewing and lossyWAV skewing.

[edit] there's an error in the size information in the Guru comparison, Matlab didn't really output 122MB of FLAC files, that's unprocessed flac! [/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-09-18 11:30:11
What's the problem you're hearing with Atemlied?

I'd call it a low midrange problem. It's not really a distortion but a bit like that. It's very subtle but it's there. From memory it's at ~ second 5 after the initial equally sounding seconds where it's best audible to me.
I'll give the exact spot tonight.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-18 11:45:42

What's the problem you're hearing with Atemlied?

I'd call it a low midrange problem. It's not really a distortion but a bit like that. It's very subtle but it's there. From memory it's at ~ second 5 after the initial equally sounding seconds where it's best audible to me.
I'll give the exact spot tonight.


Could you possibly try using the v0.1.5 "-s" parameter and default quality on Atem_lied, please and try that?

Thanks,

Nick.
Title: lossyWAV Development
Post by: halb27 on 2007-09-18 13:31:17
I will do it when I'm at home this evening.
Title: lossyWAV Development
Post by: bryant on 2007-09-18 16:11:52
I'm at work now and tried to reproduce the problem with my Intel Pentium 4 HT 2.8 GHz powered business pc.
LossyWav and LossyWavTest are working without any problem with track104 and track68.

Here's another data point. My wife's laptop (AMD Turion 64 Mobile, 1.8 GHz, XP) also fails with both v0.1.5 and lossywavtest. Now this is getting a little spooky. It fails on all 3 different computers I have tried, and nobody else has seen it! I wouldn't worry about it until at least one other person sees it. 
Title: lossyWAV Development
Post by: bryant on 2007-09-18 16:39:05
BTW, I just noticed that both of those files have extra RIFF information after the audio, and eliminating that (by unpacking them with -w) fixes the problem. Perhaps you guys used -w, or ran the files through FLAC before testing with them...?

Anyway, I'd say it was even less important to worry about now. Obviously you will want to handle this eventually, especially now that FLAC handles the RIFF stuff too.

Sorry for the false alarm... 
Title: lossyWAV Development
Post by: halb27 on 2007-09-18 17:36:05
Thanks for the info.
So I'll copy anything behind the data chunk to the output file.
Title: lossyWAV Development
Post by: halb27 on 2007-09-18 18:58:40
I tried Atem-lied using v0.1.5 -s.
I think it's better than v0.1.2, and I abxed it 7/10. It's very hard to abx (for me) as seen by the result.
The critical spot is at ~ sec. 3-5, but it's easier to hear when starting before the spot.

Sorry I don't have the time this evening to do more tests and to finish fixing the wavIO problem.
Have to go now.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-18 20:10:38
I tried Atem-lied using v0.1.5 -s.
I think it's better than v0.1.2, and I abxed it 7/10. It's very hard to abx (for me) as seen by the result.
The critical spot is at ~ sec. 3-5, but it's easier to hear when starting before the spot.

Sorry I don't have the time this evening to do more tests and to finish fixing the wavIO problem.
Have to go now.

Many thanks Horst - I think there's some merit in developing this latest idea of David's further. I'll have a think about it tonight and probably post v0.1.6 in the morning. In the latest version the debug mode shows not only the block but also the time - this may help to examine whether too many bits seem to be being removed at a certain spot and those blocks can be examined in more detail. I'll have a look at Atem_lied 3 to 5 seconds (assuming 1024 byte codec_block_size, default quality) and revert.

Bryant, it wasn't a false alarm as the files are valid WAV files - as Horst said, he'll work to pass through the additional chunks without problems. Thanks for the input!
Title: lossyWAV Development
Post by: Nick.C on 2007-09-19 08:57:41
v0.1.6 attached. - Superseded by v0.1.7

Current noise_threshold_shift values [-6.0206, -3.0103, -1.50515] for quality levels -1,-2 & -3 respectively.

Skewing function modified. Skewing amplitude values [12.0412,9.0309,6.0206] dB attenuation at 0Hz, no attenuation at High_Frequency_Bin (circa 16kHz) using a 1-cos shaping. When skewing is enabled, noise_threshold_shift is reduced to 25% of its value when skewing is disabled.

Debug mode tweaked.
Title: lossyWAV Development
Post by: halb27 on 2007-09-19 09:53:40
I'll try it this evening.
Title: lossyWAV Development
Post by: halb27 on 2007-09-19 19:25:22
Sorry, I abxed Atem-lied 9/10 using v0.1.6 -s.

BTW this does not mean this version is worse than the last one. I use my very open Alessandro MS-2 when my wife is not around, otherwise I use my canal phones ultimate ears super.fi 5 pro. They are both very good but sound very differently. But even with same hedphones I'm sure there are differences for me from day to day in being able to hear such subtle problems.

Just for comparison and to exclude the possibility that something may be slightly wrong with the Delphi implementation: is there a chance to get a analogue MATLAB generated version of Atem-lied?
Title: lossyWAV Development
Post by: Nick.C on 2007-09-19 20:28:46
Sorry, I abxed Atem-lied 9/10 using v0.1.6 -s.

BTW this does not mean this version is worse than the last one. I use my very open Alessandro MS-2 when my wife is not around, otherwise I use my canal phones ultimate ears super.fi 5 pro. They are both very good but sound very differently. But even with same hedphones I'm sure there are differences for me from day to day in being able to hear such subtle problems.

Just for comparison and to exclude the possibility that something may be slightly wrong with the Delphi implementation: is there a chance to get a analogue MATLAB generated version of Atem-lied?


The Matlab script has not been keeping pace with the Delphi in this instance. I will attempt to incorporate the skewing function into Matlab and post the result.

Currently trying to include a mersenne-twister random number generator instead of the delphi standard version - just to see if it makes a difference.

In terms of input / output file naming, the drive / directory name (if any) in the input filename will now be stripped and the output file will default to the current directory, unless the -o parameter is used to indicate an alternative output directory. I am thinking of changing the -o parameter to require the whole output filepath / filename to be specified.

I will post 0.1.6b soon which will not reduce the noise_threshold_shift at all when the skewing is switched on. However, this may result in bitrate bloat of the resultant FLAC file, as David suggested.

Or, are we trying to redesign the whole method around one problem sample? I don't know which way to go right now. Have you tried Atem_lied at quality -1? If so, is it better and if better can you ABX it?
Title: lossyWAV Development
Post by: halb27 on 2007-09-19 21:24:59
I also see it as an option to ignore Atem-lied some day especially as the problem is extremely subtle (to me).

At the moment however there may be a small chance that the problem is due to a  Delphi implementation error and we shouldn't give away the chance to find it.
Is the method of the current Delphi version when not using skewing identical to that of the Matlab script? Then it makes sense to try the Matlab version.

I hope I have fixed the wavIO problem and will test it tomorrow (I'm too tired now). I'll also try quality -1 tomorrow.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-19 22:23:48
I also see it as an option to ignore Atem-lied some day especially as the problem is extremely subtle (to me).

At the moment however there may be a small chance that the problem is due to a  Delphi implementation error and we shouldn't give away the chance to find it.
Is the method of the current Delphi version when not using skewing identical to that of the Matlab script? Then it makes sense to try the Matlab version.

I hope I have fixed the wavIO problem and will test it tomorrow (I'm too tired now). I'll also try quality -1 tomorrow.


I have implemented the debug mode to allow the bits_to_remove for each codec_block to be examined on a block-by-block basis, i.e. to check if the Matlab and Delphi output is the same. As the average bits to remove for the Matlab version seem to be higher than the Delphi version (see comparison txt files above) I feel that the Delphi version is *slightly* more conservative than the Matlab version - why, I don't quite know - but I sense another debug session tomorrow night.........

I agree that we shouldn't ignore the possibility that there's an error in implementation!

Attached the bits_to_remove data, block-by-block for atem_lied, no skewing, 3 analyses (10,8,6 bit), triangular dither, noise_threshold_shift=-3. As can be seen, mainly the same, only a few differences.

I will go through the maths regarding the determination of the sub-blocks for analysis again tomorrow and see if the result improves.

One problem I am having is re-creating the noise analysis for creating the reference_threshold and threshold_index values - currently I am using the pre-processed constants to re-create the surface (fft bits x bits_to_remove), accurate to <0.2dB.
Title: lossyWAV Development
Post by: halb27 on 2007-09-20 07:58:12
Thanks for the tables.
Judging from that I don't think the Atem-lied problem is a problem of the Delphi implementation.
In the critical region ~ 6 bits are removed (with both the Delphi and the Matlab version), and I think this is really not appropriate in this situation.

So I think we have two choices:

1) try more variants, for instance averaging over 3 pins instead of 4.

2) if things don't essentially improve accept that Atem-lied is a (very minor) problem sample we get fully transparent only with best quality setting (will try it tonight if the current best quality setting does it).
In this case however more listening experience than just mine is most welcome (not only true in this case).
Title: lossyWAV Development
Post by: Nick.C on 2007-09-20 08:24:00
Thanks for the tables.
Judging from that I don't think the Atem-lied problem is a problem of the Delphi implementation.
In the critical region ~ 6 bits are removed (with both the Delphi and the Matlab version), and I think this is really not appropriate in this situation.

So I think we have two choices:

1) try more variants, for instance averaging over 3 pins instead of 4.

2) if things don't essentially improve accept that Atem-lied is a (very minor) problem sample we get fully transparent only with best quality setting (will try it tonight if the current best quality setting does it).
In this case however more listening experience than just mine is most welcome (not only true in this case).


Last night, I got two out of three of the noise analysis calculations to give output which agrees with the matlab output - unfortunately they were the no dither and rectangular dither calculations - triangular dither still gives about 1.5dB less than it should, it's as if the whole surface has been shifted down by that amount.

I will add a "-a" parameter to allow the spreading function length to be reduced from 4 to 3.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-09-20 12:02:40
If you want to tackle atem_lied (or any other sample) I propose a very simple procedure...

Use the default original code, and change the noise threshold only. ABX these various different versions, and find out where the problem is solved.

I suggest obscuring the actual noise threshold shift from the listener - so if you're going to pass files to halb27, randomise the order and re-name them A, B, C etc. You can losslessly re-encode to FLAC at different (random) block sizes to hide the real bitrate too, since block size impact efficiency - more so when it doesn't match that used by the pre-processor, which should be left at default.

As a result of this ABXing, you know (by checking) how many bits can be removed before a problem appears.

Then you can mess around with any options you want, and look at the number of bits removed. If it's more than the known good figure, it probably doesn't solve the problem. You can do all this playing without constant ABX testing. When you have something that seems to work numerically, then ABX to make sure that the bits are actually being removed at the correct time in the file.

Hope this suggestion makes sense.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-20 12:17:52
David,

Thanks for the valued input - you make it sound so easy! (well it is, but we hadn't come up with it  , and you came up with the original script......)

I will try this out and post, say 3 samples tonight.

My other main debug is to make *very* sure that I'm using the correct reference_threshold values for triangular dither - the fact that no dither and rectangular dither calculate correctly has me a bit worried about the triangular dither calculation in Delphi and Matlab until I can get them to match up.

It would be *really* nice if I could get the Delphi to spit out exactly what the Matlab script produces in terms of WAV output (no dither, as you said previously). My take on the sub-blocks for calculation and end overlap may not be exactly the same as yours, so I'll see what the differences are - I re-invented the wheel a bit on that element of the coding.

Could I be having problems with the Delphi Random Number Generator? I have tried (a little bit) to find another that I could just plug in to the delphi code, but I haven't found a suitable candidate (yet......).

[edit] It was the <censored> random number generator - at least indirectly. In a discussion in another thread it was intimated that one way to carry out triangular dither is to generate one random number per cycle, and subtract the previous random number from the new one. This was the basis upon which my triangular dither *was* coded. Unfortunately, this (in Delphi at least) does not give the same result as in the Matlab code. By switching to generating two random numbers per cycle and subtracting one from the other the problem of not matching the Matlab calculated values is solved!

Thinking it through, if there was a problem with the triangular dither in my noise_analysis code, there must also be in the bit-reducing code - therefore both have been "fixed". I will post v0.1.7 this evening. [/edit]

Nick.
Title: lossyWAV Development
Post by: bryant on 2007-09-20 13:44:15
Something that struck me in looking at the spectrum of Atem_lied is that the lowest bins through the critical section are around 200 Hz, which I don't remember as that common. At those frequencies, averaging over 4 bins might get too much from higher frequencies because each bin covers such a wide range down there. Perhaps averaging fewer bins at lower frequencies might help (a little like the Bark scale).
Title: lossyWAV Development
Post by: Nick.C on 2007-09-20 14:07:06
Nice find Bryant! Where might the crossover be say, from 3 to 4 bins or 2 to 3 to 4 bins, in frequency terms? It took me long enough with the latter (i.e. working) version of CONV 
Title: lossyWAV Development
Post by: 2Bdecided on 2007-09-20 15:02:23
[edit] It was the <censored> random number generator - at least indirectly. In a discussion in another thread it was intimated that one way to carry out triangular dither is to generate one random number per cycle, and subtract the previous random number from the new one.
Don't do that here, at least not yet. It gives you high pass filtered dither noise. There are clear advantages to that, but the threshold calculations assume a flat noise floor.

btw, if you were using that, and it was working properly, and atem_lied still sounds bad, then you/we have worse problems than we thought - since the noise was already a few dB lower at the critical frequencies for that sample.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-20 15:23:06
Don't do that here, at least not yet. It gives you high pass filtered dither noise. There are clear advantages to that, but the threshold calculations assume a flat noise floor.

btw, if you were using that, and it was working properly, and atem_lied still sounds bad, then you/we have worse problems than we thought - since the noise was already a few dB lower at the critical frequencies for that sample.

Cheers,
David.


Wouldn't it sort of cancel itself out? But..... I was using the pre-calculated constants from Matlab, i.e. the ones I've just now managed to duplicate, so the constants *were* correct and I was using the filtered triangular dither method, so maybe there is indeed a problem.

As Bryant pointed out there may be merit in changing the CONV routine to average over fewer samples at low frequencies. Maybe institute a mid_frequency_bin for each analysis and average across fewer samples between lfb and mfb then across more samples between mfb and hfb? As seen from one of my previous posts, reducing the number of bins being averages reduced the bits_to_remove value. In the same way, could the number of bins averaged be *increased* above a certain threshold?
Title: lossyWAV Development
Post by: 2Bdecided on 2007-09-20 15:26:42
Something that struck me in looking at the spectrum of Atem_lied is that the lowest bins through the critical section are around 200 Hz, which I don't remember as that common. At those frequencies, averaging over 4 bins might get too much from higher frequencies because each bin covers such a wide range down there. Perhaps averaging fewer bins at lower frequencies might help (a little like the Bark scale).
Thank you David. Can you help me think this through...

You could do psychoacoustically sensible (or at least slightly more like psychoacoustically sensible) spreading - but that's not what I put the spreading there for. Simply, if you FFT something, some of the bins are going to get very little energy just out of coincidence. Move the window by a few samples, and it'll be different bins which get very little energy. Those minima are pretty much irrelevant. The spreading function is there to smooth them out, otherwise they'll be chosen as the noise floor and the bit rate reduction will be very low. (You could do more FFTs, greatly overlapped, and average in time to achieve a similar thing, but that would really slow things down).

You're right that, at low frequencies, this convolution might be smoothing over important dips. I tested this originally (around 1kHz, not 200Hz) and found it was OK. If you have an extreme dip of about 4 bins or less in width, then it does get partly filled with noise, but not enough to be audible (to me). The only issue was if it was narrow and short - with contrived signals, you can get something that's (just) audibly overlooked, so I included the optional 5ms FFT to catch that.


However, it seems to me in my experiments with the noise shaping version, that SebG is exactly right (and even basic masking models from decades ago support this): you need a greater SNR at lower frequencies than higher frequencies.

So I don't know why atem_lied is failing - is it because a LF dip is smoothed too much, or is it because LFs simply require a higher SNR? Either a variable length conv, or a low frequency threshold skew, can solve this problem. The question is which is correct (and does it matter)?

Obviously I favour the LF skew because it's much easier to implement! But if it's just a "bodge" then it may leave another problem lurking elsewhere. A correct psychoacoustic-ish spreading function might make it more efficient as well as more careful.


If you really want to go mad, for both the standard version, and the noise shaping version (unreleased), you could replace the current model (which is basically "find the noise floor, keep the noise below it") with a psychoacoustic model from your favourite lossy encoder. I don't know how useful this would be, and I don't recommend it - but it'll be a nice project for someone when everything else is finished.

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2007-09-20 20:06:27
I tried Atem-lied using v0.1.6 -1 -s and couldn't abx it.

I tried my other samples using default quality and -s and could not abx any of them (trumpet however may be a bit on the edge and I can imagine it can be abxed by someone with better hearing - but that's speculation).

-s seems to have a tendency to bring the number of removed bits down a bit.
I didn't really get what the skewing option does. Can you explain it please?

Now that I use LossyWav from the commandline I see the average number of bits removed. I was quite astonished to see for instance herding_calls having only 0.1150 bits removed on average. With such a sample I would have expected to see more bits removed.

It's not just with herding_calls.  From that I think a little bit more bits can be removed on average. But of course in this case there should be some method to cover problems like Atem-lied.

I'd really like to see the behavior when averaging over 3 bins. Maybe it's possible to rise the noise threshold this way compensating for the impact of averaging over 3 instead of 4 bins.

To me David Bryants idea is plausible, and may be a rough approximation is already valuable like averaging over 4 bins in the frequency range beyond 1.5 kHz and 3 bins below that.
IMO it's worth trying.

Added:
I looked up earlier in this thread where I could not abx a MATLAB Atemlied version based on 3 analyses, noise_threshold_shift=-3, triangular_dither.
Where are we with the current Delphi version in terms of these parameters?
Title: lossyWAV Development
Post by: Nick.C on 2007-09-20 21:44:47
I didn't really get what the skewing option does. Can you explain it please?

I'd really like to see the behavior when averaging over 3 bins. Maybe it's possible to rise the noise threshold this way compensating for the impact of averaging over 3 instead of 4 bins.

To me David Bryants idea is plausible, and may be a rough approximation is already valuable like averaging over 4 bins in the frequency range beyond 1.5 kHz and 3 bins below that.
IMO it's worth trying.

Added:
I looked up earlier in this thread where I could not abx a MATLAB Atemlied version based on 3 analyses, noise_threshold_shift=-3, triangular_dither.
Where are we with the current Delphi version in terms of these parameters?
Skewing lowers the outputs in bins at the low end of the FFT by up to 6dB, with no reduction at the high frequency bin (16kHz), there is a 1-cos shape to the curve.

New option -t (3 bin average) added.

Files attached - 2 Matlab, 2 lossyWAV. Same analyses, noise threshold shift and dither - 2 are 576 sample blocks and 2 are 1024 sample blocks. Removed - flawed processing.

Also attached - I was playing with the random number generator and tried rectangular, triangular, (triangular + triangular)/2 [Tr2] and Tr3 - results attached as Dither.txt. Tr2 and Tr3 seem to have a gaussian shape to them - is this something which might be of use?

Also, looking at the frequency coverage of each bin at varying fft lengths I started something which may end up being the basis for variable bin number averaging - see Bins.txt

I'm tidying up v0.1.7 just now.
Title: lossyWAV Development
Post by: halb27 on 2007-09-20 22:09:57
A short intermediate result:

I abxed the w version 6/6 and ended up 9/10.
I abxed the x version 6/6 and ended up 7/10.

I don't think w is easier to abx for me. I'm just tired now (especially of listening to Atem-lied), and I think it's better to continue with the remaining versions tomorrow (may be tomorrow morning before going to work if I have sufficient time).

Nick, I've emailed you the corrected version of the wavIO unit. Should be ok now.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-20 22:39:14
A short intermediate result:

I abxed the w version 6/6 and ended up 9/10.
I abxed the x version 6/6 and ended up 7/10.

I don't think w is easier to abx for me. I'm just tired now (especially of listening to Atem-lied), and I think it's better to continue with the remaining versions tomorrow (may be tomorrow morning before going to work if I have sufficient time).

Nick, I've emailed you the corrected version of the wavIO unit. Should be ok now.

Thanks very much for your "ear-time" - it's much appreciated, as is the work put into the wavIO unit!

Attached is v0.1.7: - Superseded b v0.1.8

-t parameter added : sets spreading_function_length to 3 rather than 4.
-s parameter modified : skewing function amended, no longer changes noise_threshold_shift value.
-f parameter added : sets fft_overlap to 1/n * fft_length samples, i.e. 1/4 = 5 analyses in 2 fft_lengths, 1/8 = 9 analyses in 2 fft_lengths.

I will have a think about the mechanism whereby variable spreading_function_length can be applied in the CONV function, using a 3.5kHz transition. Is there any merit in thinking of the frequency range in octaves, i.e. spreading_function_length increases exponentially as frequency increases?
Title: lossyWAV Development
Post by: halb27 on 2007-09-21 07:04:33
In short my result for atem_lied.lossy.y.flac: 2/2 -> 5/7 -> 6/10, so I couldn't abx it.

I do think it's better than the w and x version, though I don't think it's transparent. May be it was not a good idea to test it this morning as I'm a bit pressed to go to work now.

I'll redo the test this evening together with the z version.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-21 12:33:58
Horst's updated wavIO unit incorporated into code;

Small error in the CONV routine (yet again!) fixed;

Skewing now follows a (sin-1) [sin(pi/2*min(hfb-lfb,max(0,this_bin-lfb))/(hfb-lfb))-1] shape rather than (1-cos), now 9dB amplitude rather than 6dB;

Small error in calculation of Average Bits Removed fixed;

Small error in individual fft_analysis result calculation fixed.

[edit] Superseded - v0.2.0 [/edit]
Title: lossyWAV Development
Post by: SebastianG on 2007-09-21 13:06:10
Perhaps we should split this thread somehow in one that contains bug reports and announcements of versions (which I'm not interested in) and a technical one where strategies and techniques are discussed.

Skewing now follows a (sin-1) shape rather than (1-cos), now 9dB rather than 6dB;

Forgive my ignorance but what exactly is "skewing"?
Is there a relation to noise shaping?
What's the current state strategy on selecting the 'wasted_bits' count / noise shaping filters (if any) ?

Cheers!
SG
Title: lossyWAV Development
Post by: Nick.C on 2007-09-21 13:19:30
Perhaps we should split this thread somehow in one that contains bug reports and announcements of versions (which I'm not interested in) and a technical one where strategies and techniques are discussed.

Skewing now follows a (sin-1) shape rather than (1-cos), now 9dB rather than 6dB;
Forgive my ignorance but what exactly is "skewing"?
Is there a relation to noise shaping?
What's the current state strategy on selecting the 'wasted_bits' count / noise shaping filters (if any) ?

Cheers!
SG
The original thread is still running in the FLAC forum - maybe technical discussion should move to that?

Skewing in this instance artificially lowers the fft bin values, in this case at the lower end of the fft results.

As applied in the code, at the low_frequency_bin the dB reduction is by the full amplitude of the selected reduction amount. At the high_frequency_bin there is no reduction at all. The shape of the dB reduction curve is a scaled (1-sin[value]) curve where value is 0 at the lfb (or lower) and pi/2 at the hfb (or higher). For a 32 sample fft_length, lfb=2, hfb=11:
Code: [Select]
Bin  Freq. 1-sin dB reduction
00     0  -1.000  -9.031
01  1378  -1.000  -9.031
02  2756  -1.000  -9.031
03  4134  -0.826  -7.463
04  5513  -0.658  -5.942
05  6891  -0.500  -4.515
06  8269  -0.357  -3.226
07  9647  -0.234  -2.113
08 11025  -0.134  -1.210
09 12403  -0.060  -0.545
10 13781  -0.015  -0.137
11 15159   0.000   0.000
12 16538   0.000   0.000
From the discussions previously, maybe the zero-point in this reduction should be the nearest bin to 3.5kHz, and maybe the amplitude of the skew should be more extreme.

The bits_to_remove value for each codec block is the threshold_index corresponding to the dB of the lowest (CONV'd) bin in any of the analyses carried out on that codec block. The threshold index is determined by calculating the dithered bit reduction noise dB for each bit_to_remove for each fft length.
Title: lossyWAV Development
Post by: halb27 on 2007-09-21 16:54:59
I tried atem_lied.lossy.z.flac and couldn't abx it (5/10).
I also retried atem_lied.lossy.y.flac and didn't arrive at a better result than this morning.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-21 17:00:43
I tried atem_lied.lossy.z.flac and couldn't abx it (5/10).
I also retried atem_lied.lossy.y.flac and didn't arrive at a better result than this morning.


The problem I found in lossyWAV today will probably be what caused "w" and "x" to be so poor - "y" and "z" were Matlab versions, the only difference being the codec_block_size (x&y=1024, w&z=576). This leads me to think that the fft_overlap will help this sample - i.e. more analyses of the same length over the same data.

I will re-process and post "w" and "x" tonight - using all the same parameters as before, as well as the same with 9dB skewing (20Hz to 3.7kHz).
Title: lossyWAV Development
Post by: guruboolez on 2007-09-21 19:37:40
Does LossyWav remove some noise?
I ABXed (with pain) sample No.E12 (http://gurusamples.free.fr/samples/E12_MODERN_CHAMBER_L_piano_flute.wv) on 01.00 - 01.50 range encoded with 0.18. Issue: there's more noise... on reference file.

On the other side very low volume sample played at high level gain (sample No.V03 (http://gurusamples2.free.fr/samples/V03_CHORUS_Female_A.wv) for example) have more noise (obvious) after LossyWav processing (bitrate is higher too). I guess it's expected.


EDIT: ABX log for E12:
Code: [Select]
foo_abx v1.2 report
foobar2000 v0.8.3
2007/09/21 20:26:28

File A: file://C:\150 samples\E12_MODERN_CHAMBER_L_piano_flute.flac
File B: file://C:\150 samples\E12_MODERN_CHAMBER_L_piano_flute.lossy.flac

20:26:28 : Test started.
20:26:50 : 01/01  50.0%
20:26:58 : 01/02  75.0%
20:27:01 : 01/03  87.5%
20:27:04 : 02/04  68.8%
20:27:09 : 03/05  50.0%
20:27:16 : 04/06  34.4%
20:27:20 : 05/07  22.7%
20:27:43 : 06/08  14.5%
20:27:49 : 07/09  9.0%
20:28:07 : 07/10  17.2%
20:28:46 : 08/11  11.3%
20:28:59 : 09/12  7.3%
20:29:07 : 10/13  4.6%
20:29:12 : 11/14  2.9%
20:29:54 : 11/15  5.9%
20:30:38 : 12/16  3.8%
20:30:44 : Test finished.

 ----------
Total: 12/16 (3.8%)
Title: lossyWAV Development
Post by: Nick.C on 2007-09-21 20:36:36
Thanks for the input Guru - where the bits_to_remove is zero, lossyWAV will still dither the samples because there's an automatic anti-clipping amplitude reduction to 95.28% (30.49/32, i.e. 32 -1 (triangular_dither amplitude) -0.5 (normal rounding) -0.01 (Nick.C's error margin)) as the file is processed, so dither is still required.

Maybe a follow on batch file to detect which files have become bigger and annihilate them?

A bit surprised about E12 - and glad that your trained ears are not shuddering at the output from lossyWAV.

From my own processing testing I am getting ever close to the Matlab output in terms of matching bits_to_remove on a block-by-block basis - the latest build has only 6 instances of 1 bit difference between the processors for Atem_lied, on 524 blocks. Oddly enough they cancel each other out as 3 are +1 and 3 are -1. I am going to use the same reference threshold surface on both processors to bottom that out and then can move on, confident that the output of lossyWAV is the same as that of Matlab.

When I processed your 150 samples with and without skewing there was only about 50kB difference over the whole sample set between skewed and non-skewed (i.e. not a lot of minimum bins between 20Hz and 3.7kHz.....)
Title: lossyWAV Development
Post by: Dynamic on 2007-09-21 21:38:14
This is an interesting thread that I keep getting back to between being busy and I wish you all well. This sounds promising. It's likely that predictor-based lossy coders (like Wavpack lossy, as I'm sure Bryant is thinking) could use the same sort of analysis for a safe VBR lossy mode with the additional advantage of setting the amount of permissible prediction error to match the noise-floor relevant to that instant regardless of the bit-depth and block length in use.

It's interesting to see how one must look for the true noise floor and ignore the small chance troughs in the power spectrum that tend to vanish if we shift the transform window slightly, as 2Bdecided pointed out, and it's great to see the problem-solving at work, such as Bryant's recognition of the unusually low frequency of the quietest frequency bin during the atem-lied problem moments.

It occurs to me that there might be ways to optimise the computation of multiple overlapping FFTs (or any roughly equivalent transform) to attempt to set the noise floor more accurately without rogue troughs, though I can't get past the fact that one has to pre-multiply the samples in each analysis segment by a windowing function centred on that segment, thus making it difficult to efficiently re-use the results from any part of the analysis in calculating an overlapping FFT without compromising the smoothness of the windowing function, so I guess the averaging solution, while more temporally spread out/needing skewing adjustments to make it unABXable for atem-lied, is the most computationally-viable option.

There's probably little to be gained, but I presume it's rare that anything above, say 18 kHz is the lowest power bin in the power spectrum, but presumably it's pretty safe to ignore any bins above 18 kHz if this analysis should happen to yield a higher noise floor. It should be safe enough given that people have such difficulty ABXing music lowpassed at 18 kHz. And of course LossyFLAC preprocessing wouldn't actually lowpass anything in this scenario, it would just be capable of ignoring the noisefloor in any frequencies above 16, 17 or 18 kHz for example in its calculation of the noisefloor for the whole block, while still passing all frequencies unaltered except for the bit-depth and hence the exact pattern of the noise, which one could ABX as inaudible.

Anyway, loving your work, guys. I'm not averse to pre-scaling (and dithering) my audio with Album Gain, which saves many percent in lossless for anything over-loudly mastered and considering it an excellent source for encoding into lossy (which I tend to do with Album Gain pre-applied or supplied via a --scale switch where convenient in any case), and I'd equally consider a safe lossy mode based on robust noise-floor calculations like this and no other psychoacoustics so be an excellent storage medium for sound reproduction, including heavy EQ, processing and the like, and of course, as a pretty-darned robust source for transcoding to conventional lossy formats or indeed something like resampled-to-32 kHz wavpack lossy.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-21 22:14:40
Thanks Dynamic, the appreciation is appreciated!

Attached is v0.2.0 : - Superseded by v0.2.1.

Revised skewing function - skews below 3.7kHz (gleaned from ReplayGain technical data - equal loudness curves) by up to 9dB (0dB at 3.7kHz, -9dB at 20Hz);

Tidied up code, revised quality -1.

As said previously, when testing Guru's sample set there is only a difference of 50kB in 101.12MB between skewed and non-skewed (skewed bigger, as hoped). On my 50 sample set skewing increases the size of the FLAC'ed set by 203kB in 44.65MB.

I haven't created variants of Atem_lied to upload for ABX, but I'm confident now that if any are created they should be pretty good. Late now, will read bug reports tomorrow....
Title: lossyWAV Development
Post by: halb27 on 2007-09-21 23:03:55
I tried Atem-lied with v0.2.0 using -s, -3 -s -t, and -3 -t, and the results were all transparent to me.
From feeling I'd call -3 -t a tiny bit worse than -s.

Good work, Nick.

If I see it correctly lowering the fft bin values as you do with skewing can be used for seemlessly adjusting the noise threshold.
So maybe something like quality -3, -s, additional lowering the fft bin values, and averaging over 3 bins instead of 4 below say 3.5 kHz may save more bits on average while keeping quality at the level attained by -2 -s.
Title: lossyWAV Development
Post by: bryant on 2007-09-22 07:33:38
You're right that, at low frequencies, this convolution might be smoothing over important dips. I tested this originally (around 1kHz, not 200Hz) and found it was OK. If you have an extreme dip of about 4 bins or less in width, then it does get partly filled with noise, but not enough to be audible (to me). The only issue was if it was narrow and short - with contrived signals, you can get something that's (just) audibly overlooked, so I included the optional 5ms FFT to catch that.

However, it seems to me in my experiments with the noise shaping version, that SebG is exactly right (and even basic masking models from decades ago support this): you need a greater SNR at lower frequencies than higher frequencies.

Hi David,

It's interesting that these low frequency bands would be an issue here. I'm sure that for conventional codecs it's a non-issue because those bands take so little data to encode accurately it's probably not worth worrying about. But here we're adding white noise, so no frequency gets a free ride...

I haven't gotten too far on my implementation yet, but I was thinking of doing the convolution in both the time and frequency domain, perhaps with a 3x3 kernel, and perhaps not uniformly weighted. This was not based on any experimentation but just because I find it more elegant, although now that I see how wide the bins are at low frequencies I like it even more. I also don't care for the idea of a filter that varies with frequency, but it might be necessary if nothing fixed will work well.

It looks like the LF skew is working well and since most material has a lot of LF energy I can see why it doesn't have a large effect on bitrate for most samples. However, once you start shaping the noise it's going to make a much bigger difference, so it's probably a good idea to get it accurate now.

One thing I really don't care for is the level shift always being on. I think some very low level samples (like Guru's) are only going to be transparent if unmodified. You can easily imagine the case where a sample has over 16 bits effective resolution (at some frequencies) due to noise shaping and this would be destroyed by just about any modification, especially dithering. I don't have a solution, but perhaps there would be a way to have the shift be level dependent? I realize that this would introduce dynamic compression and maybe a little harmonic distortion depending on how it was done, but I suspect both might be okay. Of course, one could argue that very low level samples should be played at very low levels, but I'm not sure everyone will buy this... 

Anyway, it's certainly looking very promising at this point. 

David
Title: lossyWAV Development
Post by: Nick.C on 2007-09-22 08:25:29
@Halb27 : I'm glad that Atem_Lied is better. It's interesting that -s works better than -3 -t, This demonstrates the value of more fft_lengths in the analysis process. From memory, -3 -t -s produces a bigger FLAC file than -s, so this would not be a more attractive option - quicker maybe, but not better.

@Bryant: My conditional clipping reduction in the Matlab script analysed all the blocks, determined all the bits_to_remove and at the same time noted the peak amplitude for each block. The clipping reduction value for the whole file was then calculated taking into account actual bits_to_remove and block peak value for each block and taking the lowest value.

This resulted in much less level reduction (a surprising number of files did not require to be amplitude reduced). However it requires two passes through the blocks - something I was initially unwilling to do, however it's probably not *that* time consuming for the analysis. I will take that on as my next modification to the code.
Title: lossyWAV Development
Post by: bryant on 2007-09-22 14:01:43
@Bryant: My conditional clipping reduction in the Matlab script analysed all the blocks, determined all the bits_to_remove and at the same time noted the peak amplitude for each block. The clipping reduction value for the whole file was then calculated taking into account actual bits_to_remove and block peak value for each block and taking the lowest value.

This resulted in much less level reduction (a surprising number of files did not require to be amplitude reduced). However it requires two passes through the blocks - something I was initially unwilling to do, however it's probably not *that* time consuming for the analysis. I will take that on as my next modification to the code.

Well, if I understand you correctly that means that a single wild sample in a file would alter the way the whole file was processed. That's a little weird too, but it might be a reasonable compromise.

I didn't mean to imply with my original post that I thought this was a critical issue, but it is something that might make a lot of samples possible to ABX under the right circumstances. I'm kind of glad I don't need to deal with it for a WavPack version... 

BTW, hats off to you and halb27 for getting this going so quickly!

David
Title: lossyWAV Development
Post by: halb27 on 2007-09-22 18:38:40
Now that we've reached the point where we can use lossyWav for practical purposes (though a lot of more listening experience is most welcome) I wonder which block size to use.

A blocksize of 576 samples is attractive to use thinking of FLAC performance. However 2Bdecided worried about blocksizes below 1024 samples for the lossyWav procedure.

With a blocksize of 1024 which blocksize should be used with FLAC? If I interpret the FLAC documentation correctly blocksize must be a multiple of 576. Is it wise to use a lossyWav blocksize of 1024 and a FLAC blocksize of 576?

EDIT:
I just tried, and FLAC is working with a blocksize of 1024.

Or should I use a lossyWav and FLAC blocksize of 1152?

Another question: I was quite happy using wavPack lossy with a sample rate of 32 kHz though 32 kHz is a bit too low. I'd like to use a sample frequency of 35 kHz which I can do with my DAP using FLAC.
Can I consider it a safe procedure to a) resample to 35 kHz b) apply lossyWav c) apply FLAC, that is: can I consider the current lossyWav procedure applicable to 35 kHz sampled tracks?
Title: lossyWAV Development
Post by: halb27 on 2007-09-22 21:16:24
I tried all my usual samples with v0.2.0 using -s and couldn't abx any difference to the original.

As before I have a suspicion that trumpet isn't totally fine (sec. 0.6 ... 2.6). However I am not the one who can abx it (my best approximation towards a difference was 5/7, and I ended up 5/10).
Title: lossyWAV Development
Post by: halb27 on 2007-09-23 08:52:42
... One thing I really don't care for is the level shift always being on. I think some very low level samples (like Guru's) are only going to be transparent if unmodified. ...

May be a 'worth while' consideration on a per block basis may be some help.
In case only say 1 bit is removed in the block (or maybe 2 bits) the block remains untouched, and this can easily be restricted to blocks with an RMS below a certain threshold to address low volume blocks.
In  this case the machinery isn't worth while and has a tendency to give a bad SNR, be it only due to dithering.
Title: lossyWAV Development
Post by: collector on 2007-09-23 10:45:25
With a blocksize of 1024 which blocksize should be used with FLAC? If I interpret the FLAC documentation correctly blocksize must be a multiple of 576. Is it wise to use a lossyWav blocksize of 1024 and a FLAC blocksize of 576?

Very interesting thread. Flac itself is using 4096 default which isn't a multiple 576 either.

Code: [Select]
  -b, --blocksize=#            Specify the blocksize in samples; the default is
                               1152 for -l 0, else 4096; must be one of 192,
                               576, 1152, 2304, 4608, 256, 512, 1024, 2048,
                               4096 (and 8192 or 16384 if the sample rate is
                               >48kHz) for Subset streams.
Title: lossyWAV Development
Post by: halb27 on 2007-09-23 23:17:10
I wanted to see the behavior of lossyWav together with FLAC for a set of 50 full tracks which is typical of the kind of music I usually love to listen to (pop music and singer/songwriter music).
All values given are according to what foobar says when looking at the properties of the 50 selected songs:

Original ape files (extra high mode): 703 kbps
flac (--best -e) files: 744 kbps
lossyWav (-s), followed by flac (--best -e -b 1024): 507 kbps
lossyWav (-s -c 576), followed by flac (--best -e -b 576): 503 kbps
ssrc_hp (--rate 35000 --twopass --dither 0 --bits 16), lossyWav (-s), followed by flac (--best -e -b 1024): 453 kbps.

So with this kind of music staying with a blocksize of 1024 is fine, and the average bitrate is roughly 500 kbps.
Pre-resampling to 35 kHz saves some filesize though it is a bit disappointing (data flow is ~20% lower, but for the file size it's only ~10%). Quality is fine judging from listening without abxing and taking samples from this 50 track set. However abxing problem samples is required which I haven't done so far.

Remarkable was Simon & Garfunkel's short and calm Bookend Theme: the 1024 sample-block lossy.flac version was 1668 KB in size, which is more than the original ape file size (1535 KB) and only slightly less than the lossless FLAC version (1699 KB).
Using debug mode I saw there wasn't any block with bits removed so it was only the dithering which changed the file. I think samples like this are a good argument not to change a block at all when it's not worth while.
The question is of course: when is it worth while, but I think when 0 or 1 bit is removed it is not,  independently of volume as measured by RMS. I also think with low-volume blocks it's not worth while (and a bit dangerous) in case 2 bits should be removed.
This per block consideration can be dragged also to a total file consideration: if the average number of bits removed is below a threshold of say 1 bit the lossy.wav output should be identical to the wav input.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-23 23:34:23
If I was to implement a two-pass version of lossyWAV then where clipping_reduction=1 (i.e.file left at 100% amplitude) and bits_to_remove=0 then no dither, block output = block input and compression of that block should be identical to the original.

I have been developing a variable spreading_function_length dependent on fft_length, i.e. larger fft_length = larger spreading_function_length, switched on using the -v parameter. The -t parameter is now obsolete.

Enabling variable spreading_function_length also reduces skewing_amplitude from 9dB to 6dB and changes noise_threshold_shift to -1.5.

v0.2.1 attached. Superseded.
Title: lossyWAV Development
Post by: bryant on 2007-09-23 23:54:12
v0.2.1 attached.

Hey Nick,

I've noticed that you have been deleting the previous version every time you upload a new one. I understand why you wouldn't want people to use obsolete versions, but halb27 just did a whole bunch of testing with a specific version (v0.2.0) and I wanted to download it for a reference and it was already gone!

Perhaps you could set up a place where previous versions are archived, or maybe just put a note indicating a version is obsolete without actually deleting the dowload link?

Either that, or I'll write a script to download each one as it appears before you can delete it! 

Thanks,
David
Title: lossyWAV Development
Post by: Nick.C on 2007-09-24 08:49:27

v0.2.1 attached.

Hey Nick,

I've noticed that you have been deleting the previous version every time you upload a new one. I understand why you wouldn't want people to use obsolete versions, but halb27 just did a whole bunch of testing with a specific version (v0.2.0) and I wanted to download it for a reference and it was already gone!

Perhaps you could set up a place where previous versions are archived, or maybe just put a note indicating a version is obsolete without actually deleting the dowload link?

Either that, or I'll write a script to download each one as it appears before you can delete it! 

Thanks,
David


Apologies - I'll upload v0.2.0 tonight and in future merely indicate obsolescence rather than remove the file.

Command line parameters are being re-written at the moment to allow more sensible naming of parameters and inclusion of "-nts" to force noise_threshold_shift to a specific value, among others.
Title: lossyWAV Development
Post by: halb27 on 2007-09-24 08:52:21
... If I was to implement a two-pass version of lossyWAV then where clipping_reduction=1 (i.e.file left at 100% amplitude) and bits_to_remove=0 then no dither, block output = block input and compression of that block should be identical to the original. ...

I welcome very much such a two-pass version but keeping block and/or track output = input is independent of two-pass processing.
'bits_to_remove=0 then no dither' is logical but why do you want to restrict it to the 'bits_to_remove=0' case?
It seems obvious to me that adding noise in the 'bits_to_remove=1' case has a bad advantage/disadvantage relation, and this is especially true for low-volume spots where S/N ratio is bad anyway.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-24 09:00:50
I welcome very much such a two-pass version but keeping block and/or track output = input is independent of two-pass processing.
'bits_to_remove=0 then no dither' is logical but why do you want to restrict it to the 'bits_to_remove=0' case?
It seems obvious to me that adding noise in the 'bits_to_remove=1' case has a bad advantage/disadvantage relation, and this is especially true for low-volume samples where S/N ratio is bad anyway.


If the amplitude-reduced block is not dithered then there is a strong chance of unwanted noise - all blocks are automatically reduced in amplitude to prevent potential clipping based on minimum_bits_to_keep=5, so they are also automatically dithered.
Title: lossyWAV Development
Post by: halb27 on 2007-09-24 09:09:14
I see.

Maybe an approach like this can help:

- a priori think of the track not having to be reduced in amplitude and use output block = input block wherever it's not worth while resp. where it's dangerous to apply the lossyWav mechanism.
- as soon as you find amplitude reduction has to be done restart the procedure for the whole track using amplitude reduction.

Would be advantageous especially for those tracks which at the moment seem to be the most critical ones for the lossyWav procedure: tracks with low volume spots in them.
Title: lossyWAV Development
Post by: AiZ on 2007-09-24 09:25:14
Hello,

I have been developing a variable spreading_function_length dependent on fft_length, i.e. larger fft_length = larger spreading_function_length, switched on using the -v parameter. The -t parameter is now obsolete.


Sorry to bother you, but... Is it possible for a lazy man like me that don't want to dive into Matlab and technical stuff to have a LossyWav.txt in the archive that simply (to some extent) explain the parameters ?
I know that this app is in its early stages and clearly developers & golden-ears-gurus oriented but think about the future documentation of this great tool ! 

Have a nice day,


        AiZ
Title: lossyWAV Development
Post by: 2Bdecided on 2007-09-24 10:34:44
Thanks for the input Guru - where the bits_to_remove is zero, lossyWAV will still dither the samples because there's an automatic anti-clipping amplitude reduction to 95.28% (30.49/32, i.e. 32 -1 (triangular_dither amplitude) -0.5 (normal rounding) -0.01 (Nick.C's error margin)) as the file is processed, so dither is still required.
Hang on a second. You shouldn't be changing the gain (even by 0.42dB) if you're getting people to ABX. This is especially important when there's virtually no other audible different between the files.

With the gain change disabled, you shouldn't dither when no bits are removed. I know it's in my code as an option, but I don't think I enabled it even when I was changing the gain. In theory you should, but in practice I wouldn't.

Hope this helps.

EDIT: Now I've read the rest of the thread...

Remember the gain/declipping is only for efficiency, not sound quality. The "clipping" is only ever by 1LSB, and only happens when lossyFLAC has determined that several LSBs can be removed. In other words, the clipping will only be audible if the lossyFLAC algorithm itself is broken (and if it is, we can all go home anyway). Furthermore, the "clipping" will move the sample value closer to its original value (because it happens when lossyFLAC wants to increase it, but can't).

So I would not "leave the gain adjustment enabled unless there might be sound quality issues". I would "leave the gain adjustment disabled unless there are so many "clipped" samples that it reduces efficiency".

For my personal use, I would disable the lossyFLAC gain adjustment entirely. Instead, I'd run a ReplayGain album analysis, and apply only the negative ones, before using lossyFLAC. I'm guessing lots of people wouldn't like this idea though.

Cheers,
David.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-09-24 10:55:02
Sorry to bother you, but... Is it possible for a lazy man like me that don't want to dive into Matlab and technical stuff to have a LossyWav.txt in the archive that simply (to some extent) explain the parameters ?
I know that this app is in its early stages and clearly developers & golden-ears-gurus oriented but think about the future documentation of this great tool ! 
AiZ,
If it was down to me, when it was finished, it wouldn't have enough command-line options to need a manual! Just compact/default/overkill modes.

The problem is figuring out what these should be, hence all the current playing around.

If the "final version" still needs all these tweaks, then IMO we've failed!

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-24 11:06:51
As David said, it is his intention to limit user choice to the 3 stated quality levels. During the development phase of the command-line version, I have enabled an increasing number of command-line parameters to allow users to "tweak" settings in search of transparency.

I will attempt to more clearly illustrate what each one does in the command-line reference within lossyWAV.
Title: lossyWAV Development
Post by: gaekwad2 on 2007-09-24 11:49:43
I wanted to see the behavior of lossyWav together with FLAC for a set of 50 full tracks which is typical of the kind of music I usually love to listen to (pop music and singer/songwriter music).
All values given are according to what foobar says when looking at the properties of the 50 selected songs:

Original ape files (extra high mode): 703 kbps
flac (--best -e) files: 744 kbps
lossyWav (-s), followed by flac (--best -e -b 1024): 507 kbps
lossyWav (-s -c 576), followed by flac (--best -e -b 576): 503 kbps

Perhaps a bit late, but I also ran lossyWav 0.20 (using -3 instead of -s though) and FLAC 1.21 on a selection more or less representative of my cd collection (at least in terms of bitrate when compressed with TAK) and got pretty much the same result. Overall 576 produces slightly smaller files, but for classical or generally highly compressible music 1024 is better.
Title: lossyWAV Development
Post by: AiZ on 2007-09-24 13:44:11
Hi again,

If it was down to me, when it was finished, it wouldn't have enough command-line options to need a manual! Just compact/default/overkill modes.

The problem is figuring out what these should be, hence all the current playing around.

As David said, it is his intention to limit user choice to the 3 stated quality levels. During the development phase of the command-line version, I have enabled an increasing number of command-line parameters to allow users to "tweak" settings in search of transparency.

I will attempt to more clearly illustrate what each one does in the command-line reference within lossyWAV.


In one hand, it's Ok, I get the point. Sure, only one or two parameters are way better for the final release, no need for a manual.
But on the other hand, if someone who discovers the project today decides to help you, it would be fine for him not to search through this (long) post what all these changing parameters mean ; hence, a little doc up-to-date accompanying the executable would be perfect.

I stop here my off-topic posts and thank you for your dedication in better and clever audio.


        AiZ
Title: lossyWAV Development
Post by: halb27 on 2007-09-24 19:11:26
Atem-lied with v0.2.1 -v -s:

I got at 6/6, but managed to finish it up with 6/10.
Generally speaking I have a tendency to do a bad job with the second half of my guesses.

Looking at the debug results I guess two many blocks have 6 bits removed again in the critical area, and I think that's too much.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-24 20:11:43
I found a bug in the -v parameter - it was picking the wrong spreading_function_length. I will post v0.2.2 tonight. For the moment, and by special request : v0.2.0 for Bryant. Definitely superseded now.....
Title: lossyWAV Development
Post by: halb27 on 2007-09-24 20:41:03
.. For my personal use, I would disable the lossyFLAC gain adjustment entirely. Instead, I'd run a ReplayGain album analysis, and apply only the negative ones, before using lossyFLAC. ...

How do you do that exactly:
- ReplayGain using foobar as a 1 step procedure with encoding?
- 16 or 24 bit? dither or not dither?
- How do you make sure only negative replaygaining is applied? Manual control?

Especially the answer to the 16/24 bit dither/not dither question is relevant to me as the answer applies to resampling as well.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-24 20:49:39
Attached v0.2.2. Superseded.
Title: lossyWAV Development
Post by: halb27 on 2007-09-24 22:43:32
Hi Nick,
Thanks for your hard work.

I think it's good to have paramters of the mechanism adjustable in this pre-release state.
However it's hard as at least I have no idea about what some of the parameters are really doing.

For getting the behavior of 0.2.0 -s: is a setting of -skew 9.0 sufficient in order to get exactly the same behavior?
How exactly is the spreading function varying when using -vsfl? From history I guess the variation is with fft length. But how exactly, and what should we expect from it?
Title: lossyWAV Development
Post by: bryant on 2007-09-25 03:48:58
I found a bug in the -v parameter - it was picking the wrong spreading_function_length. I will post v0.2.2 tonight. For the moment, and by special request : v0.2.0 for Bryant.

Thanks Nick, I got it.

And it looks like I wasn't the only one who wanted it! 

David
Title: lossyWAV Development
Post by: BGonz808 on 2007-09-25 03:56:57
Just out of curiosity, will bit-reduction cause ALAC compress any better. I just bought an iPod    and the idea of lossy flac is great but now I'm stuck without my trusty FLAC format if I want lossless music (and rockbox doesn't really appeal to me that much).

Thanks,
Bobby
Title: lossyWAV Development
Post by: Nick.C on 2007-09-25 06:49:24
For getting the behavior of 0.2.0 -s: is a setting of -skew 9.0 sufficient in order to get exactly the same behavior?
How exactly is the spreading function varying when using -vsfl? From history I guess the variation is with fft length. But how exactly, and what should we expect from it?


A skew of 9.0 is the same as v0.2.0;

In previous versions, spreading function length was the same for each fft_length, regardless of the length of the fft, i.e. 4 for 64 and 4 for 1024 - taking into account the bin frequency widths, see bins.txt, this seemed to be a bit unbalanced, so if -vsfl is selected the spreading function length vs fft_length is as follows: 2/64; 3/128; 4/256; 5/512; 6/1024. If it is felt that more than 4-bin averaging is excessive, this can be changed at the next revision.
Title: lossyWAV Development
Post by: halb27 on 2007-09-25 07:14:19
For a promising way to continue testing (and keeping in mind that v0.2.0 -s yielded excellent results) it is important to know what we're testing.
Can you please confirm or correct the following statements:

a) just to make sure the basis:
    v0.2.2 -skew 9.0 (no other options) yields exactly the same results as v0.2.0 -s (no other options).
    Especially the v0.2.2 noise threshold default is exactly equal to that of v0.2.0?

b) to try out the fft_length dependent bin averaging the following options are useful to test
    v0.2.2 -vsfl -skew x -nts y
    with x<=9.0 (for instance x=6.0) and y>=-3.0 (for instance y=-1.0).

And as 2Bdecided said -nfc should be used for abxing to make sure no loudness difference is abxed.

What exactly does the weighted spreading function option -wsf do?
Title: lossyWAV Development
Post by: Nick.C on 2007-09-25 07:55:32
For a promising way to continue testing (and keeping in mind that v0.2.0 -s yielded excellent results) it is important to know what we're testing.
Can you please confirm or correct the following statements:

a) just to make sure the basis:
    v0.2.2 -skew 9.0 (no other options) yields exactly the same results as v0.2.0 -s (no other options).
    Especially the v0.2.2 noise threshold default is exactly equal to that of v0.2.0?

b) to try out the fft_length dependent bin averaging the following options are useful to test
    v0.2.2 -vsfl -skew x -nts y
    with x<=9.0 (for instance x=6.0) and y>=-3.0 (for instance y=-1.0).

And as 2Bdecided said -nfc should be used for abxing to make sure no loudness difference is abxed.

What exactly does the weighted spreading function option -wsf do?
a) Should be almost exactly the same (although the skew in v0.2.0 was 9.0309, i.e. 1.5 x 20 x log(2)).
b) Sounds good.
-nfc is alright - unless the sample clips under bit-reduction. There is no clipping prevention at all when -nfc is used.

-wsf creates spreading functions as follows: [1]; [2/3,1/3]; [3/6,2/6,1/6]; [4/10,3/10,2/10,1/10]; [5/15,4/15,3/15,2/15,1/15]; [6/21,5/21,4/21,3/21,2/21,1/21];
rather than [1]; [1/2,1/2]; [1/3,1/3,1/3]; [1/4,1/4,1/4,1/4] etc. The weighted spreading function tends to 75% below midlength, 25% above midlength as length tends to infinity. I am developing with a variant of this which tends to 7/12 below midlength, 5/12 above midlength.

@2Bdecided: I noticed on running the Matlab script with the same parameters several times in a row on the same input file that the average bits_to_remove value changes....? I can't pin down the cause, does it do that for you?
Title: lossyWAV Development
Post by: 2Bdecided on 2007-09-25 10:10:39

.. For my personal use, I would disable the lossyFLAC gain adjustment entirely. Instead, I'd run a ReplayGain album analysis, and apply only the negative ones, before using lossyFLAC. ...

How do you do that exactly
Hypothetically! Though I've tested it manually.
Quote
- ReplayGain using foobar as a 1 step procedure with encoding?
- 16 or 24 bit? dither or not dither?
- How do you make sure only negative replaygaining is applied? Manual control?

Especially the answer to the 16/24 bit dither/not dither question is relevant to me as the answer applies to resampling as well.
It depends on your use. If you're going to gain the files before lossyFLAC, the output should ideally be 24-bit no dither, but adding dither make little difference (less efficient on digital silence!). Nick's portable won't play 24-bit, so 16-bit no dither. I know you _should_ dither, but it reduces efficiency and adds hiss. Without it, you can in theory introduce distortion. Pick your poison.

Whether you should enabled dither within lossyFLAC is a different question. I have an artificial sample where it's required to avoid quite nasty noise pumping artefacts, but for efficiency I'm normally testing with no dither. The only place I think I've heard a difference is annoyinglyloudsong, but there it's not artefacting - it sounds louder to me without dither, that's all. I should ABX it because I'm probably talking rubbish.

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2007-09-25 11:26:41
... Whether you should enabled dither within lossyFLAC is a different question. I have an artificial sample where it's required to avoid quite nasty noise pumping artefacts, but for efficiency I'm normally testing with no dither. ...

Can you give this artificial sample please?
So you say dithering within lossyWav isn't necessarily the way to go.

Nick, can you provide a switch please to disable dithering? Or in your opinion is there a strong reason for dithering?
Title: lossyWAV Development
Post by: Nick.C on 2007-09-25 11:38:12
Nick, can you provide a switch please to disable dithering? Or in your opinion is there a strong reason for dithering?
Don't ask me difficult questions! I'm just the programmer!!!

I had already decided to re-implement the dither_choice option as a -dither parameter (0=none, 1=rectangular, 2=triangular).

Also, I feel that opinion regarding clipping_reduction indicates that the default option (0) should be none, with 1 = fixed reduction taking into account dither amplitude (if any) and 2 = my 2-pass conditional (but consistent across the file) clipping reduction.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-09-25 14:51:17

... Whether you should enabled dither within lossyFLAC is a different question. I have an artificial sample where it's required to avoid quite nasty noise pumping artefacts, but for efficiency I'm normally testing with no dither. ...

Can you give this artificial sample please?
Attached

Quote
So you say dithering within lossyWav isn't necessarily the way to go.
Probably not. You might doubt this when you hear this sample though! Remember I know exactly how lossyFLAC works, and therefore I know exactly how to break it. This sample sounds like it's just white noise, but it isn't, as you'll see if you look at the waveform (and more precisely, the sliding paired sample values) in a wave editor.

There's still an issue about rounding/truncating/clipping/dithering. They're all tied together. What's in lossyFLAC6 works well enough, but I think it could be tweaked slightly. It's not a priority.

Cheers,
David.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-09-25 15:01:43
@2Bdecided: I noticed on running the Matlab script with the same parameters several times in a row on the same input file that the average bits_to_remove value changes....? I can't pin down the cause, does it do that for you?
Are you re-generating the noise threshold reference table each time? If so, yes. If not, no.

With dither off and a fixed noise threshold table, the output should be identical every time. It's a deterministic process: a computer program where you aren't changing anything!

With dither on, the output will be different every time, but the bits removed should be identical. The FLAC bitrate may vary due to the dither.

I'm still running lossyFLAC6. It's not changed much since I uploaded it on 4th July, but I'll upload it again anyway. (Attached).

Please don't be disappointed it doesn't have any of your improvements. It doesn't have any of my improvements either!

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2007-09-25 21:56:51
Using v0.2.2 I tried Atem-lied:

a) -skew 9.0 -nfc : I abxed it 9/10, so I wonder if this really is the same procedure as with v0.2.0 -s.

b) -skew 6.0 -nfc -vsfl : sounds okay to me.

I missed a bit the debug mode I got used to.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-25 22:08:09
Using v0.2.2 I tried Atem-lied:

a) -skew 9.0 -nfc : I abxed it 9/10, so I wonder if this really is the same procedure as with v0.2.0 -s.

b) -skew 6.0 -nfc -vsfl : sounds okay to me.

I missed a bit the debug mode I got used to.


-debug is now -detail.

I'm re-writing the part of the code which actually does the analyses. I think that there was a difference in the way that David and I determined the bounds of each individual fft analysis - I'm working to correct that now.

The only thing to make sure of at a) would be to add -nts -3.0103 to the command line to see if that makes any difference.

Sorry about the inconsistency.

[edit] I've rewritten the analysis code and the output very closely matches that of the Matlab script (thanks David for the latest version!) I've got a bit more checking to do, but I think that v0.3.0 will be released tomorrow night. [/edit]
Title: lossyWAV Development
Post by: Nick.C on 2007-09-26 13:48:23
Attached summary spreadsheet of my 50 sample set, processed using soon to be released lossyWAV alpha v0.3.0 (-3 -dither 0 -clipping 0 -nts 0), against Matlab LossyFLACv6_revised script with same settings, 1024 sample codec_block_size. Matlab script used 5000 iterations to calculate reference_threshold values.

As can be seen, although not identical, 161 extra bits removed over 29799 codec blocks is not too bad a comparison. It can (and hopefully will) be improved upon.
Superseded.
Title: lossyWAV Development
Post by: halb27 on 2007-09-26 15:42:32
.. As can be seen, although not identical, 161 extra bits removed over 29799 codec blocks is not too bad a comparison. It can (and hopefully will) be improved upon. ...

Very promising indeed.

In theory however chance is the sum of the removed bits is more or less the same but bit removal is different at different spots.
Or is it a block by block comparison adding the deviations per block?
Title: lossyWAV Development
Post by: Nick.C on 2007-09-26 16:33:10
.. As can be seen, although not identical, 161 extra bits removed over 29799 codec blocks is not too bad a comparison. It can (and hopefully will) be improved upon. ...
Very promising indeed.

In theory however chance is the sum of the removed bits is more or less the same but bit removal is different at different spots.
Or is it a block by block comparison adding the deviations per block?

I knew someone would ask that question - the number quoted is for overall change, I will get going with block by block differences (+ve and -ve) and post:

644 bits extra removed, 483 less bits removed, 161 extra bits removed overall - amended results attached.

Will use the same reference_threshold creation technique in Matlab and re-process the Matlab results - to see if there is something more fundamental wrong rather than just noise_analysis results[/s]. Superseded.
Title: lossyWAV Development
Post by: halb27 on 2007-09-26 18:15:37
Thanks for your work.

So this 1127 BTR difference gives quite a different picture than the just 161 BTR summed difference.

From your remarks it looks like the BTR mechanism of the current Delphi version is different from that of the MATLAB version. I think it would be good before starting tweaking to have exactly the same mechanism and result of the Delphi version as is offered by the original version. If the versions do differ it makes things worse when starting to do changes with the MATLAB version.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-26 18:28:32
Thanks for your work.

So this 1127 BTR difference gives quite a different picture than the just 161 BTR summed difference.

From your remarks it looks like the BTR mechanism of the current Delphi version is different from that of the MATLAB version. I think it would be good before starting tweaking to have exactly the same mechanism and result of the Delphi version as is offered by the original version. If the versions do differ it makes things worse when starting to do changes with the MATLAB version.


The starting point for the investigation has to be to remove the random element, namely the calculation of the reference_threshold values used to determine the threshold_index arrays (one for each fft analysis). The Matlab version initially used 1000 iterations x (64, 1024, 256) sample fft lengths, i.e. a max of 1024000 sample x iteration. The delphi version uses constants based on a constant 2^25 count, i.e.32MB sample x iteration, i.e. 1048576 iterations at 32 sample fft; 524288 at 64; ....; 32768 at 1024 sample fft.
Title: lossyWAV Development
Post by: halb27 on 2007-09-26 20:53:06
What about the predefined values?
Or values stored in a file (IIRC the MATLAB script can make use of that)?

There's no need for perfection for these values in the first step - all that counts now is an identical basis for the MATLAB and DELPHI version as you said.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-26 21:05:31
What about the predefined values?
Or values stored in a file (IIRC the MATLAB script can make use of that)?

There's no need for perfection for these values in the first step - all that counts now is an identical basis for the MATLAB and DELPHI version as you said.


I have plugged the pre-calculated Delphi values into Matlab and the results from the Matlab script do not seem to have changed - although I can confirm that the reference_threshold values are identical.

I am in the process of a side-by-side block-by-block, sub-block-by-sub-block comparison of the fft analysis results - thankfully keys_1644ds.wav is quite short!

I've identified the error, if not immediately the solution - from the second codec block onwards, the result of the fft analysis of the first sub-block (and only the first sub-block), for each fft_length, for each channel, differs between Matlab and lossyWAV. All the rest are giving identical results to the Matlab script.

Oh well, back to debugging.......

And, I think that I've found the problem..... The audio data is not consistent between Matlab and lossyWAV - I looked at the fft outputs, then at the window_function'ed inputs, then at the bare audio data - there appear to be some discrepencies between the raw audio data, +/- 1 that I have found so far.

@David: I changed over from wavread/write to my previous wavreadraw/writeraw, removed the multiplication / division of inaudio > inaudio_int > outaudio and removed the inaudible addition, in favour of a 20*log10(max(1,min(conv......))).
This has improved the results somewhat, but it's too late to do the comparison and post it.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-27 07:54:14
Okay, so having narrowed down the difference to the second codec block onwards, first sub-block analysis only, and bearing in mind that the end-overlap is fft_length/2 and fft_overlap is fft_length/2 - the answer struck me (early) this morning.....

The Matlab script is removing the bits block by block just after the codec-block is analysed, and before the next codec-block is analysed - thus contaminating the audio data in the pre-block-start overlap of the next analysis block.

This hasn't yet been tested, but it seems *too* likely.

Now that I am happy with the Delphi code, please find attached lossyWAV alpha v0.3.0 Superseded.

@David:
In the script (replicated in the Delphi code) when the minimum min_bin value is calculated for an fft analysis the result is *rounded*, i.e. can be increased by up to 0.5 when looking up the threshold_index table to determine the bits_to_remove. Would it not be better to *floor* this value as it would reduce the likelihood of increasing noise above the minimum determined value?

Modified version of your LossyFLAC6_revised attached including the pre-calculated constants to re-create the reference_threshold values. Superseded.
Title: lossyWAV Development
Post by: halb27 on 2007-09-27 08:44:55
... The Matlab script is removing the bits block by block just after the codec-block is analysed, and before the next codec-block is analysed - thus contaminating the audio data in the pre-block-start overlap of the next analysis block.

This hasn't yet been tested, but it seems *too* likely. ...

Great work, Nick. So with respect to this your Delphi code is supposed to be better than the MATLAB script.

So it's worth doing intensive listening tests now.
Unfortunately (in this respect) I'm leaving for holidays tomorrow (will be back on Oct 7) and have to prepare a lot for it this evening (most of all find a B&B for the first nights which turned out to be a problem - Lake District, Cumbria, seems to be very popular these days [guess not only these days]).

Anyway I will try at least to do a short test this evening.

But it is most welcome if more members could contribute in testing.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-09-27 11:11:40
The Matlab script is removing the bits block by block just after the codec-block is analysed, and before the next codec-block is analysed - thus contaminating the audio data in the pre-block-start overlap of the next analysis block.
Nick, you're a genius - that's exactly what's happening.

The two parts used to be separate (analysis loop then rounding loop) - when I put them together I didn't spot this. Interesting that the effect was so little ("644 bits extra removed, 483 less bits removed, 161 extra bits removed overall over 29799 codec blocks") – it shows how benign the added noise is. It bodes well for this being multi-generation proof and transcode-proof with a noise threshold shift of -6 or -12 dB.


Quote
In the script (replicated in the Delphi code) when the minimum min_bin value is calculated for an fft analysis the result is *rounded*, i.e. can be increased by up to 0.5 when looking up the threshold_index table to determine the bits_to_remove. Would it not be better to *floor* this value as it would reduce the likelihood of increasing noise above the minimum determined value?
Yes, probably. If you're going to do that, you should also move the threshold shift down from where it is, into that calculation, otherwise the additional accuracy is pretty meaningless.


Quote
Modified version of your LossyFLAC6_revised attached including the pre-calculated constants to re-create the reference_threshold values.
Thank you. Just for clarity: this includes those thresholds, but hasn't fixed the "contamination" bug?

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-27 11:22:12
If you're going to do that, you should also move the threshold shift down from where it is, into that calculation, otherwise the additional accuracy is pretty meaningless.
I will do that for the next rev.

Thank you. Just for clarity: this includes those thresholds, but hasn't fixed the "contamination" bug?
Yes - I didn't change the bit-removal procedure.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-27 12:58:52
Comparison of "round" method vs "floor" method used when determining bits_to_remove from min_bin values. Noise_threshold_shift moved into this calculation rather than in threshold_index calculation.

Attached comparison for the 50 sample set. Removed.

Rounding or flooring is selected at present using a command line parameter -floor, however, it may be prudent to remove this option but always use floor.
[edit]I feel that due to the small change in bits_to_remove, and for less likelihood of added noise over the min_bin value, that the code should be changed to always floor this result. v0.3.1 will include this.[/edit]

@David, as an aside, I mentioned earlier about randomness in the bits_to_remove in the Matlab script - this would happen if the samples were dithered when bits were removed.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-09-27 13:24:17
@David, as an aside, I mentioned earlier about randomness in the bits_to_remove in the Matlab script - this would happen if the samples were dithered when bits were removed.
Yes, it would. Where's the smiley for "hangs head in shame"!

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-27 13:28:46
Where's the smiley for "hangs head in shame"!
Nope, no need - yours was a minor oversight. Floor implemented with noise_threshold_shift moved. v0.3.1 tonight, I think.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-27 20:12:06
Revised LossyFLAC6_x.m - no pollution of audio data when removing bits, floor rather than round used in determining min_bin.

lossyWAV alpha v0.3.1 attached.

Code: [Select]
lossyWAV alpha v0.3.1 : WAV file bit depth reduction by 2Bdecided.
Transcoded by Nick.C & Halb27 from a script, www.hydrogenaudio.org

lossyWav usage: <input wav file> <options>

Options:
-1, -2 or -3  quality level (1:overkill, 2:default, 3:compact)
-cbs <n>      analysis codec_block_size (512<=n<=4608, default=576 bytes)
              (should match codec block size used in target compression codec)
-o <folder>   destination folder for the output file
-dither <n>   dither selection, 0<=n<=2, default=0. 0=no dither; 1=rectangular
              dither; 2=triangular dither.
-clipping <n> clipping prevention selection, 0<=n<=1, default=0. 0=none;
              1=fixed clipping prevention amplitude reduction, taking into
              account dither amplitude (if any).
-vsfl         select variable spreading function lengths
              (sfl = number of fft bins averaged during convolution of fft
              results. short fft = short sfl, long fft = long sfl)
-wsf          select weighted spreading functions.
              (weighted average of fft bins during convolution of fft results
              weighted towards lower frequency fft bins, 5/8:3/8)
-nts <n>      noise_threshold_shift=n (-15.0<=n<=0.0, default -1.5dB)
              (reduces overall bits to remove)
-overlap <n>  fft_overlap = fft_length/n (2<=n<=8, default=2)
              (increases number of fft analyses per codec block)
-skew <n>     skew results of fft analyses by n dB (0.0<=n<=12.0, default=0.0)
              with a (sin-1) shaping over the frequency range 20Hz to 3.7kHz.
              (artificially increase low frequency bins to take into account
              higher SNR requirements at low frequencies)
-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

Options not yet implemented:
-bitdepth <n> forced output bitdepth (16 or 24)
-flac         optimizations for use with FLAC
-wv           optimizations for use with wavPack
-tak          optimizations for use with TAK


Default noise_threshold_shift magnitude reduced (-3.0db to -1.5db) at the same time as the change from round to floor.

Output now *identical*(!) in terms of bits removed per block.

[edit]Default clipping prevention bug report acted upon - v0.3.1 removed, v0.3.1b below.[/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-09-27 22:10:17
Nick, you're the king, incredibly fast in bringing out the new versions with all the current items included, and wonderful quality judging so far:

I did a quick test with v0.3.1 using -cbs 1024 -skew 9.0 -cliipping 0 (an explicit setting of -clipping 0 was necessary - it seems like 0 isn't the default):

Atem-lied: Could not abx, which is very remarkable as more bits are removed than with say v0.2.0 -  quality improvement due to not dithering?

bibilolo was fine to me as well.

Even with 2Bdecided's dither_noise_test.wav there was no audible issue to me, and this is a very artificial sample, and many bits are removed.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-27 22:20:47
I did a quick test with v0.3.1 using -cbs 1024 -skew 9.0 -cliipping 0 (an explicit setting of -clipping 0 was necessary - it seems like 0 isn't the default):

Atem-lied: Could not abx, which is very remarkable as more bits are removed than with say v0.2.0 -  quality improvement due to not dithering?

bibilolo was fine to me as well.

Even with 2Bdecided's dither_noise_test.wav there was no audible issue to me, and this is a very artificial sample, and many bits are removed.
-clipping 0 *should * be a default setting, I'll investigate. [edit] Ahem...... mea culpa! Corrected in alpha v0.3.1b[/edit] Superseded by v0.3.2

My gut feeling is that the move from round to floor has made quite a difference where previously bits_to_remove would have been 1 more than the new version (all else being equal) for certain codec_blocks in the sample.

I've been listening to my 51 sample set (added a Black Sabbath extract from Iron Man  ) at -3 -nts 0 -dither 2 -clipping 1 -skew 9 -wsf -vsfl and I'm *really* pleased at the result - so much so that this may become my DAP conversion setting.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-28 08:22:49
Just out of curiosity, will bit-reduction cause ALAC compress any better. I just bought an iPod    and the idea of lossy flac is great but now I'm stuck without my trusty FLAC format if I want lossless music (and rockbox doesn't really appeal to me that much).

Thanks,
Bobby
Sorry for the late reply Bobby - I think in the original thread there's some discussion about the applicability of the method to various codecs and from memory only really FLAC, Wavpack & Tak will benefit. The other way of looking at is is to just give it a try and see if the compressed file is indeed smaller!

Regards,

Nick.

(or, you could Rockbox your iPod for FLAC support!)
Title: lossyWAV Development
Post by: Nick.C on 2007-09-28 12:25:23
Performance results for v0.3.1b:

Guruboolez's 150 sample set:

WAV: 252.36MB; 1411.2kbps;
FLAC: 122.17MB; 683kbps;
lossy -1: 95.7MB; 535kbps;
lossy -2: 89.6MB; 501kbps;
lossy -3: 88.1MB; 492kbps.

My "working" 52 sample set:

WAV: 121.53MB; 1411.2kbps;
FLAC: 68.2MB; 792kbps;
lossy -1: 42.6MB; 495kbps;
lossy -2: 39.5MB; 458kbps;
lossy -3: 38.6MB; 449kbps.

I am currently processing a number of albums and will revert with "real-world" results:

AC/DC - Who Made Who :
WAV: 385.3MB; 1411.2kbps;
FLAC: 238.9MB; 875kbps;
lossy -3: 112.2MB; 411kbps;

Gerry Rafferty - City To City:
WAV: 541.8MB; 1411.2kbps;
FLAC: 307.9MB; 802kbps;
lossy -3: 161.2MB; 420kbps;

Jean Michel Jarre - Oxygene:
WAV: 400.7MB; 1411.2kbps;
FLAC: 219.5MB; 773kbps;
lossy -3: 134.8MB; 475kbps;

The Shamen - Boss Drum:
WAV: 663.3MB; 1411.2kbps;
FLAC: 433.4MB; 922kbps;
lossy -3: 170.0MB; 362kbps;

Van Morrison - Astral Weeks:
WAV: 477.0MB; 1411.2kbps;
FLAC: 255.9MB; 757kbps;
lossy -3: 142.6MB; 422kbps.

I carried out a multi-generational experiment and was pleasantly surprised to see that the output matches the input after surprisingly few generations. This was with no dither or clipping prevention. On clipping prevention, up to now there has been no check made to see if the new sample value lies in the range of permitted sample values - v0.3.2 will incorporate this and also issue warnings at the end of processing if samples have required to be clipped to the maximum permissible or minimum permissible value respectively.
Title: lossyWAV Development
Post by: user on 2007-09-28 15:28:00
You yield in the end to ca. 50% of Lossless (eg. flac) bitrates, in numbers around 450 kbit/s (+-50).
The technical approach of removing bits reminds me to wavpack lossy, which has also great more or less tranparent results at these high - very high lossy bitrates.
Isn't the technical approach between wavpack lossy and this algorithmn very very similar, ie. (nearly) no psycho-acoustic model which might cause artifacts, but instead a higher noise floor, be it perceptual or not ?

(At a side note, both, the results of wavpack lossy and of this algorithmn could be also similar to DAT at 32 kHz, 12 bit, ie. the way a DAT recorder in longplay position downsamples from 44.1 kHz 16 bit to 32 kHz, 12 bit, to achieve the half of the data.
While for many people without possibility of X/Y or A/B, even not to speak of ABX, DAT Longplay 32 kHz 12 bit is also "transparent").


Can somebody of the inventors (of this algorithmn) explain in a few simple words, what is then the difference between wavpack lossy and this new algorithmn ?
(or link to that post, where that has been explained already )
Title: lossyWAV Development
Post by: bryant on 2007-09-28 16:23:20
Isn't the technical approach between wavpack lossy and this algorithmn very very similar, ie. (nearly) no psycho-acoustic model which might cause artifacts, but instead a higher noise floor, be it perceptual or not ?

Can somebody of the inventors (of this algorithmn) explain in a few simple words, what is then the difference between wavpack lossy and this new algorithmn ?

There are several differences, but I'll mentioned what I think are the most important.

The first is that this is designed to work with FLAC, or any other lossless compressor that can detect and optimize away redundant zeros in the audio data words. The advantage of this is that it can be used with the many existing FLAC hardware (and software) players without modification or update.

The other major difference is that lossyFLAC is intelligent. WavPack lossy is CBR and simply gets as close as it can to the source given the allowed bitrate. LossyFLAC is VBR because it analyzes the source and tries to determine the most quantization noise it can add and still be completely transparent. In this way it's more like the WavPack lossy “quality” mode I worked on (and abandoned) several years ago.

Both methods rely on some psychoacoustic principles to achieve their results. In the WavPack case it chooses noise shaping and joint stereo encoding to make the noise less audible. LossyFLAC uses them to help determine how much white noise it can add (i.e., the fact that we are less sensitive to added noise at higher frequencies). However, neither have a psychoacoustic model in the strict sense and, of course, neither use any sort of frequency domain encoding or digital filtering in the signal path.
Title: lossyWAV Development
Post by: Brent on 2007-09-28 20:57:03
Just out of curiosity, will bit-reduction cause ALAC compress any better. I just bought an iPod    and the idea of lossy flac is great but now I'm stuck without my trusty FLAC format if I want lossless music (and rockbox doesn't really appeal to me that much).

Thanks,
Bobby

I ran a test with .3.1b -3:

(http://xs319.xs.to/xs319/07395/lossyalac.PNG)
(I'm too lazy to type it out, I attached image incase host goes down)

So it has no effect on filesize, slightly negative in fact. LossyWav causes some bad clipping in the case of Ali's Here and Unbreakable btw.
Too bad actually, cause it seemed a good way to save some space on the iPod.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-28 22:42:41
I ran a test with .3.1b -3:

So it has no effect on filesize, slightly negative in fact. LossyWav causes some bad clipping in the case of Ali's Here and Unbreakable btw.
Too bad actually, cause it seemed a good way to save some space on the iPod.
Thanks for the testing. You can switch on fixed amplitude reduction clipping prevention by using the -clipping 1 parameter.

Attached lossyWAV alpha v0.3.2. Superseded by v0.3.3

Code tidied up further and will now require -force parameter to over-write an existing file. If clipping occurs, warning(s) are issued after processing. I say warning(s) because it counts clipping over maximum and under minimum separately.
Title: lossyWAV Development
Post by: Mitch 1 2 on 2007-09-29 13:33:34
Could you please add a command-line parameter to set the process priority to "low"?
Title: lossyWAV Development
Post by: Nick.C on 2007-09-29 13:47:11
Could you please add a command-line parameter to set the process priority to "low"?
Mitch,

I'll certainly look into it and see if I can - I've never tried it before, but there must be code available to implement it (somewhere....).

Thanks for your input.

Nick.
Title: lossyWAV Development
Post by: guruboolez on 2007-09-29 14:00:03
@Nick.C
Did the settings change since version ~0.20? I'm unable to make the last versions (>0.30) work with foobar2000.

What I'm using is:
encoder: C:\windows\system32\cmd.exe
extension: lossy.flac
Parameters: /d /c C:\CODEC\lossywav\LossyWav.bat %s %d
Format is: lossy
bps: 16 bits

my batch files is the following one:
Code: [Select]
@echo off
set lossyWAV_path="C:\CODEC\lossywav\lossyWav.exe"
set flac_path="C:\CODEC\lossywav\flac.exe"
rem echo %lossyWAV_path% %1 -c 576 >>flac_lossy.txt
rem echo %flac_path% -8 -f -b 576 -o"%~DPN2.flac" "%~DPN1.lossy.wav" >>flac_lossy.txt
rem echo del "%~DPN1.lossy.wav" >>flac_lossy.txt
%lossyWAV_path% %1 -c 576
%flac_path% -8 -f -b 576 "%~N1.lossy.wav" -o"%~N2.flac"
del "%~N1.lossy.wav"
set output_file=
set lossyWAV_path=
set flac_path=
Title: lossyWAV Development
Post by: Nick.C on 2007-09-29 14:50:59
@Nick.C
Did the settings change since version ~0.20? I'm unable to make the last versions (>0.30) work with foobar2000.

What I'm using is:
encoder: C:\windows\system32\cmd.exe
extension: lossy.flac
Parameters: /d /c C:\CODEC\lossywav\LossyWav.bat %s %d
Format is: lossy
bps: 16 bits

my batch files is the following one:
Code: [Select]
@echo off
set lossyWAV_path="C:\CODEC\lossywav\lossyWav.exe"
set flac_path="C:\CODEC\lossywav\flac.exe"
rem echo %lossyWAV_path% %1 -c 576 >>flac_lossy.txt
rem echo %flac_path% -8 -f -b 576 -o"%~DPN2.flac" "%~DPN1.lossy.wav" >>flac_lossy.txt
rem echo del "%~DPN1.lossy.wav" >>flac_lossy.txt
%lossyWAV_path% %1 -c 576
%flac_path% -8 -f -b 576 "%~N1.lossy.wav" -o"%~N2.flac"
del "%~N1.lossy.wav"
set output_file=
set lossyWAV_path=
set flac_path=


Settings did change a bit - "-c" is now "-cbs", but a codec_block_size of 576 is now the default setting, so you can remove "-c 576" altogether.
Title: lossyWAV Development
Post by: guruboolez on 2007-09-29 15:13:52
It works fine again. Thank you very much!
Title: lossyWAV Development
Post by: guruboolez on 2007-09-29 15:53:39
Lossywav is impressive on harpsichord recordings.
It brings a 1062 kbps flac encoding to a pretty nice 436 kbps: 60% reduction!!!

Sound quality was fine, but strangely (because I didn't heard something wrong at first sight and I didn't expect to hear something different from original) my ABX score isn't bad at all:

Code: [Select]
foo_abx 1.3.1 report
foobar2000 v0.9.4.5 beta 1
2007/09/29 16:36:04

File A: F:\Fiocco - [Demeyere] Pièces de Clavecin, Oeuvre Premier\CD 1\12-Vivace.flac
File B: N:\12-Vivace.lossy.flac

16:36:04 : Test started.
16:36:29 : 01/01  50.0%
16:36:35 : 02/02  25.0%
16:36:58 : 02/03  50.0%
16:37:04 : 03/04  31.3%
16:37:14 : 04/05  18.8%
16:37:31 : 04/06  34.4%
16:37:36 : 05/07  22.7%
16:37:41 : 06/08  14.5%
16:37:54 : 06/09  25.4%
16:37:59 : 07/10  17.2%
16:38:11 : 08/11  11.3%
16:38:18 : 08/12  19.4%
16:38:21 : 08/13  29.1%
16:38:24 : 09/14  21.2%
16:38:29 : 10/15  15.1%
16:38:40 : 11/16  10.5%
16:38:45 : 12/17  7.2%
16:38:51 : 13/18  4.8%
16:39:00 : 14/19  3.2%
16:39:08 : 15/20  2.1%
16:39:15 : 16/21  1.3%
16:39:23 : 16/22  2.6%
16:39:31 : 16/23  4.7%
16:39:36 : 17/24  3.2%
16:40:04 : 18/25  2.2%
16:40:18 : 19/26  1.4%
16:40:27 : 19/27  2.6%
16:40:40 : 20/28  1.8%
16:40:51 : 20/29  3.1%
16:41:02 : 21/30  2.1%
16:41:07 : 21/31  3.5%
16:41:11 : 22/32  2.5%
16:41:32 : Test finished.

----------
Total: 22/32 (2.5%)


Second half was better than first one (it's usually the opposite). There were no noise, no artefact, but something hard to define (audiophile would call it "lack of soundstage" or something similar).
I'm not completely sure that this ABX score is valid (since I didn't really fix a target: first 16 trials, then I decided to extend it to 24 and when I reached 24 I was so confident that I extend it again to 32 trials). Pio2001 explained once why such test need a better score to be valid.

I uploaded a short sample from this sample (the part I ABXed here):
http://rapidshare.com/files/59085975/fiocco.zip.html (http://rapidshare.com/files/59085975/fiocco.zip.html)
Title: lossyWAV Development
Post by: Nick.C on 2007-09-29 17:43:58
From the look of your command line, you're not changing anything that isn't default in v0.3.2. If you're looking to maintain the same processing rate, use "-nts n" where n is less than -1.5 this will reduce the bits removed in the processing - at the rate of 1 bit per -6 of nts (noise_threshold_shift).

The other way to do it would be to use the "-1" quality option to see if that makes a difference to your sample.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-29 20:29:16
@Mitch 1 2: I have found a method of lowering process priority, although the permissible priority settings vary with Windows version. Are you running Windows later than NT, ME, 98 & 95? If so then I can provide 3 settings: Normal, Below_Normal and Low priority. If not, Below_Normal vanishes as it is not supported by the O/S mentioned.

I will implement "-below" and "-low" ("-normal" unavailable, but default) parameters to lower the process priority - but below won't work on all versions of Windows.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-29 22:12:25
lossyWAV alpha v0.3.3 attached. Immediately superseded by v0.3.3b - output file access bug.

-below and -low parameters incorporated, setting below_normal process priority and low process priority respectively.

Now checks for read-only output files and will attempt to set read/write access, will also report if read/write access setting fails.

Code: [Select]
lossyWAV alpha v0.3.3 : WAV file bit depth reduction method by 2Bdecided.
Transcoded to Delphi by Nick.C & Halb27 from a script, www.hydrogenaudio.org

Usage: lossyWAV <input wav file> <options>

Options:

-1, -2 or -3  quality level (1:overkill, 2:default, 3:compact)
-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists.

-cbs <n>      analysis codec_block_size (512<=n<=4608, default=576 samples)
              (should match codec block size used in target compression codec)
-nts <n>      noise_threshold_shift=n (-15.0<=n<=0.0, default -1.5dB)
              (reduces overall bits to remove)
-skew <n>     skew results of fft analyses by n dB (0.0<=n<=12.0, default=0.0)
              with a (sin-1) shaping over the frequency range 20Hz to 3.7kHz.
              (artificially decrease low frequency bins to take into account
              higher SNR requirements at low frequencies)

-dither <n>   dither selection, 0<=n<=2, default=0
              (0=no dither; 1=rectangular dither; 2=triangular dither)
-clipping <n> clipping prevention selection, 0<=n<=1, default=0. 0=none;
              1=fixed clipping prevention amplitude reduction, taking into
              account dither amplitude (if any).
-vsfl         select variable spreading function lengths (sfl)
              (sfl = number of fft bins averaged during convolution of fft
              results. short fft = short sfl, long fft = long sfl)
-wsf          select weighted spreading functions.
              (weighted average of fft bins during convolution of fft results
              weighted towards lower frequency fft bins, 5/8:3/8)
-overlap <n>  fft_overlap = fft_length/n (2<=n<=8, default=2)
              (increases number of fft analyses per codec block)

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode
-below        set process priority to below normal.
-low          set process priority to low.

Options not yet implemented:

-bitdepth <n> forced output bitdepth (16 or 24)
-flac         optimizations for use with FLAC
-wv           optimizations for use with wavPack
-tak          optimizations for use with TAK
Title: lossyWAV Development
Post by: Mitch 1 2 on 2007-09-30 03:08:42
I've tested lossyWAV 0.3.3 on Windows XP SP2, and there's a serious problem with output file permissions which is not present in the 0.3.2 version. lossyWAV fails to gain write access even when there is no existing file.
Title: lossyWAV Development
Post by: Nick.C on 2007-09-30 08:41:45
I've tested lossyWAV 0.3.3 on Windows XP SP2, and there's a serious problem with output file permissions which is not present in the 0.3.2 version. lossyWAV fails to gain write access even when there is no existing file.
That sounds like inadequate testing on my part prior to release. It *should* work if the output file already exists, however this is embarassing - I'll get v0.3.3b out as soon as possible.

This version should work - apologies for the error. - File removed, superseded.

Nick.
Title: lossyWAV Development
Post by: Mitch 1 2 on 2007-09-30 14:02:58
It's alpha software, so I forgive you.
Now lossyWAV works as expected, and it even handled my conflicting parameter tests!
Title: lossyWAV Development
Post by: Nick.C on 2007-10-03 19:49:46
lossyWAV alpha v0.3.4 attached. Superseded by alpha v0.3.5.

new parameter -info to show WAV file rate, channels, bps and length;
code tidy up and speed up.

Have fun!
Title: lossyWAV Development
Post by: Nick.C on 2007-10-05 09:11:50
lossyWAV alpha v0.3.5 attached. Superseded by alpha v0.3.6

new parameter -spread, replaces -vsfl. An experimental take on spreading.
code tidy up and (quite significant) speed up.

Have fun!
Title: lossyWAV Development
Post by: halb27 on 2007-10-09 22:40:35
Lossywav is impressive on harpsichord recordings.
...
There were no noise, no artefact, but something hard to define (audiophile would call it "lack of soundstage" or something similar).

Do you mind trying -cbs 1024? 2Bdecided once mentioned that he created the procedure with such a blocksize in mind and was a bit unsure about the outcome of shorter block sizes. Resulting FLAC filesize should be roughly the same according to my experience.

If this isn't sufficient can you please try -nts x as suggested by Nick.C or maybe also -skew y and -spread?

Sorry I can't do it myself as I'm not able to abx your provided samples.
Title: lossyWAV Development
Post by: halb27 on 2007-10-09 23:01:54
... -spread, replaces -vsfl. An experimental take on spreading. ...

Sorry, but I'm not sure whether it's a promising procedure to try out different weights in building the average of 3 or 4 bins. My feeling is that in the overall view that's not significant variation and may produce better results in one case and worse in other ones.

I'm still a bit worried about David Bryants comment on the spreading function: that the critical bands have a different width, with corner frequencies according to Bark of 0, 100, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000, 15500 Hz.

So to me it's plausible to vary the number of bins over which to build the average not only according to the fft_length as you did already with the previous -vfsl option, but also on the frequency range the corresponding bin belongs to. Taking it into account roughly may be sufficient (for instance averaging over 2 bins in the range up to 1720 Hz, 3 bins in the range up to 3700 Hz and over 4 bins otherwise).
Title: lossyWAV Development
Post by: halb27 on 2007-10-10 10:20:05
@ Nick.C:

I also feel a bit uncomfortable about the many options. It is not inviting for potential listeners who have a hard job as quality is already very good, and a sufficient amount of listening experience is what is missing most at the moment.

It's all a matter of taste, but I think it will be good to return back more to the essentials of primary quality settings.
As for additional options I thinks it's good to have the dithering and the clipping option (default: no dithering and no anti-clipping strategy).
But other than that my feeling is that everything should go into -1, -2, -3.
Moreover we should concentrate on getting an extremely good quality in the ~500 kbps range. I think current experience is enough to show that achieving significantly lower bitrate while keeping up excellent quality is not possible with the current approach without additions like those proposed by SebastianG.
So I think we should leave the -3 option behind until more details about such an approach are available.

On concentrating on -1 and -2 I think to target at -2 at a level that makes any known sample transparent to any listener, and we should keep -1 details only slightly above these qualitywise. This gives any listener the chance to switch from -2 to -1 in case he has a sample which is not transparent.
As a consequence what was -1 should then become -2 in the next version (or a small promising variant of -1), and a new -1 should be created.

Suggestion for -2: what is -2 right now with no skewing, and a spreading function which does just arithmetic averaging, but with the number of bins participating in averaging depending on bin frequency as described in my last post, and also depending on fft length as you did already. Moreover I'd welcome a blocksize of 1024 instead of 576. No serious disadvantage in resulting bitrate but more secure.

Suggestion for -1: specifics of -1 like in the existing version, other details like with -2 but a tiny bit more demanding, for instance a slightly lowered noise threshold and a small skewing factor (as the first trial - can be increased if necessary).
Title: lossyWAV Development
Post by: 2Bdecided on 2007-10-10 10:34:43
You must leave noise threshold shift in as a command line option.

Either the frequency skewing or the variable spreading length appear to be needed to make it work properly.

I agree that lots of options are confusing, but I thought they were only in there for testing. There will eventually be no direct user control of any of them, I hope, because someone will figure out the optimal settings and default them.

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2007-10-10 12:17:26
You must leave noise threshold shift in as a command line option.

Sure that is an advantage per se. But on the other hand it breaks the simple division of a simple quality parameter and options targeting at more or less additional features like dithering. Moreover differences in threshold shift are incorported in the difference between -2 and -1.

Quote
Either the frequency skewing or the variable spreading length appear to be needed to make it work properly.
This is the case with my suggestion for -1 and -2.
Quote
I agree that lots of options are confusing, but I thought they were only in there for testing. There will eventually be no direct user control of any of them, I hope, because someone will figure out the optimal settings and default them.

Sure, but I'm afraid the fact that we don't have a lot of listeners in the testing phase is not only due to the difficulties in abxing samples at the high quality already achieved but to some extent also to the amount of options not everybody knows what they are good for.
Looking at guruboolez (certainly the most welcome tester) it looks like he doesn't want to play around with options.

Sure these things are also related to my personal opinion that varying spreading function by varying weights in the average formula is not worth while. Variable spreading length however is promising IMO.
It's also related to my beleive that a significant saving in bitrate is not possible with the current approach, and I don't care much about whether it's finally 530 kbps or 480 kbps on average. After all we're targeting at a significantly lower bitrate than going lossless, while keeping up transparency to a high degree of security. The latter part is what I care about most, and IMO we should do everything to encourage testers.
Title: lossyWAV Development
Post by: shadowking on 2007-10-10 12:48:52
I would like people to feed all their transform problem samples and start testing lossywav. Problem is that hybrids make easy work of most transform problems. It would still be usefull I think even though i don't think we will see a good abx result even for -2 (hopefully).
Title: lossyWAV Development
Post by: Nick.C on 2007-10-10 13:14:53
I hear what's being said, but my ears / listening environment are not up to finalising the settings by myself.

The current (unreleased alpha v0.3.6) command line parameter list is as follows:

Code: [Select]
lossyWAV alpha v0.3.6 : WAV file bit depth reduction method by 2Bdecided.
Transcoded to Delphi by Nick.C & Halb27 from a script, www.hydrogenaudio.org

Usage: lossyWAV <input wav file> <options>

Options:

-1, -2 or -3  quality level (1:overkill, 2:default, 3:compact)
-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists.

-cbs <n>      analysis codec_block_size (512<=n<=4608, default=576 samples)
              (should match codec block size used in target compression codec)
-nts <n>      noise_threshold_shift=n (-15.0<=n<=0.0, default -1.5dB)
              (reduces overall bits to remove)
-spread       select variable spreading functions.(incompatible with -weight)
-weight       select weighted spreading functions.(incompatible with -spread)
              (weighted average of fft bins during convolution of fft results
              weighted towards lower frequency fft bins, 5/8:3/8)
-skew <n>     skew results of fft analyses by n dB (0.0<=n<=12.0, default=0.0)
              with a (sin-1) shaping over the frequency range 20Hz to 3.7kHz.
              (artificially decrease low frequency bins to take into account
              higher SNR requirements at low frequencies)

-dither <n>   dither selection, 0<=n<=2, default=0
              (0=no dither; 1=rectangular dither; 2=triangular dither)
-clipping <n> clipping prevention selection, 0<=n<=1, default=0. 0=none;
              1=fixed clipping prevention amplitude reduction, taking into
              account dither amplitude (if any).
-overlap <n>  fft_overlap = fft_length/n (2<=n<=8, default=2)
              (increases number of fft analyses per codec block)

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode
-info         display WAV file information
-below        set process priority to below normal.
-low          set process priority to low.

Options not yet implemented:

-bitdepth <n> forced output bitdepth (16 or 24)
-flac         optimizations for use with FLAC
-wv           optimizations for use with wavPack
-tak          optimizations for use with TAK
[/size]

However, I think that it may be beneficial to reduce this to

Code: [Select]
lossyWAV alpha v0.3.6 : WAV file bit depth reduction method by 2Bdecided.
Transcoded to Delphi by Nick.C & Halb27 from a script, www.hydrogenaudio.org

Usage: lossyWAV <input wav file> <options>

Options:

-1, -2 or -3  Classic quality level (1:overkill, 2:default, 3:compact)
-nts <n>      noise_threshold_shift=n (-15.0<=n<=0.0, default -1.5dB)
              (reduces overall bits to remove)
-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists.

-dither       dither output using triangular dither, default=off
-clipping <n> clipping prevention selection, 0<=n<=1, default=0. 0=none;
              1=fixed clipping prevention amplitude reduction, taking into
              account dither amplitude (if any).

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-below        set process priority to below normal.
-low          set process priority to low.
[/size]and tweak the parameters implicit in -1,-2 & -3. Possibly implement additional test settings to see whether a listener prefers -2 or -20? Codec block size needs to be stated for each quality setting or the user will not know how to optimally compress the output.

As an aside, I used v0.3.5 to compress 30GB of FLAC files at quality -2 and got 15.2GB out - average bitrate approx 420kbps.

As there are no real process developments (other than code optimisation) in v0.3.6, I will defer release until a way forward is agreed on internal quality settings development.

Nick.


... -spread, replaces -vsfl. An experimental take on spreading. ...

Sorry, but I'm not sure whether it's a promising procedure to try out different weights in building the average of 3 or 4 bins. My feeling is that in the overall view that's not significant variation and may produce better results in one case and worse in other ones.

I'm still a bit worried about David Bryants comment on the spreading function: that the critical bands have a different width, with corner frequencies according to Bark of 0, 100, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000, 15500 Hz.

So to me it's plausible to vary the number of bins over which to build the average not only according to the fft_length as you did already with the previous -vfsl option, but also on the frequency range the corresponding bin belongs to. Taking it into account roughly may be sufficient (for instance averaging over 2 bins in the range up to 1720 Hz, 3 bins in the range up to 3700 Hz and over 4 bins otherwise).
[/size]

The most recent experimental take on spreading (in the original thread) uses simple 3 bin average at short FFT lengths (2 to 64 samples) and shifts gradually to max of adjacent bins and current bin (a simple attempt at masking) at long FFT lengths (1024 to 32768 samples). If anyone has any algorithmic ideas with regard to spreading, then please let me know. Bear in mind that the default quality settings have always used 4 bin averaging (-2 & -3) and 3 bin averaging (-1).
Title: lossyWAV Development
Post by: 2Bdecided on 2007-10-10 14:31:40

You must leave noise threshold shift in as a command line option.

Sure that is an advantage per se. But on the other hand it breaks the simple division of a simple quality parameter and options targeting at more or less additional features like dithering. Moreover differences in threshold shift are incorported in the difference between -2 and -1.


I know. But if you want to use lossyFLAC in multiple generations of encoding (50 or more) you ought to use about -12.

Also, if anyone does find a problem sample, the obvious question is how far must the threshold shift before it's solved. If you remove the switch, no one can answer this!

Besides, the noise threshold shift is the most fundamental parameter in lossyFLAC. It was probably the first line of code that I coded! I wrote threshold_shift=0; with the assumption that really it shouldn't be zero and I'd figure it out later!

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-10 14:38:58
I know. But if you want to use lossyFLAC in multiple generations of encoding (50 or more) you ought to use about -12.
This doesn't appear to correlate with my findings with multi-generational processing - with no dither the output matches the input after about 4 or 5 generations - beyond that, generation n = generation n-1.
Title: lossyWAV Development
Post by: halb27 on 2007-10-10 14:39:10
...However, I think that it may be beneficial to reduce this to

Code: [Select]
lossyWAV alpha v0.3.6 : WAV file bit depth reduction method by 2Bdecided.
Transcoded to Delphi by Nick.C & Halb27 from a script, www.hydrogenaudio.org

Usage: lossyWAV <input wav file> <options>

Options:

-1, -2 or -3  Classic quality level (1:overkill, 2:default, 3:compact)
-nts <n>      noise_threshold_shift=n (-15.0<=n<=0.0, default -1.5dB)
              (reduces overall bits to remove)
-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists.

-dither       dither output using triangular dither, default=off
-clipping <n> clipping prevention selection, 0<=n<=1, default=0. 0=none;
              1=fixed clipping prevention amplitude reduction, taking into
              account dither amplitude (if any).

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-below        set process priority to below normal.
-low          set process priority to low.
[/size]and tweak the parameters implicit in -1,-2 & -3. Possibly implement additional test settings to see whether a listener prefers -2 or -20? Codec block size needs to be stated for each quality setting or the user will not know how to optimally compress the output. ...

I welcome such an approach very much.
As for codec block size sure it must be known. At the moment I think it's best to concentrate on FLAC and use a blocksize of 1024 (with any quality setting).
Whenever the demand comes for other lossless codecs I think it's best to bring the -tak etc. paramters to life and use a codec specific blocksize. Or may be bring them to life immediately with a promising blocksize.

Quote
The most recent experimental take on spreading (in the original thread) uses simple 3 bin average at short FFT lengths (2 to 64 samples) and shifts gradually to max of adjacent bins and current bin (a simple attempt at masking) at long FFT lengths (1024 to 32768 samples). If anyone has any algorithmic ideas with regard to spreading, then please let me know. Bear in mind that the default quality settings have always used 4 bin averaging (-2 & -3) and 3 bin averaging (-1).

To me this sounds plausible and IMO should be incorporated into the quality settings for -2 and (slightly more demanding) for -1. A rigid justification for such a procedure isn't necessary IMO.
You don't write about David Bryant's concern about the varying width of the critical bands. Isn't it plausible to you? Are there problems with implementation?
Title: lossyWAV Development
Post by: 2Bdecided on 2007-10-10 17:10:17
I know. But if you want to use lossyFLAC in multiple generations of encoding (50 or more) you ought to use about -12.
This doesn't appear to correlate with my findings with multi-generational processing - with no dither the output matches the input after about 4 or 5 generations - beyond that, generation n = generation n-1.


I didn't know that! Still, I can see why it might be true. Not sure I'm certain it's "proven" behaviour yet.

Still, useful multi-generational encoding means that you're actually going to do something with the audio between encodes. So the audio will keep being changed, and then re-quantised by lossyFLAC/WAV. When I tested this (early on) I ended up with 12dB more noise than I wanted after 50 iterations (which is quite amazingly good, because standard 16-bit dither can be audible after 50 iterations!). Lowering the noise threshold shift will solve this, though I should check that with the current version I guess.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-10 20:16:59
You don't write about David Bryant's concern about the varying width of the critical bands. Isn't it plausible to you? Are there problems with implementation?
I take the point that critical bands have varying widths - but as someone who has only become recently aware of most of the concepts being used in the method, I am a bit at a loss as to how to proceed with implementing an element of the method which would take this into account.

One thought that has just occurred to me:

Is there any merit in averaging / taking the minimum of FFT results across analyses carried out for each FFT length for a codec block rather than or as well as along the FFT analysis results? Would this give some time spreading? Or have I just drunk too much coffee today?

Thinking about default settings:

-1 : codec block size=2304 samples; 4 analyses; 64, 256, 1024 & 4096 sample FFT lengths; noise_threshold_shift=-3.0; spreading_function_length=3;

-2 : codec block size=1152 samples; 3 analyses; 64, 256 & 1024 sample FFT lengths; noise_threshold_shift=-1.5; spreading_function_length=4;

-3 : codec block size=576 samples; 2 analyses; 64 & 1024 sample FFT lengths; noise_threshold_shift=-1.0; spreading_function_length=4;

or, should the spreading_function_length=n be replaced by the experimental 3 bin average to 3 bin max spreading?

I am stripping excess command line parameters out and will play with the temporal fft averaging / minimum algorithm.
Title: lossyWAV Development
Post by: GeSomeone on 2007-10-10 23:19:02
.. I'm afraid the fact that we don't have a lot of listeners in the testing phase is not only due to the difficulties in abxing samples at the high quality already achieved but to some extent also to the amount of options not everybody knows what they are good for.

Looking from the sideline I add my 2 cents.
The only thing that really matters is: the default should be "the Right Thing?".

Having a scale like -1 -2 -3  also helps to appear simple, but first you must have -2 (the default), before worrying about the user interface. The others should be sufficiently different in size (or maybe speed) with a tradeoff in quality.

Right now you need the options to find out what the best strategy is. (e.g. why remove -skew when it has proven useful?)

There could be another reason, the concept of lossylossless might not appeal to many and is certainly hard to ABX once you reach a certain low noise level.


BTW. just a little anecdote. By accident I lossyFlac-ed a track that had only silence. Foobar2000 replaygained +64 dB and the noise was very clear to hear. I played around with replaygain a bit and (in a very sub-optimal listening environment, not even through a headphone) somewhere at about +40 dB replaygain the noise was masked by harddisk and fan noise. Not very useful I'm sure.

However, a foobar2000 DSP plugin has to be at the top of my wishlist - it would make it all *so* much easier, and would more easily preserve tagging information.

I was wondering if that would work, as in the foobar2000 0.9 DSP pipeline everything is passed as 32 bit floats? It might be no problem to remove bits though.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-10 23:25:20
...the default should be "the Right Thing?".
I wholeheartedly agree!

Having a scale like -1 -2 -3  also helps to appear simple, but first you must have -2 (the default), before worrying about the user interface. The others should be sufficiently different in size (or maybe speed) with a tradeoff in quality.
That's what we've tried to do, the settings in v0.3.5 are close those arrived at with Halb27 and Wombat in this thread.

Right now you need the options to find out what the best strategy is. (e.g. why remove -skew when it has proven useful?)
Point taken, coincidentally, it hasn't yet been removed - I won't yet.

BTW. just a little anecdote. By accident I lossyFlac-ed a track that had only silence. Foobar2000 replaygained +64 dB and the noise was very clear to hear. I played around with replaygain a bit and (in a very sub-optimal listening environment, not even through a headphone) somewhere at about +40 dB replaygain the noise was masked by harddisk and fan noise. Not very useful I'm sure.
Which version of lossyWAV was that? Recent versions default to no dither, so this problem should not happen unless you dither.

Thanks for the input!

Nick.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-11 00:01:37
lossyWAV alpha v0.3.6 attached. Superseded, see later.
Title: lossyWAV Development
Post by: halb27 on 2007-10-11 08:11:17
I take the point that critical bands have varying widths - but as someone who has only become recently aware of most of the concepts being used in the method, I am a bit at a loss as to how to proceed with implementing an element of the method which would take this into account.

Don't the coefficients returned by the FFT relate to frequencies which equidistantly cover the frequency range (linear partitioning)?
That's my maybe naive imagination.

lossyWAV alpha v0.3.6 attached. ...

Thank you.

a) The codec blocksize of -1/-2/-3 is now 2304/1152/576?
b) The spreading_length of -1/-2/-3 is now 3/4/4
    and simple averaging is done in the spreading function when not using advanced option -spread?
c) Noise threshold shift default of -1/-2/-3 is now -3.0/-1.5/-1.0?

What does -spread do?
Title: lossyWAV Development
Post by: Nick.C on 2007-10-11 09:03:56
a) The codec blocksize of -1/-2/-3 is now 2304/1152/576?
b) The spreading_length of -1/-2/-3 is now 3/4/4
    and simple averaging is done in the spreading function when not using advanced option -spread?
c) Noise threshold shift default of -1/-2/-3 is now -3.0/-1.5/-1.0?

What does -spread do?
[/size]a) Yes;
b) Yes;
c) Yes;

-spread carries out the spreading which varies with fft length. See code fragment in original thread.
Title: lossyWAV Development
Post by: halb27 on 2007-10-11 11:56:27
Wonderful.

So at the moment we're left with guruboolez' problem where he could abx a harpsichord sample.

@guruboolez: Are you out there?
It would be great if you could give your sample another try with this new version.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-11 12:17:18
@Halb27: Taking on board what you were saying about using Bark band width to determine how many bins to average, I will start to work out a new spreading option which does (inspired by one of J.M.Valin's papers).
Title: lossyWAV Development
Post by: 2Bdecided on 2007-10-11 12:29:58
I think you're heading down a slippery slope here!

First you'll find yourself averaging over 100 bins at the highest frequency, and before you know it you'll be implementing a proper psychoacoustic model to sort it all out!

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2007-10-11 12:42:59
At least I don't think of a sophisticated implementation of this principle.

For the extreme cases may be a spreading_length of 1 at the low end, and a spreading_length of 5 at the high end, or something like that, or maybe even less variation. Depending on fft_length. The principle may be worth implementing for a low or moderate fft_length.

For quality reasons (my main concern at the moment) the low end is the critical range, as a spreading_length of 4 or even 3 may not be appropriate here in cases. So taking this into account may be essential.

Allowing a very large spreading_length for the high frequency range is another story and might allow for a lower bitrate on average while keeping up excellent quality. At the moment however I see this rather as an option for the future.
Title: lossyWAV Development
Post by: GeSomeone on 2007-10-11 17:45:47
BTW. just a little anecdote. By accident I lossyFlac-ed a track that had only silence. Foobar2000 replaygained +64 dB and the noise was very clear to hear. ...
Which version of lossyWAV was that? Recent versions default to no dither, so this problem should not happen unless you dither.
Nick.

It was v0.3.5 with -skew 7 and nothing else.
I don't see it as a real problem though, it is more like a side effect in combination with replaygain. But it seems to proof that even from silence bit's can be removed 

Update: I am now convinced the dithering from the foobar2000 converted was to blame. Even though it was set to "only dither lossy sources" it seemed to have kicked in somewhere. (I marked lossFlac as lossy destination). Retesting with setting to "Never Dither" was OK. No extra noise.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-10-11 18:47:00
BTW. just a little anecdote. By accident I lossyFlac-ed a track that had only silence. Foobar2000 replaygained +64 dB and the noise was very clear to hear. ...
Which version of lossyWAV was that? Recent versions default to no dither, so this problem should not happen unless you dither.
Nick.

It was v0.3.5 with -skew 7 and nothing else.
I don't see it as a real problem though, it is more like a side effect in combination with replaygain. But it seems to proof that even from silence bit's can be removed 

It's not the normal dither - silence and near-silence should be (and with the MATLAB script, are) transparent irrespective of system gain or dither chosen, because lossyFLAC won't touch silence - it won't even re-dither it.

Nick, did you have "always dither" set to on in that version?

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-11 19:07:46

BTW. just a little anecdote. By accident I lossyFlac-ed a track that had only silence. Foobar2000 replaygained +64 dB and the noise was very clear to hear. ...
Which version of lossyWAV was that? Recent versions default to no dither, so this problem should not happen unless you dither.
Nick.

It was v0.3.5 with -skew 7 and nothing else.
I don't see it as a real problem though, it is more like a side effect in combination with replaygain. But it seems to proof that even from silence bit's can be removed 

It's not the normal dither - silence and near-silence should be (and with the MATLAB script, are) transparent irrespective of system gain or dither chosen, because lossyFLAC won't touch silence - it won't even re-dither it.

Nick, did you have "always dither" set to on in that version?

Cheers,
David.

There shouldn't have been - it was removed at about v0.3.2.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-12 09:07:22
lossyWAV alpha v0.3.7 attached. Removed due to suspect spreading function and superseded by alpha v0.3.8 below.

"-spread" parameter now enables Bark spreading function rather than previous experimental 3 bin average to 3 bin max spreading function.

As stated in the original thread, for my 52 sample set:

WAV : 121.5MB;
FLAC : 68.2MB;
lossyWAV -2 : 39.5MB;
lossyWAV -2 -spread : 35.3MB;

The reassuring thing about the new spreading function is that those files that you would expect (from simple 3 or 4 bin averaging) very few bits to be removed still have very few bits removed.
Title: lossyWAV Development
Post by: halb27 on 2007-10-12 09:32:42
I don't beleive it: Nick, did you work throughout the night? How do you manage to be so fast?

A big, big thank you to you!

And the result looks very, very promising.

Sure I'll try my usual test samples with this new version using -spread.
Title: lossyWAV Development
Post by: TBeck on 2007-10-12 13:05:29
a) The codec blocksize of -1/-2/-3 is now 2304/1152/576?
[/size]a) Yes;

Could you please add a switch to set the block sizes to 2048/1024/512? I would like to evaluate the optimum encoder settings for TAK. Unfortunately TAK currently only supports block sizes which are powers of 2...

I am very impressed by your (and 2BDecided's) work! For me LossyFlac is an exciting new option. Thanks also to the hard working testers.

  Thomas
Title: lossyWAV Development
Post by: Nick.C on 2007-10-12 13:17:24
Could you please add a switch to set the block sizes to 2048/1024/512? I would like to evaluate the optimum encoder settings for TAK. Unfortunately TAK currently only supports block sizes which are powers of 2...


Thomas, I will enable the "-flac" and "-tak" parameters tonight which will set the codec_block_size for FLAC to 2304/1152/576 and for TAK to 2048/1024/512.

I would also welcome any feedback whatsoever regarding my Bark spreading function - I can't hear anything wrong with the output, but I want independent critical input to determine whether it's worth keeping, needs work, or just needs to be trashed.

Nick.
Title: lossyWAV Development
Post by: GeSomeone on 2007-10-12 18:26:32
I will enable the "-flac" and "-tak" parameters tonight which will set the codec_block_size for FLAC to 2304/1152/576 and for TAK to 2048/1024/512.

But the blocksizes of -tak could also be used with FLAC. 
Maybe it's somewhere in this thread, but where did the 576 size come from again?
Title: lossyWAV Development
Post by: halb27 on 2007-10-12 18:50:03
Sorry, but -spread as of this version isn't so good.

I got used to only produce the .lossy.wavs via the command interpreter and watched the messages lossyWav produced, and I was very astonished about the rather high bits removed average of Atem-lied and keys_1644ds. So I was very curious about the audio quality.

Atem-lied is relatively good with so many bits removed (acceptable for -3 IMO), but I could abx it 9/10.
keys_1644ds however is bad (no abxing required).

So I guess the current implementation is a bit aggressive.

Nick: For experimentation maybe you can provide a parameter for the -spread option.
Something like:
One of the parameter values represents a spreading_length of 1 for low frequencies and a short or moderate fft_length, as well as a strong overall restriction like 4 to any spreading_length.
An other parameter value represents for a spreading_length of 1 for low frequencies and a short fft_length, a spreading_length of 2 for low frequencies and a moderate fft_length, as well as a rather strong overall restriction like 6 to any spreading_length, but switches to a speading_length of 6 only when fft_length is long.
These parameter values have quality in mind. More parameter values are welcome of course switching gradually from the pure quality target towards the efficiency target.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-12 19:44:32
Sorry, but -spread as of this version isn't so good.

I got used to only produce the .lossy.wavs via the command interpreter and watched the messages lossyWav produced, and I was very astonished about the rather high bits removed average of Atem-lied and keys_1644ds. So I was very curious about the audio quality.

Atem-lied is relatively good with so many bits removed (acceptable for -3 IMO), but I could abx it 9/10.
keys_1644ds however is bad (no abxing required).

So I guess the current implementation is a bit aggressive.

Nick: For experimentation maybe you can provide a parameter for the -spread option.
Something like:
One of the parameter values represents a spreading_length of 1 for low frequencies and a short or moderate fft_length, as well as a strong overall restriction like 4 to any spreading_length.
An other parameter value represents for a spreading_length of 1 for low frequencies and a short fft_length, a spreading_length of 2 for low frequencies and a moderate fft_length, as well as a rather strong overall restriction like 6 to any spreading_length, but switches to a speading_length of 6 only when fft_length is long.
These parameter values have quality in mind. More parameter values are welcome of course switching gradually from the pure quality target towards the efficiency target.

Before abandoning the Bark averaging method, I think that it should be expanded. At the moment each of the first 25 Bark ranges (0 to 24) are averaged then the minimum average value taken as the value for which to calculate bits to remove. I feel that this is too coarse and the granularity should be reduced by using half or even quarter Bark averaging. I will have a think about this and post v0.3.8 soon.

The -spread in v0.3.6 used 3 bin averaging at short FFT lengths (<=64 samples)and gradually changed to 3 bin maximum at long FFT lengths (>=1024 samples). This seems to be closer to what you mention above (although not exactly).

Thanks for the listening time!

I will enable the "-flac" and "-tak" parameters tonight which will set the codec_block_size for FLAC to 2304/1152/576 and for TAK to 2048/1024/512.
But the blocksizes of -tak could also be used with FLAC. 
Maybe it's somewhere in this thread, but where did the 576 size come from again?

Maybe what is required is a CD sector related codec_block_size (2304/1152/576 samples) or a power of two equivalent (2048/1024/512 samples). This could be easily implemented by a "-cd" or "-CD" switch to change from power of two blocks to CD sector multiple blocks. I will incorporate this for v0.3.8.

Thanks for the input!
Title: lossyWAV Development
Post by: halb27 on 2007-10-12 20:08:11
I see: you immediately did the whole thing and averaged over an entire critical band.

Not exactly what I have in mind.
I wouldn't bring the critical band as such so much into focus. Guess that's what 2Bdecided is afraid of.
I'd rather have the original averaging in primary focus, but with (cautious) corrections according to the widths of the critical bands.
Qualitywise I think it is essential to concentrate on the lower spectrum and use the critical band idea to hold the spreading_length very small when only one or few bins fall into a critical band.
With this it's not even necessary to look at every single critical band, but just do the averaging differently within larger frequency ranges (for instance for low to moderate fft_length use a spreading_length of 1 below ~ 800 Hz, 2 in the ~ 800-2000 Hz range, 3 in the ~ 2-8 kHz range, and 4 for the ~ 8+ kHz range, and increase these spreading_lengths very softly with increasing fft_length).

This is all with quality in mind.
Once high quality is settled (we still have an open problem with guruboolez' sample) we might become less cautious and try a bit more adventurous tactics.
Title: lossyWAV Development
Post by: halb27 on 2007-10-12 20:21:42
...Maybe it's somewhere in this thread, but where did the 576 size come from again?

Maybe what is required is a CD sector related codec_block_size (2304/1152/576 samples) or a power of two equivalent (2048/1024/512 samples). This could be easily implemented by a "-cd" or "-CD" switch to change from power of two blocks to CD sector multiple blocks. I will incorporate this for v0.3.8.

This would break the idea of -flac, -tak, etc. as targeting specific lossless encoders. Why do you want to do that?
-tak does everything that is needed.
I think GeSomeone's question targets at why at the moment the blocksizes are 2304/1152/576 and maybe why they should be like that for -flac.
According to the FLAC documentation it looks like the FLAC blocksize should be a multiple of 576, but this is not so as I did use FLAC with a blocksize of 1024. Because of this was my suggestion to use a default blocksize of 1024 with -1, -2, and -3 when not using -flac, -tak, etc., especially as my experiments didn't show up a significant saving in bitrate when using 576 instead of 1024.

Anyway I welcome the activation of -tak, -flac, etc.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-12 20:30:05
From the FLAC format page:

Code: [Select]
Block size in inter-channel samples:

    * 0000 : reserved
    * 0001 : 192 samples
    * 0010-0101 : 576 * (2^(n-2)) samples, i.e. 576/1152/2304/4608
    * 0110 : get 8 bit (blocksize-1) from end of header
    * 0111 : get 16 bit (blocksize-1) from end of header
    * 1000-1111 : 256 * (2^(n-8)) samples, i.e. 256/512/1024/2048/4096/8192/16384/32768

I like 576 because it increases the bits_to_remove by processing over a shorter time frame. If the concensus is that standard codec_block_size should be 1024 samples, then so be it.

The reason that -flac and -tak have not yet been activated it that, basically, there are no codec specific settings yet. The only reason to implement them now would be because of the codec_block_size issue.
Title: lossyWAV Development
Post by: halb27 on 2007-10-12 20:53:50
The reason that -flac and -tak have not yet been activated it that, basically, there are no codec specific settings yet. The only reason to implement them now would be because of the codec_block_size issue.

Yes, but it brings already certainty to any user whatever lossless codec he uses. By using -tak a TAK user knows lossyWav will work fine with TAK. No need IMO to think of -tak etc. as of a super-optimized version for the specific codec. Things start with codec blocksize.

As for the blocksize without a target codec option I still think it's good to default it to 1024 universally. Clear thing, easy to memorize, and should also do it efficiently in any situation known so far. Optimizing blocksize is then the clear task of -flac, etc. However it's not really of primary concern. To me it's fine also with the way it is.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-12 21:13:30
Default to 1024 for all quality settings will be implemented in v0.3.8
Title: lossyWAV Development
Post by: halb27 on 2007-10-12 21:29:30
Thank you.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-12 22:21:49
I see: you immediately did the whole thing and averaged over an entire critical band.

Not exactly what I have in mind.
I wouldn't bring the critical band as such so much into focus. Guess that's what 2Bdecided is afraid of.
I'd rather have the original averaging in primary focus, but with (cautious) corrections according to the widths of the critical bands.
Qualitywise I think it is essential to concentrate on the lower spectrum and use the critical band idea to hold the spreading_length very small when only one or few bins fall into a critical band.
With this it's not even necessary to look at every single critical band, but just do the averaging differently within larger frequency ranges (for instance for low to moderate fft_length use a spreading_length of 1 below ~ 800 Hz, 2 in the ~ 800-2000 Hz range, 3 in the ~ 2-8 kHz range, and 4 for the ~ 8+ kHz range, and increase these spreading_lengths very softly with increasing fft_length).

This is all with quality in mind.
Once high quality is settled (we still have an open problem with guruboolez' sample) we might become less cautious and try a bit more adventurous tactics.
Oops - missed this post entirely. I'm getting disillusioned with my approach to Bark averaging - will park it and start on something akin to what you've just mentioned, i.e. spreading_function_lengths increase with both frequency and fft_length. Looking at the geometric fft_length increase, should the spreading_function_length also increase in that manner, i.e. sfl[n+1]:=sfl[n]*2; or should it increase more slowly?
Title: lossyWAV Development
Post by: halb27 on 2007-10-13 00:44:41
More slowly. At the moment I think it would be good to keep spreading_length pretty much in the region we're used to even for long fft lengths. Spreading length must not increase with each increase of fft length.
As with frequency dependency for the spreading length I am thinking of only a very rough dependency on fft_length.
Something like: use something like the frequency dependency I mentioned (spreading length 1 to 4 according to a rough frequency classification - let's call this the basic frequency dependency rule) for a fft length <= 256, add 1 to the spreading length of the basic frequency dependency rule for a fft length > 256 but <= 1024, and add 2 to the spreading length of the basic frequency rule for a fft length > 1024. Maybe add 3 to the spreading length of the basic frequency dependency rule for extremely long ffts.

You see: even with highest frequency and longest fft length a spreading length of 6 or 7 as a maximum.

I guess this is a bit too conservative, but as long as we don't know it's better to play it safe. Variations can be done later (or by means of a -spread parameter value).
Title: lossyWAV Development
Post by: Nick.C on 2007-10-14 22:45:47
lossyWAV alpha v0.3.8 attached. Superseded.

Having made an abortive attempt at Bark related bit reduction determination, I have been changing the spreading method a bit, firstly having reverted to the original FFT bin averaging (3 or 4 bins dependent on quality level). As can be seen below, I have introduced two elements to the method: firstly, average 3 bins below 3.7kHz and 4 bins above; secondly, use the "square mean root" value as a slightly more conservative result (compared to simple averaging).

Reducing to very few bins (i.e. 1 or 2) drastically reduces the bits_to_remove and has not been implemented.

Code: [Select]
lossyWAV alpha v0.3.8 : WAV file bit depth reduction method by 2Bdecided.
Transcoded to Delphi by Nick.C & Halb27 from a script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Options:

-1, -2 or -3  quality level (1:overkill, 2:default, 3:compact)
-nts <n>      noise_threshold_shift=n (-15.0<=n<=0.0, default -1.5dB)
              (reduces overall bits to remove, -1 bit = -6.0206dB)
-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists.

Advanced Options:

-spread <n>   select spreading method : 0<=n<=3; default=0
              0 = fft bin averaging : 3 or 4 bins, (original method);
              1 = fft bin averaging : 3 bins below 3.7kHz, 4 bins above;
              2 = fft bin square mean root : 4 bins;
              3 = fft bin square mean root : 3 bins below 3.7kHz, 4 bins above
-skew <n>     skew results of fft analyses by n dB (0.0<=n<=12.0, default=0.0)
              with a (sin-1) shaping over the frequency range 20Hz to 3.7kHz.
              (artificially decrease low frequency bins to take into account
              higher SNR requirements at low frequencies)

-dither       dither output using triangular dither; default=off
-noclip       clipping prevention amplitude reduction; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode
-below        set process priority to below normal.
-low          set process priority to low.
Title: lossyWAV Development
Post by: bryant on 2007-10-15 05:57:14
I'm not sure how useful this is, or whether it makes any sense to integrate into lossyWAV, but I have created a “smart” normalization program that I think might fix one of the troubling issues of lossyWAV (at least for me). It might even work well in other situations where normalization is desired, although I don't know enough about those to say.

Most normalization programs work by applying a scaling factor on every audio sample such that a maximum value sample (i.e., -32768/+32767) is reduced to some desired lower value. After applying the scale factor, they may or may not apply dither and noise-shaping (they probably should, but most I've seen don't). This works great at normal audio levels, but can cause trouble at very low levels. The problem is that by using various forms of noise shaping, well produced CDs contain information below the LSB. To preserve this information (and the characteristics of the original noise floor spectrum) it is important to preserve the exact sample values at low levels.

This suggests an alternative algorithm that maps low-level samples to the output exactly, but then goes non-linear at higher values to ensure that the desired peak limit is not exceeded (this is sometimes called soft clipping). This fixes the low-level sample problem, however soft-clipping introduces unacceptably high levels of harmonic distortion in full-scale signals.

The algorithm I chose for this program combines the two methods by calculating a running RMS level (with attack and decay) and using that to determine the ideal transfer function. At low levels it maps samples without modification to the output (with rogue high samples being softly clipped). At high levels it uses the simple scaling factor (where there's enough signal that dither and noise-shaping are not needed). In between the high and low level areas is a 12 dB transition zone where the program linearly interpolates between the two methods based on the position in the zone. In this transition zone a small amount of odd harmonic distortion is added to the signal, but it's very low in level.

I am attaching a zip file with the program source and a Windows executable (the program compiles fine on Ubuntu Linux and probably most others). This has not been tested too much (especially in error conditions) so be careful!

David
Title: lossyWAV Development
Post by: Nick.C on 2007-10-15 08:01:23
Thanks for the code - I will certainly have a look at it to see how you did it!

On amplitude reduction, lossyWAV no longer reduces amplitude by default - the user has to specify the "-noclip" parameter.

Many thanks,

Nick.
Title: lossyWAV Development
Post by: halb27 on 2007-10-15 08:59:29
Reducing to very few bins (i.e. 1 or 2) drastically reduces the bits_to_remove and has not been implemented.

Thank you for the new version. Will try it out as soon as possible.
If averaging over 1 or 2 bins yields unappropriate bitrates your current approach is most appropriate I think.
Just for clarity:
a) is bits_to_remove too low also when applied to very short fft lengths when averaging over 1 or 2 bins in the frequency range below ~ 700 Hz?
    In the end you must have done something like that when averaging over entire critical bands - bits_to_remove was not too low then.
b) is it also not worth while averaging over say 2 bins in the low frequency range with very short fft lengths when considering it being applied to quality mode -1?

Another question as -tak etc. is not enabled yet:
Is codec blocksize now a constant 1024 with any quality mode?

BTW as you are doing the hard work: Please remove me from the author list of lossyWav.exe. It's not appropriate. I'm glad I could contribute a bit with the wavIO unit, but in the end it's absolutely minor contribution. Of course I will continue to maintain wavIO, so feel free to tell me about any changes you like to have realised.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-15 12:09:22
"-tak" is not yet enabled, default codec_block_size is 1024 samples for all quality levels as previously discussed.

I am looking at other permutations of spreading, including one which has 3 intermediate frequency splits and averages as follows:

20Hz to 800Hz : 2 bins;
800Hz to 3.7kHz : 3 bins;
3.7kHz to 8kHz : 4 bins;
8kHz > 16kHz : 5 bins;

I'll let you know how this one works out.

Nick.
Title: lossyWAV Development
Post by: halb27 on 2007-10-15 12:35:10
Sounds good.

Please don't see it as a bad thing in case bits_to_remove should go down a bit.
After all we are still left with guruboolez' sample he could abx.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-15 14:27:29
lossyWAV alpha v0.3.9 attached. Superseded.

Default spreading method made slightly more conservative;
Code rationalised for spreading methods 1 to 3;
Spreading method 4 introduced, 2 fft bin averaging 20Hz to 800Hz; 3 fft bin averaging 800Hz to 3.7kHz; 4 bin averaging 3.7kHz to 16kHz. (5 fft bin averaging 8kHz to 16kHz was not successful - too many bits removed).

Code: [Select]
lossyWAV alpha v0.3.9 : .....WAV file bit depth reduction method by 2Bdecided.
Transcoded to Delphi by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Options:

-1, -2 or -3  quality level (1:overkill, 2:default, 3:compact)
-nts <n>      noise_threshold_shift=n (-15.0<=n<=0.0, default -1.5dB)
              (reduces overall bits to remove, -1 bit = -6.0206dB)
-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists.

Advanced Options:

-spread <n>   select spreading method : 0<=n<=4; default=0
              0 = fft bin averaging : 3 or 4 bins, (less agressive than orig.);
              1 = fft bin averaging : 3 bins below 3.7kHz, 4 bins above;
              2 = fft bin square mean root : 4 bins;
              3 = fft bin square mean root : 3 bins below 3.7kHz, 4 bins above
              4 = fft bin averaging : 2 bins from 20Hz to 800Hz; 3 bins from
                  800Hz to 3.7kHz; 4 bins from 3.7kHz to 16kHz.
-skew <n>     skew results of fft analyses by n dB (0.0<=n<=12.0, default=0.0)
              with a (sin-1) shaping over the frequency range 20Hz to 3.7kHz.
              (artificially decrease low frequency bins to take into account
              higher SNR requirements at low frequencies)

-dither       dither output using triangular dither; default=off
-noclip       clipping prevention amplitude reduction; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode
-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
Title: lossyWAV Development
Post by: halb27 on 2007-10-15 14:31:15
Wonderful. Thanks a lot.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-15 21:45:44
I have been processing permutations with v0.3.10 (only faster than v0.3.9) and -spread 4 seems to be a candidate for default spreading function. However, I feel that the 800Hz / 3.7kHz / 8kHz intermediate steps might need moved to more suitable points in the frequency range between 20Hz and 16kHz.

Another thing I need advice with is licensing - portions of the code are (heavily modified) LGPL, so LGPL seems to be the way to go, however, I don't know exactly what I need to add to the .exe or license.txt file no enact it. As well as that, the method is David Robinson's implementation of an idea - all I have done is transcode and tweak a bit.
Title: lossyWAV Development
Post by: halb27 on 2007-10-16 07:23:30
Looks like we have to be pretty resistant towards frustration (guess that's true for everybody working on lossy codecs):

Sorry, but using -spread 4 I just abxed keys_1644ds.wav 8/10 (in a very quick test cause I have to go to work now. Guess it's easy to abx it 10/10). The encoded result isn't bad but it sounds different - like volume is a bit lower at the beginning (though replaygain values are identical) especially in the lower frequency range. Based on a very quick test I'd say this is not the case without the spreading option and not the case with -spread 3 though with these settings bits_to_remove is higher.

Perhaps a small implementation error that has found it's way into the code cause -spread 4 shoud be more conservative than -spread 3 or no -spread at all - at least in case it's correct what lossywav say about -spread 4: 4 bins average from 3.7kHz to 16kHz.
This is a bit in contradiction however towards you talking about 'the 800Hz / 3.7kHz / 8kHz intermediate steps'. So are you doing a 5 bin averaging above 8 kHz as you wrote before? (I guess that is no problem but at the moment it is about finding out what might cause the problem with keys).
But maybe the problem is with the start as that's the place where it's rather easily audible.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-16 08:06:17
Sorry, but using -spread 4 I just abxed keys_1644ds.wav 8/10 (in a very quick test cause I have to go to work now. Guess it's easy to abx it 10/10). The encoded result isn't bad but it sounds different - like volume is a bit lower at the beginning (though replaygain values are identical) especially in the lower frequency range. Based on a very quick test I'd say this is not the case without the spreading option and not the case with -spread 3 though with these settings bits_to_remove is higher.
-spread 4 uses simple averaging, -spread 2 & 3 use square-mean-root (SMR). SMR is inherently more conservative, e.g. average[1,4,9]=4.67; SMR[1,4,9]=4.00.

I don't understand about -spread 4 being worse than default though, it should *only* remove more bits.
Perhaps a small implementation error that has found it's way into the code cause -spread 4 shoud be more conservative than -spread 3 or no -spread at all - at least in case it's correct what lossywav say about -spread 4: 4 bins average from 3.7kHz to 16kHz.
Always a possibility of a bug in the code  .
This is a bit in contradiction however towards you talking about 'the 800Hz / 3.7kHz / 8kHz intermediate steps'. So are you doing a 5 bin averaging above 8 kHz as you wrote before? (I guess that is no problem but at the moment it is about finding out what might cause the problem with keys).
But maybe the problem is with the start as that's the place where it's rather easily audible.
Currently only 4 bins are averaged above 3.7kHz, however I am not ruling out re-introducing the 8kHz (move it out to 12kHz?) intermediate. I will dig deeper into keys and re-examine the code to try to spot bugs (they're there, I just haven't found them).

Code: [Select]
lossyWAV alpha v0.3.9 keys_1644ds.wav 

-spread 0
Block    Time   00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Tot.
====================================================================
    0    0.00s.  2  3  4  4  2  2  3  2  2  3  3  3  3  2  2  2  42
   16    0.37s.  2  2  3  5  3  3  2  2  2  3  3  3  4  3  3  2  45
   32    0.74s.  2  3  4  4  3  2  2  3  2  2  -  -  -  -  -  -  27
====================================================================
Average    : 2.7143 bits; [114/42; 5.16x; CBS=1024]

-spread 1
Block    Time   00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Tot.
====================================================================
    0    0.00s.  2  3  4  3  2  2  2  2  2  3  3  3  2  2  1  1  37
   16    0.37s.  2  2  3  4  3  2  2  2  2  3  3  3  3  3  2  1  40
   32    0.74s.  2  3  4  4  3  1  2  2  2  2  -  -  -  -  -  -  25
====================================================================
Average    : 2.4286 bits; [102/42; 5.13x; CBS=1024]

-spread 2
Block    Time   00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Tot.
====================================================================
    0    0.00s.  2  3  4  3  2  2  3  1  1  3  3  3  3  2  2  2  39
   16    0.37s.  2  2  3  4  3  3  2  2  2  3  3  3  3  3  3  2  43
   32    0.74s.  2  3  4  4  3  2  2  3  2  2  -  -  -  -  -  -  27
====================================================================
Average    : 2.5952 bits; [109/42; 5.11x; CBS=1024]

-spread 3
Block    Time   00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Tot.
====================================================================
    0    0.00s.  2  3  4  3  2  2  2  1  1  3  3  3  2  2  1  1  35
   16    0.37s.  2  2  3  4  3  2  2  2  1  3  3  3  3  3  2  1  39
   32    0.74s.  2  3  4  4  3  1  1  2  2  2  -  -  -  -  -  -  24
====================================================================
Average    : 2.3333 bits; [98/42; 4.93x; CBS=1024]

-spread 4
Block    Time   00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Tot.
====================================================================
    0    0.00s.  1  3  4  3  1  1  1  0  0  2  2  2  1  1  1  1  24
   16    0.37s.  2  2  2  4  3  2  1  2  1  3  1  2  3  3  2  1  34
   32    0.74s.  1  3  3  4  3  0  1  1  2  2  -  -  -  -  -  -  20
====================================================================
Average    : 1.8571 bits; [78/42; 5.36x; CBS=1024]


Well, the results are in and -spread 4 does not remove more bits for any codec_block than -spread 3 or 0
Title: lossyWAV Development
Post by: halb27 on 2007-10-16 08:43:38
I see. Strange. I will test the no -spread version more carefully this evening. Maybe my impression was wrong this morning - I was pretty much in a hurry.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-10-16 10:57:41
Another thing I need advice with is licensing - portions of the code are (heavily modified) LGPL, so LGPL seems to be the way to go, however, I don't know exactly what I need to add to the .exe or license.txt file no enact it. As well as that, the method is David Robinson's implementation of an idea - all I have done is transcode and tweak a bit.
LGPL is fine by me. I don't know how you enact it or how legally binding it is - I'm sure Google can answer the first issue!

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2007-10-16 19:07:51
I tested key_1644ds.wav again using -spread 4: I tried several times, also with different  head/earphones, but couldn't abx it.

Sorry for the confusion. I have no idea what went on this morning. Apart from the not too bad abx result of 8/10 there was a pretty clear distinguishable difference in loundness this morning to me. Must have been imagination.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-16 20:25:05
I tested key_1644ds.wav again using -spread 4: I tried several times, also with different  head/earphones, but couldn't abx it.

Sorry for the confusion. I have no idea what went on this morning. Apart from the not too bad abx result of 8/10 there was a pretty clear distinguishable difference in loundness this morning to me. Must have been imagination.
That's reassuring...... I have tweaked the -spread 4 method anyway. It's now

20Hz > 2 Bin > 800Hz > 3 Bin > 6kHz > 4 Bin > 13.5kHz > 5 Bin > 16kHz.

But, I'll add it as -spread 5.

Methods 2 and 3 are quite nice in terms of conservatism, but they are slower than 0, 1 & 4. Unless anyone says otherwise, they will be removed.

Thanks David - I'll start to compose the necessary text and run it past you before inclusion in the .exe and .zip files.
Title: lossyWAV Development
Post by: halb27 on 2007-10-17 09:14:45
Sounds good.
Just a remark as I see one of the corner frequencies is 6 kHz:
I remember the fact that we are most sensitive to noise in the 6 kHz region with sensitivity quickly dropping beyond. So maybe it's a good idea to use something like 7 kHz instead of 6.
Just an idea.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-17 10:08:19
Just a remark as I see one of the corner frequencies is 6 kHz:
I remember the fact that we are most sensitive to noise in the 6 kHz region with sensitivity quickly dropping beyond. So maybe it's a good idea to use something like 7 kHz instead of 6.
Will do.
Title: lossyWAV Development
Post by: halb27 on 2007-10-17 10:33:39
Maybe you want to look it up in the Wikipedia as 'Audio noise measurement'.
I just did for brushing up my memory. Judging from this article it may be advantegous evne to go a bit beyond 7 Khz with the corner frequency as at the sensitivity top around 6 kHz sensitivity is still pretty flat.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-17 13:05:00
I've tweaked the intermediate frequencies again:

20Hz > 2 Bin > 1.38kHz > 3 Bin > 3.45kHz > 3 Bin > 8.27kHz > 4 Bin > 13.8kHz > 5 Bin > 16.5kHz.

Other than 20Hz these frequencies relate to integer bins for a 64 sample FFT at 44.1kHz.
Superseded.

-spread 0 and 1 are the only options, 1 being equivalent to tweaked 4 from alpha v0.3.9
[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV alpha v0.3.10 : WAV file bit depth reduction method by 2Bdecided.
Transcoded to Delphi by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Options:

-1, -2 or -3  quality level (1:overkill, 2:default, 3:compact)
-nts <n>      noise_threshold_shift=n (-15.0<=n<=0.0, default -1.5dB)
              (reduces overall bits to remove, -1 bit = -6.0206dB)
-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists.

Advanced Options:

-spread <n>   select spreading method : 0 or 1; default=0
              0 = fft bin averaging : 3 or 4 bins, (less agressive than orig.);
              1 = fft bin averaging : 2 bins from 20Hz to 1.38kHz; 3 bins from
                  1.38kHz to 8.27kHz; 4 bins from 8.27kHz to 13.8kHz; 5 Bins
                  from 13.8kHz to 16.5kHz.
-skew <n>     skew results of fft analyses by n dB (0.0<=n<=12.0, default=0.0)
              with a (sin-1) shaping over the frequency range 20Hz to 3.7kHz.
              (artificially decrease low frequency bins to take into account
              higher SNR requirements at low frequencies)

-dither       dither output using triangular dither; default=off
-noclip       clipping prevention amplitude reduction; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode
-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]
Title: lossyWAV Development
Post by: halb27 on 2007-10-17 13:36:15
Wonderful. Though all these new things are pure heuristics I think they are good at keeping things defensive, and in the end we want the perceptive quality of lossless at a probality of close to 1 while keeping bitrate significantly below that of lossless.

I will try out -spread 1 this evening.

As David Bryant addressed the potential clipping issue, just another thought about a simple clipping prevention scheme:
When clipping occurs due to bit depth reduction: reduce the number of bits to remove until no clipping occurs.
No quality issue, no pre-scanning of the entire track, no serious negative impact on bitrate (in case it's correct what I guess: clipping usually occurs on rare occasion).
Title: lossyWAV Development
Post by: Nick.C on 2007-10-17 14:41:48
As David Bryant addressed the potential clipping issue, just another thought about a simple clipping prevention scheme:
When clipping occurs due to bit depth reduction: reduce the number of bits to remove until no clipping occurs.
No quality issue, no pre-scanning of the entire track, no serious negative impact on bitrate (in case it's correct what I guess: clipping usually occurs on rare occasion).


I like the idea, although, when a value rounds to 32768 I currently reduce it to (int(32767/(2^bits_to_remove))*(2^bits_to_remove). I'll have a look and revert.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-10-17 14:52:03
When clipping occurs due to bit depth reduction: reduce the number of bits to remove until no clipping occurs.
It isn't always useful. If you have a track which is constantly near clipping (most modern pop music) lossyFLAC should make huge gains (throw away many bits), but if it has to back off the bit reduction to prevent clipping, it'll throw away a few or none.

If you have a track which is well away from clipping, it doesn't matter anyway. So this approach only works well when you have a track which is quite close to clipping.

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2007-10-17 19:24:19
Sure with modern compressed music near or at the clipping level it may be that this approach isn't good cause bits_to_remove may get very low often. But did you try? IMO it's not necessary so resp. not necessarily typical behavior with such music.

I'd welcome to give it a try. Bitrate will go up on average with modern music I'm sure, but who knows may be the extent to which it does is negligible.

Thinking of other kind of music not highly compressed clipping may occur too in cases via the bit depth reduction. In these cases this simple approach would be valuable I think.

Anyway as we certainly don't want to have the user differentiate too much: everything depends on the average behavior with music published the loudness war way.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-17 19:49:26
On the licensing front I've been in touch with the copyright holder of TPMAT036 and he is happy that lossyWAV is released under the LGPL.

@Halb27: Are you happy to release your wavIO unit under the LGPL?

On the clipping front: If a sample clips, the old method just set it to 32767 / -32768 accordingly. This was a minimalist response to those few samples which exceed the upper or lower bound on rounding / dithering.

With no dither, a simple amplitude reduction to 31.5/32 of the original value should stop clipping on rounding (32768 > (32767 shr bits_to_remove) shl bits_to_remove.

I will implement a -noclip 2 (existing -noclip goes to -noclip 1) parameter which will try the iterative approach mentioned previously.

Nick.

[edit] Question: should I reduce bits_to_remove if *any* sample clips, or only if adjacent samples clip? Also, how many samples need to clip before bits_to_remove is reduced? [/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-10-17 23:11:10
@Halb27: Are you happy to release your wavIO unit under the LGPL?

Yes, I am. But please feel free to think and behave as if you are the one who developped lossyWav.exe in its current form. In fact you did 99.9% of the job so we shouldn't make things so complicated. No need also to mention me at all in the lossyWav message text. If you want to do any changes to wavIO, feel free to do it (but please send me the modified version), and if you want me to do some modifications or extensions, let me know and I'lll do it.

As for the threshold for the number of samples clipping before the bit depth reduction is modified to prevent clipping I have no idea. I did think to do it immediately in case it happens but I am also afraid that bitrate will go up too much. But I also have no idea whether it really does in an unacceptable way.
Maybe a special -noclipexp option with a parameter 0...1024 for the number of clipping samples which are ignored before the clipping prevention method is started can help us find out.
Luckily no listening test is needed to answer the question how low we can go with the parameter value and still find the 'bitrate bloat' acceptable.

Adjacent clipping samples are more of concern than scattered clipping samples but for a first result I think we shouldn't make it so complicated.

Sorry I wasn't able to test the current lossyWav version this evening. I had a lot of trouble with my system (shouldn't have installed new software today).
Title: lossyWAV Development
Post by: Nick.C on 2007-10-17 23:19:15
@Halb27: Are you happy to release your wavIO unit under the LGPL?

Yes, I am. But please feel free to think and behave as if you are the one who developped lossyWav.exe in its current form. In fact you did 99.9% of the job so we shouldn't make things so complicated. No need also to mention me at all in the lossyWav message text. If you want to do any changes to wavIO, feel free to do it (but please send me the modified version), and if you want me to do some modifications or extensions, let me know and I'lll do it.
Thanks very much - however, you were there, so you will be mentioned.

As for the threshold for the number of samples clipping before the bit depth reduction is modified to prevent clipping I have no idea. I did think to do it immediately in case it happens but I am also afraid that bitrate will go up too much. But I also have no idea whether it really does in an unacceptable way.
I have attached alpha v0.3.11 which has "if any samples clip, reduce bits_to_remove and repeat" implemented. Not too bad an effect on the sample set.
Maybe a special -noclipexp option with a parameter 0...1024 for the number of clipping samples which are ignored before the clipping prevention method is started can help us find out.
Maybe - but that would be samples per codec_block.....
Luckily no listening test is needed to answer the question how low we can go with the parameter value and still find the 'bitrate bloat' acceptable.

Adjacent clipping samples are more of concern than scattered clipping samples but for a first result I think we shouldn't make it so complicated.
Probably in the next day or so though.....

Sorry I wasn't able to test the current lossyWav version this evening. I had a lot of trouble with my system (shouldn't have installed new software today).
Nothing serious, I hope!

Nick.

[edit]attachment removed, superseded.[/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-10-18 09:33:39
I have attached alpha v0.3.11 which has "if any samples clip, reduce bits_to_remove and repeat" implemented. Not too bad an effect on the sample set.

Wonderful, and a very promising result.
Maybe a special -noclipexp option with a parameter 0...1024 for the number of clipping samples which are ignored before the clipping prevention method is started can help us find out.
Maybe - but that would be samples per codec_block.....
Sure. But if your results with the current version turns out to have a wide scope with clipping-critical material maybe this isn't necessary at all. Qualitywise this would be the perfect solution.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-18 12:51:21
I have attached alpha v0.3.11 which has "if any samples clip, reduce bits_to_remove and repeat" implemented. Not too bad an effect on the sample set.
Wonderful, and a very promising result.
Maybe a special -noclipexp option with a parameter 0...1024 for the number of clipping samples which are ignored before the clipping prevention method is started can help us find out.
Maybe - but that would be samples per codec_block.....
Sure. But if your results with the current version turns out to have a wide scope with clipping-critical material maybe this isn't necessary at all. Qualitywise this would be the perfect solution.
I've been developing the clipping prevention method further - it's now faster. Also, the minimum value of each fft analysis is being determined for use in determining bits_to_remove - should some thought also be given to the difference between the maximum value of each fft analysis to the minimum? This difference is to some extent a signal to noise ratio as removing bits adds noise. If the difference should be taken into consideration, what should be the reasonable minimum difference? I'll start playing with it and try 12dB initially - we'll see what happens......
Title: lossyWAV Development
Post by: halb27 on 2007-10-18 13:38:45
As far as I understand the lossyWav method I can't see a meaning in the difference. But this doesn't say much. Maybe 2Bdecided can bring some light into it.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-18 13:59:46
As far as I understand the lossyWav method I can't see a meaning in the difference. But this doesn't say much. Maybe 2Bdecided can bring some light into it.
Maximum was the wrong choice, I'm now trying comparing average spread result to minimum spread result and if the difference is less than 15dB then new_minimum = average - 15dB.

The bits_to_remove average comes down a bit so it *should *(?) improve the quality of the output.

As no one has spoken up, I will remove -spread 0 and the only spreading function will be the most recent -spread 1 (previously -spread 4).
Title: lossyWAV Development
Post by: Josef Pohm on 2007-10-18 17:08:12
In the hereafter you can see removed bits table evaluated on my SetF, for LossyWAV 0.3.8-9-10-11, all presets, frame sizes 512 - 1024 - 2048 - 4096.
As we already know, recent versions are removing less bits than previous ones.

Code: [Select]
------- -------------------- -------------------- -------------------- -------------------- -------------------- 
|      |       0.3.8        |       0.3.9        |       0.3.10       |       0.3.11       |  0.3.11 vs. 0.3.8  |
------- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
|      |   1  |   2  |   3  |   1  |   2  |   3  |   1  |   2  |   3  |   1  |   2  |   3  |   1  |   2  |   3  |
------- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------
| 512  | 4,69 | 5,58 | 5,80 | 4,67 | 5,50 | 5,73 | 4,38 | 5,18 | 5,41 | 4,04 | 4,80 | 5,03 | -,65 | -,78 | -,77 |
| 1024 | 4,46 | 5,21 | 5,37 | 4,44 | 5,14 | 5,31 | 4,16 | 4,84 | 5,00 | 3,77 | 4,42 | 4,58 | -,69 | -,79 | -,79 |
| 2048 | 4,12 | 4,95 | 5,11 | 4,10 | 4,89 | 5,06 | 3,83 | 4,59 | 4,75 | 3,39 | 4,11 | 4,27 | -,73 | -,84 | -,84 |
| 4096 | 3,81 | 4,63 | 4,79 | 3,79 | 4,57 | 4,74 | 3,53 | 4,28 | 4,45 | 3,05 | 3,75 | 3,91 | -,76 | -,88 | -,88 |
------- ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------ ------


In next table, you can see TAK 1.0.2e08 -p3m bitrates for LossyWAV files produced by 0.3.8 and 0.3.11.

Code: [Select]
------- ----------------- ----------------- ----------------- 
|      |  TAK on 0.3.8   |  TAK on 0.3.11  | 0.3.11 vs 0.3.8 |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----
|      |  1  |  2  |  3  |  1  |  2  |  3  |  1  |  2  |  3  |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----
| 512  | 499 | 430 | 413 | 552 | 490 | 472 | +53 | +60 | +59 |
| 1024 | 501 | 443 | 431 | 557 | 504 | 491 | +56 | +61 | +60 |
| 2048 | 519 | 453 | 441 | 578 | 520 | 507 | +59 | +67 | +66 |
| 4096 | 539 | 473 | 461 | 602 | 545 | 532 | +63 | +72 | +71 |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----


As you may see, on average we have 13% bigger files. I am aware that in this phase LossyWAV development is heading for some less aggressive settings and I'm confident sound quality improvements may be worth the loss in compression efficiency. Nevertheless in my opinion, as we are approaching bitrates which are not so attractive when compared to pure lossless (862kbps, for TAK 1.0.2e08 -p3m on my SetF, lossless) I would keep some more aggressive settings as a preset.
Title: lossyWAV Development
Post by: halb27 on 2007-10-18 18:36:50
As we wanted to have -2 very safe, -1 very safe with a rather large safety margin, and as guruboolez abxed a sample with the not-so-cautious approach IMO the current dvelopment is appropriate for -1 and -2.

But maybe -3 details should be more such that focus is kept on saving bits, more than on very safe quality.

BTW do you mind trying the new clipping prevention method of 3.11 on your set and compare it to the pure 3.11 result without clipping prevention?
Title: lossyWAV Development
Post by: Josef Pohm on 2007-10-18 20:00:03
As we wanted to have -2 very safe, -1 very safe with a rather large safety margin, and as guruboolez abxed a sample with the not-so-cautious approach IMO the current dvelopment is appropriate for -1 and -2.
But maybe -3 details should be more such that focus is kept on saving bits, more than on very safe quality.

I definitely agree. If it was up to me I would go for "<level -3> best we can do in the 384-448kbps range (generally transparent except for very rare problem samples)";
BTW do you mind trying the new clipping prevention method of 3.11 on your set and compare it to the pure 3.11 result without clipping prevention?

Bits removed table.
Code: [Select]
------- -------------------- -------------------- --------------------
|      |   0.3.11 -noclip   |       0.3.11       | noclip vs default  |
|       ------ ------ ------ ------ ------ ------ ------ ------ ------
|      |   1  |   2  |   3  |   1  |   2  |   3  |   1  |   2  |   3  |
------- ------ ------ ------ ------ ------ ------ ------ ------ ------
|  512 | 4,11 | 4,89 | 5,12 | 4,04 | 4,80 | 5,03 | 0,07 | 0,09 | 0,09 |
| 1024 | 3,89 | 4,56 | 4,72 | 3,77 | 4,42 | 4,58 | 0,12 | 0,14 | 0,14 |
| 2048 | 3,57 | 4,31 | 4,48 | 3,39 | 4,11 | 4,27 | 0,18 | 0,20 | 0,21 |
| 4096 | 3,29 | 4,02 | 4,19 | 3,05 | 3,75 | 3,91 | 0,24 | 0,27 | 0,28 |
------- ------ ------ ------ ------ ------ ------ ------ ------ ------

Tak 1.02e08 -p3m table.
Code: [Select]
------- ----------------- ----------------- -----------------
|      |   TAK -noclip   |   TAK default   | noclip vs def   |
|       ----- ----- ----- ----- ----- ----- ----- ----- -----
|      |  1  |  2  |  3  |  1  |  2  |  3  |  1  |  2  |  3  |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----
|  512 | 543 | 481 | 462 | 552 | 490 | 472 | - 9 | - 9 | -10 |
| 1024 | 544 | 490 | 477 | 557 | 504 | 491 | -13 | -14 | -14 |
| 2048 | 560 | 500 | 487 | 578 | 520 | 507 | -18 | -20 | -20 |
| 4096 | 579 | 519 | 506 | 602 | 545 | 532 | -23 | -26 | -26 |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----
Title: lossyWAV Development
Post by: Nick.C on 2007-10-18 20:00:11
Thanks for the testing Josef, as Halb27 said, it was agreed some time ago that the -2 quality level would be a "very good" default option, with -1 as overkill and -3 as compact. Once the settings for -2 are determined and set, then work can start on -1 and -3.

Personally, I will probably use -3 as I too want to keep the bitrate down, but I am being lead by my more golden-eared colleagues with respect to level -2.

I will release alpha v0.3.12 tonight with a new parameter "-snr" (as in signal to noise, with signal being average of the spread results and noise being the added noise due to bit reduction) which will allow input from 0dB to 48dB. I've tentatively set the default to 12dB as it doesn't affect the bitrate too much.

As an aside, never forget that you can always over-ride the noise_threshold_shift (-nts) default of -1.5 for quality level 2 and set it to a more aggressive value.

In ways I would like to see the default noise_threshold_shift set to zero, but only if the spreading, anti-clipping and signal-to-noise functions work as I hope they will to reduce artifacts to a minimum.

Thanks again for the input!

Nick.

[edit] posts at the same time...... Could you post a list of samples in your set and I will re-examine settings for -3. Initially, -3 was only going to have 2 FFT analyses (64 and 1024 samples) but from using -2 a lot I feel that maybe -3 should also include the 256 sample FFT analysis but have different noise_threshold_shift and signal_to_noise settings. In alpha v0.3.11, the "check-for-clipping-and-recursively-reduce-bits_to_remove-until-no-clipping" mechanism is on by default (an oversight on my part, I didn't put a parameter in place to switch it off). I'll change "-noclip" from overall amplitude reduction (31.49/32 for no dither) to select this bits_to_remove-reduction method. [/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-10-18 21:24:38
Thank you, Josef.
So with your test set the new clipping prevention scheme doesn't have a seriously bad influence on the bitrate.
I'll try and find some clipping-prone samples and test it too.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-18 22:00:14
I would like to remove the "-skew" parameter, as I believe that the new spreading function takes care of low frequencies by averaging fewer bins, i.e. giving more weight to the lowest FFT analysis results at low frequencies.

Please state any interest in retaining this parameter......

One thing to remember (and it's important) is that lossyWAV does not have a target bitrate in mind, only a quality level specified by the input parameters.

Also, due to the relatively small increase in total filesize on my sample set of the new -noclip (by bits_to_remove reduction) I would like feedback as to whether it should be on by default rather than have to be selected.

lossyWAV alpha v0.3.12 attached: Superseded.

In this version, the upper limits of the spreading function have been reduced to 12.4kHz and 15.8kHz respectively.

[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV alpha v0.3.12 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Options:

-1, -2 or -3  quality level (1:overkill, 2:default, 3:compact)
-nts <n>      noise_threshold_shift=n (-15.0dB <= n <= 0.0dB, default -1.5dB)
              (reduces overall bits to remove, -1 bit = -6.0206dB)
-snr <n>      set minimum average signal to added noise ratio;
              (0.0dB <= n <= 48.0dB, default = 12.0dB)
-o <folder>   destination folder for the output file
-noclip       clipping prevention amplitude reduction; default=off
-force        forcibly over-write output file if it exists.

Advanced Options:

-skew <n>     skew fft analysis results by n dB
              (0.0db <= n <= 12.0db, default = 0.0dB) with a (sin-1) shaping in
              the frequency range 20Hz to 3.45kHz. (decrease low frequency bins
              to take into account higher SNR requirements at low frequencies)
-dither       dither output using triangular dither; default=off

System / Output options:

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode
-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]

Have fun now!
Title: lossyWAV Development
Post by: halb27 on 2007-10-18 22:15:38
I finished my listening test with my usual samples using v0.3.11 -spread 1.
Everything is fine.
This time it was furious with which I had a similar effect to the one I described last time for keys: sometimes I believed that the original is a subtle bit lower in tonality than the lossyWaved version. I arrived at 6/7 and finished at 7/10 so this can't be judged as being abxed. Anyway a bit strange, especially as it's a déja vu.

It would be great if someone could try furious and keys_1644ds.

Thank you, Nick, for the new version. Will test FLAC bitrate with lossyWav -noclip option and without as soon I have found a set of clipping-prone samples.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-10-19 10:54:39
Personally, I will probably use -3 as I too want to keep the bitrate down, but I am being lead by my more golden-eared colleagues with respect to level -2.
Apart from the fantastic dedication of halb27, I think your main problem Nick is that you're not getting input from anyone - Golden eared or otherwise. (Consider the number of people who were involved trying to make lame 3.90.3 transparent!)

I wish I had time to help with the skew vs variable-spreading choice from a theoretical standpoint. Without a fleet of golden ears you're stabbing in the dark.

If variable spreading was fundamentally correct, then extending it to higher frequencies wouldn't break things. It seems that it does. Never the less, given that it works at lower frequencies, that's good enough for me! I suspect that there are two possible approaches that could both work: variable-spreading, and fixed-spreading with frequency skew. Maybe the best is a combination of both. But again, if you have something that works, go for it. Just be aware that you don't have an army of golden Ears backing you up.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-19 12:16:49
If variable spreading was fundamentally correct, then extending it to higher frequencies wouldn't break things. It seems that it does.
I take it from that you think that the 5 bin spreading between 12.4kHz and 15.8kHz has in some way reduced the quality of the output. The simple way round that would be to revert to 4 bin averaging in this frequency band as well.
Never the less, given that it works at lower frequencies, that's good enough for me! I suspect that there are two possible approaches that could both work: variable-spreading, and fixed-spreading with frequency skew. Maybe the best is a combination of both.
Taking on board what you said here, I have revised the skewing function and increased the upper skew limit to 48dB (too high, overkill, like the -snr maximum) and set a default -skew of 12dB (again like -snr, the user can actively switch it off by -skew 0). This coupled with the 2/3/4/5 bin weighting seems to give good reduction while maintaining higher bitrate on known problem samples. I am considering changing the default -nts values, but need feedback.

lossyWAV alpha v0.3.13 attached. Superseded.

clipping prevention default, use -clipping to disable;
12dB skew default, use -skew 0 to disable;
12dB average signal to added noise default, use -snr 0 to disable;

[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV alpha v0.3.13 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Options:

-1, -2 or -3  quality level (1:overkill, 2:default, 3:compact)
-nts <n>      set noise_threshold_shift to n dB (-15dB<=n<=0dB, default=-1.5dB)
              (reduces overall bits to remove by 1 bit for every 6.0206dB)
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (0dB <= n <= 48dB, default = 12dB)
-o <folder>   destination folder for the output file
-clipping     disable clipping prevention by iteration; default=off
-force        forcibly over-write output file if it exists.

Advanced Options:

-skew <n>     skew fft analysis results by n dB (0db<=n<=48db, default=12dB)
              in the frequency range 20Hz to 3.45kHz
-dither       dither output using triangular dither; default=off

System / Output options:

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode
-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]
Title: lossyWAV Development
Post by: halb27 on 2007-10-19 14:17:05
Apart from the fantastic dedication of halb27, I think your main problem Nick is that you're not getting input from anyone - Golden eared or otherwise. (Consider the number of people who were involved trying to make lame 3.90.3 transparent!)

So true, unfortunately.
Luckily we're not exactly in the same situation as the Lame devs.
Starting from your very first MATLAB version quality was already very high, and as long as variations are done in a defensive way lossyWav development isn't very dangerous qualitywise.
IMO taking it all in all the variants brought into lossyWav have a high probability of making progress towards ensuring quality even in difficult situations.
So IMO lossyWav can be used already for practical purposes.

Hopefully guruboolez will have time and test with the current version especially the problem he could abx.
Title: lossyWAV Development
Post by: Bourne on 2007-10-19 16:39:56
did you guys dropped "lossyFLAC" and adopted "lossyWAV" ?
Title: lossyWAV Development
Post by: Nick.C on 2007-10-19 16:54:26
did you guys dropped "lossyFLAC" and adopted "lossyWAV" ?
Yes, simply because the pre-processing method is not limited to the FLAC codec. I believe that it works well with TAK and WavPack as well. Also, in the "old school" programming world, naming had to be done within 8 characters......

I'll release v0.3.14 tonight - I am dabbling in FPU assembly and have made one notable speedup already (changed Magnitude function, which calculates the magnitude of a complex number, to assembler). Having trouble picking double real values out of an array of complex numbers though.......
Title: lossyWAV Development
Post by: Josef Pohm on 2007-10-19 17:05:56
Comparison of 0.3.13 and 0.3.11 on my SetF.

Bits to remove table.
Code: [Select]
------- -------------------- -------------------- --------------------
|      |       0.3.13       |       0.3.11       |      13 vs 11      |
|       ------ ------ ------ ------ ------ ------ ------ ------ ------
|      |   1  |   2  |   3  |   1  |   2  |   3  |   1  |   2  |   3  |
------- ------ ------ ------ ------ ------ ------ ------ ------ ------
|  512 | 5,13 | 5,64 | 5,85 | 4,04 | 4,80 | 5,03 | 1,09 | 0,84 | 0,82 |
| 1024 | 4,88 | 5,25 | 5,40 | 3,77 | 4,42 | 4,58 | 1,11 | 0,83 | 0,82 |
| 2048 | 4,48 | 4,93 | 5,09 | 3,39 | 4,11 | 4,27 | 1,09 | 0,82 | 0,82 |
| 4096 | 4,11 | 4,55 | 4,70 | 3,05 | 3,75 | 3,91 | 1,06 | 0,80 | 0,79 |
------- ------ ------ ------ ------ ------ ------ ------ ------ ------

TAK 1.0.2e08 -p3m bitrate table (lossless 862kbps).
Code: [Select]
------- ----------------- ----------------- -----------------
|      |  TAK on 0.3.13  |  TAK on 0.3.11  |    13 vs 11     |
|       ----- ----- ----- ----- ----- ----- ----- ----- -----
|      |  1  |  2  |  3  |  1  |  2  |  3  |  1  |  2  |  3  |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----
|  512 | 465 | 426 | 411 | 552 | 490 | 472 | -87 | -64 | -61 |
| 1024 | 470 | 441 | 430 | 557 | 504 | 491 | -87 | -63 | -61 |
| 2048 | 492 | 457 | 445 | 578 | 520 | 507 | -86 | -63 | -62 |
| 4096 | 517 | 482 | 471 | 602 | 545 | 532 | -85 | -63 | -61 |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----
Title: lossyWAV Development
Post by: SebastianG on 2007-10-19 17:06:52
I'll release v0.3.14 tonight - I am dabbling in FPU assembly and have made one notable speedup already (changed Magnitude function, which calculates the magnitude of a complex number, to assembler). Having trouble picking double real values out of an array of complex numbers though.......


Regarding performance: How about using floats with single precision?

Cheers!
SG
Title: lossyWAV Development
Post by: 2Bdecided on 2007-10-19 17:42:27
If variable spreading was fundamentally correct, then extending it to higher frequencies wouldn't break things. It seems that it does.
I take it from that you think that the 5 bin spreading between 12.4kHz and 15.8kHz has in some way reduced the quality of the output.
No, not the current version - I thought halb27 had ABXed a previous version with wider spreading? Sorry if I was mistaken, or if that's the one he retracted.

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2007-10-19 17:56:54
.... I thought halb27 had ABXed a previous version with wider spreading? Sorry if I was mistaken, or if that's the one he retracted.

In the end I could'nt abx a problem with any of the recent versions.

There is a small suspicion that someone else may really abx a problem with keys_1644ds and/or furious. I did beleive I can hear a subtle difference here, and my abx results were such that the listenenig results of other members are highly welcome.


@Nick: Looking at Josef's results: What makes 0.3.13 bring down the bitrate so significantly compared to 0.3.11?
Title: lossyWAV Development
Post by: GeSomeone on 2007-10-19 18:00:54
.. I think your main problem Nick is that you're not getting input from anyone ...
David.

It doesn't help that this "development" thread is in the Upload forum in stead of the FLAC forum where it was.
On the other hand, it is true that it no longer belongs in the lossless section 
Title: lossyWAV Development
Post by: Bourne on 2007-10-19 18:24:26

.. I think your main problem Nick is that you're not getting input from anyone ...
David.

It doesn't help that this "development" thread is in the Upload forum in stead of the FLAC forum where it was.
On the other hand, it is true that it no longer belongs in the lossless section 


if you guys could upload The "eig" sample with lossyWAV (provided with the lossless sample) I could try to ABX it. But I doubt I would since I did not with OGG at -q6. For what I have seen it would take a combination of a couple with golden years plus killer lossy codecs samples (gurubolez included..LOL).
Title: lossyWAV Development
Post by: halb27 on 2007-10-19 19:31:59
if you guys could upload The "eig" sample ....

File space is restricted here.
I provide the sample for a limited time on my webspace:
zip version of eig.original.flac and eig.lossy.flac (http://home.arcor.de/horstalb/Xfer2/eig.zip).

eig.lossy.flac is encoded with 0.3.13 (no special options).
Title: lossyWAV Development
Post by: Nick.C on 2007-10-19 19:33:43
.. I think your main problem Nick is that you're not getting input from anyone ...
David.
It doesn't help that this "development" thread is in the Upload forum in stead of the FLAC forum where it was.
On the other hand, it is true that it no longer belongs in the lossless section 
I can't upload in the FLAC forum.....
Title: lossyWAV Development
Post by: Nick.C on 2007-10-19 21:02:58
@Nick: Looking at Josef's results: What makes 0.3.13 bring down the bitrate so significantly compared to 0.3.11?
There was a bug - there was no 5 bin averaging between 12.4kHz and 15.8kHz, the program was using 4 bin averaging.

@Sebastian: I tried that at an early stage but the output didn't match that of Mathcad, so I didn't pursue it. I may revert to single if I have no luck with 80x87 assembly language....
Title: lossyWAV Development
Post by: halb27 on 2007-10-19 21:27:23
I searched my collection looking for clipping samples. Which turned out to be not easy.

Tracks which I know have strong distortions in them didn't have them due to clipping. And when I found clipping it usually was so isolated that it wasn't worth while trying. CDs with the typical loudness-war style kind of compression and clipping didn't have the clipping level very close to the 100% CD level.

In the end I found a CD wonderfully suited for worst-case testing: Francoise Hardy: Le temps des Souvenirs. Not the kind of music I expected to be clipping-prone. But it seems to be terribly remastered in this respect.
I took 16 tracks from this 2 CD album (everything from my productive collection plus everything I could perfectly re-rip from the 1st CD). I added 15 other tracks of various kind which made at least a slight impression that bitrate might go up with the iterative clipping prevention method.

I encoded these 31 tracks using lossyWav 0.3.13 with and without the -clipping option (no other option).

All the tracks: Average bitrate with -clipping:      419 kbps.
All the Tracks: Average bitrate without -clipping:  467 kbps.

A closer look:

Francoise Hardy tracks: Average bitrate with -clipping:      441 kbps.
Francoise Hardy tracks: Average bitrate without -clipping:  551 kbps.

Non-Francoise Hardy tracks: Average bitrate with -clipping:      401 kbps.
Non-Francoise Hardy tracks: Average bitrate without -clipping:  402 kbps.

So with tracks loundness-war style and clipping at or very close to the peak level sure this clipping prevention scheme increases bitrate significantly.
On the other hand at least judging from my collection (and I looked at quite a lot of CDs published in recent years) it doesn't happen very often that bitrate bloat requirements are fulfilled.

In order to take everything into account a more elaborated clipping prevention method could be like this:
Encode with the iterative clipping prevention strategy. When doing so, compute the average bits removed also for the variant of pretending the clipping prevention strategy is not used (can easily be done in parallel - no extra encoding necessary). If in the end it turns out the percentage of the difference between the two strategies is too high, reencoding can occur with a scale factor for preventing clipping.
From all we know so far reencoding is rarely necessary, and in these strong clipping cases it's easy to accept the theoretical quality impact of scaling.

The threshold for reencoding can vary with quality level, for instance like this:
-1: no reencoding with scaling no matter what bits to remove difference
-2: reencoding with scaling in case bits to remove difference is greater than 10%.
-3: reencoding with scaling in case bits to remove difference is greater than 5%.
Title: lossyWAV Development
Post by: halb27 on 2007-10-19 22:19:15
I listened to my usual sample set with v0.3.13 (no option).
Everything is fine.
Amazing for this rather low bitrate on average with usual samples.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-19 22:31:52
I listened to my usual sample set with v0.3.13 (no option).
Everything is fine.
Amazing for this rather low bitrate on average with usual samples.
  I'm *really* glad that the new spreading function / skew combination is working [good shout, David!]! Also, keeping the bitrate down is nearly as important.

Is there any appetite for correction files? I have been toying with the idea of this option but it's not worth doing if there is no "market".

lossyWAV alpha v0.3.14 attached: - superseded.

No parameter changes, just faster.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-21 23:21:59
lossyWAV alpha v0.3.15 attached. Superseded.

Code speeded up again.

Changes made to quality level -3 to make it slightly more aggressive : codec_block_size changed from 1024 samples to 512; -nts default changed from -1 to -0.5.

I'm considering making quality level -1 more conservative by setting the codec_block_size to 2048 samples. Input required please.

[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV alpha v0.3.15 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Options:

-1, -2 or -3  quality level (1:overkill, 2:default, 3:compact)
-nts <n>      set noise_threshold_shift to n dB (-15dB<=n<=0dB, default=-1.5dB)
              (reduces overall bits to remove by 1 bit for every 6.0206dB)
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (0dB<=n<=48dB, default=12dB)
-skew <n>     skew fft analysis results by n dB (0db<=n<=48db, default=12dB)
              in the frequency range 20Hz to 3.45kHz
-o <folder>   destination folder for the output file
-clipping     disable clipping prevention by iteration; default=off
-force        forcibly over-write output file if it exists; default=off

Advanced / System Options:

-dither       dither output using triangular dither; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]
Title: lossyWAV Development
Post by: halb27 on 2007-10-22 06:29:37
...I'm considering making quality level -1 more conservative by setting the codec_block_size to 2048 samples. Input required please. ...

I welcome the approach of putting more details into -1, -2, -3, as well as a more pronounced differentiation security-/and bitrate wise for -1 and -3, with -2 being the standard as ever with a good but not exaggerated security margin.

But what is the advantage of a blocksize of 2048 for -1?

Indepent of the block size question for -1 I personally would like to see a new approach with spreading_length=1 for low frequencies and short ffts cause in this case the current approach averages over more than 1 critical band.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-22 08:14:11
...I'm considering making quality level -1 more conservative by setting the codec_block_size to 2048 samples. Input required please. ...
I welcome the approach of putting more details into -1, -2, -3, as well as a more pronounced differentiation security-/and bitrate wise for -1 and -3, with -2 being the standard as ever with a good but not exaggerated security margin.

But what is the advantage of a blocksize of 2048 for -1?

Indepent of the block size question for -1 I personally would like to see a new approach with spreading_length=1 for low frequencies and short ffts cause in this case the current approach averages over more than 1 critical band.
A codec_block_size of 2048 will reduce bits to remove by effectively taking the lower of two bits_to_remove values for consecutive small blocks and applying that to the large block. It's a possibility, but more conservatism might better be achieved by increasing the default "-snr" value for -1 from 12dB (the same for all quality levels at the moment) to a larger value.

At present for my 52 sample set WAV: 121.53MB; FLAC: 68.20MB, 791.9kbps; -1: 42.97MB 499.0kbps(106.1% -2); -2: 40.49MB, 470.2kbps; -3: 37.68MB, 437.5kbps (93.1% -2). A quick test of "-1 -snr 24" gave 44.67MB, 518.7kbps (110.3% -2).

I will have a look at spreading_function_length with respect to Critical Bandwidth and post.

[edit] See attached Excel sheet. I will have a think as to how best to implement this, without reverting to my abortive previous attempt at Bark averaging. [/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-10-22 12:36:58
A codec_block_size of 2048 will reduce bits to remove by effectively taking the lower of two bits_to_remove values for consecutive small blocks and applying that to the large block. It's a possibility, but more conservatism might better be achieved by increasing the default "-snr" value for -1 from 12dB (the same for all quality levels at the moment) to a larger value.

I understand well that you want to be conservative qualitywise with -1, and a blocksize of 2048 certainly will reduce bits to remove, but from my naive understanding I cannot see a promising impact of a 2048 sized block on quality. The -snr approach or an increased value of -nts or -skew are more promising to me.

My personal feelings however go strongly in the direction that for FFT lengths <= 256 the spreading length is two long in case the width of critical bands are worth taking into account.
Sure there must be compromise, and IMO the current results are very good. But in favor of conservatism for -1 a shorter spreading length at least for a FFT length of 64 and low frequency is welcome IMO (spreading_length=1 in this case), and in case the influence on bits removed remains acceptable even for a FFT length of 128 and maybe 256 spreading length should be tried to get lower than they are now, and if possible not only for the low frequency end.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-22 12:59:33
A codec_block_size of 2048 will reduce bits to remove by effectively taking the lower of two bits_to_remove values for consecutive small blocks and applying that to the large block. It's a possibility, but more conservatism might better be achieved by increasing the default "-snr" value for -1 from 12dB (the same for all quality levels at the moment) to a larger value.
I understand well that you want to be conservative qualitywise with -1, and a blocksize of 2048 certainly will reduce bits to remove, but from my naive understanding I cannot see a promising impact of a 2048 sized block on quality. The -snr approach or an increased value of -nts or -skew are more promising to me.
I agree, a -snr / -skew combination will probably be a better solution. I am running a matrix calculation of -snr 0 to 30 step 3, -skew 0 to 30 step 3 to see what happens with bitrate for my sample set.
My personal feelings however go strongly in the direction that for FFT lengths <= 256 the spreading length is two long in case the width of critical bands are worth taking into account.
Sure there must be compromise, and IMO the current results are very good. But in favor of conservatism for -1 a shorter spreading length at least for a FFT length of 64 and low frequency is welcome IMO (spreading_length=1 in this case), and in case the influence on bits removed remains acceptable even for a FFT length of 128 and maybe 256 spreading length should be tried to get lower than they are now, and if possible not only for the low frequency end.
I am currently looking at what impact a spreading_function_length of 1 would have and how to implement it. It could be as simple as if FFT_length<256 then spreading_function_length=1. if 256 or 512 then 1,2,3,4. if 1024 or above then 2,3,4,5.
Title: lossyWAV Development
Post by: halb27 on 2007-10-22 14:39:19
I am currently looking at what impact a spreading_function_length of 1 would have and how to implement it. It could be as simple as if FFT_length<256 then spreading_function_length=1. if 256 or 512 then 1,2,3,4. if 1024 or above then 2,3,4,5.

Wonderful, thank you. In case this brings bits to remove too much down there's still room for compromise especially for FFT_length < 256. Guess for the high frequency range spreading_length needs not be 1 even with short FFT lengths.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-22 14:56:58

I am currently looking at what impact a spreading_function_length of 1 would have and how to implement it. It could be as simple as if FFT_length<256 then spreading_function_length=1. if 256 or 512 then 1,2,3,4. if 1024 or above then 2,3,4,5.

Wonderful, thank you. In case this brings bits to remove too much down there's still room for compromise especially for FFT_length < 256. Guess for the high frequency range spreading_length needs not be 1 even with short FFT lengths.
I added a final table to the bottom of the spreadsheet which takes the max(1,int(log2(number_of_bins_in_critical_band_width))) - this yields a sensible starting point.
Title: lossyWAV Development
Post by: halb27 on 2007-10-22 15:25:34
I added a final table to the bottom of the spreadsheet which takes the max(1,int(log2(number_of_bins_in_critical_band_width))) - this yields a sensible starting point.

Fine, this table shows under what circumstances Width of Critical Band Width in FFT Bins is < 1 which is most critical IMO. IMO it should be >1 (better: >= 2), resp. spreading_length should be 1 in case 'Width of Critical Band Width in FFT Bins > 1' cannot be achieved.
This is with respect to where these requirements are not fulfilled at the moment. I'm not talking about making spreading length larger than 5 in the high frequency area with long FFTs though to a cautiously chosen extent this may be possible - especially for -2 and more so -3. This is something that can be considered later.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-22 15:51:10
I added a final table to the bottom of the spreadsheet which takes the max(1,int(log2(number_of_bins_in_critical_band_width))) - this yields a sensible starting point.

Fine, this table shows under what circumstances Width of Critical Band Width in FFT Bins is < 1 which is most critical IMO. IMO it should be >1 (better: >= 2), resp. spreading_length should be 1 in case 'Width of Critical Band Width in FFT Bins > 1' cannot be achieved.
This is with respect to where these requirements are not fulfilled at the moment. I'm not talking about making spreading length larger than 5 in the high frequency area with long FFTs though to a cautiously chosen extent this may be possible - especially for -2 and more so -3. This is something that can be considered later.
I see where you're coming from.

With respect to the matrix calculation mentioned earlier, please note the average bitrates for my 52 sample set, processed at quality level -2 with -SNR and -SKEW as the only other parameters.
Code: [Select]
BitRate   SNR=00  SNR=03  SNR=06  SNR=09  SNR=12  SNR=15  SNR=18  SNR=21  SNR=24  SNR=27  SNR=30
SKEW=00   468.4   468.4   468.4   468.4   468.4   469.2   471.4   476.2   483.2   494.7   508.7
SKEW=03   468.7   468.7   468.7   468.7   468.8   469.8   472.0   477.3   484.9   497.3   512.1
SKEW=06   468.9   468.9   468.9   468.9   469.0   470.3   472.8   478.5   486.9   499.9   515.5
SKEW=09   469.5   469.5   469.5   469.5   469.6   471.0   473.8   479.9   488.9   502.4   518.7
SKEW=12   470.1   470.1   470.1   470.1   470.2   471.8   474.9   481.4   491.1   505.1   522.1
SKEW=15   470.9   470.9   470.9   470.9   471.1   472.7   476.2   483.1   493.5   507.7   525.4
SKEW=18   471.9   471.9   471.9   471.9   472.1   473.9   477.6   484.8   495.9   510.2   528.7
SKEW=21   473.3   473.3   473.3   473.3   473.5   475.3   479.2   486.7   498.3   513.0   531.9
SKEW=24   475.2   475.2   475.2   475.2   475.4   477.0   481.3   488.9   500.9   515.6   535.1
SKEW=27   477.5   477.5   477.5   477.5   477.7   479.2   483.6   491.2   503.7   518.6   538.3
SKEW=30   480.5   480.5   480.5   480.5   480.6   482.0   486.4   494.0   506.6   521.7   541.6
Title: lossyWAV Development
Post by: halb27 on 2007-10-22 17:09:08
So from this table a higher value of skew than usual so far isn't critical as long as the snr value isn't chosen very high.
We're in a world of heuristics, but to me the skew option is more meaningful than the snr option.
So values up to say skew=21 or 24 and snr=18 are well acceptable IMO for -1 judging from your table.
(Sure I have headroom in mind for the variable spreading function modifications).
Title: lossyWAV Development
Post by: Nick.C on 2007-10-22 22:31:23
So from this table a higher value of skew than usual so far isn't critical as long as the snr value isn't chosen very high.
We're in a world of heuristics, but to me the skew option is more meaningful than the snr option.
So values up to say skew=21 or 24 and snr=18 are well acceptable IMO for -1 judging from your table.
(Sure I have headroom in mind for the variable spreading function modifications).
I think that the higher skew values increase bitrate on some samples, but not all, e.g. Atem_Lied.

I have re-written the spread procedure and it is now prepared to accept spreading_function_lengths which vary with fft_length, although I have not yet nailed down the exact relationship between fft_length  / bin frequency and spreading_function_length - that's a job for tomorrow. The price of the re-write is about 5% added to the process time.
Title: lossyWAV Development
Post by: halb27 on 2007-10-23 06:58:31
I wouldn't care about the 5% added processing time.

Sure everbody is different, but as a first approximation I guess anybody who accepts the file size increase from ~ 200 kbps of a transform codec to ~ 450 kbps of this approach in favor of an expected extremely high quality doesn't care very much about encoding speed (which is a two stage process here anyway).
Though more speed is welcome everything is fine as long as processing time doesn't really hurt.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-23 20:01:10
I've tried a first attempt at spreading which varies with every fft_length. Reference: FLAC=788.6kbps / 67.91MB

1st iteration: no averaging at 64 sample fft_length, -2 yields 619.6kbps / 53.36MB (64:1,1,1,1,1; 256:1,1,2,2,3; 1024:2,3,3,4,5).

2nd iteration : less conservative version, -2 yields 485.8kbps / 41.84MB (64:2,2,2,3,3; 256:2,2,3,3,4; 1024:2,3,3,4,5).

3rd iteration (64:2,2,2,2,2; 256:2,2,2,3,3; 1024:2,3,3,4,5) yields 510.3kbps / 43.95MB. This same iteration with "-nts 0" yields 491.7kbps / 42.35MB.

This in comparison with the current fixed spreading yields 470.2 kbps / 40.49MB.

I've decided to release the 3rd iteration as alpha v0.3.16 - attached. Superseded.
Title: lossyWAV Development
Post by: halb27 on 2007-10-23 20:46:16
I've tried a first attempt at spreading which varies with every fft_length. Reference: FLAC=788.6kbps / 67.91MB

When there is no averaging at 64 sample fft_length, -2 yields 619.6kbps / 53.36MB (64:1,1,1,1,1; 256:1,1,2,2,3; 1024:2,3,3,4,5).

A less conservative version (still more conservative than previous 2,3,3,4,5 for all fft_lengths) yields 485.8kbps / 41.84MB (64:2,2,2,3,3; 256:2,2,3,3,4; 1024:2,3,3,4,5).

Another iteration (64:2,2,2,2,2; 256:2,2,2,3,3; 1024:2,3,3,4,5) yields 510.3kbps / 43.95MB

This in comparison with the current fixed spreading yields 470.2 kbps / 40.49MB.

Thank you.
IMO this shows the routes that are not promising and those that are::

(64:1,1,1,1,1; 256:1,1,2,2,3; 1024:2,3,3,4,5):  a lot too conservative. Probably due to spreading_lenth too short in the mid and high frequency range.

(64:2,2,2,3,3; 256:2,2,3,3,4; 1024:2,3,3,4,5): this or a variation of this is a promising candidate IMO for a -1 spreading length strategy.

Do you mind trying: (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5)? I still care most about the very low frequency edge.

Just a question: What's your sample set? If it's regular music we should try to hold bitrate down. If it's problem samples we shouldn't care about bitrate going up. Ideally bitrate is kept rather low with regular music and increases significantly with problem samples (not necessarily individually but as classes of well- and bad-behaving samples).
Title: lossyWAV Development
Post by: Nick.C on 2007-10-23 20:53:45
Do you mind trying: (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5)? I still care most about the very low frequency edge.

Just a question: What's your sample set? If it's regular music we should try to hold bitrate down. If it's problem samples we shouldn't care about bitrate going up. Ideally bitrate is kept rather low with regular music and increases significantly with problem samples (not necessarily individually but as classes of well- and bad-behaving samples).
Done - attached alpha v0.3.16b : 494.2kbps / 42.56MB. Superseded.

My sample set is:[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
04 - Black Sabbath - Iron Man.wav
06_florida_seq.wav
10 - Dungeon - The Birth- The Trauma Begins.wav
14_Track03beginning.wav
16_Track03entreaty.wav
18_Track04cakewithtea.wav
34_Gabriela_Robin___Cats_on_Mars.wav
41_30sec.wav
A02_metamorphose.wav
A03_emese.wav
Angelic.wav
annoyingloudsong.wav
aps_Killer_sample.wav
Atem_lied.wav
ATrain.wav
Bachpsichord.wav
badvilbel.wav
bibilolo.wav
BigYellow.wav
birds.wav
bruhns.wav
cricket__insect___edit_.wav
dither_noise_test.wav
E50_PERIOD_ORCHESTRAL_E_trombone_strings.wav
eig.wav
Furious.wav
glass_short.wav
harp40_1.wav
herding_calls.wav
jump_long.wav
keys_1644ds.wav
ladidada_10s.wav
Liebe_so_gut_es_ging.wav
Moon_short.wav
Poets_of_the_fall___Shallow.wav
rach_original.wav
rawhide.wav
Rush___Hold_Your_Fire___Turn_the_Page.wav
S13_KEYBOARD_Harpsichord_C.wav
S30_OTHERS_Accordion_A.wav
S34_OTHERS_GlassHarmonica_A.wav
S35_OTHERS_Maracas_A.wav
S53_WIND_Saxophone_A.wav
SeriousTrouble.wav
swarm_of_wasps__edit_.wav
thewayitis.wav
the_product.wav
triangle.wav
triangle_2_1644ds.wav
trumpet.wav
VELVET.wav
wait.wav
[/size]If you're worried about the low frequency range, use more -skew.....
Title: lossyWAV Development
Post by: halb27 on 2007-10-23 21:23:38
Do you mind trying: (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5)? I still care most about the very low frequency edge.
...
Done - attached alpha v0.3.16b : 494.2kbps / 42.56MB.

Thank you. So as 494.2kbps is the result of (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5) I think that's very, very promising, and this is especially true as your sample set consists more or less of short problem samples.
With this in mind I guess it's even acceptable to go a bit more conservative (as a target for -1 when we're done), something like
(64:1,1,1,2,4; 256:1,1,2,3,4; 1024:1,3,3,4,5) - looking at your wonderful 'Width of Critical Band Width in FFT Bins' table more closely.

I'd love to go through my 51 regular song collection I used before with this setting, if you can provide such a version. BTW default for -skew and -snr is still 12 for each of these options?
Title: lossyWAV Development
Post by: Nick.C on 2007-10-23 21:40:33
Do you mind trying: (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5)? I still care most about the very low frequency edge.
...
Done - attached alpha v0.3.16b : 494.2kbps / 42.56MB.
Thank you. So as 494.2kbps is the result of (64:1,1,2,3,4; 256:1,2,3,3,4; 1024:2,3,3,4,5) I think that's very, very promising, and this is especially true as your sample set consists more or less of short problem samples.
With this in mind I guess it's even acceptable to go a bit more conservative (as a target for -1 when we're done), something like
(64:1,1,1,2,4; 256:1,1,2,3,4; 1024:1,3,3,4,5) - looking at your wonderful 'Width of Critical Band Width in FFT Bins' table more closely.

I'd love to go through my 51 regular song collection I used before with this setting, if you can provide such a version. BTW default for -skew and -snr is still 12 for each of these options?
Your wish is my command...... lossyWAV alpha v0.3.16c attached : 536.5 kbps / 46.20MB Superseded. Yes, -snr 12 -skew 12 is the default for all options. The spreading table is fixed (currently, this will change) for all quality levels.

Looking at the quality levels more carefully, maybe all 3 should use the 64/256/1024 sample fft analyses that -2 uses and the only other variables would be -snr, -skew, codec_block_size (512 samples for -3) and -nts.

Oh, and I realise that I was taking the lower of the min(min_result,average_result-snr) then adding the (negative) noise threshold shift to that. I've changed this to min(min_result+noise_threshold_shift,average_result-snr). Which will reduce bitrate slightly but not carelessly. lossyWAV alpha v0.3.16d attached : 527.8kbps / 45.45mB. Superseded, although default spreading is the same in v0.3.18.
Title: lossyWAV Development
Post by: halb27 on 2007-10-23 22:14:26
... lossyWAV alpha v0.3.16c attached : 536.5 kbps / 46.20MB. ...

Thank you. Appropriate result for your more-or-less problem sample set IMO. But behavior on regular music is important. I'll run this version on my regular music sample set tonight and will report tomorrow.
Title: lossyWAV Development
Post by: halb27 on 2007-10-24 07:07:51
Results (average bitrate according to foobar) for my 50 (51 was wrong) regular song collection:

a) prior result I had a few weeks ago (don't remember the version but certainly with a fixed spreading_length of 4): 507 kbps.

b) result of 0.3.16d: 475 kbps.

c) For comparison result of 0.3.15: 425 kbps.

No special options specified.


So I think for -1 this is an adequate spreading length strategy (more exact: a good start. Fine tuning is necessary).

A closer look at fiocco, the sample guruboolez was on the edge to abx (his result was at least good enough to show that the fiocco quality should improve though it was very good already).
guruboolez' versions (guess it was 0.3.1 - but certainly a fixed spreading_length of 4 version) result: 436 kbps.
0.3.16d result: 507 kbps. So this makes the expectation reasonable that this way the small remaining problem is gone. Sure it is most welcome if guruboolez could confirm.
0.3.15 result for comparison: 472 kbps. Already a very good step into the right direction. Very remarkable moreover as average bitrate came down in general with switching from a fixed spreading length of 4 to the variable spreading length.

As for fine tuning:
Judging from what we got so far:
- if it's up to hold average bitrate low it is essential to keep spreading length relatively long at the high frequency edge. Luckily this can easily be done also with respect to the heuristic requirement that several bins (at least 1) should fall into each critical band.
- if it's up to hold up the heuristic requirement that several bins should fall into each critical band (as far as it's possible at all) it's essential to hold spreading length low (usually 1) at the low frequency edge. Luckily if done carefully this doen't seem to have an unacceptable impact on average bitrate.

So fine tuning (finding promising compromises) can be done with these considerations in mind considering the extreme ends, and especially with respect to the target that average bitrate of regular samples should be held low while it's welcome if it goes up with problem samples. Sure everything within the restricted possibilities we have.

I welcome most your idea to have a fixed fft analysis strategy (fft length of 64, 256, 1024) for any quality setting (as done with -2 so far).
Sufficient IMO and makes fine tuning a lot more easy:
For fine tuning purposes can you provide spreading length options of the kind:
-spreading64 11234
-spreading256 12334
-spreading1024 23345
or similar.
This way anybody can try to find a promising spreading length strategy.
I'd love to search for such strategies for -1, -2, -3, and I wouldn't have to bother you with building new lossyWav versions for whatever comes to my mind.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-24 07:54:28
I welcome most your idea to have a fixed fft analysis strategy (fft length of 64, 256, 1024) for any quality setting (as done with -2 so far).
Sufficient IMO and makes fine tuning a lot more easy:
For fine tuning purposes can you provide spreading length options of the kind:
-spreading64 11234
-spreading256 12334
-spreading1024 23345
or similar.
This way anybody can try to find a promising spreading length strategy.
I'd love to search for such strategies for -1, -2, -3, and I wouldn't have to bother you with building new lossyWav versions for whatever comes to my mind.
I was thinking about this early this morning: it might be easier to implement a -spread parameter that takes a 15 character hexadecimal numeric input (would we ever exceed spreading_function_length=15?) and puts the results in the spreading_function table for each analysis length. This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). I'm very glad that the problem samples are improving while the average bitrate is not growing too much.

So, expect a new build with the possibility to use "-spread 112341233423345" to control the spreading function. Now, where's the cliParameter unit, I must rip it apart and rebuild it.......

Okay,  cliParameter unit duly ripped and rebuilt. There is an unexplained considerable slowdown of processing, but for evaluation of spreading functions it should be okay. lossyWAV alpha v0.3.17 attached. Superseded, slowdown "cured".

[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV alpha v0.3.17 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Options:

-1, -2 or -3  quality level (1:overkill, 2:default, 3:compact)
-nts <n>      set noise_threshold_shift to n dB (-15dB<=n<=0dB, default=-1.5dB)
              (reduces overall bits to remove by 1 bit for every 6.0206dB)
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (0dB<=n<=48dB, default=12dB)
-skew <n>     skew fft analysis results by n dB (0db<=n<=48db, default=12dB)
              in the frequency range 20Hz to 3.45kHz
-spf <15hex>  manually input the 3 spreading functions as 3 x 5 hex characters;
              e.g. 444444444444444, default=111241123423345; Hex characters
              must be one of 1,2,3,4,5,6,7,8,9,A,B,C,D,E,F (zero excluded).
-o <folder>   destination folder for the output file
-clipping     disable clipping prevention by iteration; default=off
-force        forcibly over-write output file if it exists; default=off

Advanced / System Options:

-dither       dither output using triangular dither; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]
Title: lossyWAV Development
Post by: halb27 on 2007-10-24 09:20:27
Excellent. Thank you.
Means I will have a lot of (interesting) work this evening.

... This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). ...

Sorry, I don't understand this. Can you please explain it a bit?
Title: lossyWAV Development
Post by: Nick.C on 2007-10-24 12:24:26

... This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). ...
Sorry, I don't understand this. Can you please explain it a bit?
Basically, you will need to input a 15 character hexadecimal string, regardless of how many analyses will actually be carried out at the specified quality level (-1 = 2048/1024/256/64 sample fft_length; -2 = 1024/256/64 sample fft_length; -3 = 1024/64 sample fft_length). What would happen is that the user always inputs 3 spreading functions and those three are mapped to 64, 256 and 1024 fft_length spreading. Then, copies are made into the spreading functions for 128, 512 and 2048 fft_length spreading functions.
Title: lossyWAV Development
Post by: halb27 on 2007-10-24 12:56:49

... This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). ...
Sorry, I don't understand this. Can you please explain it a bit?
Basically, you will need to input a 15 character hexadecimal string, regardless of how many analyses will actually be carried out at the specified quality level (-1 = 2048/1024/256/64 sample fft_length; -2 = 1024/256/64 sample fft_length; -3 = 1024/64 sample fft_length). What would happen is that the user always inputs 3 spreading functions and those three are mapped to 64, 256 and 1024 fft_length spreading. Then, copies are made into the spreading functions for 128, 512 and 2048 fft_length spreading functions.

I imagined it to be like that - just wanted to make sure.
In this case the user doesn't have full control of the spreading length for every fft length.
If for instance it turns out to be important for the 1024 bin fft that there is a 1 in the spreading like in (1,3,3,4,5), it would be so for a 2048 bin fft as well and might have a negative impact on bitrate.
There are dependancies which I'd prefer to see avoided.

I thought you wanted to be content with 3 analyses. So do you still think of using a fft length of 2048 for -1?
If yes I'd prefer a 20 character hex string covering all fft lengths used (64, 256, 1024, 2048) in this order, and you ignore the 256 and 2048 part if -3 is used resp. you ignore the 2048 part if -2 is used.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-24 13:21:11
... This would be independent of the number of actual analyses (128=64, 512=256, 2048=1024). ...
Sorry, I don't understand this. Can you please explain it a bit?
Basically, you will need to input a 15 character hexadecimal string, regardless of how many analyses will actually be carried out at the specified quality level (-1 = 2048/1024/256/64 sample fft_length; -2 = 1024/256/64 sample fft_length; -3 = 1024/64 sample fft_length). What would happen is that the user always inputs 3 spreading functions and those three are mapped to 64, 256 and 1024 fft_length spreading. Then, copies are made into the spreading functions for 128, 512 and 2048 fft_length spreading functions.
I imagined it to be like that - just wanted to make sure.
In this case the user doesn't have full control of the spreading length for every fft length.
If for instance it turns out to be important for the 1024 bin fft that there is a 1 in the spreading like in (1,3,3,4,5), it would be so for a 2048 bin fft as well and might have a negative impact on bitrate.
There are dependancies which I'd prefer to see avoided.

I thought you wanted to be content with 3 analyses. So do you still think of using a fft length of 2048 for -1?
If yes I'd prefer a 20 character hex string covering all fft lengths used (64, 256, 1024, 2048) in this order, and you ignore the 256 and 2048 part if -3 is used resp. you ignore the 2048 part if -2 is used.
Changed to 20 hexchar string, 128 & 512 fft_length removed. I do want to move to only 3 analyses, just don't want to upset anybody..... lossyWAV alpha v0.3.18 attached. Superdeded;[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV alpha v0.3.18 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Options:

-1, -2 or -3  quality level (1:overkill, 2:default, 3:compact)
-nts <n>      set noise_threshold_shift to n dB (-15dB<=n<=0dB, default=-1.5dB)
              (reduces overall bits to remove by 1 bit for every 6.0206dB)
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (0dB<=n<=48dB, default=12dB)
-skew <n>     skew fft analysis results by n dB (0db<=n<=48db, default=12dB)
              in the frequency range 20Hz to 3.45kHz
-spf <4x5hex> manually input the 4 spreading functions as 4 x 5 hex characters;
              e.g. 44444-44444-44444-44444, default=11124-11234-23345-34456;
              Hex characters must be one of 1 to 9 and A to F (zero excluded).
-o <folder>   destination folder for the output file
-clipping     disable clipping prevention by iteration; default=off
-force        forcibly over-write output file if it exists; default=off

Advanced / System Options:

-dither       dither output using triangular dither; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]Test results for v0.3.18:

My 52 sample set: WAV: 121.53MB; FLAC: 68.2MB / 792.0kbps; -1: 46.42MB / 539.0kbps; -2: 45.45MB / 527.8kbps; -3: 38.88MB / 451.5kbps.

Guru's 150 sample set: WAV: 252.36MB; FLAC: 122.17MB / 683.2kbps; -1: 95.95MB / 536.5kbps; -2: 93.81MB / 524.6kbps; -3: 84.96MB / 475.1kbps.
Title: lossyWAV Development
Post by: halb27 on 2007-10-24 13:42:46
Wonderful. Thank you.
Title: lossyWAV Development
Post by: GeSomeone on 2007-10-24 19:24:38
We're in a world of heuristics, but to me the skew option is more meaningful than the snr option.

What I understood what SKEW was for, it is an "offset" to SNR to give the low freqs (where we would more easily discern noise) a better snr. (with a stretch you could call it a form of noise shaping)

So if you change SNR, this will impact the values where SKEW is applied too.

If I'm correct the effect on quality (==snr?) would be
- when you raise SKEW  you (only) give better snr to the lower frequenties
- when you raise SNR and lower SKEW (at the same time) you (only) give the high freqs a better snr.

So choose where you want the extra quality...  or just vary the SNR.

BTW. Has anybody found that SKEW above 9 improves a problem sample?
Title: lossyWAV Development
Post by: halb27 on 2007-10-24 20:28:32

We're in a world of heuristics, but to me the skew option is more meaningful than the snr option.

What I understood what SKEW was for, it is an "offset" to SNR to give the low freqs (where we would more easily discern noise) a better snr. (with a stretch you could call it a form of noise shaping)

So if you change SNR, this will impact the values where SKEW is applied too.

If I'm correct the effect on quality (==snr?) would be
- when you raise SKEW  you (only) give better snr to the lower frequenties
- when you raise SNR and lower SKEW (at the same time) you (only) give the high freqs a better snr.

So choose where you want the extra quality...  or just vary the SNR.

BTW. Has anybody found that SKEW above 9 improves a problem sample?

Well, the skew option is more meaningful to me than the snr option just because I have an imagination about the effect of skew (though I don't really know how useful it is), but I personally don't really understand the idea behind snr. Maybe Nick can help.
I personally accept that we are partially doing a bit of rather wild experimenting as long as this is done in a pretty conservative way that makes sure the very good quality already achieved.
I have liked the idea of skew as I have always seen too much averaging at the low frequency edge IMO. Now that this is gonna change due to variable spreading maybe the skew option will partially loose it's usefulness. For being conservative, especially with -1, however skew may still be welcome.
I also see snr in favor of conservatism, but because of lacking insight so far my heart is more with skew.
Let's see what will come out.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-24 20:36:06
We're in a world of heuristics, but to me the skew option is more meaningful than the snr option.
What I understood what SKEW was for, it is an "offset" to SNR to give the low freqs (where we would more easily discern noise) a better snr. (with a stretch you could call it a form of noise shaping)

So if you change SNR, this will impact the values where SKEW is applied too.

If I'm correct the effect on quality (==snr?) would be
- when you raise SKEW  you (only) give better snr to the lower frequenties
- when you raise SNR and lower SKEW (at the same time) you (only) give the high freqs a better snr.

So choose where you want the extra quality...  or just vary the SNR.

BTW. Has anybody found that SKEW above 9 improves a problem sample?
Well, the skew option is more meaningful to me than the snr option just because I have an imagination about the effect of skew (though I don't really know how useful it is), but I personally don't really understand the idea behind snr. Maybe Nick can help.
I personally accept that we are partially doing a bit of rather wild experimenting as long as this is done in a pretty conservative way that makes sure the very good quality already achieved.
To me, -snr is a safety net that calculates the average of all the relevant fft bins and then deducts the value (default=12) to derive a threshold value. If the minimum result of the relevant fft bins is below the threshold value then the minimum result is used, if above then the threshold value is used. It is easily disabled with -snr 0.
Title: lossyWAV Development
Post by: GeSomeone on 2007-10-24 22:14:34
To me, -snr is a safety net that calculates the average of all the relevant fft bins and then deducts the value (default=12) to derive a threshold value. If the minimum result of the relevant fft bins is below the threshold value then the minimum result is used, if above then the threshold value is used.

If that's all then they are not related, and I was wrong.  I must be mixing up -SNR with some other noise threshold.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-24 22:18:10
To me, -snr is a safety net that calculates the average of all the relevant fft bins and then deducts the value (default=12) to derive a threshold value. If the minimum result of the relevant fft bins is below the threshold value then the minimum result is used, if above then the threshold value is used.
If that's all then they are not related, and I was wrong.  I must be mixing up -SNR with some other noise threshold.
If you introduce a large -skew value then the minimum *may* be affected, but the average will definitely be affected as the fft results are skewed before the spreading / averaging is done.
Title: lossyWAV Development
Post by: Josef Pohm on 2007-10-24 23:18:06
Comparison of 0.3.18 and 0.3.15 on my SetF.

Bits to remove table.

Code: [Select]
------- -------------------- -------------------- --------------------
|      |       0.3.15       |       0.3.18       |      18 vs 15      |
|       ------ ------ ------ ------ ------ ------ ------ ------ ------
|      |   1  |   2  |   3  |   1  |   2  |   3  |   1  |   2  |   3  |
------- ------ ------ ------ ------ ------ ------ ------ ------ ------
|  512 | 5,13 | 5,64 | 5,93 | 5,22 | 5,36 | 5,88 |  ,09 | -,28 | -,05 |
| 1024 | 4,88 | 5,25 | 5,48 | 4,84 | 4,93 | 5,44 | -,04 | -,32 | -,04 |
| 2048 | 4,48 | 4,93 | 5,17 | 4,40 | 4,52 | 5,11 | -,08 | -,41 | -,06 |
| 4096 | 4,11 | 4,55 | 4,78 | 3,91 | 4,05 | 4,71 | -,20 | -,50 | -,07 |
------- ------ ------ ------ ------ ------ ------ ------ ------ ------


TAK 1.0.2b1 -p3m bitrate table (lossless 862kbps).
Code: [Select]
------- ----------------- ----------------- -----------------
|      |  TAK on 0.3.15  |  TAK on 0.3.18  |    18 vs 15     |
|       ----- ----- ----- ----- ----- ----- ----- ----- -----
|      |  1  |  2  |  3  |  1  |  2  |  3  |  1  |  2  |  3  |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----
|  512 | 465 | 426 | 405 | 458 | 447 | 409 | - 7 |  21 |   4 |
| 1024 | 470 | 441 | 424 | 472 | 465 | 426 |   2 |  24 |   2 |
| 2048 | 492 | 457 | 439 | 498 | 488 | 443 |   6 |  31 |   4 |
| 4096 | 517 | 482 | 465 | 532 | 521 | 470 |  15 |  39 |   5 |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----
Title: lossyWAV Development
Post by: halb27 on 2007-10-24 23:59:12
Well, I got a lot of work left (will have to do it the day after tomorrow as I'll be busy tomorrow), but I can report about my first findings which I think show pretty much the way to go to a rather large extent.

I took 12 full tracks of regular music and 8 samples that are suspected to be problematic for LossyWav and went thru a lot of tests. Here's an extract which shows up the way to go in a rather consequent way:

I used only the new -spf parameter, so I drop the spreading values for 2048 here.

a) I started with 23345-23345-23345 as this was the reference setting for a long time yielding good results. => regular tracks: 425 kbps on average, problem tracks: 481 kbps.
Quite a good differentiation already.

b) With the critical band approach it's most vital to have a spreading length of 1 at the low frequency edge.
13345-13345-13345 => 428 kbps (regular) vs. 499 kbps (problems).
So obeying to the critical band principle is nearly for free here, and we get an improved differentiation regular vs. problems.

c) Looking at the next frequency range at the low edge spreading length should be 1 for FFT length=64 and 256, and can be up to 4 with a 1024 bin FFT.
11345-11345-13345 => 434 kbps (regular) vs. 512 kbps (problems).
A pretty acceptable bitrate increase IMO and an improved spread between regular and problematic tracks.
Using 2 instead 3 for the 1024 bin FFT provides nearly the same result (435 vs. 512 kbps).

d) For a FFT length of 64 spreading length should be 1 for the frequency range next lowest. This however increases bitrate significantly. Should only be done with -1 IMO.
So let's make compromise and use a spreading length of 2 here. With a FFT length of 64 in the next frequency range spreading length should be 3. So we got 11235 for the 64 bin FFT if we stick with 5 for the spreadinig length at the HF end.
With 256 FFT bins spreading length should be 3 for the 3.4...8.3 kHz range. With anything else left we arrive at 11235-11345-13345, and this yields 437 kbps (regular) vs. 515 kbps (problems).
Corresponds closely to c).

e) With those digits in spreading formula d) that are not bold we are free to do some variations on them trying for instance cautiously a rather long spreading length, especially at the high frequency edge, but - more cautiously - also with the 8.3..12.4 kHz range.
Remember changing spreading strategy to 23345 brought down bitrate significantly due to this small increase in spreading length at the HF end.
I'm just trying these things and will report about them. I think this is a good area of differentiation among the different quality modes.
Just a promising candidate for -2: 11235-11357-13379 => 416 kbps (regular music) vs. 512 kbps (problems).

Not bad, isn't it? I'll try to get it a bit more defensive for -2 while keeping these good properties to a large extent.
Title: lossyWAV Development
Post by: halb27 on 2007-10-26 07:56:33
Spreading strategy for -2 settled:

11235-11336-1234D for 64-256-1024 FFT length.

Yields 420 kbps (regular music) resp. 514 kbps (problem samples) on average with my sample sets.

This is roughly the same bitrate as that of v0.3.15 (using 23345-23345-23345), but with a significantly improved security against problems.
Up to 12.4 kHz the spreading length is lower than or equal to that of the v0.3.15 strategy.
For the 12.4+ kHz range and an FFT length >=256 the longer spreading length shouldn't be an issue with so many bins in this range (each of the averages covers only a small frequency range). Moreover our ears' sensitivity drops quicikly in this area, and this is especially true for our sensitivity towards noise which peaks at around 6 kHz.

Just a bit strange looking at an FFT length of 1024 and the 12.4+ kHz range:
If I lower the 'D' to '5', average bitrate for regular music increases to 436 kbps.
So noise behavior in the 12.4+ kHz range has an influence on deciding between '5' and 'D'.
But that's a bit of a contradiction towards the fact that the 23345 setting yields a bitrate of 425 kbps.
I know these things can happen in a world of averages, but this behavior is a bit strong and I wonder whether there may be another issue causing it.

Anyway I suggest to use 11235-11336-1234D-1245F as an internal default (with 1245F having to be refined later).

I'll find a spreading strategy for -3 next.

Thanks, Nick, for your wisdom of using HEX values for the spreading length. When I was thinking about a spreading length parameter I had only spreading lengths of up to 9 in mind.

@Josef Pohm: Do you mind trying your setF with option -spf 11235-11336-1234D-1245F (just quality -2)?

@Nick: As I'll be working with -3 for the first time the codec block size question comes to me. Guess from your and Josef Pohm's results for -3 it makes sense to use a smaller codec blocksize. As I'm using FLAC guess 576 is the most welcome blocksize for -3.
How I can achieve this?
Brings back the question of blocksize control how to. For experimenting your former codec blocksize option wasn't bad.
But we can go a bit more into the final direction I think. Could be something like:
Default blocksize without special codec options (like -tak): 1024 as a general default (current behavior).
-tak behavior: 1024 for -1 and -2, 512 for -3.
-flac behavior: 1024 for -1 and -2, 576 for -3.
-wv behavior: IIRC David Bryant said wavPack doesn't like small blocksizes. Ideally he can say what's best. Right now I think we should just stick to the default blocksize of 1024.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-10-26 10:50:47
In regular lossless coding, the choice of most efficient block length depends on the content.

The same is true of lossy/lossless coding, but the sweet spot is probably a shorter block length.

Unless some adaptive block length switching is employed (I don't suggest this!) then the optimum block length should be judged on a wide range of content, and possibly judged on different genres separately and the results published.

With the block length tied to the encoding quality pre-set, you risk the bizarre (though possibly inevitable) situation of certain content giving a higher bitrate at lower quality, because the short block length is inappropriate for that content.

Just a thought. I don't have an answer!

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2007-10-26 11:54:03
... With the block length tied to the encoding quality pre-set, you risk the bizarre (though possibly inevitable) situation of certain content giving a higher bitrate at lower quality, because the short block length is inappropriate for that content. ...

Hopefully there will be progress for a long time covering more and more special situations, but at the moment I'm very content if we'll arrive at a very good average bitrate.
I wouldn't care much about certain 'unnatural' bitrate increases as long it's restricted and as long as average bitrate is good.
I am more worried about 'bizarre' quality drops, that's why I didn't want to consider a lower block size than 1024 until recently. But I think with -3 it's okay. On one hand I don't see a real a priori danger that we'll run into trouble, and on the other hand -3 users do want relatively low bitrate while keeping up excellent quality - but they accept that they expose their encodings a bit more to the risk that quality is suboptimal. Within this framework to me it's okay to use a blocksize in the 5xx range for -3.
Title: lossyWAV Development
Post by: Josef Pohm on 2007-10-26 16:03:12
Quote
@Josef Pohm: Do you mind trying your setF with option -spf 11235-11336-1234D-1245F (just quality -2)?

Great work Halb! Comparison of 0.3.18-Halb and 0.3.18 on my SetF. While I was at it, I tried your settings also on -1 and -3.

Bits to remove table.
Code: [Select]
------- -------------------- -------------------- --------------------
|      |       0.3.18H      |       0.3.18       |     18H vs 18      |
|       ------ ------ ------ ------ ------ ------ ------ ------ ------
|      |   1  |   2  |   3  |   1  |   2  |   3  |   1  |   2  |   3  |
------- ------ ------ ------ ------ ------ ------ ------ ------ ------
|  512 | 5,69 | 5,92 | 6,21 | 5,22 | 5,36 | 5,88 | 0,47 | 0,56 | 0,33 |
| 1024 | 5,40 | 5,55 | 5,80 | 4,84 | 4,93 | 5,44 | 0,56 | 0,62 | 0,36 |
| 2048 | 4,99 | 5,19 | 5,46 | 4,40 | 4,52 | 5,11 | 0,59 | 0,67 | 0,35 |
| 4096 | 4,57 | 4,78 | 5,07 | 3,91 | 4,05 | 4,71 | 0,66 | 0,73 | 0,36 |
------- ------ ------ ------ ------ ------ ------ ------ ------ ------


TAK 1.0.2b1 -p3m bitrate table (lossless 862kbps).
Code: [Select]
------- ----------------- ----------------- -----------------
|      |  TAK on 0.3.18H |  TAK on 0.3.18  |    18H vs 18    |
|       ----- ----- ----- ----- ----- ----- ----- ----- -----
|      |  1  |  2  |  3  |  1  |  2  |  3  |  1  |  2  |  3  |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----
|  512 | 423 | 406 | 385 | 458 | 447 | 409 | -35 | -41 | -24 |
| 1024 | 430 | 418 | 401 | 472 | 465 | 426 | -42 | -47 | -25 |
| 2048 | 453 | 437 | 418 | 498 | 488 | 443 | -45 | -51 | -25 |
| 4096 | 481 | 465 | 443 | 532 | 521 | 470 | -51 | -56 | -27 |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----


... With the block length tied to the encoding quality pre-set, you risk the bizarre (though possibly inevitable) situation of certain content giving a higher bitrate at lower quality, because the short block length is inappropriate for that content. ...


I had a short test session on that matter in the early days of LossyFLAC.

From my post here (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=500896) a frame size of 512 (and also 256) SEEMS to offer better compression ratios for all codecs but wavpack. That said, WavPack SEEMS to work well with a frame size of 1024, where it performs, in any case, slightly better than Flac.

Frame size of 128, on the other hand, SEEMS to result in generalized loss of compression performance for all codecs.

Moreover, David Bryant unveiled here (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=501042) a couple of quite promising news for possible further optimizations...

So I agree on using a frame size 1024/512 for TAK, 1152/576 for FLAC (though -1:1024/-2:512/-3:256 may also be tempting, even if in the past I heard use of smaller frames is not considered completely safe) and to better clarify WV status.
Title: lossyWAV Development
Post by: halb27 on 2007-10-26 17:28:05
Thanks a lot. Looks good.

So you suggest a codec blocksize of 1152/576 for FLAC.
Does anybody see a problem in that this is not in correspondance with the FFT lengths?
Title: lossyWAV Development
Post by: Josef Pohm on 2007-10-26 22:13:37
...So you suggest a codec blocksize of 1152/576 for FLAC...

No, sorry for I wasn't clear enough, but I didn't mean that. Actually I don't have an ultimate opinion whether to go for <1152;576>, <1024;512> or a mixed solution, concerning FLAC.

I only meant I agree that [1152 (or 1024) for <-1;-2>] and [576 (or 512) for <-3>], should be okay for FLAC. I wanted to keep it simple and ended up being inaccurate. Sorry again.
Title: lossyWAV Development
Post by: halb27 on 2007-10-26 22:15:39
No problem, thank you for clarification.
Title: lossyWAV Development
Post by: halb27 on 2007-10-26 22:29:47
Spreading strategy for -3 settled:

11236-1246E for 64-1024 FFT length.

Yields 390 kbps (regular music) resp. 493 kbps (problem samples) on average with my sample sets (using FLAC with a blocksize of 1024).

Quite remarkable is the difference of 103 kbps between regular and problematic samples.
This is more than the 94 kbps difference of the -2 setting I found. So maybe in combination with a more defensive value for -nts or another option this setting may be a better basis for -2. Will try later when I have found out more about the other options.

Everbody who wants to try this -3 setting may use the options:-3 -spf 11236-FFFFF-1246D-FFFFF.

Before finding an adequate setting for -1 I will try to analyze the effects of -skew and -snr.
My regular and problematic sample sets seem to be quite adequate to find out about differentiating behavior of option values in this respect.

As I said before my heart is pretty much with skew, but after having thought about your remark, Nick, that -snr strengthens the effect of skew I'm curious learning about the behavior of both of these options.
Title: lossyWAV Development
Post by: Mitch 1 2 on 2007-10-27 08:03:01
Out of curiosity, I processed a whole album with lossyWAV, and encoded it to Windows Media Audio 9.2 Lossless (WMALSL).
For comparison, I used FLAC -5 (default), and used lossyWAV -spf 11235-11336-1234D-1245F with both codecs.

Code: [Select]
Comparison of FLAC and WMALSL, with and without lossyWAV pre-processing

Format      | Total Size        | % of WAV Size | % of Unpreprocessed Size | Avg. Bitrate
        WAV | 691 905 184 bytes | 100.00        |                          | 1411 kbps
       FLAC | 384 957 786 bytes | 55.64         |                          |  785 kbps
  lossyFLAC | 233 040 957 bytes | 33.68         | 60.54                    |  475 kbps
     WMALSL | 373 569 806 bytes | 53.99         |                          |  822 kbps
lossyWMALSL | 208 287 236 bytes | 30.10         | 55.76                    |  448 kbps


Clearly, WMA Lossless benefits significantly from lossyWAV pre-processing.
Title: lossyWAV Development
Post by: shadowking on 2007-10-27 08:26:31
Out of curiosity, I processed a whole album with lossyWAV, and encoded it to Windows Media Audio 9.2 Lossless (WMALSL).
For comparison, I used FLAC -5 (default), and used lossyWAV -spf 11235-11336-1234D-1245F with both codecs.

Code: [Select]
Comparison of FLAC and WMALSL, with and without lossyWAV pre-processing

Format      | Total Size        | % of WAV Size | % of Unpreprocessed Size | Avg. Bitrate
        WAV | 691 905 184 bytes | 100.00        |                          | 1411 kbps
       FLAC | 384 957 786 bytes | 55.64         |                          |  785 kbps
  lossyFLAC | 233 040 957 bytes | 33.68         | 60.54                    |  475 kbps
     WMALSL | 373 569 806 bytes | 53.99         |                          |  822 kbps
lossyWMALSL | 208 287 236 bytes | 30.10         | 55.76                    |  448 kbps



Clearly, WMA Lossless benefits significantly from lossyWAV pre-processing.


Nice. Good work Nickc, halb27, 2bdecided and everyone else involved.
Title: lossyWAV Development
Post by: halb27 on 2007-10-27 11:10:37
I finished my analysis of -skew and -snr:

Results for -3 -spf 11236-FFFF-1246E-FFFF -skew x -snr y (encoded with FLAC using a blocksize of 1024)
Results are given as bitrate in kbps for regular / for problematic tracks:

Code: [Select]
        | -skew 0 | -skew 12| -skew 20| -skew 24| -skew 36
-snr 0  |389 / 483|390 / 493|393 / 504|398 / 514|435 / 551
-snr 12 |         |390 / 493|         |398 / 514|435 / 551
-snr 24 |         |397 / 500|         |407 / 524|439 / 560

Looking at the first row (-snr 0):
Nick's default -skew 12 yields a significant security margin practically for free!
-skew 20 increases it, and it's still more or less for free.
From roughly -skew 24 on there's a price to pay: bitrate of the problematic samples increases, but so does the bitrate for regular music. The relation is still favorable at around -skew 24, but we're starting getting diminishing returns concerning the relation of the bitrate increase of problematic vs. regular tracks.

Looking at the other rows:
-snr 12 yields the same results as -snr 0 and thus is not interesting.
Loooking at -snr 24 there is something going on. Roughly speaking however it's more of a general bitrate increase as can be achieved more directly via -nts. It's not exactly true with -skew 36 -snr 24 where bitrate increase is higher for the problematic tracks, but -skew 36 isn't very interesting (see below).

-skew has an astoshing effect on security, and it's more or less for free (for -skew <~ 24).
However we have to face the fact that it covers improved security only in the frequency range below 3.5 kHz (and most of the effect goes into the 1.5- kHz region).
So IMO it wouldn't be a balanced strategy to use a large -skew value. We would pay for benefits restricted to this frequency area. It's okay to pay a little bit, but if we want to pay much, IMO we should do it more generally (use a more defensive -nts value).

I thought -3 is based on a codec blocksize of 1024 but I was wrong: it's 512. So it's wise to use this blocksize with FLAC as well.
For -2 of course I used FLAC wirh a blocksize of 1024.

So my final settings for -2 and -3 and the results for my test sets are:

-3 -spf 11236-FFFFF-1246E-FFFFF -skew 24 -snr 0      => 386 kbps (regular music) / 508 kbps (problem samples)

-2 -spf 11235-11336-1234D-FFFFF -skew 24 -snr 0    => 426 kbps (regular music) / 534 kbps (problem samples)

Now that we've reached the 3xx kbps region hopefully some nice guys come up and do some listening tests.
It's not about just problem samples, also regular music may be harmed by our rather simple method when in the 3xx kbps range.

I'll work on the -1 setting within the next days, but first will give me some rest.

Edited: -skew 20 changed to -skew 24 in the final setting. IMO that's better relation security vs. price.
Title: lossyWAV Development
Post by: halb27 on 2007-10-27 23:23:29
Spreading strategy for -1 settled.

To put everything in one place:

-1 -spf 11124-11225-11236-12347 -skew 24 -snr 0 (yielding 488 / 560 kbps on avg. for my regular resp. problem sample set)
-2 -spf 11235-11336-1234D-FFFFF -skew 24 -snr 0 (yielding 426 / 534 kbps on avg. for my regular resp. problem sample set)
-3 -spf 11236-FFFFF-1246E-FFFFF -skew 24 -snr 0 (yielding 386 / 508 kbps on avg. for my regular resp. problem sample set)

Even -1 yields a bitrate below 500 kbps (with my set).

Looks pretty well graduated with respect to resulting bitrate as well as the degree to which average building is done defensively within the 5 frequency regions obeying the critical band criterion.

Nick, what do you think about putting this into fixed software and leave the -nts option as the only quality related option for the user? Not right now but after a certain time giving room for improvement.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-29 07:54:40
Spreading strategy for -1 settled.

To put everything in one place:

-1 -spf 11124-11225-11236-12347 -skew 24 -snr 0 (yielding 488 / 560 kbps on avg. for my regular resp. problem sample set)
-2 -spf 11235-11336-1234D-FFFFF -skew 24 -snr 0 (yielding 426 / 534 kbps on avg. for my regular resp. problem sample set)
-3 -spf 11236-FFFFF-1246E-FFFFF -skew 24 -snr 0 (yielding 386 / 508 kbps on avg. for my regular resp. problem sample set)

Even -1 yields a bitrate below 500 kbps (with my set).

Looks pretty well graduated with respect to resulting bitrate as well as the degree to which average building is done defensively within the 5 frequency regions obeying the critical band criterion.

Nick, what do you think about putting this into fixed software and leave the -nts option as the only quality related option for the user? Not right now but after a certain time giving room for improvement.
I don't know - my home broadband goes down on Friday morning, I have no access to the internet for 3 days - and all hell breaks loose on the thread!!!

@Halb27 - Many thanks for the *extensive* testing to get the spreading function parameters fixed. I will implement your latest as default (including -skew 24).

@Mitch 1 2 - Excellent find! Should extend the userbase of David's method......

As an aside, I noticed a bug in v0.3.18: -snr was not working correctly. I have amended and will post.

In the interim, I've been playing with assembler and have optimised the code somewhat, so it should run faster. I only have Intel C2D platforms for testing, so (selfishly?) the optimisations are for this chip.
Title: lossyWAV Development
Post by: halb27 on 2007-10-29 09:03:59
Hi Nick,

I've really worried what has happened to you as you've always been so busy with this thread and we haven't heard of you for so long. Just an internet breakdown - not too bad giving place for other things to do.

Well, as -snr wasn't correctly in place with v0.3.18 I'll try again -skew/-snr combinations as soon as you can provide a fixed version.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-29 09:16:37
Hi Nick,

I've really worried what has happened to you as you've always been so busy with this thread and we haven't heard of you for so long. Just an internet breakdown - not too bad giving place for other things to do.

Well, as -snr wasn't correctly in place with v0.3.18 I'll try again -skew/-snr combinations as soon as you can provide a fixed version.
lossyWAV alpha v0.3.19 attached: Superseded; faster, -snr now working "correctly", -spf now allows 1..9;A..Z input to allow up to 35 bin averaging(!).

Having no broadband at home is really tedious......

[edit] My 52 sample set (default parameters other than -1, -2 & -3): WAV: 121.53MB; FLAC: 68.2MB / 791.9kbps; -1: 50.15MB / 582.3kbps; -2: 44.09MB / 512kbps; -3: 39.5MB / 458.7kbps [/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-10-29 11:44:44
Looking at your result I guess you include already the -spf values for -1, -2, -3 which I found.

Your sample set is to a large extent a set of hard samples. I think for bitrate consideration it is good to have a hopefully representive set of full length tracks from your collection on one hand, and a set of sample tracks supposed to require a very high bitrate on the other hand.
As you can consider your 52 sample set to be more or less a set of the second kind, an additional set with regular music would be most welcome IMO. Bitrate of this set will be considerably lower.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-29 12:01:52
Looking at your result I guess you include already the -spf values for -1, -2, -3 which I found.

Your sample set is to a large extent a set of hard samples. I think for bitrate consideration it is good to have a hopefully representive set of full length tracks from your collection on one hand, and a set of sample tracks supposed to require a very high bitrate on the other hand.
As you can consider your 52 sample set to be more or less a set of the second kind, an additional set with regular music would be most welcome IMO. Bitrate of this set will be considerably lower.
Yes, I forgot to mention that the revised -spf defaults are per your testing. I will start to transcode a selection from my archive and revert.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-29 17:04:35
Following testing of alpha v0.3.19 on a few albums:
Code: [Select]
lossyWAV alpha v0.3.19

| Artist - Album                                    | Lossless; FLAC -8 |    -2; FLAC -8    |    -3; FLAC -8    |

| AC-DC - Dirty Deeds Done Dirt Cheap               |  220MB / 781 kbps |  122MB / 435 kbps |  112MB / 399 kbps |
| B-52's - Good Stuff                               |  398MB / 993 kbps |  184MB / 459 kbps |  169MB / 423 kbps |
| China Crisis - Flaunt The Imperfection            |  238MB / 774 kbps |  132MB / 431 kbps |  121MB / 394 kbps |
| Chris Isaak - Chris Isaak                         |  227MB / 878 kbps |  114MB / 441 kbps |  104MB / 404 kbps |
| Climie Fisher - Everything                        |  308MB / 910 kbps |  149MB / 440 kbps |  137MB / 406 kbps |
| Dave Stewart and the Spiritual Cowboys - Honest   |  330MB / 835 kbps |  172MB / 436 kbps |  157MB / 397 kbps |
| Fish - From The Mirror                            |  274MB / 854 kbps |  136MB / 425 kbps |  125MB / 390 kbps |
| Gary Moore - Out In The Fields (The Very Best Of) |  495MB / 976 kbps |  226MB / 447 kbps |  208MB / 412 kbps |
| Gerry Rafferty - City To City                     |  307MB / 802 kbps |  165MB / 431 kbps |  150MB / 392 kbps |
| Iron Maiden - Can I Play With Madness             |  206MB / 784 kbps |  118MB / 45O kbps |  110MB / 419 kbps |
| Jean Michel Jarre - Oxygene                       |  219MB / 773 kbps |  143MB / 506 kbps |  130MB / 459 kbps |
| Marillion - Real to Reel (Live)                   |  305MB / 821 kbps |  172MB / 464 kbps |  158MB / 425 kbps |
| Mike Oldfield - Discovery                         |  237MB / 804 kbps |  129MB / 438 kbps |  118MB / 399 kbps |
| Mike Oldfield - QE2                               |  243MB / 855 kbps |  133MB / 469 kbps |  121MB / 425 kbps |
| Scorpions - Best of Rockers'N'Ballads             |  451MB / 922 kbps |  225MB / 460 kbps |  209MB / 428 kbps |
| The Shamen - Boss Drum                            |  433MB / 922 kbps |  220MB / 470 kbps |  202MB / 431 kbps |
| Van Morrison - Astral Weeks                       |  255MB / 757 kbps |  148MB / 440 kbps |  136MB / 404 kbps |
| Voice of the Beehive - Honey Lingers              |  213MB / 938 kbps |   99MB / 434 kbps |   92MB / 402 kbps |

| Average                                           | 5369MB / 863 kbps | 2796MB / 449 kbps | 2567MB / 412 kbps |
[/size]
Title: lossyWAV Development
Post by: user on 2007-10-29 17:25:11
Congratulations to your great development of all the nice people involved !

Cool to see these results and the team spirit !

(though on a side note, I question myself, if I will apply and try out it in future, if i should get a flac capable device or if many people will use it, and not only some tech HA experienced.
As it would mean another transcoding or parallel encoding step and additional space for storage, as the true lossless is kept anyways, as i think (by myself also), that people interested that in quality to consider using 400 kbit/s, they have true Loslsess interest anyways.
Though it is possible to lower Lossless bitrates by 50% from eg. 860 kbit/s down to 400-450 kbit/s now, it isn't lossless anymore, and still way above eg. the "standard 320 kbit/s bitrate" which is considered by most either as overkill or as already nearly transparent in most cases dependent on the codec and the point of views. <-- uu, long sentence.
I think, lossy wavpack eg. could have similar bitrates and probably same transparency at these bitrates (as lossy wavpack is tested down to only ca. 200 kbit/s). Nevertheless only tech experienced guys, even only few at HA, use lossy wavpack (or other modern codecs at highest quality settings, consider ogg, mpc, aac at such bitrates).
Of course, for flac it is interesting due to increasing hardware/portable support to offer a space limit orientated bitrate solution. (well, still, who uses seriously eg. 320 k mp3 for portable usage?)
For home HiFi usage, you have nearly unlimited space due to big and quite cheapo HDs or DVD+-R as even cheaper storage, so it doesn't matter really if the bitrate is 400-450 or averaged between 800-1000 like for Lossless (flac).)
Title: lossyWAV Development
Post by: halb27 on 2007-10-29 19:05:17
Following testing of alpha v0.3.19 on a few albums:
Code: [Select]
lossyWAV alpha v0.3.19
...
| Artist - Album                                    | Lossless; FLAC -8 |    -2; FLAC -8    |    -3; FLAC -8    |
| Average                                           | 5369MB / 863 kbps | 2796MB / 449 kbps | 2567MB / 412 kbps |
[/size]

Nice results - though I'm a bit disapointed about the -3 result which I had expected to have of lower bitrate.

Should we try to go a bit deeper in bitrate with -3?
But maybe once I was busy a lot with -3 I'm sporting too much to achieve a bitrate below 400 kbps on average with regular music.

Feedback welcome.
Title: lossyWAV Development
Post by: halb27 on 2007-10-29 19:27:19
... As it would mean another transcoding or parallel encoding step and additional space for storage, as the true lossless is kept anyways, as i think (by myself also), that people interested that in quality to consider using 400 kbit/s, they have true Loslsess interest anyways. ...

The practical use of this procedure is certainly limited to rather few people. mp3, vorbis, aac or mpc make nearly everbody happy for portable use or even for home hifi use. For archiving purposes storage technology is thus that most of us can afford archiving lossless. But there are niches where people might find it useful. I personally want to use it as a space saving alternative to a lossless codec on my DAP. With my 40 GB DAP and selective collection I can afford using a codec which requires an average bitrate in the 400 kbps range. Using it this way can be done right now. Another interest may be to use the -1 quality level for archiving which can be useful even today for owners of huge musical collections. In this case however it may be wise to wait until some more feedback is available regarding quality.

As for that everybody is highly welcome to share practical experience. Using -3 I guess there is a chance to prove the current state of this approach as worth improving.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-29 20:35:24
Following testing of alpha v0.3.19 on a few albums:
Code: [Select]
lossyWAV alpha v0.3.19
...
| Artist - Album                                    | Lossless; FLAC -8 |    -2; FLAC -8    |    -3; FLAC -8    |
| Average                                           | 5369MB / 863 kbps | 2796MB / 449 kbps | 2567MB / 412 kbps |
[/size]
Nice results - though I'm a bit disapointed about the -3 result which I had expected to have of lower bitrate.

Should we try to go a bit deeper in bitrate with -3?
But maybe once I was busy a lot with -3 I'm sporting too much to achieve a bitrate below 400 kbps on average with regular music.

Feedback welcome.
User error I'm afraid - I forgot to FLAC recode at -b 512...... amended results as follows:
Code: [Select]
lossyWAV alpha v0.3.19

| Artist - Album                                    | Lossless; FLAC -8 |    -2; FLAC -8    |    -3; FLAC -8    |

| AC-DC - Dirty Deeds Done Dirt Cheap               |  220MB / 781 kbps |  122MB / 435 kbps |  110MB / 391 kbps |
| B-52's - Good Stuff                               |  398MB / 993 kbps |  184MB / 459 kbps |  162MB / 404 kbps |
| China Crisis - Flaunt The Imperfection            |  238MB / 774 kbps |  132MB / 431 kbps |  117MB / 382 kbps |
| Chris Isaak - Chris Isaak                         |  227MB / 878 kbps |  114MB / 441 kbps |  101MB / 392 kbps |
| Climie Fisher - Everything                        |  308MB / 910 kbps |  149MB / 440 kbps |  131MB / 387 kbps |
| Dave Stewart and the Spiritual Cowboys - Honest   |  330MB / 835 kbps |  172MB / 436 kbps |  152MB / 385 kbps |
| Fish - From The Mirror                            |  274MB / 854 kbps |  136MB / 425 kbps |  120MB / 377 kbps |
| Gary Moore - Out In The Fields (The Very Best Of) |  495MB / 976 kbps |  226MB / 447 kbps |  202MB / 400 kbps |
| Gerry Rafferty - City To City                     |  307MB / 802 kbps |  165MB / 431 kbps |  147MB / 383 kbps |
| Iron Maiden - Can I Play With Madness             |  206MB / 784 kbps |  118MB / 45O kbps |  106MB / 405 kbps |
| Jean Michel Jarre - Oxygene                       |  219MB / 773 kbps |  143MB / 506 kbps |  127MB / 450 kbps |
| Marillion - Real to Reel (Live)                   |  305MB / 821 kbps |  172MB / 464 kbps |  154MB / 414 kbps |
| Mike Oldfield - Discovery                         |  237MB / 804 kbps |  129MB / 438 kbps |  115MB / 390 kbps |
| Mike Oldfield - QE2                               |  243MB / 855 kbps |  133MB / 469 kbps |  118MB / 416 kbps |
| Scorpions - Best of Rockers'N'Ballads             |  451MB / 922 kbps |  225MB / 460 kbps |  203MB / 415 kbps |
| The Shamen - Boss Drum                            |  433MB / 922 kbps |  220MB / 470 kbps |  190MB / 405 kbps |
| Van Morrison - Astral Weeks                       |  255MB / 757 kbps |  148MB / 440 kbps |  133MB / 395 kbps |
| Voice of the Beehive - Honey Lingers              |  213MB / 938 kbps |   99MB / 434 kbps |   88MB / 385 kbps |

| Average                                           | 5369MB / 863 kbps | 2796MB / 449 kbps | 2484MB / 399 kbps |
[/size]
Title: lossyWAV Development
Post by: halb27 on 2007-10-29 20:47:35
Wonderful. Something like this is what I expected.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-29 21:23:12
Wonderful. Something like this is what I expected.
  It might be worth being less conservative with the -nts parameter, i.e. try -nts 0 for -3 to see what that does to the bitrate. On my "problem" set:

-3 -nts -0.5 -skew 24 -snr 12 > 458.7kbps; (default -3)
-3 -nts -0.5 -skew 18 -snr 12 > 446.1kbps;
-3 -nts -0.5 -skew 18 -snr 18 > 448.3kbps;

-3 -nts 0 -skew 12 -snr 6 > 433.3kbps;
-3 -nts 0 -skew 12 -snr 12 > 433.3kbps;
-3 -nts 0 -skew 12 -snr 18 > 435.8kbps;

-3 -nts 0 -skew 18 -snr 12 > 440.2kbps;
-3 -nts 0 -skew 18 -snr 18 > 442.9kbps.

-3 -nts 0 -skew 24 -snr 12 > 452.9kbps;
Title: lossyWAV Development
Post by: halb27 on 2007-10-29 21:51:56
Hi Nick,

I just started examining the behavior of -3 with respect to -skew and -snr.
I only started using -snr cause I think there's something wrong:

Code: [Select]
        | -skew 0 | -skew 12| -skew 24| -skew 36
-snr 0  |390 / 510|390 / 510|390 / 510|390 / 510


These values can't  be identical to my former test, cause I used FLAC -b 1024 then and FLAC -b 512 now. But I wonder what's wrong hear: identical results with various -skew values is not what I expected.
390/510 is a good result IMO, but is expected to be achieved with around -skew 24.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-29 22:13:07
Hi Nick,

I just started examining the behavior of -3 with respect to -skew and -snr.
I only started using -snr cause I think there's something wrong:

Code: [Select]
        | -skew 0 | -skew 12| -skew 24| -skew 36
-snr 0  |390 / 510|390 / 510|390 / 510|390 / 510

These values can't  be identical to my former test, cause I used FLAC -b 1024 then and FLAC -b 512 now. But I wonder what's wrong hear: identical results with various -skew values is not what I expected.
390/510 is a good result IMO, but is expected to be achieved with around -skew 24.
I've run some skew tests on my 52 sample set:

-3 -skew 0 -snr 0 > 433.0kbps;
-3 -skew 6 -snr 0 > 435.3kbps;
-3 -skew 12 -snr 0 > 439.1kbps;
-3 -skew 18 -snr 0 > 446.1kbps;
-3 -skew 24 -snr 0 > 458.7kbps;
-3 -skew 30 -snr 0 > 479.8kbps;
-3 -skew 36 -snr 0 > 511.3kbps.

Is it possible that *none* of your samples have a minimum result below 3.45kHz?
Title: lossyWAV Development
Post by: Josef Pohm on 2007-10-30 07:44:19
@Mitch 1 2 - Excellent find! Should extend the userbase of David's method......

Nice to see that WMALSL is working. I gave it a quick run and, with my old version of WMALSL, it looks like best frame size for that codec is 2048. When somebody else can confirm that is the case also with newer versions, we may want to add a dedicated switch, to avoid people using it with frame size 512 or 1024.

By the way @2048 WMALSL performs halfway between TAK and FLAC.

Set F, WMALSL-WMP9, 0.3.18 11236-FFFFF-1246D-FFFFF
Code: [Select]
------- ----- ----- ----- 
|      |  1  |  2  |  3  |
------- ----- ----- -----
|  512 | 434 | 427 | 425 |
| 1024 | 432 | 427 | 425 |
| 2048 | 430 | 424 | 422 |
| 4096 | 460 | 453 | 451 |
------- ----- ----- -----
Title: lossyWAV Development
Post by: Nick.C on 2007-10-30 07:55:23
@Mitch 1 2 - Excellent find! Should extend the userbase of David's method......
Nice to see that WMALSL is working. I gave it a quick run and, with my old version of WMALSL, it looks like best frame size for that codec is 2048. When somebody else can confirm that is the case also with newer versions, we may want to add a dedicated switch, to avoid people using it with frame size 512 or 1024.

By the way @2048 WMALSL performs halfway between TAK and FLAC.

Set F, WMALSL-WMP9, 0.3.18 11236-FFFFF-1246D-FFFFF
Code: [Select]
------- ----- ----- ----- 
|      |  1  |  2  |  3  |
------- ----- ----- -----
|  512 | 434 | 427 | 425 |
| 1024 | 432 | 427 | 425 |
| 2048 | 430 | 424 | 422 |
| 4096 | 460 | 453 | 451 |
------- ----- ----- -----
So, a -wm parameter to set codec_block_size to 2048 for all quality levels for WMALSL?
Title: lossyWAV Development
Post by: halb27 on 2007-10-30 08:02:20
Hi Nick,
I just started examining the behavior of -3 with respect to -skew and -snr. ...

I've run some skew tests on my 52 sample set:

-3 -skew 0 -snr 0 > 433.0kbps;
-3 -skew 6 -snr 0 > 435.3kbps;
-3 -skew 12 -snr 0 > 439.1kbps;
-3 -skew 18 -snr 0 > 446.1kbps;
-3 -skew 24 -snr 0 > 458.7kbps;
-3 -skew 30 -snr 0 > 479.8kbps;
-3 -skew 36 -snr 0 > 511.3kbps.

Is it possible that *none* of your samples have a minimum result below 3.45kHz?

My problem sample set should respond at least as heavy as yours on -skew variation (and it did with my v0.3.18 test).
Thanks for your test. I must have done something wrong and will look into it.
Title: lossyWAV Development
Post by: halb27 on 2007-10-30 08:20:44
So, a -wm parameter to set codec_block_size to 2048 for all quality levels for WMALSL?

Nice we have another candidate for a codec-specific option.

As for the internal codec block size: I think if we're working internally with a blocksize of 1024, there is no problem to use a blocksize of 2048 with a lossless encoder if this is most effective with it in an overall sense. Lossless encoder blocksize should be just a multiple of the internal lossyWav blocksize.

But it brings up the question: what is the meaning of our lossyWav internal blocksize at all?
Taking the big view not looking at internal details we have a two stage process:
Stage 1: Transform the input wav file to an output wav file with the effect of bringing as many LSBs of each sample to zero as long as we can expect this doesn't have an audible impact.
Stage 2: Use a lossless codec on the output of stage 1.

In principle there is no use talking about blocks within stage 1. We can think of the stage 1 process as of a process concerning each sample individually.

We should give advice for blocksize use with the various encoders. Encoders take profit from short blocks as this adapts best to what's done in stage 1. But as encoders are partially not efficient with short blocks (wavPack, WMAlossless) a best general compromise has to be found for each codec. This seems to be not difficult.

I guess thinking of a codec blocksize within stage 1 is mixed up with what it's really up to: FFT windowing.
When getting it clearer we may improve things - maybe as well as with respect to quality as well as with respect to practical usage.
Title: lossyWAV Development
Post by: Mitch 1 2 on 2007-10-30 10:26:15
Nice to see that WMALSL is working. I gave it a quick run and, with my old version of WMALSL, it looks like best frame size for that codec is 2048. When somebody else can confirm that is the case also with newer versions, we may want to add a dedicated switch, to avoid people using it with frame size 512 or 1024.

I came to the same conclusion, but I used a hex editor.
Title: lossyWAV Development
Post by: halb27 on 2007-10-30 10:35:57
Just for knowledge about the FFT windowing / codec block details:

Is my imagination about the current way of doing it correct?:
For definiteness let's talk about -3 (codec block size: 512, FFT lengths of 64 and 1024), and for a moment let's ignore the effect of spreading, skewing and using
-snr.

We're looking at a specific 512 sample codec block CB.
FFT analysis of length 1024 is done starting with the first sample of CB.
The analysis result is applied to all the 512 samples of CB.
FFT analysis of length 64 is applied to the 8 consecutive 64-sample-subblocks SB1, ... , SB8 of CB.
In principle we can look at each of the SB1, ..., SB8 seperately and apply the FFT analysis of length 1024 to any of these subblocks currently under investigation: look for the lowest bin in both FFTs and decide about the number of bits to remove based on this minimum bin. In principle this needs to be restricted to only to the 64-sample-subblocks, but we use it as a temporary result, look at all the subblocks of CB, and then - based on the subresults of each subblock decide on the bits to remove for the entire CB.

In principle we can decide on the bits to remove on a 64 sample block basis which corresponds to the short 64 sample FFT.
Sure we mix information that belong to 1024 samples with information that belong to 64 samples which formally is not correct. But if we want to be that correct we also may not use a codec blocksize of 512 with a FFT length of 1024 (or as with -1 a codec blocksize of 1024 with a FFT length of 2048 [which resulted from a probably bad idea of mine - so we should either return back to a blocksize of 2048 or maybe better skip the 2048 FFT]).
Other than that we can improve when thinking of more adequate FFT windows - for instance build several 1024 sample FFT windows (8 in the extreme case) in a way that the 64 sample window under investigation is more or less in the center of a 1024 sample FFT window. Or something more intelligent.

Brings back the idea of overlapping FFT analysis you offered already, Nick, in a specific form.

Anyway, by a considerations like these we seperate FFT analysis considerations from codec block size considerations which should belong to the lossless encoder of stage 2 alone.

Edited: nonsense removed.
Title: lossyWAV Development
Post by: halb27 on 2007-10-30 12:29:46
Just one more idea:

Though I love the idea of deciding (at least in principle) for each individual sample about the number of bits to remove we can see it a bit more practically:

Overall view:
Our stage 1 process provides blocks of 512 samples, and all samples within this block have the same number of bits removed.
We do it under all circumstances, that is especially for -1, -2, -3.
This way we are free with the stage 2 encoder to use any multiple of 512 as the blocksize, and for our best knowledge so far it's easy to find an appropriate blocksize (for instance 512 for FLAC and TAK, 1024 for wavPack, 2048 for WMAlossless).
Especially bitrate for -2 would still go down a bit with FLAC and TAK and a blocksize of 512.

Detail view for stage 1:
With a 512 sample block we can easily let it consist of several consecutive length-64-FFT and (for -1, -2) length-256-FFT windows.
We can build for each 512 sample block an individual length-1024-FFT in a way that our 512 sample block lies in the middle of the 1024 sample FFT window. (Looking at only the length-1024-FFT windows: these cover the entire track overlappingly).
May be it's good to apply a complex FFT window function for the length-1024-FFT, but I guess the simple approach is good enough.
The length-1024-FFT window contains information from 256 samples in front of and after the block which make up for an inaccuracy. These access sample window parts correspond to ~5.8 msec each - a pretty short period IMO. Moreover in case the shorter FFTs have an independent influence on the number of bits to remove I don't think this is a dangerous inaccuracy. What I mean is: if one of the shorter FFTs yields a very low value bin, and if there's no lower one in the length-1024-FFT, this low value bin from a shorter FFT decides on the number of bits to remove.

But this is the place IMO where we should say goodbye to length-2048-FFTs.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-30 13:04:57
Just one more idea:

Though I love the idea of deciding (at least in principle) for each individual sample about the number of bits to remove we can see it a bit more practically:

Overall view:
Our stage 1 process provides blocks of 512 samples, and all samples within this block have the same number of bits removed.
We do it under all circumstances, that is especially for -1, -2, -3.
This way we are free with the stage 2 encoder to use any multiple of 512 as the blocksize, and for our best knowledge so far it's easy to find an appropriate blocksize (for instance 512 for FLAC and TAK, 1024 for wavPack, 2048 for WMAlossless).

Detail view for stage 1:
With a 512 sample block we can easily let it consist of several consecutive length-64-FFT and length-256-FFT windows.
We can build for each 512 sample block an individual length-1024-FFT in a way that our 512 sample block lies in the middle of the 1024 sample FFT window. (Looking at only the length-1024-FFT windows: these cover the entire track overlappingly).
May be it's good to apply a complex FFT window function for the length-1024-FFT, but I guess the simple approach is good enough.
The length-1024-FFT window contains information from 256 samples in front of and after the block which make up for an inaccuracy. These access sample window parts correspond to ~5.8 msec each - a pretty short period IMO. Moreover in case the shorter FFTs have an independent influence on the number of bits to remove I don't think this is a dangerous inaccuracy. What I mean is: if one of the shorter FFTs yields a very low value bin, and if there's no lower one in the length-1024-FFT, this low value bin from a shorter FFT decides on the number of bits to remove.

But this is the place IMO where we should say goodbye to length-2048-FFTs.
I will happily remove 2048 sample fft's from the analysis. Looking at fft analysis, currently there are separate fft analyses carried out on the data in the current codec block, some of the previous block and some of the next block (assuming we are not analysing the ends of the file). The overlap is fft_length/2 and the spacing of analyses is fft_length/2, so for a 1024 sample codec_block_size, 3 fft analyses are performed: -512 to 511; 0 to 1023 and 512 to 1535. For a 512 sample codec_block_size 2 analyses are performed: -512 to 511; 0 to 1023 (-ve samples counts are in the previous block, +ve sample counts in excess of codec_block_size-1 are in the next block).

So, for a 1024 sample codec_block size there are 3 1024 sample fft analyses carried out; 9 256 sample fft analyses carried out and 33  64 sample fft analyses carried out. Spreading, minimum searching and averaging is carried out on all of them and the smallest derived value used to determine bits_to_remove.
Title: lossyWAV Development
Post by: halb27 on 2007-10-30 14:41:02
Thanks for clarification. So you do a lot of overlapping analyses.

Looking at this current way you do it I don't see a reason why not use a lossyWav blocksize of 512 throughout. (I'd like to call it lossyWav blocksize cause it's not neccesarily the blocksize of the encoding codec).
In case there should be something not appropriate with this way of doing the analyses it is so as well with a lossyWav block size of 1024.

A lossyWav blocksize of 512 gives way for any appropriate blocksize as a multiple of 512 in the stage 2 encoding process.

What might be wrong with doing the analyses this way?
Hopefully nothing of course, but I'm a bit afraid of the energy that originates from outside of the codec block influencing the analysis for the codec block. The way it's done energy from ~11.5 msec before and after the block make it into the decision making for the block. So a potential min bin may loose its min status due to energy from outside the block.
If that's fine: alright, if it can be problematic statistics is in favor of 1024 sample lossyWav blocks as for each block say the 1024 sample FFTs provide for a 100% access samples being used whereas with 512 sample lossyWav blocks this extends to 200%.
Anyway it should be problem free (at least problem poor) in any case.

That's all about the way it is.
But what would be the disadvantage with the approach of my last post: overlapping only in the case of length-1024-FFTs (with the 512 sample lossyWav block right in the middle), and with consecutive non-overlapping FFT windows for the other FFT lengths.
Would reduce the foreign energy problematic (in case there is one) and would reduce the number of FFTs.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-10-30 15:20:19
It's inefficient to remove more bits than a given lossless encoder can take advantage of.

So say, for example, you run lossyWAV with 512 and FLAC with 1024.

That means, within any FLAC block, half the samples might have more zeros than FLAC can take advantage of (because the other half have fewer zeros, defining and limiting the number of "wasted_bits" within that FLAC block).

"So what?" you might think. Well, removing more bits equates to adding more noise. And the more noise you add, the less efficient a lossless codec will be (excepting the special case where the "noise" is a string of zeros which it can take advantage of).

So it's possible (and in my very early tests, true) that lossyWAV 512 with FLAC 1024 will give a higher bitrate than lossyWAV 1020 with FLAC 1024 (and, of course, a theoretically lower quality, though hopefully both are transparent).

Cheers,
David.


The way it's done energy from ~11.5 msec before and after the block make it into the decision making for the block. So a potential min bin may loose its min status due to energy from outside the block.
That's intentional. One of the FFT analyses is usually concentrated on the block boundary, which for a 1024-point FFT at 44.1kHz is, as you say, about +/-11.6ms - though the windowing means the effect at the edges is pretty small. The reason for doing this is to catch low energy moments near the block boundary, which could otherwise be completely missed. If you miss them, you add too much noise; worse still, you can put a hefty transition in there as you switch to more bits removed.

More generally, if you don't overlap analysis blocks, then there are moments of the audio that you never check, so you won't know if the noise you're adding is above or below the noise floor during those moments.

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2007-10-30 16:02:20
It's inefficient to remove more bits than a given lossless encoder can take advantage of.
So say, for example, you run lossyWAV with 512 and FLAC with 1024. ... That means, within any FLAC block, half the samples might have more zeros than FLAC can take advantage of (because the other half have fewer zeros, defining and limiting the number of "wasted_bits" within that FLAC block). ...

Sure if we provide 512 sample blocks with lossyWav we loose efficiency when using an encoder with a blocksize of 1024 in case the encoder works efficiently with a blocksize of 512. The lossyWav512/FLAC1024 isn't attractive and should be replaced by lossyWav512/FLAC512. But encoders like wavPack or WMAlossless prefer larger blocksizes for efficiency so it's about finding the sweet spot combination. So maybe lossyWav512/WMAlossless2048 is the better combination than lossyWav512/WMAlossless512 (not for sure at all). But I can't see a mechanism that makes the lossyWav512/WMAlossless2048 inferior to the lossyWav2048/WMAlossless2048 combination. Sure encoder blocksize should always be an integer multiple of lossyWav blocksize.
The way it's done energy from ~11.5 msec before and after the block make it into the decision making for the block. So a potential min bin may loose its min status due to energy from outside the block.
That's intentional. One of the FFT analyses is usually concentrated on the block boundary, which for a 1024-point FFT at 44.1kHz is, as you say, about +/-11.6ms - though the windowing means the effect at the edges is pretty small. The reason for doing this is to catch low energy moments near the block boundary, which could otherwise be completely missed. If you miss them, you add too much noise; worse still, you can put a hefty transition in there as you switch to more bits removed.

More generally, if you don't overlap analysis blocks, then there are moments of the audio that you never check, so you won't know if the noise you're adding is above or below the noise floor during those moments.

Cheers,
David.

You certainly know more about these things than I do. But with a lossyWav blocksize of 512 the length-1024-FFT which is overlapping covers your fears.
So at least there is no need to do this extensive overlapping with the 1024 FFTs.
The shorter FFTs don't hurt my proposal done the way it is done now.
Moreover there is the problem of unwanted energy from outside the block under investigation having an influence in bits to remove for the current block. With my proposal this influence is lower.
Encoding speed improves (though IMO this is a minor aspect).

So in the end: why not just use only 512 sample blocks in lossyWav and just 1 1024-FFT for each of these 512-lossyWav blocks, with the lossyWav block centered in the FFT window?

BTW is there a windowing function like hanning used? With the overlapping it would be most welcome I think and it would reduce potential negative side effects of the 'foreign' samples. It would also reduce errors resulting from a rectangular window.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-30 18:35:41
BTW is there a windowing function like hanning used? With the overlapping it would be most welcome I think and it would reduce potential negative side effects of the 'foreign' samples. It would also reduce errors resulting from a rectangular window.
The Hanning window is used. I did toy with the idea of the centred analysis previously, but at that time I was more concerned with being able to duplicate exactly the results from David's Matlab script.
Title: lossyWAV Development
Post by: halb27 on 2007-10-30 20:38:40
The Hanning window is used. I did toy with the idea of the centred analysis previously, but at that time I was more concerned with being able to duplicate exactly the results from David's Matlab script.

Yes, IMO that was the right thing to do then.
But now we're ahead of that, and it's wonderful that we have the same idea.

Edited: Removed the idea of having a smaller overlap area for the 64 and 256 sample FFT. Not a good idea.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-30 22:19:34
lossyWAV alpha v0.3.20 attached: Superseded.

-overlap parameter added to reduce the end_overlap of FFT analyses to 25% FFT_length rather than 50%.[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV alpha v0.3.20 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-1            extreme quality level (-cbs 1024 -nts -3.0 -skew 30 -snr 24)
-2            default quality level (-cbs 1024 -nts -1.5 -skew 24 -snr 18)
-3            compact quality level (-cbs  512 -nts -0.5 -skew 18 -snr 12)

-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists; default=off

Advanced / System Options:

-nts <n>      set noise_threshold_shift to n dB (-18dB<=n<=0dB)
              (reduces overall bits to remove by 1 bit for every 6.0206dB)
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (0dB<=n<=48dB)
-skew <n>     skew fft analysis results by n dB (0db<=n<=48db) in the
              frequency range 20Hz to 3.45kHz
-cbs <n>      set codec block size to n samples (512<=n<=4608, n mod 16=0)
-overlap      enable aggressive fft overlap method; default=off

-spf <3x5chr> manually input the 3 spreading functions as 3 x 5 characters;
              e.g. 44444-44444-44444; Characters must be one of 1 to 9 and
              A to Z (zero excluded).
-clipping     disable clipping prevention by iteration; default=off
-dither       dither output using triangular dither; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]
Title: lossyWAV Development
Post by: halb27 on 2007-10-30 22:28:00
OOPs, you're so fast!!

I've read a lot about fft overlapping windows, and I haven't seen anybody doing less than 50% overlapping.
I've just removed this part from my post, and a second later seen you having realized it.

Thanks for your version and sorry for the confusion!

But now that you've done it: let's see what 2Bdecided and other people have to say about it.

Anyway for the 1024 sample FFT I think we should do the 1 FFT center approach - at least as long as we're happy with a 50% overlapping of the other FFTs as this has pretty much the same feasibility background.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-30 22:32:38
OOPs, you're so fast!!

I've read a lot about fft overlapping windows, and I haven't seen anybody doing less than 50% overlapping.
I've just removed this part from my post, and a second later seen you having realized it.

Thanks for your version and sorry for the confusion!

But now that you've done it: let's see what 2Bdecided and other people have to say about it.

Anyway for the 1024 sample FFT I think we should do the 1 FFT center approach - at least as long as we're happy with a 50% overlapping of the other FFTs as this has pretty much the same feasibility background.
Or, what about a fixed proportion of the largest FFT_length as the end_overlap? Say, 256, i.e. 0.25 of the 1024, for *all* analyses?
Title: lossyWAV Development
Post by: halb27 on 2007-10-30 22:53:14
Or, what about a fixed proportion of the largest FFT_length as the end_overlap? Say, 256, i.e. 0.25 of the 1024, for *all* analyses?

I guess your concern is the same as mine: for the starting and ending 'overlap' half the FFT_length for the area outside the lossyWav block is a bit much and brings in wrong information to a major extent.
Your approach of 25% seems appropriate to me and corresponds to the 50% overlap between adjacent FFT windows (meaning the most central 50% samples of the FFT windows are considered to take good care of by the hanning windowed FFT analysis).
But why do you want to relate it to the longest FFT? IMO it should be 25% of the current FFT length.
This more general procedure matches perfectly with the 1 FFT center spproach for a lossyWav blocksize of 512 and a 1024 sample FFT.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-30 23:01:53
Or, what about a fixed proportion of the largest FFT_length as the end_overlap? Say, 256, i.e. 0.25 of the 1024, for *all* analyses?

I guess your concern is the same as mine: for the starting and ending 'overlap' half the FFT_length for the area outside the lossyWav block is a bit much and brings in wrong information to a major extent.
Your approach of 25% seems appropriate to me and corresponds to the 50% overlap between adjacent FFT windows (meaning the most central 50% samples of the FFT windows are taken good care of by the hanning windowed FFT analysis).
But why do you want to relate it to the longest FFT? IMO it should be 25% of the current FFT length.
This more general procedure matches perfectly with the 1 FFT center spproach for a lossyWav blocksize of 512 and a 1024 sample FFT.
alpha v0.3.20 uses 25% of the current fft_length for the -overlap option. I'd be interested to know if there is any perceptible difference in the quality of the output as using -overlap increases bits_to_remove.

Although, niggling at the back of my mind is the thought that if it holds that you should overlap by 50% inside a codec block, why would we change that when looking outside the codec block in the end_overlap area?
Title: lossyWAV Development
Post by: halb27 on 2007-10-30 23:08:31
I repeated my v0.3.19 skew and -snr analysis (using -3) which I did wrong yesterday (first result: average bitrate of my full length regular music set, second result: average bitrate from my problem sample set):
Code: [Select]
        | -skew 0 | -skew 12| -skew 24| -skew 36
-snr 0  |382 / 480|383 / 490|390 / 510|421 / 547
-snr 12 |382 / 480|383 / 490|390 / 510|421 / 547
-snr 24 |387 / 486|393 / 501|402 / 524|429 / 560

Pretty much the same result as with v0.3.18. (keep in mind that the v0.3.18 test was done with a FLAC blocksize of 1024).
Title: lossyWAV Development
Post by: halb27 on 2007-10-30 23:20:19
Although, niggling at the back of my mind is the thought that if it holds that you should overlap by 50% inside a codec block, why would we change that when looking outside the codec block in the end_overlap area?

If you overlap 50% inside of the lossyWav block this means you have confidence that the region 25% to either side of the FFT window center carries the necessary information. Let's take this as valid assumption (otherwise we would have to increase the overlapping). With a 50% overlap these '25% away from the center' regions  consecutively and nonoverlappingly cover the lossyWav block. At the start this means you need to start the first window just 25% before the current lossyWav block. The lossyWav block then starts at the very beginning of our trusted region of the first FFT window. At the end it's the same thing as only the last 25% of the FFT window makes up for the trailing untrusted region.

Most vital it's for the long FFT as a lot of foreign energy makes it into the current lossyWav block analysis with the current form we do it.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-30 23:30:04
Although, niggling at the back of my mind is the thought that if it holds that you should overlap by 50% inside a codec block, why would we change that when looking outside the codec block in the end_overlap area?
If you overlap 50% inside of the lossyWav block this means you have confidence that the region 25% to either side of the FFT window center carries the necessary information. Let's take this as valid assumption (otherwise we would have to increase the overlapping). With a 50% overlap these '25% away from the center' regions  consecutively and nonoverlappingly cover the lossyWav block. At the start (analogously for the end) this means you need to start the first window just 25% before the current lossyWav block. The lossyWav block then starts at the very beginning of our trusted region of the first FFT window.

Most vital it's for the long FFT as a lot of foreign energy would make it into the current lossyWav block analysis.
I see where you're coming from and agree with the logic, although the 50% end_overlap does allow the beginning and end samples to have full weight in the Hanning Window. As you said previously, we need guidance as to whether this approach has in some way been tried and discredited.
Title: lossyWAV Development
Post by: halb27 on 2007-10-31 07:09:46
....
lossyWAV alpha v0.3.20 attached:

-overlap parameter added to reduce the end_overlap of FFT analyses to 25% FFT_length rather than 50%....

Hi Nick,

Did you change this already or did it go unnoticed to me:
Does that mean the overlap within a lossyWav block is 50% as before, but the overlap at the beginning and end of a lossyWav blocks stretches just 25% into the neighboring lossyWav blocks?

Would be great, as I'm really worried about the behavior with 1024 sample FFTs where we have 2 FFT windows which get exactly the same amount of information from the neighboring lossyWav blocks as from the block under consideration, and no other FFT window in the case of lossyWav block size = 512 resp. just 1 more FFT window (so 1 out of 3) in the case of lossyWav block size = 1024 (this ione at least gets the right information).
Min finding makes the situation worse.

Hope I interpret your -overlap option correctly cause 25% overlap in the interior wasn't a good idea. Sorry again for going wild.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-31 08:30:25
....
lossyWAV alpha v0.3.20 attached:

-overlap parameter added to reduce the end_overlap of FFT analyses to 25% FFT_length rather than 50%....
Hi Nick,

Did you change this already or did it go unnoticed to me:
Does that mean the overlap within a lossyWav block is 50% as before, but the overlap at the beginning and end of a lossyWav blocks stretches just 25% into the neighboring lossyWav blocks?

Would be great, as I'm really worried about the behavior with 1024 sample FFTs where we have 2 FFT windows which get exactly the same amount of information from the neighboring lossyWav blocks as from the block under consideration, and no other FFT window in the case of lossyWav block size = 512 resp. just 1 more FFT window (so 1 out of 3) in the case of lossyWav block size = 1024 (this ione at least gets the right information).
Min finding makes the situation worse.

Hope I interpret your -overlap option correctly cause 25% overlap in the interior wasn't a good idea. Sorry again for going wild.
Initially I changed both end_overlap and fft_overlap to 25%, however I think that you were the only one to download that version, so I changed the fft_overlap back to 50% and attached the executable as alpha v0.3.20 without incrementing the version.

I will start to implement a variable end_overlap which will never exceed 25% of the codec_block_size (except at the ends). This will require that the max permissible fft_length is limited to double the codec_block_size.
Title: lossyWAV Development
Post by: halb27 on 2007-10-31 10:16:29
...I will start to implement a variable end_overlap which will never exceed 25% of the codec_block_size (except at the ends). This will require that the max permissible fft_length is limited to double the codec_block_size.

I'm a bit confused. Isn't that what the current v0.3.20 is doing?

For definiteness let's talk about the 1024 sample FFT (my main concern anyway).
Is it correct for the current v0.3.20 version that
a) with a lossyWav blocksize (your codec_block_size) of 512 we just have 1 1024 sample FFT with the lossyWav block situated in the center?
b) with a lossyWav blocksize of 1024 1 1024 sample FFT window identical to the lossyWav block, 1 FFT window starting 25% in front of the lossyWav block and 1 ending 25% after the lossyWav block?

Or does your end_overlap right now only affect the way you start with a certain FFT length? (no problem then for a FFT length of 1024 with a lossyWav block size of 512, but worse then with a lossyWav block of 1024 samples as the last FFT windows lies 75% in the next lossyWav block).
Title: lossyWAV Development
Post by: Nick.C on 2007-10-31 12:18:50
...I will start to implement a variable end_overlap which will never exceed 25% of the codec_block_size (except at the ends). This will require that the max permissible fft_length is limited to double the codec_block_size.
I'm a bit confused. Isn't that what the current v0.3.20 is doing?

For definiteness let's talk about the 1024 sample FFT (my main concern anyway).
Is it correct for the current v0.3.20 version that
a) with a lossyWav blocksize (your codec_block_size) of 512 we just have 1 1024 sample FFT with the lossyWav block situated in the center?
b) with a lossyWav blocksize of 1024 1 1024 sample FFT window identical to the lossyWav block, 1 FFT window starting 25% in front of the lossyWav block and 1 ending 25% after the lossyWav block?

Or does your end_overlap right now only affect the way you start with a certain FFT length? (no problem then for a FFT length of 1024 with a lossyWav block size of 512, but worse then with a lossyWav block of 1024 samples as the last FFT windows lies 75% in the next lossyWav block).
The end_overlap is applied at both ends, so in the case of a 25% overlap on a 1024 sample FFT and 1024 sample codec_block_size with a 50% FFT_overlap: the first analysis looks at -256:767, the second 255:1279, i.e. only two analyses. If in this case end_overlap = 50% then the first analysis looks at -512:511; the second - 0:1023; the third - 512:1535.
Title: lossyWAV Development
Post by: halb27 on 2007-10-31 12:35:03
The end_overlap is applied at both ends, so in the case of a 25% overlap on a 1024 sample FFT and 1024 sample codec_block_size with a 50% FFT_overlap: the first analysis looks at -256:767, the second 255:1279, i.e. only two analyses. If in this case end_overlap = 50% then the first analysis looks at -512:511; the second - 0:1023; the third - 512:1535.

Wonderful, so everything's fine with the 1024 sample FFT no matter whether lossyWav block is size 512 or 1024.
Hope I'll find the time this evening to do a listening test with my common problem samples using -3 -overlap.

I guess bitrate will come down a bit using -overlap.
Title: lossyWAV Development
Post by: halb27 on 2007-10-31 21:54:07
1) v0.3.20 -3 -overlap for my regular / problem set yields 383 kbps / 496 kbps.
    v0.3.20 -3 yields 385 kbps / 498 kbps.
    So the effect of -overlap regarding efficiency seems to be very low.

2) I checked v0.3.20 -3 -overlap with my usual problem samples and everything was fine.
    I also checked regular music and found a small issue:
    [attachment=3942:attachment] (Rickie Lee Jones: Under The Boardwalk), sec. 18.6-21.3: I abxed it 8/10.
    It's not up to -overlap as according to -detail -3 and -3 -overlap produce the same output in the 18.6+ sec. region.
    Maybe we went a bit too far. Maybe -nts -1.0 or -1.5 is necessary, maybe -skew 24 is the solution.
    At least we have a sample this way for doing fine tuning.
Title: lossyWAV Development
Post by: Nick.C on 2007-10-31 22:28:08
1) v0.3.20 -3 -overlap for my regular / problem set yields 383 kbps / 496 kbps.
    v0.3.20 -3 yields 385 kbps / 498 kbps.
    So the effect of -overlap regarding efficiency seems to be very low.

2) I checked v0.3.20 -3 -overlap with my usual problem samples and everything was fine.
    I also checked regular music and found a small issue:
    [attachment=3942:attachment] (Rickie Lee Jones: Under The Boardwalk), sec. 18.6-21.3: I abxed it 8/10.
    It's not up to -overlap as according to -detail -3 and -3 -overlap produce the same output in the 18.6+ sec. region.
    Maybe we went a bit too far. Maybe -nts -1.0 or -1.5 is necessary, maybe -skew 24 is the solution.
    At least we have a sample this way for doing fine tuning.
Could it be the spreading function? Maybe the high-frequency averaging is a bit too coarse......
Title: lossyWAV Development
Post by: halb27 on 2007-10-31 22:51:05
...Maybe we went a bit too far. Maybe -nts -1.0 or -1.5 is necessary, maybe -skew 24 is the solution.
    At least we have a sample this way for doing fine tuning.
Could it be the spreading function? Maybe the high-frequency averaging is a bit too coarse......

Maybe of course: with a spreading of 11236/1246E for 64/1024 FFT length maybe a spreading length of 6 is too long for the 12+ kHz region with a 64 sample FFT.
My problem is my hearing isn't so good, and the problem is not at all a big issue. Confirmation of a problem found by 1 person is welcome anyway. So it would be great I somebody confirms the problem and maybe fixes it by using -nts -1.0 or -1.5 or -skew 24 (or slightly higher) or a more demanding -spf setting.
Title: lossyWAV Development
Post by: halb27 on 2007-11-01 18:04:11
Well, Rickie Lee Jones' Under The Boardwalk at ~ sec. 19.5 turns out to be pretty hard stuff for lossyWav.

I tried more defensive -nts values , more defensive -skew values, and more defensive -spf parameters for the high frequencies, but none of these trials were really satisfying.

Using -detail you can see bits to remove doesn't go down well at ~ 19.5 sec.

I tried -2, and with plain -2 I can't abx the problem, but already when using -cbs 512 I'm on the edge of being able to abx it (7/10).

Nevertheless it's a subtle problem and I wonder whether with -3 we should really care about it.

But I do care about -2. As it's not okay with -cbs 512 I wonder whether we have a general problem. To me -cbs 1024 is more defensive than -cbs 512 only by pure hazard (roughly speaking -cbs 1024 takes the bits to remove as the min of the 2 consecutive 512 block bits to remove values).

As we cared about FFT overlapping recently:
I think -overlap is a good thing as it carries the idea of the 50% central trusted area of each FFT window to the edges.
But maybe a trusted area of 50% within the FFT window is too much? When reading about windowing functions and overlapping an overlapping stepsize 50% the window length was the absolute maximum people allowed for - usually it was less than that.

May be we should consider a smaller trusted region than 50%.
The question then is: what should be the size of the trusted region?
Possible candidates: 25%, 33%, 38% resp. integer approximations to that when translated to the number of samples.
38% corresponds to the golden section (38%/62%), and I often use it as a reasonable ratio in cases when there is no real good reasoning. Sure cowards way out, but my experience with this kind of decision making is pretty good though I agree it's a bit of doing voodoo.

If we decide to use a trusted region like that this means that for each lossyWav block we use the latest starting, earliest ending, and minimum overlapping FFT window sequence in such a way that each sample of our block falls into the 25%, 33%, or 38% central area of one of the FFT windows.

My personal feeling is going with a 33% or 38% trusted region.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-01 18:08:50
I'll have a think after the kids are in bed and revert.....
Title: lossyWAV Development
Post by: 2Bdecided on 2007-11-01 19:08:24
I don't claim to be keeping up with this 100%, so please don't take this as gospel...


Firstly, it's good you've found a new problem sample - it was getting a bit suspicious that things were working "perfectly".

Secondly, overlapping:

Most importantly, there needs to be an FFT that hits pretty much the centre of the block. With a hanning window (and most windows) 50% overlap is appropriate. You can overlap more for better accuracy. Unless this is needed, don't do it - it's a huge speed hit. You shouldn't overlap less than 50%, because you'd be leaving gaps in the analysis coverage. A lot of the time, this won't matter, but depending on where the most critical moment comes, it might.

You can (I think should) have FFTs which are centred on the ends of the block. There is no issue with "energy from outside the block leaking in" - of course it will in that FFT, but across all the FFTs you're looking for minimum energy, not maximum. So these edge FFTs only have an effect if they find a part with less energy than within the block - they'll be ignored if they find a part with more energy.

Consider a fast transient start to a sound: silence to very loud instantaneously. If this happen just after a block start, then the lossyWAV noise added will start right at the block start - i.e. before the loud sound. Pre-masking is pretty minimal in trained listeners, so this could be audible. That's why it's best to keep that initial silent part pretty much silent - which is why it's useful to have an analysis centred on the start of the block. IMO.


If you're going to change anything critical, think through what would be a problem sample for that change, and test it. I ran noise bursts starting and stopping at various times relative to block start/end, and also filters within white noise switched in/out at various times (and for various durations) relative to block start/end and analysis duration/start/end. You can listen to the results, but also check them in waveform and spectral view. That latter part doesn't matter just for listening, but it give another "hope" that it'll transcode OK too.


Have fun. I really admire your energy!

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-01 19:54:23
Hoping that I'm reading this correctly, fft_overlap should be at least 50% and end_overlap should be exactly 50%. So, for a power of two block length and power of two fft lengths, this does not pose a problem at all. However, for non power of two block lengths, some extra maths is required to setup the first fft centred on the beginning of the block and increment by less than fft/2 until the last fft is centred on the end of the block, having centred an fft on the centre of the block in the process. I will work out the maths (relatively simple in Excel....) and implement. You might reasonably expect alpha v0.4.0 tomorrow morning (vX.Y.(Z>20) seems a bit extreme.....)
Title: lossyWAV Development
Post by: halb27 on 2007-11-01 20:17:41
Hoping that I'm reading this correctly, fft_overlap should be at least 50% and end_overlap should be exactly 50%....

I interpret 2Bdecided as 'fft_overlap should be at least 50% and the first and last FFT window should have their center excatly at the lossyWav block edge (for best sensitivity towards transients near a lossyWav edge).'

Because of 2Bdecided's remark on speed with a more defensive FFT overlapping I suggest to use a central 'trusted' region of 38% - in case we try it at all.

Should we really think of blocksizes other than a power of 2? FLAC is the only codec I know which has some favor of a multiple of 576, but even FLAC works fine with a multiple of 512.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-01 20:45:23
Hoping that I'm reading this correctly, fft_overlap should be at least 50% and end_overlap should be exactly 50%....
I interpret 2Bdecided as 'fft_overlap should be at least 50% and the first and last FFT window should have their center excatly at the lossyWav block edge (for best sensitivity towards transients near a lossyWav edge).'

Because of 2Bdecided's remark on speed with a more defensive FFT overlapping I suggest to use a central 'trusted' region of 38% - in case we try it at all.

Should we really think of blocksizes other than a power of 2? FLAC is the only codec I know which has some favor of a multiple of 576, but even FLAC works fine with a multiple of 512.
I'd be only too delighted to drop non power of two codec block sizes - it makes the maths *so* much easier and would minimise the number of fft analyses. When you say a trusted region of 38%, are you really saying 3/8? In which case you mean an actual overlap of fft's of 5/8 or 62.5%.

For the 1024 sample fft on a 1024 sample codec block this would mean fft #1: -512:511; #2: -128:895; #3: 0:1023 (centred on the block centre); #4: 384:1407; #5: 512:1535. But a more evenly spread 5 analysis set would have an overlap of 75%, i.e. -512:511; -256:767; 0:1023; 256:1279; 512:1535. This is the worst case as the fft length equals the block length.

For a 256 fft length, -128:127; -32:223; 64:319; 160:415; 256:511; 352:607; 384:639 (centre to centre); 448:703; 544:799; 640:895; 736:991; 832:1087; 896:1151 (centred on the end) and yields 13 analyses. Basically a step of 512/6 (85+1/3) would be the most even.

[edit]Another way of looking at it would be to centre the centre fft on the centre of the block, as in the original script and see where the first analysis end_overlap takes you taking into account the desired overlap. e.g. if overlap = 5/8 of a 256 analysis, then the analysis step is 3/8, i.e. 96 samples as above - but it has already been seen that it would be better as 85.33 samples, rounded per analysis.
This would yield the 13 analyses (rather than 9 at the moment). For 64 sample fft length, the step would be 21.33 which would yield 49 analyses (rather than 33 at the moment).

So 3 > 5; 9 > 13 and 33 > 49 analyses respectively, estimated increase in processing time of 53.2%.[/edit]

[edit2]Or..... simply add one analysis either side of centre so, 3>5, 9>11, 33>35. This would have more effect at longer fft lengths, but would keep down the added processing overhead.[/edit2]

[edit3]Maybe the problem with the problem sample is due to the fact that -3 only uses 64 & 1024 sample fft's; -1 & -2 use 64, 256 & 1024 sample fft's. Simply remedied by changing from 2 to 3 analyses for -3, but keeping the codec_block_size = 512 and -snr, -skew & -nts parameters (or tweaking them.....) This can be duplicated by using lossyWAV wavfilename.wav -cbs 512 -skew 18 -snr 12 -nts -0.5. Don't worry about the spreading function - I would advocate using the same as for -2 anyway.[/edit3]
Title: lossyWAV Development
Post by: halb27 on 2007-11-02 00:33:45
Yeah, a stepsize 3/8 the FFT window length (an overlapping area of 5/8 the window length) would be a good thing to try IMO.

With such a rather narrow partitioning I think it's not necessary to have the edge of a lossyWav block exactly in the center of a FFT window. So for a lossyWav blocksize of 1024 and a FFT length of 1024 the FFT windows can be -384...639, 0:1023, 384:1407, so 3 FFTs.
The idea is to have a stepsize of 3/8 the FFT length, start at least 3/8*FFT length in front of the lossyWav block (thus the first sample of the block belongs to the trusted region) and start the FFT windowing at such a point that all the FFT windows extend to the same amount to either side of the block.
For a lossyWav blocksize of 512 and a FFT length of 1024 two FFT windows will do it: -448:575, -64:959.

In general, for a lossyWav blocksize b, a FFT length fl, if n is the number of FFT windows and d is the extent on either side of the lossyWav block still covered by the FFT windows, find n and d as the smallest positive integers such that (n-1)*3/8*fl + fl = b + 2 *d under the restriction that d >= 3/8 * fl. Start the FFT windowing at
-d.
Title: lossyWAV Development
Post by: halb27 on 2007-11-02 07:05:24
Maybe the problem with the problem sample is due to the fact that -3 only uses 64 & 1024 sample fft's; -1 & -2 use 64, 256 & 1024 sample fft's. Simply remedied by changing from 2 to 3 analyses for -3, but keeping the codec_block_size = 512 and -snr, -skew & -nts parameters (or tweaking them.....) This can be duplicated by using lossyWAV wavfilename.wav -cbs 512 -skew 18 -snr 12 -nts -0.5. Don't worry about the spreading function - I would advocate using the same as for -2 anyway.

Yes, probably we will have to make it more defensive.
3 FFT lengths is a good thing at any rate IMO, and getting closer to the spreading of -2 or use the identical one is promising as well.
Guess we will have to make -2 more defensive too. At the moment I'm scared why -2 has (more or less) an issue with -cbs 512. So far it backs up the theory that a blocksize of 512 should not be used. But from the machinery I can't see a good reason for that. From the machinery -cbs 1024 yields a 'blind' bitrate increase against -cbs 512 due to the fact that the min bits to remove are taken from two consecutive 512 sample blocks.
Moreover it would make things easier if we had one universal lossyWav blocksize of 512. Sure only in case we don't sacrifice quality.
But let's see: maybe the improved overlapping changes things.
Title: lossyWAV Development
Post by: halb27 on 2007-11-02 10:12:35
For the math of post #439:

(1)    (n-1)*3/8*fl + fl = b + 2 *d

under the restriction d >= 3/8 * fl

means  (n-1)*3/8*fl + fl >= b + 6/8 * fl

and this means n = 1/3 * ( 1 + 8 * b/fl ) , rounded up to the next integer.

d then is computed via (1).

Can be calculated with Excel and yields for our block sizes and FFT lengths:

Code: [Select]
    b  |  fl |    n |   d
  1024 | 1024|    3 | 384
  1024 | 256 |   11 |  96
  1024 | 16  |  171 |   6
  512  | 1024|    2 | 448
  512  | 256 |    6 | 112
  512  | 16  |   86 |   7


Remark: For a blocksize of 512 keeps the center of the first resp. last FFT window very close to the block's edges.
Title: lossyWAV Development
Post by: bryant on 2007-11-02 14:02:53
I have added an enhancement to WavPack to significantly improve its performance with lossyWAV files (especially with shorter blocks). See post here (http://www.hydrogenaudio.org/forums/index.php?showtopic=58716).

At this point I don't think there's any reason to have any special block size considerations with respect to WavPack. However, it still might be possible to take advantage of the fact that WavPack can efficiently handle blocks that have samples clipped to +32767.

BTW, you guys are having way too much fun! 

David
Title: lossyWAV Development
Post by: Josef Pohm on 2007-11-03 02:12:28
I have added an enhancement to WavPack to significantly improve its performance with lossyWAV files.
At this point I don't think there's any reason to have any special block size considerations with respect to WavPack...

I had a quick test session on the matter.

Comparison with an older WavPack version (SetF, some LossyWAV 0.3.18 settings):
Code: [Select]
------- ----------------- ----------------- -----------------
|      | WV 4.42a2 -hhx4 | WV 4.41.0 -hhx4 |   42a2 vs. 41   |
|       ----- ----- ----- ----- ----- ----- ----- ----- -----
|      |  1  |  2  |  3  |  1  |  2  |  3  |  1  |  2  |  3  |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----
|  512 | 407 | 401 | 399 | 453 | 444 | 442 | -46 | -43 | -43 |
| 1024 | 417 | 412 | 410 | 438 | 434 | 432 | -21 | -22 | -22 |
| 2048 | 437 | 430 | 428 | 446 | 439 | 438 | - 9 | - 9 | -10 |
| 4096 | 462 | 455 | 453 | 467 | 460 | 458 | - 5 | - 5 | - 5 |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----


Comparison with FLAC (SetF, some LossyWAV 0.3.18 settings):
Code: [Select]
------- ----------------- ----------------- -----------------
|      | WV 4.42a2 -hhx4 |  FLAC 1.2.1 -8  |   WV vs. FLAC   |
|       ----- ----- ----- ----- ----- ----- ----- ----- -----
|      |  1  |  2  |  3  |  1  |  2  |  3  |  1  |  2  |  3  |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----
|  512 | 407 | 401 | 399 | 405 | 395 | 394 | + 2 | + 6 | + 5 |
| 1024 | 417 | 412 | 410 | 419 | 415 | 415 | - 2 | - 3 | - 5 |
| 2048 | 437 | 430 | 428 | 443 | 436 | 435 | - 6 | - 6 | - 7 |
| 4096 | 462 | 455 | 453 | 474 | 467 | 466 | -14 | -12 | -13 |
------- ----- ----- ----- ----- ----- ----- ----- ----- -----


Effect of using a WavPack frame size which is multiple than LossyWAV frame size, to clarify whether that may possibly improve performances (mostly when codec is not well optimized for smaller frame sizes). It seems that is not this case.
Code: [Select]
------------------- ----- ----- -----
|                  |  1  |  2  |  3  |
------------------- ----- ----- -----
| LW0512-WV0512    | 407 | 401 | 399 |
| LW0512-WV1024    | 418 | 411 | 409 |
| LW0512-WV2048    | 440 | 433 | 431 |
| LW0512-WV4096    | 477 | 470 | 468 |
------------------- ----- ----- -----

I would confirm that WavPack seems now safe to be used with both 1024 and 512 frame size LossyWAV files. As for compression ratio on LossyWAV files, WavPack may now be considered more or less on par with FLAC.

Gap closed, a new nice feature for WavPack, once again thanks to David for his impressive work.
Title: lossyWAV Development
Post by: halb27 on 2007-11-03 10:19:30
Thank you Josef, wonderful result.

As a side remark this also backs up the idea of having just one universal lossyWav frame size of 512 as long as we don't sacrifice quality. It would make things clearer, easier, and simpler, and as far as it's about efficiency (low bitrate) everything's up to a frame size of 512. And as we currently have a (small) problem here it's motivating to fix it.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-03 21:54:09
@Bryant: Thanks for taking lossyWAV into account in the development of your codec!

@Josef: Thanks again for the testing - it's good to see that the bitrate is working well with Wavpack.

@Halb27: It would be worth some testing using -cbs 512 for -1 and -2 to ensure that no artifacts occur.

I've been trying to convert the FFT routine into 80x87 floating point assembler - it will be quite a speedup when I actually get it working.......

On the overlap front, I'm still working on the maths - I'll get back to you.
Title: lossyWAV Development
Post by: bryant on 2007-11-04 18:36:09

I have added an enhancement to WavPack to significantly improve its performance with lossyWAV files.
At this point I don't think there's any reason to have any special block size considerations with respect to WavPack...

I had a quick test session on the matter.

Thanks again for your typically thorough testing! 

I hadn't thought about the possibility of using a multiple of the lossyWAV block size to overcome the inefficiency of the smaller block sizes, but it's nice to know it's not needed at 512 samples. If they ever decide to play around with 256 sample blocks (or even smaller) it might help, but we'll burn that bridge when we come to it... 

David
Title: lossyWAV Development
Post by: Nick.C on 2007-11-05 13:11:46
I hadn't thought about the possibility of using a multiple of the lossyWAV block size to overcome the inefficiency of the smaller block sizes, but it's nice to know it's not needed at 512 samples. If they ever decide to play around with 256 sample blocks (or even smaller) it might help, but we'll burn that bridge when we come to it... 
256 sample codec_block_size could be enabled, but at the expense of the 1024 sample fft_length analysis. codec_block_size must now be a multiple of 32 in the range 512 to 4608.

lossyWAV alpha v0.4.0 attached: Superseded.

- slight speedup;
- -overlap ensures fft overlap of 62.5% of fft_length between analyses. end_overlap of 50% of fft_length remains unchanged.[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV alpha v0.4.0 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-1            extreme quality [4xFFT] (-cbs 1024 -nts -3.0 -skew 30 -snr 24
              -spf 11124-ZZZZZ-11225-11225-11236)
-2            default quality [3xFFT] (-cbs 1024 -nts -1.5 -skew 24 -snr 18
              -spf 11235-ZZZZZ-11336-ZZZZZ-1234D)
-3            compact quality [3xFFT] (-cbs  512 -nts -0.5 -skew 18 -snr 12
              -spf 11235-ZZZZZ-11336-ZZZZZ-1234D)

-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists; default=off

Advanced / System Options:

-nts <n>      set noise_threshold_shift to n dB (-18dB<=n<=0dB)
              (reduces overall bits to remove by 1 bit for every 6.0206dB)
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (0dB<=n<=48dB)
-skew <n>     skew fft analysis results by n dB (0db<=n<=48db) in the
              frequency range 20Hz to 3.45kHz
-cbs <n>      set codec block size to n samples (512<=n<=4608, n mod 32=0)
-overlap      enable conservative fft overlap method; default=off

-spf <5x5chr> manually input the 5 spreading functions as 5 x 5 characters;
              These correspond to FFTs of 64, 128, 256, 512 & 1024 samples;
              e.g. 44444-44444-44444-44444-44444 (Characters must be one of
              1 to 9 and A to Z (zero excluded).
-clipping     disable clipping prevention by iteration; default=off
-dither       dither output using triangular dither; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]
Title: lossyWAV Development
Post by: halb27 on 2007-11-05 14:24:17
Thank you very much. I'm very curious about the quality, especially with Rickie Lee Jones' Under The Boardwalk.
Title: lossyWAV Development
Post by: halb27 on 2007-11-05 23:15:09
I tested 'Under The Boardwalk' using -3, -2, both with a block size of 512 and 1024 samples.
Everything is alright, though with plain -3 I got results like 5/7 or 6/8 before I missed badly, or 7/10.
Anyway this is not a valid abx differentiation so we should be content.

I also tried my usual problem samples using -3, and everything is fine.

Bitrate is fine as well: my 12 full tracks I used before yield 415 kbps on average using -3, and 441 kbps when encoded with -2.

I am very content with these results.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-06 08:05:40
I tested 'Under The Boardwalk' using -3, -2, both with a block size of 512 and 1024 samples.
Everything is alright, though with plain -3 I got results like 5/7 or 6/8 before I missed badly, or 7/10.
Anyway this is not a valid abx differentiation so we should be content.

I also tried my usual problem samples using -3, and everything is fine.

Bitrate is fine as well: my 12 full tracks I used before yield 415 kbps on average using -3, and 441 kbps when encoded with -2.

I am very content with these results.
Thanks again for your tireless abx'ing. I am also content that the "compact" quality level is not "perfect" - how many times (other than abx'ing) will the differences between -3 output and the original be annoyingly noticable? (especially as when listening to music we're not abx'ing.....)

I will continue my quest to further optimise and speed-up the code. FP assembly language is not as painful as I first thought. I did download the Intel IA-32 Software Developers Manual and it's got lots of nice instructions in it.... However I would be worried about using instructions only available on later processors as I don't wish to alienate any users (and am not in the position [yet] to maintain separate builds).
Title: lossyWAV Development
Post by: halb27 on 2007-11-06 08:21:57
...I will continue my quest to further optimise and speed-up the code. FP assembly language is not as painful as I first thought. I did download the Intel IA-32 Software Developers Manual and it's got lots of nice instructions in it.... However I would be worried about using instructions only available on later processors as I don't wish to alienate any users (and am not in the position [yet] to maintain separate builds).

Nice you do optimizing. IMO you're absolutely right in not going too far spezializing. Speed is welcome but even more is using your exe without getting into trouble (including your personal trouble as extreme optimizing can be troublesome).
Title: lossyWAV Development
Post by: TBeck on 2007-11-06 08:35:00
I will continue my quest to further optimise and speed-up the code. FP assembly language is not as painful as I first thought. I did download the Intel IA-32 Software Developers Manual and it's got lots of nice instructions in it.... However I would be worried about using instructions only available on later processors as I don't wish to alienate any users (and am not in the position [yet] to maintain separate builds).

Do you know the bible of IA-32 optimization? If not: Optimizing assembly code (Agner Fog) (http://www.agner.org/assem/)

  Thomas
Title: lossyWAV Development
Post by: Nick.C on 2007-11-06 09:01:30
I will continue my quest to further optimise and speed-up the code. FP assembly language is not as painful as I first thought. I did download the Intel IA-32 Software Developers Manual and it's got lots of nice instructions in it.... However I would be worried about using instructions only available on later processors as I don't wish to alienate any users (and am not in the position [yet] to maintain separate builds).
Do you know the bible of IA-32 optimization? If not: Optimizing assembly code (Agner Fog) (http://www.agner.org/assem/)

  Thomas
Ooooh! Thanks for that Thomas, I will certainly have a read before I get too heavily down the "Delphi wrapper around an assembly language program" route.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-07 21:32:19
Well, big thanks to Thomas for the pointer to Agner Fog's excellent guide to optimising assembly language. The FFT routine is now completely in IA-32 assembler using only 32bit registers and the FPU. Even so, it is considerably faster than alpha v0.4.0.

lossyWAV alpha v0.4.1 attached: Superseded.

Code optimisation of FFT routine;

Slight change to the -overlap calculations regarding number of fft analyses to carry out for a given block and size of fft_overlap. This means that the "central" fft analysis may not be exactly in the centre of the codec block, but the end_overlap value is exactly half of the fft_length.
Title: lossyWAV Development
Post by: halb27 on 2007-11-08 21:31:02
Hallo Nick,

Thank you very much for your new version.
Looks like quality has improved: With 'Under The Boardwalk' using plain -3 I'm far away now from being able to abx it. No chance at all.

I'm about to encode part of my collection using -3.
Doing so I wanted to try the 128 sample and 512 sample FFT using a full -spf string but with no effect. Are these FFT lengths reserved to -1?
Title: lossyWAV Development
Post by: Nick.C on 2007-11-08 22:32:35
Hallo Nick,

Thank you very much for your new version.
Looks like quality has improved: With 'Under The Boardwalk' using plain -3 I'm far away now from being able to abx it. No chance at all.

I'm about to encode part of my collection using -3.
Doing so I wanted to try the 128 sample and 512 sample FFT using a full -spf string but with no effect. Are these FFT lengths reserved to -1?
YGPM!
Title: lossyWAV Development
Post by: Nick.C on 2007-11-09 13:00:26
Doing so I wanted to try the 128 sample and 512 sample FFT using a full -spf string but with no effect. Are these FFT lengths reserved to -1?
Not any more. I have implemented a "-fft" parameter which takes a 5 character binary numeric input, each character of which corresponds to a specific fft_length, i.e. character 1 > 64 samples, character 2 > 128 samples, etc, character 5 > 1024 samples. So, the default for -2 would be -fft 10101, for -1 would be 10111.

I have also converted the spread and remove_bits procedures to IA-32 / FP assembly, so there's been a bit of a speed up as well. Incidentally, I found out that the size of the data segment for each unit will adversely affect the program speed if not carefully aligned to 8 or 16 byte boundaries (not exactly sure which).

lossyWAV alpha v0.4.2 attached: Superseded.[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV alpha v0.4.2 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-1            extreme quality [4xFFT] (-cbs 1024 -nts -3.0 -skew 30 -snr 24
              -spf 11124-ZZZZZ-11225-11225-11236 -fft 10111)
-2            default quality [3xFFT] (-cbs 1024 -nts -1.5 -skew 24 -snr 18
              -spf 11235-ZZZZZ-11336-ZZZZZ-1234D -fft 10101)
-3            compact quality [3xFFT] (-cbs  512 -nts -0.5 -skew 18 -snr 12
              -spf 11235-ZZZZZ-11336-ZZZZZ-1234D -fft 10101)

-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists; default=off

Advanced / System Options:

-nts <n>      set noise_threshold_shift to n dB (-18dB<=n<=0dB)
              (reduces overall bits to remove by 1 bit for every 6.0206dB)
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (0dB<=n<=48dB)
-skew <n>     skew fft analysis results by n dB (0db<=n<=48db) in the
              frequency range 20Hz to 3.45kHz
-cbs <n>      set codec block size to n samples (512<=n<=4608, n mod 32=0)
-fft <5xchr>  select fft lengths to use in analysis (1=on, 0=off)
              from 64, 128, 256, 512 & 1024 samples, e.g. 01001 = 128,1024
-overlap      enable conservative fft overlap method; default=off

-spf <5x5chr> manually input the 5 spreading functions as 5 x 5 characters;
              These correspond to FFTs of 64, 128, 256, 512 & 1024 samples;
              e.g. 44444-44444-44444-44444-44444 (Characters must be one of
              1 to 9 and A to Z (zero excluded).
-clipping     disable clipping prevention by iteration; default=off
-dither       dither output using triangular dither; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]As a quick test, I ran my 52 sample set through at "-3 -fft 00100 -skew 36 -cbs 1024", which gives about a 3x speed increase (over -2):

WAV: 121.53MB; FLAC: 68.08MB, 790.6kbps; lossyWAV -2: 44.16MB, 512.8kbps, 46 secs.; lossyWAV -3 -fft 00100 -skew 36 -cbs 1024: 41.30MB, 479.6kbps, 16secs.

Surprisingly(?), this produces output which is satisfactory for my preferred DAP, in a third of the time and 94% of the diskspace.
Title: lossyWAV Development
Post by: halb27 on 2007-11-09 14:52:20
Good news, thank you.

As I'm about to go productive I welcome very much the possibility to have further FFTs, especially at the short edge as the 64 sample FFT has some shortcomings in the low/mid frequency range.
Going productive I try to play it safe while staying within most of the current -2 framework (but -cbs 512 and -nts -1.0).
Title: lossyWAV Development
Post by: halb27 on 2007-11-09 18:27:20
Hallo Nick,

There seems to be a problem. I tried v0.4.2 this way:

lossyWAV.exe utb.wav -2 -cbs 512 -nts -0.5 -skew 24 -snr 18 -spf 11235-11236-11336-12348-1234D -fft 11111

and lossyWav starts to output a .lossy.wav file but immediately after that stops working. No crash, it just hangs, doesn't come back to the command line, and produces no output. Sorry.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-09 18:51:16
Ah.... It may be down to my inexperience with FP assembly - probably too few "FWAIT" instructions. I will amend and re-attach.

Nick.

[edit]lossyWAV alpha v0.4.3 attached: added a few more "FWAIT" instructions and reduced the permissible range of "-spf" input values back to hexadecimal characters (could cause problems at shorter FFT lengths).

Having fun with "-fft" and I was astounded to get casual listening compatible results on my 52 sample set  (as above) with "-3 -fft 00100 -skew fffff-fffff-44579-fffff-fffff -skew 36" : 33.25MB, 386.1kbps

Regarding the crashing - could anyone else with this problem please speak up and also, if you could indicate CPU type that would be very welcome. The only machines I have to test on are Intel C2D.....

On a more serious note, and with regard to IA32 / FPU assembly language: when *should* I insert an FWAIT instruction into the code?[/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-11-09 22:11:44
Sorry same issue with new version and

lossyWAV.exe utb.wav -2 -cbs 512 -nts -0.5 -spf 11235-11236-11336-12348-1234D -fft 11111

lossyWav.exe echoes the options, starts producing the lossy.wav file, then hangs and produces no output. It does not crash. I can finish lossyWav by pressing Ctrl-C.

My cpu is a (32 bit) AMD mobile Athlon (= low power Barton), and I'm running Windows XP.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-09 22:14:02
Sorry same issue with new version and

lossyWAV.exe utb.wav -2 -cbs 512 -nts -0.5 -spf 11235-11236-11336-12348-1234D -fft 11111

lossyWav.exe echoes the options, starts producing the lossy.wav file, then hangs and produces no output. It does not crash. I can finish lossyWav by pressing Ctrl-C.

My cpu is a (32 bit) AMD mobile Athlon (= low power Barton), and I'm running Windows XP.
Thanks - now for a bit more debugging 
Title: lossyWAV Development
Post by: halb27 on 2007-11-09 22:25:18
I should add I had no problem with v0.4.1 where you had already a lot of assembler code in it.
Title: lossyWAV Development
Post by: verbajim on 2007-11-09 22:29:35
It crashes immediately here when I run lossyWAV.exe file.wav. My processor is an AMD Athlon 64. I also had no problem with 0.4.1.

Edit: on second thought it doesn't terminate, I just get the crash report by windows, but it hangs like halb27 says.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-09 22:37:40
I may have found a possible culprit.....

lossyWAV alpha v0.4.3b attached.
Title: lossyWAV Development
Post by: halb27 on 2007-11-09 22:44:56
Sorry, same effect.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-09 22:48:17
Sorry, same effect.
Is it doing this with no input parameters, or only when input parameters (other than name of file to process) are used?
Title: lossyWAV Development
Post by: halb27 on 2007-11-09 22:54:20
Same effect with plain lossyWav.exe utb.wav.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-09 23:10:39
Seems to be a problem with AMD processors at the moment......

lossyWAV alpha v0.4.3c attached: Maybe?

lossyWAV alpha v0.4.3d attached: FWAIT instructions removed. Just to see if that is it.
Title: lossyWAV Development
Post by: halb27 on 2007-11-09 23:33:37
Nothing changes with v0.4.3c and with v0.4.3d.
Title: lossyWAV Development
Post by: robert on 2007-11-09 23:43:16
It seems to work on my Athlon64X2
Code: [Select]
E:\dev-privat\lossy-wav>lossyWAV "Q:\CD\Anastacia\2000-Not That Kind\01 Not That Kind.wav" -o .\
lossyWAV alpha v0.4.3 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org
Processing : 01 Not That Kind.wav
Format     : 44.10kHz; 2 ch.; 16 bit; 8858220 samples; 200.87 sec.
Average    : 6.4714 bits; [55984/8651]; 11.72x; CBS=1024]
%lossyWAV Warning% : 47 bits not removed due to clipping.
Title: lossyWAV Development
Post by: shadowking on 2007-11-10 00:00:35
Nothing changes with v0.4.3c and with v0.4.3d.



Crashing here too. PIII 550
Title: lossyWAV Development
Post by: robert on 2007-11-10 00:05:07
Does it happen often, that there are no bits removed at all?
Code: [Select]
E:\dev-privat\lossy-wav>lossyWAV "Q:\CD\Various\1990-Classic Hits der 20er Jahre
- CD 1\01 Am Sonntag will mein Süsser mit mir segeln gehn - Edith d'Amara.wav"
lossyWAV alpha v0.4.3 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org
Processing : 01 Am Sonntag will mein S³sser mit mir segeln gehn - Edith d'Amara.
wav
Format     : 44.10kHz; 2 ch.; 16 bit; 7926828 samples; 179.75 sec.
Average    : 0.0000 bits; [0/7742]; 11.95x; CBS=1024]
Title: lossyWAV Development
Post by: robert on 2007-11-10 00:25:58
It doesn't work on my Notebook, CPU is a Pentium-M.
Title: lossyWAV Development
Post by: Mitch 1 2 on 2007-11-10 04:43:04
lossyWAV 0.4.2 has no issues with my laptop's AMD Mobile Sempron 3000+ (32-bit) CPU, except it crashes when the specified output folder doesn't exist.
Title: lossyWAV Development
Post by: [JAZ] on 2007-11-10 09:01:24

Nothing changes with v0.4.3c and with v0.4.3d.



Crashing here too. PIII 550



Nick.C : Have you added "SSE2" instructions ( operations with doubles )???? PIII, and Athlon XP don't have such, although an Athlon 64 does.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-10 09:40:40
Thanks for the responses guys..... It seems to fail on some AMD and older Intel CPU's.

Sometimes no bits will be removed - that's the beauty of David's method - nothing is removed if it is not safe to do so.

No SSE / SSE2 instructions used, only 80x87 FPU instructions. I will try to revert to v0.4.1 with the functionality of v0.4.3 and attach.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-10 10:31:34
I will try to revert to v0.4.1 with the functionality of v0.4.3 and attach.
lossyWAV alpha v0.4.3e attached: Superseded.

Spread and Remove_Bits procedures have been rolled back to v0.4.1;

"-fft " parameter functionality remains.

Where's the smiley for "fingers-crossed" when you want it....?
Title: lossyWAV Development
Post by: halb27 on 2007-11-10 10:47:26
Yeah, it works.
Thank you.

BTW: From my personal experience on performance optimization the most imprtant thing is to have a good and adequate software architecture. Low level optimization is important often in only isolated spots.
Sure this needn't necessarily apply to lossyWav.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-10 10:50:20
Yeah, it works.
Thank you.

BTW: From my personal experience on performance optimization the most imprtant thing is to have a good and adequate software architecture. Low level optimization is important often in only isolated spots.
Sure this needn't necessarily apply to lossyWav.
I *was* only optimising the most frequently called procedures / functions. FFT, Spread and Remove_Bits are the functional core of the whole method. With all three converted to assembler I got an extra 10% speed compared to just FFT.

However, thankfully it is now working, and close to optimal speed. Have fun with your testing / transcoding!
Title: lossyWAV Development
Post by: Mitch 1 2 on 2007-11-10 11:08:33
lossyWAV 0.4.2 has no issues with my laptop's AMD Mobile Sempron 3000+ (32-bit) CPU, except it crashes when the specified output folder doesn't exist.


This problem still hasn't been fixed.
Title: lossyWAV Development
Post by: halb27 on 2007-11-10 11:13:13
... and close to optimal speed. ...

Yes, I'm very pleased by the speed.

ADDED:

As a result of the new possibilities of 5 fft lengths:

lossyWav -2 -cbs 512 -nts -1.0 -fft 11111 -spf 11235-11236-11336-12348-1234D    (my favorite for going productive)

followed by FLAC --best -e -f -b 512

yields 438 kbps for my regular set and 546 kbps for my problem set. I'm very pleased with this ratio.
Title: lossyWAV Development
Post by: shadowking on 2007-11-10 11:28:46
It works now.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-10 12:39:49
lossyWAV 0.4.2 has no issues with my laptop's AMD Mobile Sempron 3000+ (32-bit) CPU, except it crashes when the specified output folder doesn't exist.
This problem still hasn't been fixed.
Sorry Mitch, I will endeavour to fix it for the next revision. Thanks for the feedback people!

[edit]Thinking about the crashing - maybe it was an infinite loop....... Much investigation to come.[/edit]

[edit2] Some moron was using the FISTTP to store a truncated real to a mem32 integer....  .... which is apparently an SSE3 instruction. I will rework the routines to avoid using this instruction and re-attach as alpha v0.4.3f (hopefully with the output directory crashing bug rectified). [/edit2]
Title: lossyWAV Development
Post by: Nick.C on 2007-11-10 21:23:00
lossyWAV alpha v0.4.4 attached: Superseded.

Use of FISTTP instruction (SSE3!) eradicated; Thanks for the pointer [JAZ] - I found it very quickly when I googled "80x87 instruction set" and FISTTP isn't on the list........

Spread and Remove_Bits procedures now assembler (again....);

Now checks for access to output directory if specified.
Title: lossyWAV Development
Post by: halb27 on 2007-11-12 11:47:51
I encoded part of my collection using v0.4.4 without any problem, and according to my listening experience so far everything is very fine.
I used a variant of -2 which made me think more deeply afterwards about what's really important.

I'd like to suggest a discussion on two points concerning default bahavior:

1)
I would welcome - as I said before - a general default cbs of 512 samples. This will make most lossless codecs behave more efficiently on one hand, and on the other hand I can't  see a logical reason why not to use it. If it's about holding average bitrate up for defensive reason we should use a more direct approach targeting directly at overcoming potential weaknesses.

2)
With -2 I suggest to use an additional 128 sample FFT, to be precise I'd like to see a default behavior according to -fft 11101 -spf 11235-11236-11336-FFFFF-1234D.
The 64 sample FFT yields only few bins in the low and lower mid frequency range, so it is welcome IMO to  have another rather short FFT which improves significantly upon the situation in the important lower mid frequency  range.
So I think it's a meaningful addition to use a 128 sample FFT.
Moreover it doesn't really hurt as lossyWav is very fast now, and the increase in average bitrate is very low.
With -1 btw (not much in my focus) I suggest to use the full 5 analyses.

What do you think?
Title: lossyWAV Development
Post by: Nick.C on 2007-11-12 12:01:29
I encoded part of my collection using v0.4.4 without any problem, and according to my listening experience so far everything is very fine.
I used a variant of -2 which made me think more deeply afterwards about what's really important.

I'd like to suggest a discussion on two points concerning default bahavior:

1)
I would welcome - as I said before - a general default cbs of 512 samples. This will make most lossless codecs behave more efficiently on one hand, and on the other hand I can't  see a logical reason why not to use it. If it's about holding average bitrate up for defensive reason we should use a more direct approach targeting directly at overcoming potential weaknesses.

2)
With -2 I suggest to use an additional 128 sample FFT, to be precise I'd like to see a default behavior according to -fft 11101 -spf 11235-11236-11336-FFFFF-1234D.
The 64 sample FFT yields only few bins in the low and lower mid frequency range, so it is welcome IMO to  have another rather short FFT which improves significantly upon the situation in the important lower mid frequency  range.
So I think it's a meaningful addition to use a 128 sample FFT.
Moreover it doesn't really hurt as lossyWav is very fast now, and the increase in average bitrate is very low.
With -1 btw (not much in my focus) I suggest to use the full 5 analyses.

What do you think?
Sounds entirely reasonable. I have no problem with a 512 sample codec_block_size. I will implement the changes to the -2 and -1 quality levels.

On another topic, do we *really* need a -dither option - I have no problems with the quality of the output? Similarly, the -clipping option to switch off the iterative clipping reduction method also seems redundant. This would increase throughput a bit which would in turn offset the increased processing time due to the extra analyses.
Title: lossyWAV Development
Post by: halb27 on 2007-11-12 13:03:56
I personally don't see a real reason for the -dither option.
But as it's not defaulted I don't care much about it. You created a good separation between standard options and advanced options, and -dither is well situated in the advanced options IMO.
Good reasons for eventually saying good bye to the -dither option are IMO
- if you should run into trouble with your software architecture keeping up the -dither option (guess you won't) when at the same time nobody seems to use -dither.
- if it comes to cleaning up all the advanced options - but as they're separated well into 'advanced options' there's no real need for such a cleaning procedure IMO. Sure the time may come where these things may be thought of being obsolete.

As we're talking about default bahavior: what about -3?
I see two targets for -3:

a) -3 as a minor variant of -2, expected to be excellent under all circumstances as we expect it from -2, but with a detail behavior which is not as defensive as is -2. Your choice of using the same -spf values as that of -2 points in this direction. If we want to have it like this I suggest we increase the -skew value a bit.

b) as a seriously less defensive alternative to -2 targeting at a larger average bitrate gap than with what we have at the moment. To be more precise: if -2 yields say ~440 kbps on average, -3 should yield ~400 kbps. I guess it's achievable while still getting excellent quality. May be an even larger gap makes sense when being aware that quality may be sacrificed on hopefully rare occasion.
For b) the default setting should change quite a lot IMO.
Having extremely good encoding speed (like with your doing just 1 FFT) as a target fits rather good into this framework.

I personally don't have a favorite for a) or b).
Title: lossyWAV Development
Post by: halb27 on 2007-11-12 13:22:33
I totally forgot about the -clipping option.
If there wasn't David Bryant's remark about wavPack being able to make use of the MSBs being 1 I would easily say -clipping makes no sense. It looks like the 'iterative' anti-clipping strategy does not only preserve quality but also doesn't impact efficiency in a global sense.
David Bryant brought this wavPack feature back to mind recently so I think it's not so simple to drop the -clipping option (keeping in mind it was David Bryant who brought us the idea of taking care of the critical bands, and I think this idea was one of the major improvements in the progress of lossyWav).
My personal feeling however is as the 'iterative' anti-clipping strategy doesn't have a negative impact on efficiency in a global sense wavPack won't benefit significantly from letting clipping happen. Moreover even if it did it would do so because of allowing clipping to occur. But I'd like David Bryant see commenting on this. Maybe I understand this wavPack feature totally wrong.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-12 13:37:46
As we're talking about default bahavior: what about -3?
I see two targets for -3:

a) -3 as a minor variant of -2, expected to be excellent under all circumstances as we expect it from -2, but with a detail behavior which is not as defensive as is -2. Your choice of using the same -spf values as that of -2 points in this direction. If we want to have it like this I suggest we increase the -skew value a bit.

b) as a seriously less defensive alternative to -2 targeting at a larger average bitrate gap than with what we have at the moment. To be more precise: if -2 yields say ~440 kbps on average, -3 should yield ~400 kbps. I guess it's achievable while still getting excellent quality. May be an even larger gap makes sense when being aware that quality may be sacrificed on hopefully rare occasion.
For b) the default setting should change quite a lot IMO.
Having extremely good encoding speed (like with your doing just 1 FFT) as a target fits rather good into this framework.

I personally don't have a favorite for a) or b).
My preference would be for b). Thinking about it, if at the end of the day the only options were -1, -2, -3, -nts and -fft; with -skew, -snr & -spf fixed according to the quality settings, then the user could decide how aggressive the processing was by using -fft and -nts alongside the -1, -2 or -3 quality setting.

On the other hand, maybe all of the analyses should use the same -skew, -snr and -spf values?

However, taking David's preference for only 4 command line options (-1, -2, -3 & -nts) then *maybe* other parameters should only be available when using the -3 quality option. The thinking being: "I've already accepted that I want reduced quality by selecting quality level -3, so the program will now let me foul it up myself rather than using presets....."

On -dither and -clipping, from listening to undithered output and the process never reducing amplitude then -dither seems to be expendable. Similarly, the iterative approach used in the current clipping prevention method has little impact on bitrate so the -clipping parameter also seems to be expendable.
Title: lossyWAV Development
Post by: halb27 on 2007-11-12 14:13:11
Target b) for -3: OK, so we should think about the details.

Identical -spf and -skew values for all of the three quality levels? I don't like the idea.

From my test when finding useful values for -spf I know some values really hurt bitrate efficiency wise (most of all the bold 1 in '11124' for the 64 sample FFT of -1) but may be vital for being real defensive with respect to the critical band at the lower edge of the corresponding frequency range. So I think it's neceesary for -1 (and would be most welcome for -2 too, but it's expensive and the more economic way of treating this within -2 may be by doing the additional 128 sample FFT).

With -skew it's similar. -skew is important for diffentiating resulting bitrate between regular and problematic spots, but with a value >24 the improved defensiveness is getting more and more expensive. So I think a value of 24 is very appropriate for -2, but it should be significantly higher only for -1. For -3 it should be <24.

Using very high values for -snr helps differentiating between regular and problematic spots too but with these values there's a rather high price to pay bitrate wise. So again high values of -snr should be used with -1 only IMO.

So I strongly think -1, -2, and -3 should consist of different -fft, -spf, -skew, -snr, and -nts settings in such a way that the overkill defensiveness, standard defensiveness, reduced defensiveness are represented best.

If you want to keep -1 and -2 clean of user options I suggest you do it for -3 as well, and instead create an experimental quality option -x which enables all the advanced options. advanced options = any option except for -1, -2, -3, -nts x (and -flac etc. in case these are ever needed - guess they won't).
Title: lossyWAV Development
Post by: Nick.C on 2007-11-12 14:17:06
Target b) for -3: OK, so we should think about the details.

Identical -spf and -skew values for all of the three quality levels? I don't like the idea.

From my test when finding useful values for -spf I know some values really hurt bitrate efficiency wise (most of all the bold 1 in '11124' for the 64 sample FFT of -1) but may be vital for being real defensive with respect to the critical band at the lower edge of the corresponding frequency range. So I think it's neceesary for -1 (and would be most welcome for -2 too but it's expensive and the more economic way of treating this within -2 may be by doing the additional 128 sample FFT).

With -skew it's similar. -skew is important for diffentiating resulting bitrate between regular and problematic spots, but with a value >24 the improved defensiveness is getting more and more expensive. So I think a value of 24 is very appropriate for -2, but it should be significantly higher only for -1. For -3 it should be <24.

Using very high values for -snr helps differentiating between regular and problematic spots too but with these values there's a rather high price to pay bitrate wise. So again high values of -snr should be used with -1 only IMO.

So I strongly think -1, -2, and -3 should consist of different -fft, -spf, -skew, -snr, and -nts settings in such a way that the overkill defensiveness, standard defensiveness, reduced defensiveness are represented best.

If you want to keep -1 and -2 clean of user options I suggest you do it for -3 as well, and instead create an experimental quality option -x which enables all the advanced options. advanced options = any option except for -1, -2, -3, -nts x (and -flac etc. in case these are ever needed - guess they won't).
I like the idea of the -x quality parameter (-0?) enabling the advanced options and also keeping -1, -2 & -3 "clean". This would be a copy of -2 and only those settings that the user input would be over-written, the rest being taken as per -2 for the processing.

On the -skew, -spf and -snr settings I am inclined to agree with you. The only difficult bit being agreeing what those settings will be.....
Title: lossyWAV Development
Post by: halb27 on 2007-11-12 14:58:12
I like the idea of the -x quality parameter (-0?) enabling the advanced options and also keeping -1, -2 & -3 "clean".

On the -skew, -spf and -snr settings I am inclined to agree with you. The only difficult bit being agreeing what those settings will be.....

When first thinking of the experimental option I also thought of -0 cause it matches the current naming scheme. But with the current schematics it makes the experimental quality level look superior to the standard quality levels. Though hopefully somebody might find a great setting this way I think -x (or an explicit
-experimental) is more appropriate.

'The only difficult bit being agreeing what those settings will be.....'. May be, let's see, but with -2 I think we're pretty much done already (better ideas always welcome):

-2 = -fft 11101 -spf 11235-11236-11336-FFFFF-1234D -cbs 512 -nts -1.5 -skew 24 -snr 18

With -1 I suggest to use

-1 = -fft 11111 -spf 11124-11125-11225-11225-11236 -cbs 512 -nts -3.0 -skew 30 -snr 24.

Most disputable may be -3.
Due to the 'significantly reduced defensiveness' target I suggest we use those -spf values I found in my -spf value testing. I think it's necessary for a significantly reduced average bitrate, and it still provided excellent quality. So the mixture of this and the current setting is

-3 = -fft 1001 -spf 11236-FFFF-FFFF-FFFF-1246E -cbs  512 -nts -0.5 -skew 18 -snr 12.

All these settings are pretty much what they are right now, and IMO they're just working out a little bit more what the various accents of the different quality levels stand for.
I don't care much about such details like whether -skew value for -3 should be rather 20 and -snr value 0 (my very personal preference but worth nothing).
Title: lossyWAV Development
Post by: Nick.C on 2007-11-12 22:25:21
-1 = -fft 11111 -spf 11124-11125-11225-11225-11236 -cbs 512 -nts -3.0 -skew 30 -snr 24.
-2 = -fft 11101 -spf 11235-11236-11336-FFFFF-1234D -cbs 512 -nts -1.5 -skew 24 -snr 18
-3 = -fft 10001 -spf 11236-FFFFF-FFFFF-FFFFF-1246E -cbs  512 -nts -0.5 -skew 18 -snr 12.

I don't care much about such details like whether -skew value for -3 should be rather 20 and -snr value 0 (my very personal preference but worth nothing).
The quality settings in the next revision will reflect those above (unless anyone else indicates a strong preference for something different).

I've been playing with the -fft parameter again and -3 -fft 00100 -spf ....-23346-..... -skew 24 yields 403kbps on my problematic sample set with no immediately apparent artifacts. I say immediately apparent because I don't believe that ABX'ing -3 is useful - to me -3 is the equivalent of listening in a car or on a train or plane - there is background noise already, so some minor changes to the original may / will be obscured by the noise floor of the listening environment. My "acceptability" testing takes place in an open-plan office environment with earbuds & DAP.

I am wondering about the clipping reduction method - at the moment, if it finds 1 or more sample which clips after rounding then it reduces bits_to_remove by one and tries again, until bits_to_remove=0 then it just stores the original values. Is 0 permissible clipping samples a bit too harsh? At the time thatthe iterative clipping was introduced, I put in an "allowable" variable, implying that a number of clipping (but rounded) samples may be permitted. I think that I should implement a "-allowable" parameter (1<=n<=64 (maximum permissible codec_block_size)) to set the allowable value as a clipping detection "threshold".
Title: lossyWAV Development
Post by: halb27 on 2007-11-12 22:50:01
The quality settings in the next revision will reflect those above (unless anyone else indicates a strong preference for something different).

I've been playing with the -fft parameter again and -3 -fft 00100 -spf ....-23346-..... -skew 24 yields 403kbps on my problematic sample set with no immediately apparent artifacts.  ....

Thanks a  lot.

As for your -3 approach (just 1 FFT, targeting a significantly lower bitrate than ~400 kbps for regular music ) I can try to help and do listening tests, especially with your setting. I wouldn't lower quality demand extremely however cause after all we will stay with pretty high bitrate, and with that I think we should have a distinction from what we can get with mp3 at moderate bitrate (though this is always a matter of taste).
Sorry I won't be able to do it within this week as I'm leaving for my father in law's 90th birthday (got some trouble at the moment producing a photo based dvd movie, and neither my old nor the new dvd player (present for my father in law) are playing it fine).
Title: lossyWAV Development
Post by: Nick.C on 2007-11-12 23:22:41
As for your -3 approach (just 1 FFT, targeting a significantly lower bitrate than ~400 kbps for regular music ) I can try to help and do listening tests, especially with your setting. I wouldn't lower quality demand extremely however cause after all we will stay with pretty high bitrate, and with that I think we should have a distinction from what we can get with mp3 at moderate bitrate (though this is always a matter of taste).
Sorry I won't be able to do it within this week as I'm leaving for my father in law's 90th birthday (got some trouble at the moment producing a photo based dvd movie, and neither my old nor the new dvd player (present for my father in law) are playing it fine).
Don't worry about the timescale, I will keep on trying to optimise the code..... I hope you have a great time at the party! Have you checked whether the DVD is written as UDF or not? This may make a difference.

I also tried -3 -fft 01100 -spf ffff-22335-22346-fffff-fffff -skew 24 which yielded 420kbps - not too bad at all. Second opinion definitely required. [edit] I will test some "real" music tomorrow and see what the bitrate comes out at. Maybe 400kbps for "real music" should be the target rather than approaching that for my problem set. [/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-11-12 23:39:51
... Have you checked whether the DVD is written as UDF or not? This may make a difference. ...

The DVD plays well on my PC so I think the DVD is fine. My own dvd player simply is broken and doesn't play any dvd any more. The new player plays the 'movie', but from time to time it skips the current spot a bit which especially sounds very ugly as the music skips. Guess it's a VBR problem and that's what I'm playing with all evening long but with limited success. Guess we'll exchange the player tomorrow.

As for your new -3 setting I like the new one better as it's more demanding. Let's hear how it sounds.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-13 13:59:34
As for your new -3 setting I like the new one better as it's more demanding. Let's hear how it sounds.
Another variation:

At the moment the method uses the Hanning window function on the input to the FFT analysis. Looking for "window function" in my favourite resource (Wikipedia) gives quite a long list. I have added a "-window" parameter to select which one to use. This allows the selection of 7 window functions (for evaulation / elimination at this stage): Hanning, Bartlett-Hann, Blackman, Nuttall, Blackman-Harris, Blackman-Nuttall and Flat-Top.

Will post revision tonight.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-11-13 15:19:17
What do you get for your test set resampled to 32kHz, processed with -2?

Does 32k resampling followed by ReplayGain (only negative values applied) help even more?

It makes sense to have a -3 along the lines you're proposing, but I suspect the above will be dramatically more efficient, and still artefact-free (though with a 16k LPF and, with RG, loud tracks becoming quieter).

Cheers,
David.
Title: lossyWAV Development
Post by: GeSomeone on 2007-11-13 17:41:06
Target b) for -3: OK, so we should think about the details.

Just following you dialog here.. 
This seems the right basic choice, there has to be a benefit for offering a (little) bit of quality. IMO that means a significant lower bit rate for -3 (compared with -2).

(Would -skew of -12 -18 -24 (for -3 -2 -1) be too agressive?)
I am wondering about the clipping reduction method - at the moment, if it finds 1 or more sample which clips after rounding then it reduces bits_to_remove by one and tries again, until bits_to_remove=0 then it just stores the original values. Is 0 permissible clipping samples a bit too harsh? At the time thatthe iterative clipping was introduced, I put in an "allowable" variable, implying that a number of clipping (but rounded) samples may be permitted.

I suppose you mean consecutive samples of the maximum (or minimum) value?  To me in this case 0, 1 or 2 would make sense, only already badly clipping music would be affected by other values.

And yes, the dither function is obsolete as you no longer opt to lower the amplitude.
I also tried -3 [..] which yielded 420kbps [..] [edit]Maybe 400kbps for "real music" should be the target rather than approaching that for my problem set. [/edit]

The problem with this is that from the offset this method aims for constant quality (I like that BTW) so the bit rate will vary. I found for example that music that already compresses well (lossless) like in the 600's will not get half the bit rates with the help of lossyWav but rather still around 420.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-13 21:25:40
What do you get for your test set resampled to 32kHz, processed with -2?

Does 32k resampling followed by ReplayGain (only negative values applied) help even more?

It makes sense to have a -3 along the lines you're proposing, but I suspect the above will be dramatically more efficient, and still artefact-free (though with a 16k LPF and, with RG, loud tracks becoming quieter).

Cheers,
David.
I tried it with revised -3 settings (see below) and got:
WAV: 125.11MB;
FLAC: 69.36MB, 782kbps;
-1: 52.59MB, 593kbps;
-2 @ 44.1kHz: 45.10MB, 509kbps;
-2 @ 32.0kHz: 38.97MB, 440kbps;
-3 @ 44.1kHz: 38.49MB, 434kbps;
-3 @ 32.0kHz: 33.95MB, 383kbps.
  - a 13.6% at -2 or 11.8%  at -3 further saving by resampling to 32kHz! The results didn't sound bad at all.

[edit] I tried a couple of albums and the results were a bit of a surprise: FLAC: 773MB, 914kbps; -3 @ 44.1kHz: 321MB, 381kbps; -3 @ 32.0kHz: 313MB, 371kbps. The size difference is welcome, but the resampling has a time overhead and the 16kHz LPF. [/edit]

Target b) for -3: OK, so we should think about the details.
Just following you dialog here.. 
This seems the right basic choice, there has to be a benefit for offering a (little) bit of quality. IMO that means a significant lower bit rate for -3 (compared with -2).

(Would -skew of -12 -18 -24 (for -3 -2 -1) be too agressive?)
I am wondering about the clipping reduction method - at the moment, if it finds 1 or more sample which clips after rounding then it reduces bits_to_remove by one and tries again, until bits_to_remove=0 then it just stores the original values. Is 0 permissible clipping samples a bit too harsh? At the time thatthe iterative clipping was introduced, I put in an "allowable" variable, implying that a number of clipping (but rounded) samples may be permitted.
I suppose you mean consecutive samples of the maximum (or minimum) value?  To me in this case 0, 1 or 2 would make sense, only already badly clipping music would be affected by other values.

And yes, the dither function is obsolete as you no longer opt to lower the amplitude.
I also tried -3 [..] which yielded 420kbps [..] [edit]Maybe 400kbps for "real music" should be the target rather than approaching that for my problem set. [/edit]
The problem with this is that from the offset this method aims for constant quality (I like that BTW) so the bit rate will vary. I found for example that music that already compresses well (lossless) like in the 600's will not get half the bit rates with the help of lossyWav but rather still around 420.
I've settled on a set of settings which are in some ways similar to -2 but using different fft lengths, -nts and -spf, see below. 434kbps is a reasonable bitrate at a reasonable quality (I can't hear anything wrong, but my ears are 39 years old......). The -allowable parameter only counts individual clips, it doesn't look for multiples (although it could, at a slight speed penalty). The -window parameter hasn't made it into this revision as I have to check the bit reduction noise calculations for each new spreading function to ensure that I'm not adding the "wrong" amount of noise per bit removed.

Feedback, as always is requested and valued.

lossyWAV alpha v0.4.5 attached: Superseded.

-3 settings tweaked;

-allowable parameter implemented to allow a number of clips per codec block (total per block per channel).[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV alpha v0.4.5 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-1            extreme quality [5xFFT] (-cbs 512 -nts -3.0 -skew 30 -snr 24
              -spf 11124-11125-11225-11225-11236 -fft 11111)
-2            default quality [4xFFT] (-cbs 512 -nts -1.5 -skew 24 -snr 18
              -spf 11235-11236-11336-12348-1234D -fft 11101)
-3            compact quality [3xFFT] (-cbs 512 -nts -0.5 -skew 24 -snr 18
              -spf 22236-22237-22347-22358-2234E -fft 01010)

-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists; default=off

Advanced / System Options:

-nts <n>      set noise_threshold_shift to n dB (-18dB<=n<=0dB)
              (reduces overall bits to remove by 1 bit for every 6.0206dB)
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (0dB<=n<=48dB)
-skew <n>     skew fft analysis results by n dB (0db<=n<=48db) in the
              frequency range 20Hz to 3.45kHz
-cbs <n>      set codec block size to n samples (512<=n<=4608, n mod 32=0)
-fft <5xbin>  select fft lengths to use in analysis, using binary switching,
              from 64, 128, 256, 512 & 1024 samples, e.g. 01001 = 128,1024
-overlap      enable conservative fft overlap method; default=off

-spf <5x5hex> manually input the 5 spreading functions as 5 x 5 characters;
              These correspond to FFTs of 64, 128, 256, 512 & 1024 samples;
              e.g. 44444-44444-44444-44444-44444 (Characters must be one of
              1 to 9 and A to F (zero excluded).
-clipping     disable clipping prevention by iteration; default=off
-allowable    select allowable number of clipping samples per codec block
              before iterative clipping reduction; (0<=n<=64, default=0).
-dither       dither output using triangular dither; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]
Title: lossyWAV Development
Post by: GeSomeone on 2007-11-14 13:55:43
Code: [Select]
-allowable    select allowable number of clipping samples per codec block
              before iterative clipping reduction; (0<=n<=64, default=0).

I tried with/without -allowable 1 on a track that hits full scale.
Doesn't make a lot of difference here.
Code: [Select]
%lossyWAV Warning% : Codec_block_size forced to 512 bytes.
%lossyWAV Warning% : Allowable clipping samples set to 1 per codec block.
%lossyWAV Warning% : Process priority set to low.
temp-6605F1A4E1877D7AA8BA0D93BF92EA95.wav;5.2624;67932;12909;8.92x
%lossyWAV Warning% : 9 sample(s) clipped to maximum +ve amplitude.

%lossyWAV Warning% : Codec_block_size forced to 512 bytes.
%lossyWAV Warning% : Process priority set to low.
temp-6605F1A4E1877D7AA8BA0D93BF92EA95.wav;5.2606;67909;12909;9.01x
%lossyWAV Warning% : 23 bits not removed due to clipping.


BTW shouldn't the logging say 512 samples 
Title: lossyWAV Development
Post by: Nick.C on 2007-11-14 15:50:26
[code]-allowable   select allowable number of clipping samples per codec block
           before iterative clipping reduction; (0<=n<=64, default=0).[/code

I tried with/without -allowable 1 on a track that hits full scale.
Doesn't make a lot of difference here.

BTW shouldn't the logging say 512 samples 
  erm, yes, you would be correct in that assertion!

-allowable 1 will only allow 1 sample per channel per codec_block to clip - try with -clipping instead to see what the maximum bits to remove for the track in question would be (this will also give you a count of samples which clip over or under) and then play about with -allowable. The parameter will take up to 64 permitted clips per channel per codec_block.
Title: lossyWAV Development
Post by: jesseg on 2007-11-15 01:59:25
I was thinking today that it would be nice to be able to just drop a wav (or multiple wavs) onto a small app and get lossyFLAC files in return.

So after about 8 hours of opcode hexing, and batch file scripting...  lFLCDrop is born.    See attached.


You will have to download and/or copy flac.exe and lossyWAV.exe into the folder you will use lFLCDrop in, because I'm not sure what the licenses are for redistributing that stuff yet.  I'll have to check that out, but if anyone knows off hand, you could let me know to save me the hassle. 

I should note that lFLC.bat for lFLCDrop v1.0.0.0 is forcing 576 sample blocks for lossyWAV and FLAC, due to Winamp's in_flac plugin not showing the spectrum in the classic visualization when 512 sample blocks are used.  (i don't have modern skins installed to test).  If this fix is not ok with you, feel free to chance it in the batch file.  For quick reference here's the command line used for both:
lossyWAV [input] [quality] -o [output] -nowarn -cbs 576
flac -8 -o [output] --delete-input-file -f -b 576 [input]
and you should note that FLAC is deleting a temp file, not your source file.  If you want to delete your source files, the option is available if you right-click on the lFLCDrop GUI. 


The next thing I plan to do is create an lFLC.bat for use with EAC, including passing in variables for use in tagging.  It might take a bit longer to test due to the possibility of it being impossible to get around certain characters being passed in.  Mainly double-quotes & percent signs, but it will need some testing for sure.


at any rate, enjoy

p.s. thanks to all of the people involved with lossyWAV and of course FLAC, and to Layer3Maniac for making the original FlacDrop.  Without all of you, this would not have been possible and I take no credit for anything I've done which belongs with you all.  This is mentioned in the readme file, but I thought it would be considerate to have here as well.

[edit] removed, newer version posted later in the thread [/edit]
Title: lossyWAV Development
Post by: Nick.C on 2007-11-15 12:25:13
So after about 8 hours of opcode hexing, and batch file scripting...  lFLCDrop is born.    See attached.
Nice to hear that you think enough of the processor to create a method of using it! lossyWAV is LGPL (although exactly what that means, I still need to get my head round.....), by the way.

Possible bug report:

I am in the process of batch converting circa 1500 tracks in Foobar2000 v0.9.5 beta1 using FLAC v1.2.1 and lossyWAV v0.4.5. I got a bit concerned when after a while I noticed that the total time of the output files is less than that of the input files. Narrowing it down, I find that some tracks are exactly 8 codec blocks (4096 samples / 16kB) shorter than they should be. I am at a loss as to why this is occurring.

[edit] I've looked at the throughput as one album with 2 affected tracks processes: the input and processed WAV files are the same length..... [/edit]

[edit2] As an aside, I'm 623 tracks in and the processing (-3) has brought the bitrate down from 854kbps to 392kbps (8.27GB / 18.0GB). [/edit2]
Title: lossyWAV Development
Post by: Nick.C on 2007-11-16 07:59:13
In reply to PM regarding conversion in Foobar2000: [edit] See wiki article [/edit]

I'm still working on the user selectable window function parameter, this should be ready tonight or tomorrow.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-16 21:32:53
lossyWAV alpha v0.4.6 attached: Superseded - bug report.

Added noise due to bit reduction calculations re-done. Calculated for the seven user selectable window functions. Slight change in bits_to_remove (more) than v0.4.5;

"-nts n" parameter now valid in the range -18dB to +6dB;

"-window n" parameter (0<=n<=6) selects window function to use in FFT analysis.

[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV alpha v0.4.6 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-1            extreme quality [5xFFT] (-cbs 512 -nts -3.0 -skew 30 -snr 24
              -spf 11124-11125-11225-11225-11236 -fft 11111)
-2            default quality [4xFFT] (-cbs 512 -nts -1.5 -skew 24 -snr 18
              -spf 11235-11236-11336-12348-1234D -fft 11101)
-3            compact quality [2xFFT] (-cbs 512 -nts -0.5 -skew 24 -snr 18
              -spf 22236-22237-22347-22358-2234E -fft 01010)

-o <folder>   destination folder for the output file
-force        forcibly over-write output file if it exists; default=off

Advanced / System Options:

-nts <n>      set noise_threshold_shift to n dB (-18dB<=n<=+6.0dB)
              (-ve values reduces bits to remove, +ve value increase)
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (0dB<=n<=48dB) Increasing value reduces bits to remove.
-skew <n>     skew fft analysis results by n dB (0db<=n<=48db) in the
              frequency range 20Hz to 3.45kHz
-cbs <n>      set codec block size to n samples (512<=n<=4608, n mod 32=0)
-fft <5xbin>  select fft lengths to use in analysis, using binary switching,
              from 64, 128, 256, 512 & 1024 samples, e.g. 01001 = 128,1024
-overlap      enable conservative fft overlap method; default=off

-spf <5x5hex> manually input the 5 spreading functions as 5 x 5 characters;
              These correspond to FFTs of 64, 128, 256, 512 & 1024 samples;
              e.g. 44444-44444-44444-44444-44444 (Characters must be one of
              1 to 9 and A to F (zero excluded).
-clipping     disable clipping prevention by iteration; default=off
-allowable    select allowable number of clipping samples per codec block
              before iterative clipping reduction; (0<=n<=64, default=0).
-window       select windowing function n (0<=n<=6, default=0); 0=Hanning
              1=Bartlett-Hann; 2=Blackman; 3=Nuttall; 4=Blackman-Harris;
              5=Blackman-Nuttall; 6=Flat-Top.
-dither       dither output using triangular dither; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]
Implementation of the "-wmalsl" parameter to force codec_block_size to 2048 samples will be implemented for the next revision.

[edit] Possible candidate for -3: -3 -nts 6 -skew 36 -snr 21. Currently processing FLAC > lossyFLAC 1496 tracks, 859kbps > 337kbps. 40.8GB > 16.0GB

Really quite palatable to listen to. I think the interplay between the -nts +6 (take the minimum value found and add 6dB) and -snr 21 (take the average of all relevant bins and subtract 21dB), then take the lower of the modified minimum and the modified average, produces quite a robust check against added noise. I am listening to a lot of the output (4d17h27m27.333s) trying to find the artifacts I *really* expect to be there at that bitrate. None yet. Quite pleased.[/edit]
Title: lossyWAV Development
Post by: BGonz808 on 2007-11-18 02:57:38
Thanks to everyone for bettering LossyWAV!! I don't know exactly what is happening here, but when I try to run version 0.4.6 it just outputs a wav header and no data. I attached a screenshot of the commandline, and it appears that LossyWAV doesn't even try to render any audio  for me. I'm running an Intel Celeron processor @ 2.4ghz (the P4 based style) and I'm wondering if something SSE-wise just isn't meshing with my processor. If anybody has any answers they will be greatly appreciated, but I'm gonna for now hope that newer versions will work again for me.

Thanks!
-808
Title: lossyWAV Development
Post by: Nick.C on 2007-11-18 08:13:34
Thanks to everyone for bettering LossyWAV!! I don't know exactly what is happening here, but when I try to run version 0.4.6 it just outputs a wav header and no data. I attached a screenshot of the commandline, and it appears that LossyWAV doesn't even try to render any audio  for me. I'm running an Intel Celeron processor @ 2.4ghz (the P4 based style) and I'm wondering if something SSE-wise just isn't meshing with my processor. If anybody has any answers they will be greatly appreciated, but I'm gonna for now hope that newer versions will work again for me.

Thanks!
-808
Did v0.4.5 work properly with the same settings? I thought that I had got rid of all SSE instructions in v0.4.5 and don't think that I've added any into v0.4.6 (although I'll check anyway). I'll look for bugs and revert.

[edit]There's a bug in the -detail parameter which seems to prematurely end the process. I'll amend and include in the the next revision.[/edit]
Title: lossyWAV Development
Post by: Nick.C on 2007-11-18 10:54:29
[edit]There's a bug in the -detail parameter which seems to prematurely end the process. I'll amend and include in the the next revision.[/edit]
lossyWAV alpha v0.4.7 attached: Superseded.

-detail bug corrected.

Thanks BGonz808!
Title: lossyWAV Development
Post by: Mitch 1 2 on 2007-11-18 11:39:39
Using lossyWAV -3 -nts 6 -skew 36 -snr 21, my (small) test set achieved an average 344kbps with FLAC, compared to ~400 with -3 alone. Some files were smaller using FLAC, while others were smaller using WMALSL, and the difference between the two codecs over the whole set was negligible.
Title: lossyWAV Development
Post by: halb27 on 2007-11-18 20:24:06
I tried 0.4.7 on my regular/problem test set and got the following average bitrates:

-1: 512/585 kbps for my regular/problem sample set
-2: 430/539 kbps for my regular/problem sample set
-3: 388/481 kbps for my regular/problem sample set
-3 -nts 6 -skew 36 -snr 21: 338/468 kbps for my regular/problem sample set.

To me these are a very attractive bitrate variations for the various quality levels, and the average bitrate differences between regular and problems samples show at least in a statistical sense that lossyWav can differentiate well what to do according to the different situations.

Your new -3 candidate looks extremely attractive judging from the statistics, Nick.
Statistics however doesn't really tell about quality, so I tried -3 -nts 6 -skew 36 -snr 21 on my problem samples as well as on some tracks of regular music.
Surprise was the only issue I found was with badvilbel at ~sec. 19.0 where I could abx the added hiss 8/10. This added hiss is so negligible to me that it is well within the excellent quality I'd like to see with -3.
I have never thought before that lossyWav is that good at an average bitrate of ~340 kbps with regular music.
Great work, Nick.

So this is the way to go for -3 IMO as long as we don't get bad news. Maybe even for -2 in an adapted and more cautious way.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-18 21:05:10
I tried 0.4.7 on my regular/problem test set and got the following average bitrates:

-1: 512/585 kbps for my regular/problem sample set
-2: 430/539 kbps for my regular/problem sample set
-3: 388/481 kbps for my regular/problem sample set
-3 -nts 6 -skew 36 -snr 21: 338/468 kbps for my regular/problem sample set.

To me these are a very attractive bitrate variations for the various quality levels, and the average bitrate differences between regular and problems samples show at least in a statistical sense that lossyWav can differentiate well what to do according to the different situations.

Your new -3 candidate looks extremely attractive judging from the statistics, Nick.
Statistics however doesn't really tell about quality, so I tried -3 -nts 6 -skew 36 -snr 21 on my problem samples as well as on some tracks of regular music.
Surprise was the only issue I found was with badvilbel at ~sec. 19.0 where I could abx the added hiss 8/10. This added hiss is so negligible to me that it is well within the excellent quality I'd like to see with -3.
I have never thought before that lossyWav is that good at an average bitrate of ~340 kbps with regular music.
Great work, Nick.

So this is the way to go for -3 IMO as long as we don't get bad news. Maybe even for -2 in an adapted and more cautious way.
Well, I'm very glad to hear that you like the new -3 proposal. I will implement this in v0.4.8. I was pretty astonished when I got to the end of the 1496 track processing and the output was 16GB from 40.8GB input.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-18 22:29:56
lossyWAV alpha v0.4.8 attached: Superseded.

-wmalsl parameter implemented : sets codec_block_size to 2048 samples, incompatible with -cbs parameter;

-3 quality level changed to -cbs 512 -fft 01010 -snr 21 -skew 36 -nts +6.0 -spf 22236-22237-22347-22358-2246E;

Code speeded up a bit further - I still don't understand the speed increases available by properly aligning variables......
Title: lossyWAV Development
Post by: Nick.C on 2007-11-20 22:48:58
Thinking further on what Halb27 was saying about hiss in badvilbel, I have been iterating with the -3 settings and have arrived at:

-3:  -fft 10001 -spf 22235-22236-22347-22358-2247F -snr 21 -skew 36 -nts 6

This gives a lossyFLAC output of 35.95MB / 405.5kbps with a fairly significant reduction in bits_to_remove for badvilbel, and also reverts to the original two fft lengths in David's script. This is in contrast to 34.62MB / 390.5kbps for alpha v0.4.8 -3 settings. Slightly more conservative, but if it reduces noticable hiss, then I;m all for it (however, I haven't heard any added hiss on my iPAQ at existing -3 settings - but the noise floor for audio output is not great on it).

I intend to implement these settings for the next revision, unless of course anyone feels strongly that I shouldn't (alternative settings welcomed).

Nick.
Title: lossyWAV Development
Post by: halb27 on 2007-11-20 23:42:32
Hmm,

If it's only about the (to me) negligible added hiss above hiss that is already there in the original badvilbel I personally wouldn't care about it. I've grown to love your current -3 setting. I've been listening to a lot of music with current -3 trying to abx problems on suspicious spots, and I'm very happy with it. To me it's a very good solution for people who want great quality on a FLAC enabled DAP.
Sure it's all within the usual restriction of experience so far. But remember it's about -3 here.
Everybody can increase -3 quality to his liking by lowering -nts.

Anyway I'll try your new -3 proposal tomorrow.

I've tried a lot of settings for -2 with your -3 idea in mind: using a rather high -skew and -snr value, a rather high -nts value, and being very restrictive with using spreading_length = 1, and I ended up with

-2 -fft 11011 -spf 33335-22236-22348-123FF-23FFF -nts 0.0 -skew 36 -snr 24

It yields an average of 405/549 kbps on my regular/problem sample set which compares favorably with the 430/539 kbps of the current -2 setting.
Moreover -nts 0 should be defaulted for security IMO but I guess using a positive user chosen -nts value is fine. Trading -nts 2.5 for -nts 0 for instance yields 388/540 kbps for my regular/problem sample set.

I will do a listening test with it (using -nts 2.5) tomorrow.

The idea behind the -spf setting is (apart from merging current setting with your -3 setting):
a) Make the 64 and 128 sample FFT the primary decision basis for deciding on the 2 highest frequence ranges. Give the 64 sample FFT a minor influence on decision making for the 3 lower frequency ranges.
b) Make the 512 and 1024 sample FFT the primary decision basis for deciding on the 2 lowest frequency ranges. Give the 1024 sample FFT a negligible influence on decision making for the 3 higher frequency ranges. Same for the 512 sample FFT with respect to the 2 highest frequency ranges.
c) Make the 128 and 512 sample FFT the primary decision basis for deciding on the 3rd of your 5 frequency ranges.
d) Details are chosen on a cost consideration. For instance the 2s in the 128 sample FFT setting cost next to nothing (at first I wanted to have them as a 3 as with the 64 sample FFT setting).

I will report on the listening test.

BTW I've found a little bug: -nts 1 doesn't do what it should do: -nts 0.99 is fine as is -nts 1.01, but with -nts 1.00 bits removed are far too low (less than with -nts 0.0).
Title: lossyWAV Development
Post by: halb27 on 2007-11-21 07:15:54
I couldn't resist and try your new -3 proposal this morning, Nick.

The statistics says 343/473 kbps on average for my regular/problem set which is very close to the 338/468 kbps of the current -3 setting.
I also tried the 'hiss spot' of badvilbel, and I can't abx the difference.

Looking more closesly at the new setting it is a bit of what I have in mind with -2: let the short FFT do the main decision job for the high frequencies (the short FFT is good at that), and let the long FFT do the main decision job on the low frequencies (the short FFT isn't good at that).

Sorry for having been pretty negative about the new -3 setting. Guess I was a bit upset cause I've done a lot of listening effort with the current -3 setting. But I think this wasn't useless when switching to the new setting. The major principle is the same, and it is a little bit more defensive. Sure I'll try the new setting with my usual problem samples tonight. To me this is sufficient and I won't go through part of my regular collection again.

What's more relevant IMO: why is this -nts x setting, with x>0 to a rather high degree so good? Can we trust it so much to use a positive -nts also for the higher quality settings?

A high -skew value is a good thing for differentiating good and bad spots (with respect to 'number of bits to remove') in the music. But -skew is effective only at rather low frequencies. Together with a high -skew value -snr also does a good job differentiating. But because of this interconnection I'm afraid -snr is effective also only in the low to lower medium frequency range below ~3 kHz.
If this is correct using a positive -nts value leaves the high frequency range under reduced noise control.
However from what we experienced so far this doesn't seem to have a practical negative impact.
Maybe dropping the same amount of LSBs in an entire block usually gives a noise floor with frequencies below 3 kHz which is caught well by the skew/snr machinery even with a rather high positive -nts value?
Or maybe maybe the ATH curve is relevant here which gives reduced sensitivity to the 3+ kHz range for low level signals?

In either case it would be very welcome if younger members could contribute listening. If for instance everything's fine in the high frequency range to my old ears this doesn't say a lot.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-21 08:14:44
I couldn't resist and try your new -3 proposal this morning, Nick.

The statistics says 343/473 kbps on average for my regular/problem set which is very close to the 338/468 kbps of the current -3 setting.
I also tried the 'hiss spot' of badvilbel, and I can't abx the difference.

Looking more closesly at the new setting it is a bit of what I have in mind with -2: let the short FFT do the main decision job for the high frequencies (the short FFT is good at that), and let the long FFT do the main decision job on the low frequencies (the short FFT isn't good at that).

Sorry for having been pretty negative about the new -3 setting. Guess I was a bit upset cause I've done a lot of listening effort with the current -3 setting. But I think this wasn't useless when switching to the new setting. The major principle is the same, and it is a little bit more defensive. Sure I'll try the new setting with my usual problem samples tonight. To me this is sufficient and I won't go through part of my regular collection again.

What's more relevant IMO: why is this -nts x setting, with x>0 to a rather high degree so good? Can we trust it so much to use a positive -nts also for the higher quality settings?

A high -skew value is a good thing for differentiating good and bad spots (with respect to 'number of bits to remove') in the music. But -skew is effective only at rather low frequencies. Together with a high -skew value -snr also does a good job differentiating. But because of this interconnection I'm afraid -snr is effective also only in the low to lower medium frequency range below ~3 kHz.
If this is correct using a positive -nts value leaves the high frequency range under reduced noise control.
However from what we experienced so far this doesn't seem to have a practical negative impact.
Maybe dropping the same amount of LSBs in an entire block usually gives a noise floor with frequencies below 3 kHz which is caught well by the skew/snr machinery even with a rather high positive -nts value?
Or maybe maybe the ATH curve is relevant here which gives reduced sensitivity to the 3+ kHz range for low level signals?

In either case it would be very welcome if younger members could contribute listening. If for instance everything's fine in the high frequency range to my old ears this doesn't say a lot.
I'm glad that the badvilbel hiss has disappeared - I tried quite a few permutations before arriving at this latest proposal - I have also done quite a bit of listening at current -3 .

I think that due to the high skew value, and the fact that it weights in favour of the lower frequencies, will produce minimum values at low frequencies quite often. As there are artificially weighted, to add 6dB to them has no major detrimental effect on the output.

-snr is currently the average of the skewed & spread fft results. I had thought about making it the plain average of the relevant bins (pre-skewing) to see what effect that has, but put it off as I feel that this will effectively weight the higher frequencies. Another option would be to take the average of the skewed results, pre-spreading.

I'll take a look to see what's wrong with -nts 1.0.

Ditto your request for younger ears to test the output - it would be very much appreciated.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-21 12:37:29
lossyWAV alpha v0.4.9 attached: Superseded.

-3 quality settings modified to: -fft 10001 -nts 6.0 -snr 21 -skew 36 -spf 22235-22236-22347-22358-2246C;

-nts 1.0 bug rectified.

This results in 406.9kbps / 36.08MB for my 53 sample set. [edit] Currently re-processing my 1496 track set, 536 tracks in: 5.18GB / 345kbps output from 13.4GB / 895kbps input. [/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-11-21 13:43:17
I see you made things a little more defensive for -3.

I've been thinking about listening tests. In order to make listening experience expendable throughout our quality levels, and with regard to the very good quality of these -3 settings I think we should make -2 a more defensive version than -3 in any detail (and -1 a more defensive version than -2 with every detail).
This way everybody can try -3 where problems can be heard most easily in case they exist. The resulting improvements on -3 can then be carried over analogously to -2 and -1.
It would be different if a certain say -2 detail wasn't necessarily more defensive than the corresponding -3 detail.
Moreover meanwhile I think we can use a slightly positive -nts value with -2 too when using a high -skew and -snr value.
I also feel that 3 analyses should be enough for -2, so speed can be improved compared to the current 4 analyses used.
So I have to change my -2 suggestion I wanted to listen to tonight.
Title: lossyWAV Development
Post by: Josef Pohm on 2007-11-21 16:25:30
Well, while we are talking about defaults settings... in these days I've been working, just for the fun of it, on a very simple algorhythm which, using as a base official defaults, apply some morphing between them and slowly goes to pure lossless, so that you can input a floating point value in the range between [0.00 .. 4.00] instead of (-1;-2;-3) as a quality setting.

Here's some examples (please note that 1.0; 2.0; 3.0 are official defaults). Though numbers look fine, it is also possible that many of these combinations are worth nothing, as they are obtained as pure morphing. All this is just to show a possible feature.

SetF, LossyWAV 0.4.8, Tak 1.02 -p3m
Code: [Select]
-------------------------------------------------------------------------
Qual. String                                              Rem.Bits  kb/s
-------------------------------------------------------------------------
0,2 | 4096 -15,0 44,4 43 1111211112111121111211112 11111 | 0,3797 | 830 |
0,4 | 2048 -12,0 40,8 38 1111211113111131111311123 11111 | 1,1347 | 770 |
0,6 | 2048  -9,0 37,2 34 1112311123112231122311224 11111 | 2,2097 | 678 |
0,8 | 1024  -6,0 33,6 29 1112311124112241122411235 11111 | 3,3363 | 593 |
1,0 |  512  -3,0 30,0 24 1112411125112251122511236 11111 | 4,3924 | 523 |
1,4 |  512  -2,4 27,6 22 111351113611236112381124D 11111 | 4,7898 | 491 |
1,6 |  512  -2,1 26,4 20 112351123611336113481134D 11101 | 5,2201 | 458 |
2,0 |  512  -1,5 24,0 18 112351123611336123481234D 11101 | 5,4594 | 440 |
2,4 |  512   1,5 28,8 19 112361123711347123581234E 11010 | 5,9919 | 401 |
2,7 |  512   3,8 32,4 20 122361223712347123581234E 01010 | 6,4358 | 370 |
3,0 |  512   6,0 36,0 21 222362223722347223582234E 01010 | 6,9055 | 337 |
3,4 |  512   6,0 21,6 13 7778A7778A7788A7789B7788E 01010 | 7,6182 | 295 |
3,8 |  512   6,0  7,2  4 CCCDDCCCDDCCDDDCCDDECCDDF 00100 | 8,1760 | 269 |
-------------------------------------------------------------------------
Title: lossyWAV Development
Post by: halb27 on 2007-11-21 18:25:51
.... apply some morphing between them ...

How did you do the morphing?
Title: lossyWAV Development
Post by: Nick.C on 2007-11-21 19:43:08
Well, while we are talking about defaults settings... in these days I've been working, just for the fun of it, on a very simple algorhythm....
If you could post / pm / em a copy of the algorithm to me (which language?), I will certainly have a look. This could have interesting possibilities.

[edit] Proposal for -2: -fft 10101 -nts 1.5 -snr 24 -skew 36 -spf 11224-12235-12346-22357-22459; 44.23MB / 498.9kbps for my 53 sample set. [edit]

[edit] Proposal for -1: -fft 11111 -nts -3.0 -snr 27 -skew 36 -spf 11124-11125-11225-11226-11236; 53.21MB / 600.2kbps for my 53 sample set. [edit]
Title: lossyWAV Development
Post by: halb27 on 2007-11-21 23:51:08
My statistics for the 0.4.9 -3 setting: 345/474 kbps on average for my regular/problem sample set which is very fine to me: pretty low bitrate for the regular samples and probably sufficiently high bitrate for the problems which is confirmed also by the listening experience so far.

As for your new proposals for -2 and -3: honestly speaking I don't like them very much.
Your -2 proposal yields 420/543 kbps on average for my regular/problem samples, and this is not a lot better in comparison to the 430/539 kbps on average of the current -2 setting which doesn't 'suffer' from the a bit questionable positive -nts setting. I do favor a positive -nts value for -2 as much as you do, but when doing so I would expect a lower bitrate for regular tracks and/or a higher bitrate for problematic tracks.
With -1 it's 523/601 kbps for regular/problematic tracks, and this too isn't a real progress from the 512/585 kbps for the current -1 setting.

I did a lot of variations for also finding a hopefully improved -2 and -1 setting.
As you do I also favor a small positive -nts value together with -skew 36 -snr 24 when it's up to -2. I decided for -nts 2 but I really don't care whether it's 1.5 or 2.0.
With the fft setting however my approach is different. I do want to let the longer FFTs decide on the low frequencies cause only they have a good resolution there. This also improves the differentiation between good and problematic spots which is enhanced by the high skew/snr setting. A spreading length of 1 with the short FFTs in contrary has a tendency to be rather contraproductive in this sense. So in principle a 64 and 1024 sample FFT should do the job, but I'm still a bit worried about the 1024 sample FFT stretching so far beyond the block borders. So I decided to use a 64, 512, and 1024 sample FFT.
I tuned the details and ended up with

-2 -skew 36 -snr 24 -fft 10011 -spf 22235-22236-22347-12359-1236C -nts 2

which yields 395/551 kbps for my regular/problematic tracks.

With -1 I also wanted to use a negative -nts value like you do (more exactly: 0 as the utmost limit).
I found differentiation between good and bad still improves a bit when going -skew 40, but there's no real improvement in good/bad spot differentiation when using a higher -snr value. Going from -snr 21 to -snr 24 to -snr 27 put up bitrate by the same amount for the regular as well as the problematic set. Going -snr 30 was contraproductive already. So I used -snr 21 and decided for -nts -1 (with a larger -snr value I preferred -nts 0).
I added a 128 sample FFT because even for the higher mid frequency range the resolution of the 64 sample FFT is a bit restricted.
So I ended up with

-1 -skew 40 -snr 21 -fft 11011 -spf 22224-22225-11235-11246-12358 -nts -1

which yields 452/576 kbps for my regular/problematic set.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-22 08:01:32
Your -2 proposal yields 420/543 kbps on average for my regular/problem samples, and this is not a lot better in comparison to the 430/539 kbps on average of the current -2 setting.....
With -1 it's 523/601 kbps for regular/problematic tracks, and this too isn't a real progress from the 512/585 kbps for the current -1 setting......
I wasn't trying for a revolutionary change in bitrate, rather a slight evolutionary reduction - I also tried to keep a logical progression in parameters between quality levels (i.e. -nts 6.0,1.5,-3.0, step -4.5; -snr 21,24,27, step 3.0, skew 36,36,36, step 0.0).
-2 -skew 36 -snr 24 -fft 10011 -spf 22235-22236-22347-12359-1236C -nts 2
which yields 395/551 kbps for my regular/problematic tracks.

-1 -skew 40 -snr 21 -fft 11011 -spf 22224-22225-11235-11246-12358 -nts -1
which yields 452/576 kbps for my regular/problematic set.
Personally, I would prefer to keep -skew constant. I think I see where you're coming from with respect to not using spread length=1 at short fft lengths. Time for more iterations....
Title: lossyWAV Development
Post by: halb27 on 2007-11-22 09:06:28
... I would prefer to keep -skew constant.

No problem for -1 for me. Though -skew 40 brought a little progress for differentiating good and bad, it was not very significant. Same goes for -snr 21/24/27. It doesn't really matter.
So we can use -skew 36 throughout all the quality levels, and maybe -snr 24 or 27 for -1 for the good of a certain systematics.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-22 10:06:19
... I would prefer to keep -skew constant.
No problem for -1 for me. Though -skew 40 brought a little progress for differentiating good and bad, it was not very significant. Same goes for -snr 21/24/27. It doesn't really matter.
So we can use -skew 36 throughout all the quality levels, and maybe -snr 24 or 27 for -1 for the good of a certain systematics.
Taking into account what you were saying and after some iteration:

-2 quality settings: -fft 10101 -snr 21 -skew 36 -nts 1.5 -spf 22224-22235-22336-12347-12358; Yields 42.22MB / 476.3kbps.
Title: lossyWAV Development
Post by: halb27 on 2007-11-22 11:57:50
-2 quality settings: -fft 10101 -snr 21 -skew 36 -nts 1.5 -spf 22224-22235-22336-12347-12358; Yields 42.22MB / 476.3kbps.

I will try that out this evening to see what it means for regular/problem tracks.

-snr 21 -skew 36 -nts 1.5 is fine to me.

Not using 1s for the short blocks also perfectly matches my ideas.

Using a spreading of 22224 on the 64 sample FFT instead of 22235 as of my proposal will bring average bitrate up for regular music, but it is also more defensive and in a sense more appropriate for the rather crude frequency resolution of a 64 sample FFT even at higher frequencies. I did consider such a thing too but gave it away for efficiency. Just a matter of taste. For taking care of the restricted resolution of the 64 sample FFT an alternative is to use a 64 and 128 sample FFT with a spreading of 22235 for FFT64 and 22236 for FFT128. This doesn't push up average bitrate significantly but costs another analysis.
I don't care very much about these details but would prefer the 64 and 128 sample FFT solution because efficiency isn't seriously affected.

-fft 10101 looks nice being so symmetric, but what is the idea behind the 256 sample FFT?
I prefer a 512 sample FFT as an alternative addition to the 1024 sample FFT which reaches out pretty much into the neighboring blocks, and because of this might push up the decisive min values due to energy from the neighboring blocks. I don't know whether this can really be a problem, but I don't feel very secure using only a 1024 sample FFT.

A spreading of 12358 for the 1024 sample FFT looks a bit demanding to me for the high frequency regions especially as we do take good care of HF with the short FFT(s) with your proposal and with my 2 short FFTs alternative. But maybe it doesn't matter much as the demanding values for the short FFTs push up bitrate for the high frequencies already. In case we agree on my proposal for the two short FFTs my feeling is not to care much for the HF with the long FFTs, not to be very demanding here, and leave this up to the short FFTs.

Let's see what the statistics says for regular/problem tracks.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-22 12:11:19
I will try that out this evening to see what it means for regular/problem tracks.

-snr 21 -skew 36 -nts 1.5 is fine to me.

Not using 1s for the short blocks also perfectly matches my ideas.

Using a spreading of 22224 on the 64 sample FFT instead of 22235 as of my proposal will bring average bitrate up for regular music, but it is also more defensive and in a sense more appropriate for the rather crude frequency resolution of a 64 sample FFT even at higher frequencies. I did consider such a thing too but gave it away for efficiency. Just a matter of taste. For taking care of the restricted resolution of the 64 sample FFT an alternative is to use a 64 and 128 sample FFT with a spreading of 22235 for FFT64 and 22236 for FFT128. This doesn't push up average bitrate significantly but costs another analysis.
I don't care very much about these details but would prefer the 64 and 128 sample FFT solution because efficiency isn't seriously affected.

-fft 10101 looks nice being so symmetric, but what is the idea behind the 256 sample FFT?
I prefer a 512 sample FFT as an alternative addition to the 1024 sample FFT which reaches out pretty much into the neighboring blocks, and because of this might push up the decisive min values due to energy from the neighboring blocks. I don't know whether this can really be a problem, but I don't feel very secure using only a 1024 sample FFT.

A spreading of 12358 for the 1024 sample FFT looks a bit demanding to me for the high frequency regions especially as we do take good care of HF with the short FFT(s) with your proposal and with my 2 short FFTs alternative. But maybe it doesn't matter much as the demanding values for the short FFTs push up bitrate for the high frequencies already. In case we agree on my proposal for the two short FFTs my feeling is not to care much for the HF with the long FFTs, not to be very demanding here, and leave this up to the short FFTs.

Let's see what the statistics says for regular/problem tracks.
I tried to merge your idea for short fft's and my previous -2. 12358 at 1024 samples doesn't make a huge difference (if any) to my sample set.

As an aside, I finished my 1496 track processing at -3: 16.8GB / 348kbps output from 40.8GB / 859kbps input
Title: lossyWAV Development
Post by: M on 2007-11-22 12:23:16
This may have already been addressed somewhere in the previous twenty-two pages (I have not read them all!)... but has anyone done a listening comparison to high-bitrate MP3 or AAC, so average users will have a plain-English frame of reference?

    - M.
Title: lossyWAV Development
Post by: halb27 on 2007-11-22 12:29:20
I tried to merge your idea for short fft's and my previous -2. 12358 at 1024 samples doesn't make a huge difference (if any) to my sample set.

Yor sample set is made up of problem tracks to a large extend. This is welcome information, but it is good to match it against the average bitrate of a regular track set. I use a 12 sample full regular track set for this purpose.
As an aside, I finished my 1496 track processing at -3: 16.8GB / 348kbps output from 40.8GB / 859kbps input

So pretty much the same as the 343 kbps average of my 12 sample set.
Title: lossyWAV Development
Post by: halb27 on 2007-11-22 12:45:13
This may have already been addressed somewhere in the previous twenty-two pages (I have not read them all!)... but has anyone done a listening comparison to high-bitrate MP3 or AAC, so average users will have a plain-English frame of reference?

    - M.

The usual thing: Everything is fine for high quality aac and mp3 users under nearly all circumstances.
For mp3 it's easy of course to find samples that show the superiority of lossyWav (try eig as a sample).

Roughly speaking lossyWav can be attractive to people who like lossless codecs but don't like file sizes ~2/3 that of the corresponding wav files which is immanent to losslessly encoding musical genres similar to rock music.

Using -1 it's for instance a disc space saving alternative to lossless archiving for people with a huge musical collection.
Using -3 it's for instance a high quality method for usage on a FLAC or wavPack enabled DAP.
Using -2 for instance allows for a universally usable compromise.

All acording to everybody's likings.

Sure usefulness only applies if the quality is really extremely good. Unfortunately we don't get a lot of listening feedback. Anyway all we know so far is that we can be very content with lossyWav's quality.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-22 12:52:02
This may have already been addressed somewhere in the previous twenty-two pages (I have not read them all!)... but has anyone done a listening comparison to high-bitrate MP3 or AAC, so average users will have a plain-English frame of reference?

    - M.
My ears aren't good enough...... Also, I don't have access to online storage (at all) so, I couldn't host one. If anyone else wishes to host a listening test, I would be only too happy to assist in the preparatory work.

Nick.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-22 13:33:28
Yor sample set is made up of problem tracks to a large extend. This is welcome information, but it is good to match it against the average bitrate of a regular track set. I use a 12 sample full regular track set for this purpose.
I'm currently processing a selection of my 1496 track set.

Oh, one thing: when I refer to lossyFLAC I am referring to lossyWAV preprocessed WAV encoded to FLAC -8. Similarly, lossyTAK, lossyWAVPACK (as opposed to WAVPACK lossy ), etc.

Artist - Album / FLAC / lossyFLAC -2 / lossyFLAC-3;

Code: [Select]
AC/DC - Dirty Deeds Done Dirt Cheap    / 781kbps / 398kbps / 331kbps
B52's - Good Stuff                     / 993kbps / 408kbps / 361kbps
David Byrne - Uh-Oh                    / 937kbps / 398kbps / 344kbps
Fish - Songs From The Mirror           / 854kbps / 384kbps / 335kbps
Gerry Rafferty - City To City          / 802kbps / 400kbps / 338kbps
Iron Maiden - Can I Play With Madness  / 784kbps / 422kbps / 370kbps
Jean Michel Jarre - Oxygene            / 773kbps / 454kbps / 372kbps
Marillion - The Thieving Magpie        / 790kbps / 404kbps / 344kbps
Mike Oldfield - Tr3s Lunas             / 848kbps / 421kbps / 364kbps
Scorpions - Best Of Rockers N' Ballads / 922kbps / 421kbps / 353kbps
[/font]

So, overall an average of 850kbps / 410kbps / 350kbps
Title: lossyWAV Development
Post by: halb27 on 2007-11-22 14:18:30
So, overall an average of 850kbps / 410kbps / 350kbps

Is the 410 kbps result for your new -2 proposal or for the current default setting?
With the -2 default I got 430 kbps (with v0.4.7) and my sample set is expected to yield a slightly lower bitrate than your set as can be seen for -3. Or was there a change for the -2 default since v0.4.7?
Title: lossyWAV Development
Post by: halb27 on 2007-11-22 18:26:13
Your -2 setting of -fft 10101 -snr 21 -skew 36 -nts 1.5 -spf 22224-22235-22336-12347-12358
yields 404/539 kbps on average with my regular/problem set.
As it is a bit more conservative than my setting I like this result and agree with this setting.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-22 18:44:55
Your -2 setting of -fft 10101 -snr 21 -skew 36 -nts 1.5 -spf 22224-22235-22336-12347-12358
yields 404/539 kbps on average with my regular/problem set.
As it is a bit more conservative than my setting I like this result and agree with this setting.
That's great. With that in mind, I've been playing with Josef Pohm's excel morphing..... See attached. Assigning the "corners" creates some quite reasonable numbers. Attached.
Title: lossyWAV Development
Post by: halb27 on 2007-11-22 18:50:07
Finally I managed to do my listening test for current -3 with my problem sample set as well as utb.flac  I once had problems with.
Everything is fine, and I will use this setting for my DAP.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-22 20:28:08
Finally I managed to do my listening test for current -3 with my problem sample set as well as utb.flac  I once had problems with.
Everything is fine, and I will use this setting for my DAP.
Great, -2 & -3 settings now fixed (subject to usual caveat that if anyone hears any artifacts, please let us know).

lossyWAV alpha v0.5.0 attached: Superseded.[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV alpha v0.5.0 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-1            extreme quality [5xFFT] (-cbs 512 -nts -3.0 -skew 36 -snr 21
              -spf 12224-12225-12225-11226-11236 -fft 11111)
-2            default quality [4xFFT] (-cbs 512 -nts +1.5 -skew 36 -snr 21
              -spf 22224-22235-22346-12347-12358 -fft 10101)
-3            compact quality [2xFFT] (-cbs 512 -nts +6.0 -skew 36 -snr 21
              -spf 22235-22236-22347-22358-2246C -fft 10001)

-o <folder>   destination folder for the output file
-nts <n>      set noise_threshold_shift to n dB (-18.0dB<=n<=+6.0dB)
              (-ve values reduce bits to remove, +ve values increase)
-force        forcibly over-write output file if it exists; default=off

Codec Options:

-wmalsl       optimise internal settings for WMA Lossless codec; default=off

Advanced / System Options:

-snr <n>      set minimum average signal to added noise ratio to n dB;
              (0.0dB<=n<=48.0dB) Increasing value reduces bits to remove.
-skew <n>     skew fft analysis results by n dB (0.0db<=n<=48.0db) in the
              frequency range 20Hz to 3.45kHz
-cbs <n>      set codec block size to n samples (512<=n<=4608, n mod 32=0)
-fft <5xbin>  select fft lengths to use in analysis, using binary switching,
              from 64, 128, 256, 512 & 1024 samples, e.g. 01001 = 128,1024
-overlap      enable conservative fft overlap method; default=off

-spf <5x5hex> manually input the 5 spreading functions as 5 x 5 characters;
              These correspond to FFTs of 64, 128, 256, 512 & 1024 samples;
              e.g. 22235-22236-22347-22358-2246C (Characters must be one of
              1 to 9 and A to F (zero excluded).
-allowable    select allowable number of clipping samples per codec block
              before iterative clipping reduction; (0<=n<=64, default=0).

-window       select windowing function n (0<=n<=6, default=0); 0=Hanning
              1=Bartlett-Hann; 2=Blackman; 3=Nuttall; 4=Blackman-Harris;
              5=Blackman-Nuttall; 6=Flat-Top.
-clipping     disable clipping prevention by iteration; default=off
-dither       dither output using triangular dither; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]Once -1 settings are fixed, then I'll remove excess options and we're probably ready to go beta.
Title: lossyWAV Development
Post by: GeSomeone on 2007-11-23 16:29:47
Hi, regarding my remark some pages back (http://www.hydrogenaudio.org/forums/index.php?act=ST&f=35&t=56129&st=250#) about noise when encoding silence with lossyFlac. I updated my post as I think I know what must have happened. It had nothing to do with lossyWav.

dithering from the foobar2000 converter must have been to blame. Even though it was set to "only dither lossy sources" it seemed to have kicked in somewhere. A new test with setting to "Never Dither" (and latest lossyWav) was as expected. No more noise.

I liked to clear that up. 

the result (that doesn't matter IMO):
silence.flac bit rate 3
silence.lossy.flac bit rate 12
Title: lossyWAV Development
Post by: Nick.C on 2007-11-23 19:28:57
Hi, regarding my remark some pages back (http://www.hydrogenaudio.org/forums/index.php?act=ST&f=35&t=56129&st=250#) about noise when encoding silence with lossyFlac. I updated my post as I think I know what must have happened. It had nothing to do with lossyWav.

dithering from the foobar2000 converter must have been to blame. Even though it was set to "only dither lossy sources" it seemed to have kicked in somewhere. A new test with setting to "Never Dither" (and latest lossyWav) was as expected. No more noise.

I liked to clear that up. 

the result (that doesn't matter IMO):
silence.flac bit rate 3
silence.lossy.flac bit rate 12
Thanks for the clarification - I think that it just reinforces the opinion that dither should not be used, as David said some time ago (along with amplitude reduction, although that is now not required due to the iterative clipping prevention).

[edit] Oh, and I think that the four-fold increase in bitrate is due to FLAC encoding at -b 512 rather than default of -b 4096 (I think). [/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-11-25 09:06:24
For -1:

My proposal some posts ago was

-1 -skew 40 -snr 21 -fft 11011 -spf 22224-22225-11235-11246-12358 -nts -1
which yields 452/576 kbps for my regular/problematic set.

Now that we want to have -skew 36 -snr 21 throughout the quality levels I suggest we trade -skew 40/36 for
-nts -1/-2.

So I suggest we use

-1 -skew 36 -snr 21 -fft 11011 -spf 22224-22225-11235-11246-12358 -nts -2
Title: lossyWAV Development
Post by: Nick.C on 2007-11-25 09:49:50
For -1:

My proposal some posts ago was

-1 -skew 40 -snr 21 -fft 11011 -spf 22224-22225-11235-11246-12358 -nts -1
which yields 452/576 kbps for my regular/problematic set.

Now that we want to have -skew 36 -snr 21 throughout the quality levels I suggest we trade -skew 40/36 for
-nts -1/-2.

So I suggest we use

-1 -skew 36 -snr 21 -fft 11011 -spf 22224-22225-11235-11246-12358 -nts -2
Done, -1 quality settings fixed. Will post as alpha (beta?) v0.5.1 tonight.
Title: lossyWAV Development
Post by: halb27 on 2007-11-25 10:41:32
IMO it's alright giving beta state to lossyWAV.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-26 10:05:22
lossyWAV beta v0.5.1 attached. Superseded.

-1 quality settings : -skew 36 -snr 21 -fft 11011 -spf 22224-22225-11235-11246-12358 -nts -2;
Code: [Select]
lossyWAV beta v0.5.1 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-1            extreme quality [4xFFT] (-cbs 512 -nts -2.0 -skew 36 -snr 21
              -spf 22224-22225-11235-11246-12358 -fft 11011)
-2            default quality [3xFFT] (-cbs 512 -nts +1.5 -skew 36 -snr 21
              -spf 22224-22235-22346-12347-12358 -fft 10101)
-3            compact quality [2xFFT] (-cbs 512 -nts +6.0 -skew 36 -snr 21
              -spf 22235-22236-22347-22358-2246C -fft 10001)

-o <folder>   destination folder for the output file
-nts <n>      set noise_threshold_shift to n dB (-18.0dB<=n<=+6.0dB)
              (-ve values reduce bits to remove, +ve values increase)
-force        forcibly over-write output file if it exists; default=off

Codec Options:

-wmalsl       optimise internal settings for WMA Lossless codec; default=off

Advanced / System Options:

-snr <n>      set minimum average signal to added noise ratio to n dB;
              (0.0dB<=n<=48.0dB) Increasing value reduces bits to remove.
-skew <n>     skew fft analysis results by n dB (0.0db<=n<=48.0db) in the
              frequency range 20Hz to 3.45kHz
-cbs <n>      set codec block size to n samples (512<=n<=4608, n mod 32=0)
-fft <5xbin>  select fft lengths to use in analysis, using binary switching,
              from 64, 128, 256, 512 & 1024 samples, e.g. 01001 = 128,1024
-overlap      enable conservative fft overlap method; default=off

-spf <5x5hex> manually input the 5 spreading functions as 5 x 5 characters;
              These correspond to FFTs of 64, 128, 256, 512 & 1024 samples;
              e.g. 22235-22236-22347-22358-2246C (Characters must be one of
              1 to 9 and A to F (zero excluded).
-allowable    select allowable number of clipping samples per codec block
              before iterative clipping reduction; (0<=n<=64, default=0).

-window       select windowing function n (0<=n<=6, default=0); 0=Hanning
              1=Bartlett-Hann; 2=Blackman; 3=Nuttall; 4=Blackman-Harris;
              5=Blackman-Nuttall; 6=Flat-Top.
-clipping     disable clipping prevention by iteration; default=off
-dither       dither output using triangular dither; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

David Robinson for the method itself and motivation to implement it in Delphi.
Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.

[edit] For Foobar2000 converter settings, see wiki article. [/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-11-26 11:25:42
Thank you, Nick.
Title: lossyWAV Development
Post by: GeSomeone on 2007-11-26 16:16:20
IMO it's alright giving beta state to lossyWAV.

The program is stable enough I would think too, but a little "How to use blurb" should be added when you go Beta, (maybe the example foobar script + screenshot too). The info is in this thread, but you can't expect a test user to go through all that.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-26 16:20:21
IMO it's alright giving beta state to lossyWAV.
The program is stable enough I would think too, but a little "How to use blurb" should be added when you go Beta, (maybe the example foobar script + screenshot too). The info is in this thread, but you can't expect a test user to go through all that.
Yes, that would be a sensible idea. A guide and a licence document have to be the next items on the agenda.
Title: lossyWAV Development
Post by: jesseg on 2007-11-26 22:59:33
new lFLCDrop version...
Quote
lFLC.bat Change Log:
v1.0.0.1
- added line to delete all wavs from temp directory before encoding

this should get rid of problems caused by closing the batch window in the middle of encoding.

It still works fine with lossyWAV as of v0.5.1.22 Beta. 


re: the batch file for EAC, what I'm probably going to end up doing is creating an app to handle it.  No gui or commandline is really needed, other than for a critical error message I guess, and most people have EAC setup to run the encoders hidden anyways.  This will allow me to do some more fancy stuff than a batch allows for, and it should be able to handle unicode in the tagging (assuming EAC and tag.exe can handle it, I haven't had time to look into it yet)

[edit] removed, newer version posted later in the thread [/edit]
Title: lossyWAV Development
Post by: Axon on 2007-11-27 06:53:22
Forgive me for asking a fundamental (and admittedly critical) question; I'm very late to this particular party. Before I start, I must say this idea (and all the work that has gone into it) is incredible, and I would not hesitate to use it once the kinks are ironed out. From the original post by 2BDecided:
Quote
This isn't about psychoacoustics. What you can or can't hear doesn't come into it. Instead, you perform a spectrum analysis of the signal, note what the lowest spectrum level is, and throw away everything below it. (If this seems a little harsh, you can throw in an offset to this calculation, e.g. -6dB to make it more careful, or +6dB to make it more aggressive!).
I don't see a use for a quasi-lossy bitrate reduction of a lossless format, if the reduction is known to produce artifacts in reasonable configurations. If people are able to ABX this in a wide variety of different modes, that doesn't give me much confidence in using lossyWAV at all, no matter what the settings. If I can deal with a probabilistic chance of artifact audibility, why not stay with lossy?

This doesn't seem like the sort of algorithm that lends itself to tuning. If the technique is independent of psychoacoustics, then the only advanced setting that ought to exist is -skew.

Is that too harsh? Perhaps I'm being overly critical on beta code?
Title: lossyWAV Development
Post by: halb27 on 2007-11-27 08:48:08
I don't see a use for a quasi-lossy bitrate reduction of a lossless format, if the reduction is known to produce artifacts in reasonable configurations. If people are able to ABX this in a wide variety of different modes, that doesn't give me much confidence in using lossyWAV at all, no matter what the settings. If I can deal with a probabilistic chance of artifact audibility, why not stay with lossy?

a) We have a system of lossyWAV + a lossless codec which makes up for a lossy codec.
So you do lossy encoding when using lossyWAV, and the good and bad of the procedure must be measured against that of other lossy codecs. Which is a very subjective thing of course when comparing lossy codecs of very good quality.
b) AFAIK nobody ever experienced an artifact even with our lowest quality mode -3. Quality was extremely good from the very start. You're welcome to do some listening tests and report about it.
This doesn't seem like the sort of algorithm that lends itself to tuning.

??? For a very long period we had great quality but at a bitrate of ~500 kbps on average. But we've investigated and optimized David Bryant's idea of doing the averaging of the FFT outcome according to the length of the critical bands, and we differentiate on doing this depending on FFT length. We've optimized the -skew parameter where a rather high -skew value does an extremely good job at differentiating between spots in the music which have to be handled defensively or not. We've introduced the -snr parameter which adds benefits for the differentiation work of -skew. We've found a solution to the theoretical clipping issue. We've improved the way the FFT analyses covers the lossyWAV blocks for security reasons.
So we ended up with an average bitratre of ~350 kbps for -3 with not the least quality issue known. -2 and -1 IMO provide for adequately varying internals to make it promising for the cautious minded of the various kind.
As a consequence IMO the only really useful option apart from the quality parameter is -nts. I personally however wouldn't mind if the advanced options are kept even in the final release if they are clearly marked as such (maybe hidden in the commandline help, but documented in the external documentation).
Title: lossyWAV Development
Post by: Nick.C on 2007-11-27 10:00:11
Forgive me for asking a fundamental (and admittedly critical) question; I'm very late to this particular party. Before I start, I must say this idea (and all the work that has gone into it) is incredible, and I would not hesitate to use it once the kinks are ironed out. From the original post by 2BDecided:
Quote
This isn't about psychoacoustics. What you can or can't hear doesn't come into it. Instead, you perform a spectrum analysis of the signal, note what the lowest spectrum level is, and throw away everything below it. (If this seems a little harsh, you can throw in an offset to this calculation, e.g. -6dB to make it more careful, or +6dB to make it more aggressive!).
I don't see a use for a quasi-lossy bitrate reduction of a lossless format, if the reduction is known to produce artifacts in reasonable configurations. If people are able to ABX this in a wide variety of different modes, that doesn't give me much confidence in using lossyWAV at all, no matter what the settings. If I can deal with a probabilistic chance of artifact audibility, why not stay with lossy?

This doesn't seem like the sort of algorithm that lends itself to tuning. If the technique is independent of psychoacoustics, then the only advanced setting that ought to exist is -skew.

Is that too harsh? Perhaps I'm being overly critical on beta code?
The beta nature only really reflects the status of the code with respect to bug reports which will (probably) come in. This method / pre-processor was initially intended to allow the benefits of a lossy codec to be "wrapped" in a lossless codec. The method is David's, Halb27 and I have only implemented it in Delphi and added a few tweaks along the way.

At various points along the way, people have assisted with setting determination through personal ABX'ing of particularly problematic samples (Big thanks to Halb27, Shadowking, Wombat & Gurubooleez). Valued input has been made by 2Bdecided, Bryant, TBeck, Mitch 1 2, Josef Pohm, SebastianG, user, collector, Dynamic, GeSomeone, Robert, verbajim, [JAZ], BGonz808, M & Jesseq.

At the present time I don't think that the method is "known" to produce any artifacts with default settings (however if anyone can tell me differently, I would be very appreciative of the particular sample to try and iron it out).

Yes there have been very few individuals involved in ABX'ing / settings development, but I take it that that just means that this is a niche program only wanted by a few people.

From a purely personal perspective, I have found the drive to develop it through feedback from those who have made comments along the way and from a desire to use lossyFLAC on my iPAQ (GSPlayer v2.25 & GSPFlac.DLL)

In keeping with David's wishes, the only command line options in the final revision will be quality levels -1,-2 & -3 and the -nts parameter (unless, as Halb27 has indicated we leave the advanced options in the code but don't "advertise" them outside of the accompanying PDF / TXT file.

Why don't you give it a try? It's certainly robust enough to handle a Foobar2000 transcode of about 1500 files without falling over (the largest of which was circa 60 minutes).
Title: lossyWAV Development
Post by: Synthetic Soul on 2007-11-27 11:02:40
Yes there have been very few individuals involved in ABX'ing / settings development, but I take it that that just means that this is a niche program only wanted by a few people.
Personally, I have been following this thread avidly from the start, but I lack the ears to be testing very high quality lossy audio, or the expertise to offer technical advise or cause debate.

This gives me an opportunity to thank you all though for the work that you have put in.  I think this is an extremely exciting development.

I think Axon's question was well worth the ask:  much of the discussion in this thread is - to complete laymans like myself - of a complex technical nature.  Given that lossyWAV sits somewhere between high quality 'psychoacoustic' lossy and lossless quality, it is necessary to explain to the general masses what users can expect from this process.

Personally I have been considering a Wavpack lossy backup of my music for a while.  It is possible that using lossyWAV as a pre-processor may be more suited to my needs (or whims).

Also, I cannot simply 'give it a try'.  I am highly unlikely to find an issue.  What I need to know is that people with excellent ears and technical knowledge can assure me that this process will create a near-perfect archive from which I can safely transcode to lossy for use on my DAP, or car stereo.

After re-reading my post I think I've just realised why we're called 'users'.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-27 11:58:22
Yes there have been very few individuals involved in ABX'ing / settings development, but I take it that that just means that this is a niche program only wanted by a few people.
Personally, I have been following this thread avidly from the start, but I lack the ears to be testing very high quality lossy audio, or the expertise to offer technical advise or cause debate.

This gives me an opportunity to thank you all though for the work that you have put in.  I think this is an extremely exciting development.

I think Axon's question was well worth the ask:  much of the discussion in this thread is - to complete laymans like myself - of a complex technical nature.  Given that lossyWAV sits somewhere between high quality 'psychoacoustic' lossy and lossless quality, it is necessary to explain to the general masses what users can expect from this process.

Personally I have been considering a Wavpack lossy backup of my music for a while.  It is possible that using lossyWAV as a pre-processor may be more suited to my needs (or whims).

Also, I cannot simply 'give it a try'.  I am highly unlikely to find an issue.  What I need to know is that people with excellent ears and technical knowledge can assure me that this process will create a near-perfect archive from which I can safely transcode to lossy for use on my DAP, or car stereo.

After re-reading my post I think I've just realised why we're called 'users'.
Thanks are always appreciated.

I totally agree that the question is valid and requires an answer. Technically, I am not really the person to answer it, just the programmer.

Also, I will be using my lossyFLAC collection in tandem with my FLAC collection rather than replacing the latter with the former, essentially, lossyFLAC is my lossy transcode.

Until more ears have validated the current quality level settings, we're not going to be in the position to reassure new users of the quality of the output.
Title: lossyWAV Development
Post by: halb27 on 2007-11-27 12:21:29
Personally I have been considering a Wavpack lossy backup of my music for a while.  It is possible that using lossyWAV as a pre-processor may be more suited to my needs (or whims).

Also, I cannot simply 'give it a try'.  I am highly unlikely to find an issue.  What I need to know is that people with excellent ears and technical knowledge can assure me that this process will create a near-perfect archive from which I can safely transcode to lossy for use on my DAP, or car stereo.

After re-reading my post I think I've just realised why we're called 'users'.

The more I'm into audio compression the more I think it's upto personal decisions (and personal a priori preferences) what codec and setting to use. Objective findings always have a limited scope.
My personal key event was the 128 kbps listening test of Lame 3.97 where Lame came out more or less on par with codecs like Vorbis. I have no doubt this test was done with great care, but I personally would never use 3.97 at a bitrate of 128 kbps (due to the 'sandpaper' noise and similar problems). Luckily 3.98 has overcome these problems, and is still about to improve.

So it's true that more listening experience by especially well-respected ears is most welcome, but IMO it's not a sine qua non thing. Technical knowledge can't assure transparency anyway.

So in the end what IMO counts is that any experience tells that everything is fine so far (finally we do have public experience though we like to get more). And of course any potential user must like the idea of being close to lossless (from the technical view of the overall procedure which is not necessarily related to quality), and must not care about a bitrate of 350 kbps or higher. Otherwise he wouldn't use it.

As you have considered using wavPack lossy you don't care about extremely high bitrate, and you like the idea of being with a clean signalpath associated with going a near-lossless way, cause otherwise you would use very hiqh quality Vorbis or similar. Using lossyWAV you're more or less in the same situation as if you used wavPack lossy. We can expect wavPack lossy high mode at 400 kbps using dynamic noise shaping giving transparent results in nearly any situation and non-annoying results even on the hardest stuff, and all this without a real quality control so far. With lossyWAV the situation is the same (hopefully even better due to the existing quality control which can be said to have proved being effective).

The main problem with very high quality codecs is: while it's easy to prove the codec has an issue by giving a sample, it's impossible to prove a codec is transparent in a universal sense. So in the end the most adequate attitude IMO is once very high quality is assured at least in a basic sense: don't care as long as no counterexamples are given.
Title: lossyWAV Development
Post by: Synthetic Soul on 2007-11-27 13:10:28
Thank you both for your responses.

Technical knowledge can't assure transparency anyway.
If it's technical knowledge of a lossless operation then it can.

The techniques that are being used in lossyWAV are complete gibberish to me.  In my limited understanding though, what was originally proposed was the removal of near-useless bits from the WAVE, to make mor efficient use of basic compression routines within the encoders (e.g.: FLAC's wasted_bits).  You speak below of "a clean signalpath": this is really what I am discussing.  If someone with a technical knowledge of the algorithms used can assure users that the resulting signal has merely had some negligable information removed with no further processing then that to me would suggest that there was less room for a bug in the algorithm, or that the decision making process was more simple and therefore less prone to erratic behaviour.  I don't think I'm making myself clear.

As you have considered using wavPack lossy you don't care about extremely high bitrate, and you like the idea of being with a clean signalpath associated with going a near-lossless way, cause otherwise you would use very hiqh quality Vorbis or similar. Using lossyWAV you're more or less in the same situation as if you used wavPack lossy. We can expect wavPack lossy high mode at 400 kbps using dynamic noise shaping giving transparent results in nearly any situation and non-annoying results even on the hardest stuff, and all this without a real quality control so far. With lossyWAV the situation is the same (hopefully even better due to the existing quality control which can be said to have proved being effective).
Exactly.

The main problem with very high quality codecs is: while it's easy to prove the codec has an issue by giving a sample, it's impossible to prove a codec is transparent in a universal sense. So in the end the most adequate attitude IMO is once very high quality is assured at least in a basic sense: don't care as long as no counterexamples are given.
Agreed.  And, of course, such claims are will be taken with a pinch of salt until a lot of testing has been undertaken.  And, of course, testing high quality encodes is not easy.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-27 13:18:03
If someone with a technical knowledge of the algorithms used can assure users that the resulting signal has merely had some negligable information removed with no further processing then that to me would suggest that there was less room for a bug in the algorithm, or that the decision making process was more simple and therefore less prone to erratic behaviour.  I don't think I'm making myself clear.
As I have an implicit knowledge of the workings of the 3 main procedures involved in the process (having transcoded them from Matlab > Delphi > IA-32 Assembler) I will work on a process flow explanation.
Title: lossyWAV Development
Post by: Synthetic Soul on 2007-11-27 14:08:42
As I have an implicit knowledge of the workings of the 3 main procedures involved in the process (having transcoded them from Matlab > Delphi > IA-32 Assembler) I will work on a process flow explanation.
I would be very interested to read a non-technical explanation of the processes involved; however I feel awful for increasing your workload.

Please only do so if you believe that it will be necessary for other users to make the decision also.

Thanks again.
Title: lossyWAV Development
Post by: halb27 on 2007-11-27 14:15:09
... If someone with a technical knowledge of the algorithms used can assure users that the resulting signal has merely had some negligable information removed with no further processing then that to me would suggest that there was less room for a bug in the algorithm, or that the decision making process was more simple and therefore less prone to erratic behaviour.  ...

Yes, that's what makes the procedure attractive to me too though I'm afraid we won't get a kind of security from the mere process itself.
I can try to describe the procedure from my understanding which isn't perfect at all:

As you write the basic idea is to form (now) 512 sample blocks and decide for each block how many of the least significant bits not to use (set to 0). Lossless codecs like FLAC can make use of the reduced number of bits per sample in these blocks, and in order to be effective the block size of the lossless codec should be identical to the lossyWAV block size (or an integer multiple of it in case the lossless codec works more efficient in an overall sense with longer blocks). FLAC works fine with a blocksize of 512.

The usual 16 bit accuracy of wave samples is necessary mainly to give a good accuracy to low volume spots in the music and allow for a good dynamic range. At moderate to low volume spots far less than 16 bits are used for signal representation (that's why lossless codecs yield a good compression ratio in these cases). At high volume spots not the entire 16 bits are needed usually. Roughly speaking a certain number of rather high value bits are needed for loud spots (while the lower value bits can be zero), and a certain number of low value bits are needed for quieter music (and the high value bits are zero). That's the main background of the method. We care about the louder spots and reduce accuracy of representation here.
Dropping a certain amount of least significantly bits means adding noise to the original. This added noise is not necessarily perceived as the kind of analog noise/hiss known from for instance tape recordings.

So the main thing is to decide on how many least significant bits to drop. From a bird's view the frequency spectrum of the 512 sample block is calculated and the frequency region with the lowest energy is searched. The idea is to preserve this energy, don't let it get drowned in the added noise, and this done by keeeping sample accuracy high enough by looking up this minimum energy level in a table that tells how many bits are possible to remove depending on energy level and frequency. The table was found a priori by examining white noise behavior with respect to our purposes.

The real process is a bit more complicated letting several FFTs do the frequency spectrum analysis according to what they're best at: short FFTs responding to quickly changing signals but with a very restricted resolution at low to medium frequencies, and long FFTs giving good frequency resolution but not responding very quickly. Nick.C has done a good job in letting the FFTs cover the lossyWAV blocks very accurately - more than was done originally.
Moreover in order not to have to keep up high accuracy due to pure hazard, a certain averaging is done over the outcome of the FFT analyses. A lot of tuning has been done on this in order to achieve good quality and relatively low bitrate.
A huge sensitivity bias is given to the low to medium frequency range by using the -skew and -snr options. This is done in analogy to the fact that the usual transform codecs give priority to the accurate representation of low to medium frequencies. The improvement in quality control by using -skew is so strong that we have decided that a noise threshold of +6 is sufficient for -3 (in the a priori theory -nts should be 0).
For -2 we also default to the slightly positive -nts 2, and only with -1 we use a defensive -2. Other than that that the different quality levels differ for the main part in how they do the FFT analyses. With -3 we use 2 different FFT lengths for each block, -2 uses 3 different FFT lengths, and it's a total of 4 FFT lengths for -1. Moreover the averaging of the FFT results is done in an increasingly defensive way when going from -3 to -1.

After having decided about how many least significant bits to remove (set to 0) the samples of the lossyWAV block are rounded to the corresponding values. This rounding can lead to clipping, but we have found a solution to avoid it (by simply dropping less bits in the block so long until no clipping occurs).

Hope that helps.
Title: lossyWAV Development
Post by: Mitch 1 2 on 2007-11-27 14:28:13
I've started a new wiki article here (http://wiki.hydrogenaudio.org/index.php?title=LossyWAV). The article is incomplete and probably inaccurate. It is also in need of a "technical details" section, possibly along the lines of what you posted above.
Title: lossyWAV Development
Post by: halb27 on 2007-11-27 14:35:12
I've started a new wiki article here (http://wiki.hydrogenaudio.org/index.php?title=LossyWAV). The article is incomplete and probably inaccurate. It is also in need of a "technical details" section, possibly along the lines of what you posted above.

Wonderful idea, good job.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-27 15:22:09
.....Nick.C has done a good job in letting the FFTs cover the lossyWAV blocks very accurately - more than was done originally.....
I don't think that that is the case, the original method overlaps the ends of the codec_block by half an fft_length and overlaps fft's by half an fft_length. The -overlap parameter overlaps the ends by half an fft_length and overlaps fft's by 5/8 of an fft_length.
Title: lossyWAV Development
Post by: Synthetic Soul on 2007-11-27 15:36:25
I can try to describe the procedure from my understanding which isn't perfect at all:
...
Hope that helps.
Yes.  Thank you for your time.  I'm slowly getting there.

I'm not sure if you can answer this, and it may be better left for the documentation, but I am left wondering between the differences of -1, -2 and -3.  Is -3 thought to be transparent in all known situations now?  The obvious next question being: so why bother with -2 and -3?

I guess the same could be said with LAME -V0 and 320kbps CBR, but I'm expecting lossyWAV to have less of a grey area.

Personally, I'd like to hope that -2 (as default) is 'considered transparent until a problem sample can be found', -3 is overkill for the more paranoid amongst us, and -1 introduces a slight amount of risk.  Apologies if the description of these presets has been discussed recenty elsewhere.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-27 15:42:13
I can try to describe the procedure from my understanding which isn't perfect at all:
...
Hope that helps.
Yes.  Thank you for your time.  I'm slowly getting there.

I'm not sure if you can answer this, and it may be better left for the documentation, but I am left wondering between the differences of -1, -2 and -3.  Is -3 thought to be transparent in all known situations now?  The obvious next question being: so why bother with -2 and -3?

I guess the same could be said with LAME -V0 and 320kbps CBR, but I'm expecting lossyWAV to have less of a grey area.

Personally, I'd like to hope that -2 (as default) is 'considered transparent until a problem sample can be found', -3 is overkill for the more paranoid amongst us, and -1 introduces a slight amount of risk.  Apologies if the description of these presets has been discussed recenty elsewhere.
Exactly those last descriptions, but in reverse order: -1 = overkill; -2 = what you said; -3 = (may, although not yet proven) introduce a slight amount of risk.

The reason for -1 is that you may want to do other things with the output of lossyWAV; -2 is considered to be a very robust intermediate between -1 and -3; -3 is the "I want a lower bitrate and I want "acceptable" (rather than transparent) output" setting, which at the moment is better than its target.

My view of the process:

Read WAV header from input file;
Write WAV header to output file;

Create reference_threshold tables for each fft_length for each bits_to_remove (1 to 32) - not required as precalculated data is used to re-create the surface for each window / dither combination (yes, it changes with both..... ) - This calculates the mean fft output from the analysis of the difference between the random noise signal and its bit_removed compatriot;

Create threshold_indices from selected reference_threshold table (window / dither combo) - basically, determine how many bits_to_remove for a given minimum dB value;

Read WAV data in a codec_block_size chunk (all channels at once) and for each channel:

Carry out FFT analyses (3 for 1024 sample fft on 512 codec_block_size up to 33 for a 64 sample fft on 512 codec_block_size) on each channel of the codec_block, for each fft_analysis:

  Calculate magnitudes of FFT output (from complex number);

  Skew magnitudes (currently -36dB at 20Hz to 0dB at 3545Hz, following a 1-sin(angle) curve where angle is the proportion of 1 given by (log(this_bin_frequency)-log(min_bin_frequency))/(log(max_bin_frequency)-log(min_bin_frequency)))    by the relevant amount;

  Spread skewed magnitudes using the relevant spreading function (e.g. 23358-...... means average 2 bins in the first zone, 3 in the second and third zones, 5 in the fourth zone and 8 in the fifth zone), retaining the minimum value and the average value of the skewed results;

  minimum_threshold=floor(min(minimum_skewed_result+nts,average_skewed_result-snr));

  Look up Threshold_Index table for the relevant fft_length to determine bits to remove for that particular fft_analysis;

When all fft_analyses for a particular codec_block are complete, determine the minimum bits_to_remove value and use that to:

Remove_bits: For each sample in each channel of the codec_block bit_removed_sample:=round(sample/(2^bits_to_remove))*(2^bits_to_remove). If in the remove_bits process a sample falls outwith the upper or lower bound then decrease bits_to_remove and start the remove_bits process again.

Write processed codec_block and repeat;

Close files and exit.
Title: lossyWAV Development
Post by: Synthetic Soul on 2007-11-27 16:20:27
Exactly those last descriptions, but in reverse order: -1 = overkill; -2 = what you said; -3 = (may, although not yet proven) introduce a slight amount of risk.
Excellent news.  I will have to spend some time reading your explanation as it, on a quick skim, still seems quite technical to me.  Perhaps, as I try to comprehend myself, I can suggest a n00b translation to your technical explanation, that may help to produce the final documentation?

Anyway, the reason I came to post again: WOW!

I have tested lossyWAV previously, but - given the frequency of releases - have really been waiting for it to get to beta before testing fully.

I have just used it and FLAC on my TAK corpus (http://www.synthetic-soul.co.uk/comparison/lossless/corpus.asp), and am astounded by the savings, using the default settings.

Code: [Select]
File  FLAC    lossyWAV+FLAC
===========================
00    1054    376
01    728    366
02    765    390
03    1013    413
04    883    425
05    860    469
06    1084    455
07    981    419
08    1052    399
09    873    393
10    1026    511
11    853    367
12    834    422
13    1016    435
14    954    403
15    867    390
16    1068    397
17    861    376
18    787    442
19    909    394
20    1142    400
21    760    384
22    1022    410
23    1030    394
24    917    433
25    914    384
26    810    401
27    878    354
28    1040    449
29    912    442
30    895    419
31    913    411
32    1010    402
33    1018    397
34    831    429
35    939    410
36    1038    402
37    1084    439
38    825    381
39    999    413
40    1007    408
41    1037    505
42    1054    408
43    897    418
44    839    364
45    924    425
46    898    431
47    890    398
48    1014    414
49    999    412
Bloody good work gentlemen!

I am under the impression that I can also use TAK and WavPack already.  I need to do some more reading to see, if anything, what I need to do to test these also.
Title: lossyWAV Development
Post by: halb27 on 2007-11-27 16:37:02
.....Nick.C has done a good job in letting the FFTs cover the lossyWAV blocks very accurately - more than was done originally.....
I don't think that that is the case, the original method overlaps the ends of the codec_block by half an fft_length and overlaps fft's by half an fft_length. The -overlap parameter overlaps the ends by half an fft_length and overlaps fft's by 5/8 of an fft_length.

Oops, I thought the new overlapping was done throughout. So without the -overlap option FFT overlapping is done as before and it takes the -overlap option to do the new overlapping (we discussed something like 8 pages back)?
Title: lossyWAV Development
Post by: Nick.C on 2007-11-27 16:39:44
.....Nick.C has done a good job in letting the FFTs cover the lossyWAV blocks very accurately - more than was done originally.....
I don't think that that is the case, the original method overlaps the ends of the codec_block by half an fft_length and overlaps fft's by half an fft_length. The -overlap parameter overlaps the ends by half an fft_length and overlaps fft's by 5/8 of an fft_length.
Oops, I thought the new overlapping was done throughout. So without the -overlap option FFT overlapping is done as before and it takes the -overlap option to do the new overlapping (we discussed something like 8 pages back)?
Yes, exactly - the new 5/8th fft_length overlapping system doesn't have me totally "sold" to make it the default, but it is still a selectable option.

@Synthetic Soul -  Glad you like it sir - now, does it bear listening to? Oh, and which quality level was that?

@Axon - Thanks for stimulating a very interesting series of posts!
Title: lossyWAV Development
Post by: Mitch 1 2 on 2007-11-27 16:55:45
Anyway, the reason I came to post again: WOW!

I have tested lossyWAV previously, but - given the frequency of releases - have really been waiting for it to get to beta before testing fully.

I have just used it and FLAC on my TAK corpus (http://www.synthetic-soul.co.uk/comparison/lossless/corpus.asp), and am astounded by the savings, using the default settings.
You ain't seen nothin' yet! You should try using lossyWAV -3 with FLAC -8 -b 512.
Title: lossyWAV Development
Post by: Synthetic Soul on 2007-11-27 17:17:47
@Synthetic Soul - smile.gif Glad you like it sir - now, does it bear listening to? Oh, and which quality level was that?
I've been casually listening to the files while testing, and of course can hear no discernable difference.  Default settings for both lossyWAV (-2) and FLAC (-5).

I will soon be posting results for WavPack and TAK defaults also.

@Axon - Thanks for stimulating a very interesting series of posts!
Indeed.  I've not felt it was the time to get involved before now, but I think it's now time for us more casual testers to show our interest.
Title: lossyWAV Development
Post by: Synthetic Soul on 2007-11-27 17:30:37
OK, here's my results for FLAC, WavPack and TAK on default settings

Code: [Select]
Encoder          |  Command
===================================================================
FLAC 1.2.1       |  flac -b 512 <source>
WavPack 4.42a2   |  wavpack --merge-blocks --blocksize=512 <source>
TAK 1.0.2 Final  |  takc -e -fsl512 <source>

===============================================================
File  |    FLAC    Lossy  |    WavPack Lossy  |    TAK    Lossy
Code: [Select]
00    |    1054    376    |    1048    367    |    1034    360
01    |    728    366    |    728    374    |    708    359
02    |    765    390    |    766    395    |    742    378
03    |    1013    413    |    1013    421    |    997    406
04    |    883    425    |    880    421    |    867    413
05    |    860    469    |    858    491    |    798    445
06    |    1084    455    |    1077    458    |    1071    447
07    |    981    419    |    976    418    |    955    410
08    |    1052    399    |    1046    395    |    1040    391
09    |    873    393    |    871    401    |    823    372
10    |    1026    511    |    1029    524    |    1011    504
11    |    853    367    |    853    374    |    827    355
12    |    834    422    |    832    429    |    811    414
13    |    1016    435    |    1010    435    |    1000    425
14    |    954    403    |    948    402    |    927    396
15    |    867    390    |    864    397    |    841    380
16    |    1068    397    |    1066    400    |    1059    393
17    |    861    376    |    860    382    |    829    365
18    |    787    442    |    783    440    |    774    431
19    |    909    394    |    907    393    |    879    382
20    |    1142    400    |    1140    396    |    1130    394
21    |    760    384    |    767    390    |    740    370
22    |    1022    410    |    1014    408    |    1004    400
23    |    1030    394    |    1025    391    |    1022    385
24    |    917    433    |    913    444    |    888    423
25    |    914    384    |    910    381    |    884    371
26    |    810    401    |    811    404    |    784    383
27    |    878    354    |    871    366    |    855    346
28    |    1040    449    |    1033    459    |    1019    443
29    |    912    442    |    911    444    |    877    421
30    |    895    419    |    889    431    |    843    403
31    |    913    411    |    914    415    |    874    389
32    |    1010    402    |    1003    401    |    992    393
33    |    1018    397    |    1009    398    |    994    387
34    |    831    429    |    859    457    |    793    411
35    |    939    410    |    940    417    |    908    395
36    |    1038    402    |    1032    399    |    1027    393
37    |    1084    439    |    1088    453    |    1071    430
38    |    825    381    |    829    392    |    796    367
39    |    999    413    |    993    408    |    986    399
40    |    1007    408    |    999    405    |    990    398
41    |    1037    505    |    1029    516    |    1012    497
42    |    1054    408    |    1046    403    |    1035    395
43    |    897    418    |    901    426    |    882    408
44    |    839    364    |    830    377    |    798    354
45    |    924    425    |    920    425    |    909    414
46    |    898    431    |    899    435    |    881    426
47    |    890    398    |    882    393    |    875    384
48    |    1014    414    |    1006    412    |    997    401
49    |    999    412    |    992    409    |    984    400
==============================================================
Avg  |    940    412    |    937    415    |    917    400
Title: lossyWAV Development
Post by: Josef Pohm on 2007-11-27 17:47:57
I've started a new wiki article here (http://wiki.hydrogenaudio.org/index.php?title=LossyWAV). The article is incomplete and probably inaccurate. It is also in need of a "technical details" section, possibly along the lines of what you posted above.

As your documentation reports which codecs support LossyWAV and which don't, the following is my experience about the missing ones.

MP4ALS and LPAC support LossyWAV very very well.

SHN should, but I didn't bother to actually check.

On the other side, unless I made some kind of mistake, in my tests APE, LA and ALAC didn't even show to be able to support wasted bits detection at all! OFR supports wasted bits but I can't see a way for it to use a 512 samples frame size (nor my OPINION is that OFR was designed to work with such a small frame size).
Title: lossyWAV Development
Post by: Synthetic Soul on 2007-11-27 17:54:19
You ain't seen nothin' yet! You should try using lossyWAV -3 with FLAC -8 -b 512.
Using -8 does little for my corpus by the looks of it.  I've only tested the first 25 files so far, but it only take the average bitrate from 933 to 930 for those files.

In fact, using lossyFLAC and encoding using -5 yields, on average, a file 43.90% the size of the standard FLAC, but with -8 it is merely 43.93% the size.

Edit: Sorry, in my haste to test I have forgotten that I'm still using lossyWAV files processed using -2.  Perhaps with -3 there is a more drastic improvement.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-27 19:44:38
I've started a new wiki article here (http://wiki.hydrogenaudio.org/index.php?title=LossyWAV). The article is incomplete and probably inaccurate. It is also in need of a "technical details" section, possibly along the lines of what you posted above.
As your documentation reports which codecs support LossyWAV and which don't, the following is my experience about the missing ones.

MP4ALS and LPAC support LossyWAV very very well.

SHN should, but I didn't bother to actually check.

On the other side, unless I made some kind of mistake, in my tests APE, LA and ALAC didn't even show to be able to support wasted bits detection at all! OFR supports wasted bits but I can't see a way for it to use a 512 samples frame size (nor my OPINION is that OFR was designed to work with such a small frame size).
As long as the target codec can work on a multiple of the lossyWAV codec_block_size, or use -cbs xxx to set the lossyWAV codec_block_size to the same as the target codec, or I get off my behind and implement a -ofr parameter to specify codec specific settings (as for WMALSL).

We may be early beta, but if anyone has any ideas as to improvements / additions / changes they might like to see then let me know you can pm me or e-mail me from here if you don't want to post publicly.

I am gratified to see that the code is quite robust as the error reports have dwindled.... <avalanche!>

Mitch 1 2 is doing a great job with the wiki article, I should get round to my bit of it.
Title: lossyWAV Development
Post by: Axon on 2007-11-27 20:33:04
Thanks for the excellent responses.

I think I may not have stated my concerns accurately or completely in my first post. I was certainly wrong to assume that artifacts have been found in recent -1 and -2 tests. But my beef isn't quite with the existence of artifacts, or that the bit reduction process is necessarily obscure (although halb's and Nick's posts did a lot to explain them). It's that the entire design process of the algorithm seems obscure, and clarifying it (and potentially formalising it) would go a long way to help explain to users exactly what this is good for.

2BDecided's original post seemed to imply that the transparency of bit reduction can be solely proved based on one psychoacoustic principle: spectral masking below a noise floor. This appears to be one of the more fundamental results of psychoacoustics, and fairly hard limits on audibility can be determined a priori to listening tests, based on the literature.

This is the biggest advantage lossyWAV may have compared to other lossy formats. Most lossy encoders exploit multiple psychoacoustic effects to reduce bitrate while maintaining transparency. If one effect is exploited too aggressively, out of several effects being exploited in parallel, transparency is lost and an artifact is audible. But lossyWAV, if it only relies upon spectral masking, has only one point of failure, and one that is very well understood. The quantization distortion should not induce artifacts under any other psychoacoustic effect. That's in incredibly strong selling point to convince people to use lossyWAV for many, many applications.

But the sheer number of tunings that have occured in the final product (regardless of whether or not they are eventually made available to the user) made me question how ironclad this advantage really is. It seems to me that the algorithm should be proven transparent a priori of any listening tests, based entirely on signal processing principles, and only very little psychoacoustic principles (based only on masking the quantization noise with the background noise). But instead, the settings seem like they are based primarily on listening tests. Those are a correct testing method for lossy codecs, but for an encoder this agonizingly close to being able to be formally verified? The tunings have the slight air of a sausage factory behind them. The end result is tasty, but the means to the end are rather unsavory.

Perhaps lossyWAV has simply evolved to use slightly more psychoacoustic phenomena than a simple theory of spectral masking. That appears to be the justification for -skew and the spreading functions. Certainly, a tight argument can be made for taking into account the width of the critical bands to adjust the sensitivity of low/high frequencies. But it still seems like the other options are pretty much pulled out of a hat.

What would be ideal is if each step of the algorithm is shown to follow logically from critical band masking theory, or from a small finite set of psychoacoustic effects, and to show that the algorithm is immune to artifacts from other effects.

Perhaps I'm talking out of line by asserting that an algorithm like this can be formally verified?
Title: lossyWAV Development
Post by: Nick.C on 2007-11-27 20:53:54
Thanks for the excellent responses.

I think I may not have stated my concerns accurately or completely in my first post. I was certainly wrong to assume that artifacts have been found in recent -1 and -2 tests. But my beef isn't quite with the existence of artifacts, or that the bit reduction process is necessarily obscure (although halb's and Nick's posts did a lot to explain them). It's that the entire design process of the algorithm seems obscure, and clarifying it (and potentially formalising it) would go a long way to help explain to users exactly what this is good for.

2BDecided's original post seemed to imply that the transparency of bit reduction can be solely proved based on one psychoacoustic principle: spectral masking below a noise floor. This appears to be one of the more fundamental results of psychoacoustics, and fairly hard limits on audibility can be determined a priori to listening tests, based on the literature.

This is the biggest advantage lossyWAV may have compared to other lossy formats. Most lossy encoders exploit multiple psychoacoustic effects to reduce bitrate while maintaining transparency. If one effect is exploited too aggressively, out of several effects being exploited in parallel, transparency is lost and an artifact is audible. But lossyWAV, if it only relies upon spectral masking, has only one point of failure, and one that is very well understood. The quantization distortion should not induce artifacts under any other psychoacoustic effect. That's in incredibly strong selling point to convince people to use lossyWAV for many, many applications.

But the sheer number of tunings that have occured in the final product (regardless of whether or not they are eventually made available to the user) made me question how ironclad this advantage really is. It seems to me that the algorithm should be proven transparent a priori of any listening tests, based entirely on signal processing principles, and only very little psychoacoustic principles (based only on masking the quantization noise with the background noise). But instead, the settings seem like they are based primarily on listening tests. Those are a correct testing method for lossy codecs, but for an encoder this agonizingly close to being able to be formally verified? The tunings have the slight air of a sausage factory behind them. The end result is tasty, but the means to the end are rather unsavory.

Perhaps lossyWAV has simply evolved to use slightly more psychoacoustic phenomena than a simple theory of spectral masking. That appears to be the justification for -skew and the spreading functions. Certainly, a tight argument can be made for taking into account the width of the critical bands to adjust the sensitivity of low/high frequencies. But it still seems like the other options are pretty much pulled out of a hat.

What would be ideal is if each step of the algorithm is shown to follow logically from critical band masking theory, or from a small finite set of psychoacoustic effects, and to show that the algorithm is immune to artifacts from other effects.

Perhaps I'm talking out of line by asserting that an algorithm like this can be formally verified?
Certainly not talking out of line, but beyond my limited knowledge, as I said - I'm just the programmer. The -skew and -spread (and -snr I suppose) functions and settings have certainly been arrived at heuristically. I've worked up beta v0.5.2 (attached) Superseded... to allow the original concept settings to be implemented using a -0 parameter (as closely as possible due to slight changes in the conv / spread combined function). Use -0 -clipping to emulate the original method settings, -0 -fft 10101 -clipping to emulate the three analysis version. -nts is the only other parameter available to you under the original method.

As to number of tunings, -fft, -nts, -snr, -skew and -spread are the only tunings used in the 3 default quality settings, others such as -clipping, -dither, -overlap, -window, -allowable are all defaulted to off.

I must stress that looking at the file sizes of the output of vanilla -0, I am fairly certain that artifacts will show in Atem_lied at the very least.

***** -0 is not a permanent quality setting, merely a response to a request. *****
Title: lossyWAV Development
Post by: halb27 on 2007-11-27 21:17:20
It's true some heuristics were introduced, especially spreading and skewing - spreading from the very start. Without these heuristics the method may have a better justification, but it comes at the price of a seriously increased bitrate.
With the advanced options everybody who wants to can get rid of the heuristics: -skew 0 -snr 0 -fft 10101 -spf 11111-11111-11111-11111-11111 -nts 0 for instance when using a 64, 256, and 1024 sample FFT.
I personally love the reduced bitrate given by spreading and skewing, and I feel secure enough with it according to experience.

I agree however that this gives rise to the question whether we should readjust the quality levels. Maybe -1 should go to Axon's pure method, and maybe -2 should be a mixture of current -2 and -1, for instance the FFT usage like that of -1 (maybe dropping the 128 sample FFT), but with an -nts value of 2.
I personally would agree with such a solution.

ADDED:
I just saw your new beta, Nick. So I see -snr should be negative to the limit for avoiding the skewing/snr heuristics. Spreading length should be 1 however IMO to avoid the spreading heuristics. The constant spreading of 4 was just 2Bdecided's spreading heuristics at his start up as far as I can see it. There's no reason IMO to use a blocksize of 1024. 2Bdecided just used a 1024 sample block size when he started things.
Of course not averaging FFT outcome at all is fine in a pure sense but is suspected to be a huge overkill especially in the high frequency range bringing bitrate up.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-27 21:52:44
It's true some heuristics were introduced, especially spreading and skewing - spreading from the very start. Without these heuristics the method may have a better justification, but it comes at the price of a seriously increased bitrate.
With the advanced options everybody who wants to can get rid of the heuristics: -skew 0 -snr 0 -fft 10101 -spf 11111-11111-11111-11111-11111 -nts 0 for instance when using a 64, 256, and 1024 sample FFT.
I personally love the reduced bitrate given by spreading and skewing, and I feel secure enough with it according to experience.

I agree however that this gives rise to the question whether we should readjust the quality levels. Maybe -1 should go to Axon's pure method, and maybe -2 should be a mixture of current -2 and -1, for instance the FFT usage like that of -1 (maybe dropping the 128 sample FFT), but with an -nts value of 2.
I personally would agree with such a solution.

ADDED:
I just saw your new beta, Nick. So I see -snr should be negative to the limit for avoiding the skewing/snr heuristics. Spreading length should be 1 however IMO to avoid the spreading heuristics. The constant spreading of 4 was just 2Bdecided's spreading heuristics at his start up as far as I can see it. There's no reason IMO to use a blocksize of 1024. 2Bdecided just used a 1024 sample block size when he started things.
Of course not averaging FFT outcome at all is fine in a pure sense but is suspected to be a huge overkill especially in the high frequency range bringing bitrate up.
At present you can't use a negative -snr value, it's safely forced in the code.

As an aside, using -0 -spf 11111-11111-11111-11111-11111 -cbs 512 -fft 10001 yields: 56.47MB / 637.0kbps; changing to -fft 10101 yields: 57.60MB / 649.7kbps on my 53 sample set.

Bearing in mind that the source FLAC files amount to 69.36MB / 781kbps, that's not really a great saving.

[edit] And the 4 bin spreading function was there from the very beginning in David's original script. [/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-11-27 22:01:31
As an aside, using -0 -spf 11111-11111-11111-11111-11111 -cbs 512 -fft 10001 yields: 56.47MB / 637.0kbps; changing to -fft 10101 yields: 57.60MB / 649.7kbps on my 53 sample set.

Bearing in mind that the source FLAC files amount to 69.36MB / 781kbps, that's not really a great saving.

The pure method isn't attractive to you, and it isn't attractive to me. But it's intrinsically safe as Axon said.
[edit] And the 4 bin spreading function was there from the very beginning in David's original script. [/edit]
Yes, 2Bdecided used this spreading heuristics from the very start, and we've improved upon it - both with respect to quality and bitrate saving.

ADDED:
I just re-read Axon's post. I'm not sure any more if he dislikes spreading as he seems to accept the critical band heuristics being the most important basis for our current spreading parameters. Sure this means already to accept some heuristics.
Anyway the question remains: should we have the -1 configuration in such a way that configuration details have a very high degree of theoretical justification?
Title: lossyWAV Development
Post by: jensend on 2007-11-28 00:02:46
The primary advantage of lossless formats, it seems to me, is the future-proof factor (being able to benefit from it when a new and better encoder or a different format comes around rather than having that option made unattractive by the huge quality per bitrate losses involved in transcoding). So has anybody done listening tests to see how files processed by lossyWAV do when encoded into MP3/AAC/Vorbis/whatever?

Also, where is the preferred place to discuss lossyWAV? It seems like it would belong in the "other lossy formats" forum, but all the discussion of it seems to be restricted to this thread and the original thread in the FLAC forum.
Title: lossyWAV Development
Post by: BGonz808 on 2007-11-28 03:10:17
I'm just wanting to see if my understanding of the preprocessing method is somewhat accurate:
Let's say that an amplitude of part of a 16-bit wave is +32295 (1111111000100111), LossyWAV will simplify (not "clip" , oops  maybe I meant snip?) it so that the binary value contains many trailing zeros so that FLAC will compress those away as wasted_bits. The processed value of that amplitude will then become something like +32256 (1111111000000000) and save 9 bits. Is this the basic principle? Just wanting a little bit of clarification, thanks

808
Title: lossyWAV Development
Post by: Axon on 2007-11-28 07:42:00
I just re-read Axon's post. I'm not sure any more if he dislikes spreading as he seems to accept the critical band heuristics being the most important basis for our current spreading parameters. Sure this means already to accept some heuristics.
Anyway the question remains: should we have the -1 configuration in such a way that configuration details have a very high degree of theoretical justification?
Well, insofar as nothing in psychoacoustics is set in stone and there are going to be heuristics to evaluate very complicated phenomena, you can't escape them. I mean, the Bark scale seems like a hack in the first place, as every closed-form EBW equation probably is.

But clearly, spreading exists in any halfway-complete masking model. To leave such a tempting bone out there without chewing on it is madness. I'd just like to know how the predicted -spf numbers line up against what the tunings are, and have an option to use the theoretical numbers.

I would use a different option than -1 for a setting that matched theoretical predictions, because there's still a need for -1 to -3 in their current incarnations. Moreover, whatever setting exists must still be absolutely transparent. It seems like 2BDecided's original code had some artifact problems... which makes no sense if it was purely by the book.
Title: lossyWAV Development
Post by: halb27 on 2007-11-28 08:23:15
I'm just wanting to see if my understanding of the preprocessing method is somewhat accurate:
Let's say that an amplitude of part of a 16-bit wave is +32295 (1111111000100111), LossyWAV will clip it so that the binary value contains many trailing zeros so that FLAC will compress those away as wasted_bits. The processed value of that amplitude will then become something like +32256 (1111111000000000) and save 9 bits. Is this the basic principle? Just wanting a little bit of clarification, thanks

808

Yes, that essentially is it. It's only a bit the other way around, and clipping isn't a correct description. LossyWAV decides on a per block analysis how many least significant bits are considered not essential for the 512 samples in the block. If it decides for instance that 9 (that's unusually many, let's also consider 3) least significant bits can be ignored then a sample of 1111111000100111 in the block is rounded to 1111111000000000 (resp. 1111111000101000).
Title: lossyWAV Development
Post by: Nick.C on 2007-11-28 08:39:53
The primary advantage of lossless formats, it seems to me, is the future-proof factor (being able to benefit from it when a new and better encoder or a different format comes around rather than having that option made unattractive by the huge quality per bitrate losses involved in transcoding). So has anybody done listening tests to see how files processed by lossyWAV do when encoded into MP3/AAC/Vorbis/whatever?

Also, where is the preferred place to discuss lossyWAV? It seems like it would belong in the "other lossy formats" forum, but all the discussion of it seems to be restricted to this thread and the original thread in the FLAC forum.
In its purest sense, it's lossy, so lossy it is.

All the discussion and uploading lives in here as I am not a member of the developers group and cannot upload in any other forum.

@Halb27: Maybe I'm being a little over protective of the settings we have arrived at after quite a bit of work. Let's rename them as -DAP1, -DAP2 & -DAP3, and start again on the pure method versions. Thinking about it, I feel that -snr may be useful in the pure method.

Attached again (to bring it closer to the conversation) my spreading excel sheet.
Title: lossyWAV Development
Post by: Josef Pohm on 2007-11-28 09:51:01
...OFR supports wasted bits but I can't see a way for it to use a 512 samples frame size (nor my OPINION is that OFR was designed to work with such a small frame size).
As long as the target codec can work on a multiple of the lossyWAV codec_block_size, or use -cbs xxx to set the lossyWAV codec_block_size to the same as the target codec, or I get off my behind and implement a -ofr parameter to specify codec specific settings (as for WMALSL).

I think OFR support is a story on his own. From a certain point of view, the facts that it supports wasted bits detection and that it shares with LA the crown for the best compression ratios around were very promising. On the other hand I couldn't find any information about the frame sizes OFR uses or a possible undocumented switch to make it work with a frame size fixed by the user.

As a last chance, I got an OFR file (encoded at default setting), damaged one only sample with an hexadecimal editor and checked what happened.
As a result, I got exactly five seconds of silence in the middle of the music.

So I couldn't do any better than assuming that OFR is working with a frame size of 220.500 samples (at least on 44.1khz material at default setting), that means practically no chance to use it with lossyWAV.

That's a risky assumption, but that is the little I could do. Obviously, I can't be sure at all about such a conclusion, so, when somebody knows better that would be welcome.
Title: lossyWAV Development
Post by: Synthetic Soul on 2007-11-28 10:09:43
The only information I could find on the board:

The reason why Monkey uses large frames (up to 4s at 44.1khz) relies on it's architecture.
OptimFROG suffers from the same problem. The adaptive predictors have to catch up some data...
Title: lossyWAV Development
Post by: halb27 on 2007-11-28 10:12:53
Well, insofar as nothing in psychoacoustics is set in stone and there are going to be heuristics to evaluate very complicated phenomena, you can't escape them. I mean, the Bark scale seems like a hack in the first place, as every closed-form EBW equation probably is.

But clearly, spreading exists in any halfway-complete masking model. To leave such a tempting bone out there without chewing on it is madness. I'd just like to know how the predicted -spf numbers line up against what the tunings are, and have an option to use the theoretical numbers.

I would use a different option than -1 for a setting that matched theoretical predictions, because there's still a need for -1 to -3 in their current incarnations. Moreover, whatever setting exists must still be absolutely transparent. It seems like 2BDecided's original code had some artifact problems... which makes no sense if it was purely by the book.

I gladly see we're all pretty close to each other.
And especially I have done a rather bad job explaining the ingredients from the sausage factory. I'll try to do better:

a) the skew and snr options

These options I think have the worst theoretical justification.
But: the only thing they can do is to decrease the number of bits removed, to increase the sample accuracy, that is to potentially increase quality compared to not using them.
And it was found that they do a very good job in differentiating between 'good' spots where many bits can be ignored and 'bad' spots where we have to keep nearly all the bits.
As far as I was busy with that I did not find good skew/snr values by listening tests. Instead I have a set of regular music where many bits on average are expected to be removable, and a set of problem samples where it is known that only few bits can be safely removed. I've looked at the resulting bitrate of these sample classes for deciding on skew and snr. I've done only few listening tests for the skew/snr value finding due to the exclusively defensive nature of using these parameters.

A certain danger drops in with our decision to use a positive -nts value for -2 and -3 which is done because we have an excellent good/bad spot indicator by using skew/snr and because the skew value is something like nts applied to the low to medium frequency range so that we can safely lower the nts demand with respect to this. However this adds a certain risk for the higher frequencies.
We do not do this with -1 which is the option best suited to perfectionists.
A -nts value of 2 for quality level -2 is so close to 0 that I think the practical advantages of skewing with respect to good/bad spot differentiation outperform the small danger introduced. Sure we can discuss forever whether the default -nts value should be +2 or +2.5 or +1.5 or maybe 0. In practice it's not very important. Moreover -nts is our main option apart from the quality parameter and everybody can set it easily to 0 with -2 or -3.
In the end the -nts values for -2 and -1 match very much IMO what we have in mind for these quality levels.
BTW at least I don't have this very strong demand for 'secure' transparency with -2 and -3. I do with -1, but with -2 (more so with -3) I accept a very slight risk that the result is not transparent on rare occasion in case I can expect to get only a negligible problem. So in the end it's the typical lossy approach with -2 and -3, but with extremely high demands for -2, and very high demands for -3.

b) spreading

I'm glad you have a positve aspect towards spreading. When allowing for spreading I think David Bryant's idea of taking care of the width of the critical bands is a good starting point for deciding on the spreading details. As far as I was busy with the spreading details my target was to have several FFT bins in every critical band. With this in mind what at first glance looks a bit dangerous with our -spf values, the rather long spreading length of the highest frequency zone with the 1024 sample FFT in fact is a small danger. The problems come rather from the other end, as frequency resolution is pretty low there. But as our spreading length is short there with the long FFTs I think this is adequate. Moreover we do several FFTs, and especially with -1 this should give a very secure result. Last not least we have skewing to bring a big additional safety margin to low frequencies.
As far as I was busy with the critical bands my primary considerations ws about number of FFT bins in the critical bands, and I backed these things up again by checking with my regular and problematic sample set looking at the resulting bitrate. Bitrate should be high with the difficult tracks, and rather low with the regular tracks. The final result was that we got a significantly improved security margin for the difficult tracks (compared to what we had before), and a bitrate decrease with the regular tracks. I also did listening tests, but to a minor degree.
Of course we can discuss endlessly the details of spreading as well as other details of how to do the FFT anylasis and do simplifications with the result. For instance I personally would prefer a different FFT covering of the blocks, and I would prefer a 512 sample FFT instead of the 256 sample FFT with -2 in favor of giving additional security to the low end. But after all it's not vital to me (beyond myself it's an open question whether that's useful at all), and IMO we have adequate considerations for the various aspects with our current settings.

So I think your aspects which originate from the theoretical basis (ensuring quality a priori without listening tests) are covered well by using -1. This is your quality level, as what we have in mind with -2 and -3 isn't in full congruence with your targets.
Sure any practical suggestion for improving things is welcome.
Title: lossyWAV Development
Post by: halb27 on 2007-11-28 11:42:53
... @Halb27: Maybe I'm being a little over protective of the settings we have arrived at after quite a bit of work. Let's rename them as -DAP1, -DAP2 & -DAP3, and start again on the pure method versions. Thinking about it, I feel that -snr may be useful in the pure method.

Attached again (to bring it closer to the conversation) my spreading excel sheet.

Sorry it was me who brought in some confusion wanting to have -1 going the extremely pure way.
I've thought it over at night (see my last post) - and come to the conclusion that with our current -1 we're going the pure way. Stuff from the sausage factory like skewing doesn't hurt quality a bit - the contrary is true. We do have to make some practical considerations for the way we do the FFT analyses, but here too I think this is in agreement with the pure way though details are always disputable.

So I think we can leave -1 as is. Sure suggestions for improvements are always welcome.

-3 is typically used with DAPs as you said, and -2 is a compromise for -3 and -1, kind of a -1 for the more practically minded.

BTW your spreading excel sheet was of high value for me on deciding about the spreading details - as far as it was me who worked out the details.

A suggestion:
It looks like it will be hard to disqualify -3 qualitywise (which is a good thing of course). Maybe for testing we can do it the other way around, start with an even less demanding quality setting in such a way that we do get into trouble, and increase the quality demands until quality is fine with the problems found. This way we can get a feeling of how big the security margin of -3 is. It is expected to be small, but who knows?
Essentially this means that we should be able to set -nts to a value higher than +6.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-28 14:52:22
... @Halb27: Maybe I'm being a little over protective of the settings we have arrived at after quite a bit of work. Let's rename them as -DAP1, -DAP2 & -DAP3, and start again on the pure method versions. Thinking about it, I feel that -snr may be useful in the pure method.

Attached again (to bring it closer to the conversation) my spreading excel sheet.
Sorry it was me who brought in some confusion wanting to have -1 going the extremely pure way.
I've thought it over at night (see my last post) - and come to the conclusion that with our current -1 we're going the pure way. Stuff from the sausage factory like skewing doesn't hurt quality a bit - the contrary is true. We do have to make some practical considerations for the way we do the FFT analyses, but here too I think this is in agreement with the pure way though details are always disputable.

So I think we can leave -1 as is. Sure suggestions for improvements are always welcome.

-3 is typically used with DAPs as you said, and -2 is a compromise for -3 and -1, kind of a -1 for the more practically minded.

BTW your spreading excel sheet was of high value for me on deciding about the spreading details - as far as it was me who worked out the details.

A suggestion:
It looks like it will be hard to disqualify -3 qualitywise (which is a good thing of course). Maybe for testing we can do it the other way around, start with an even less demanding quality setting in such a way that we do get into trouble, and increase the quality demands until quality is fine with the problems found. This way we can get a feeling of how big the security margin of -3 is. It is expected to be small, but who knows?
Essentially this means that we should be able to set -nts to a value higher than +6.
It's easier than that: use -snr <large negative number> with v0.5.3.....

Using -3 -snr -215 on my 53 sample set yields: 32.16MB; 362.8kbps.......

lossyWAV beta v0.5.3 attached: Superseded.

-snr parameter now valid in range -215<=n<=48.
-window parameter fully removed.

I intend to fully remove the following parameters unless there is objection:

-dither;
-clipping;
-overlap.
Title: lossyWAV Development
Post by: Mitch 1 2 on 2007-11-28 15:05:51
I don't object, and I also don't see the use in keeping -allowable.
Title: lossyWAV Development
Post by: halb27 on 2007-11-28 15:15:00
... It's easier than that: use -snr <large negative number> with v0.5.3.....

I have no idea what a negative -snr value is doing. I had thought bringing in snr means giving the relevant min the chance to go lower than when not using snr. From this understanding any snr value has only the chance to make things more defensive compared to not using snr. Sure as we do use a snr value of 21 we will get lower bitrate when turning the -snr value down. However I wonder what makes your problem samples set go so low in bitrate. Guess there's a specific meaning of a negative snr value.

Anyway I'd prefer to use a higher -nts value of up to say 40 instead. It would give us the chance to keep the usual skew/snr combination and go extreme with noise threshold for learning about lossyWAV behavior.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-28 16:21:15
... It's easier than that: use -snr <large negative number> with v0.5.3.....
I have no idea what a negative -snr value is doing. I had thought bringing in snr means giving the relevant min the chance to go lower than when not using snr. From this understanding any snr value has only the chance to make things more defensive compared to not using snr. Sure as we do use a snr value of 21 we will get lower bitrate when turning the -snr value down. However I wonder what makes your problem samples set go so low in bitrate. Guess there's a specific meaning of a negative snr value.

Anyway I'd prefer to use a higher -nts value of up to say 40 instead. It would give us the chance to keep the usual skew/snr combination and go extreme with noise threshold for learning about lossyWAV behavior.
I am beginning to feel that -snr is a bit of packing in the sausage. When I tried -3 -snr -215 (modified average = average - snr_value, i.e. average +215 in this case, effectively removing it from consideration) I got palatable results.

[edit] I would go further than saying palatable: 32.17MB / 362.8kbps on my 53 sample set. I've started a speculative 1496 track transcode - so far: 256 tracks, 2.20GB / 302kbps vs 6.43GB / 881kbps..... [/edit]

-nts amended as requested.

Now you can really cause awful results.......

Try: -3 -nts 48 -skew 0 -snr -215

This gave 9.504MB / 107.2kbps. 

lossyWAV beta v0.5.4 attached. Superseded.[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV beta v0.5.4 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-0            emulate script  [2xFFT] (-cbs 1024 -nts  0.0 -skew  0 -snr -215
              -spf 44444-44444-44444-44444-44444 -fft 10001)
-1            extreme quality [4xFFT] (-cbs  512 -nts -2.0 -skew 36 -snr   21
              -spf 22224-22225-11235-11246-12358 -fft 11011)
-2            default quality [3xFFT] (-cbs  512 -nts +1.5 -skew 36 -snr   21
              -spf 22224-22235-22346-12347-12358 -fft 10101)
-3            compact quality [2xFFT] (-cbs  512 -nts +6.0 -skew 36 -snr   21
              -spf 22235-22236-22347-22358-2246C -fft 10001)

-o <folder>   destination folder for the output file
-nts <n>      set noise_threshold_shift to n dB (-48.0dB<=n<=+48.0dB)
              (-ve values reduce bits to remove, +ve values increase)
-force        forcibly over-write output file if it exists; default=off

Codec Options:

-wmalsl       optimise internal settings for WMA Lossless codec; default=off

Advanced / System Options:

-snr <n>      set minimum average signal to added noise ratio to n dB;
              (-215.0dB<=n<=48.0dB) Increasing value reduces bits to remove.
-skew <n>     skew fft analysis results by n dB (0.0db<=n<=48.0db) in the
              frequency range 20Hz to 3.45kHz
-cbs <n>      set codec block size to n samples (512<=n<=4608, n mod 32=0)
-fft <5xbin>  select fft lengths to use in analysis, using binary switching,
              from 64, 128, 256, 512 & 1024 samples, e.g. 01001 = 128,1024
-overlap      enable conservative fft overlap method; default=off

-spf <5x5hex> manually input the 5 spreading functions as 5 x 5 characters;
              These correspond to FFTs of 64, 128, 256, 512 & 1024 samples;
              e.g. 22235-22236-22347-22358-2246C (Characters must be one of
              1 to 9 and A to F (zero excluded).
-allowable    select allowable number of clipping samples per codec block
              before iterative clipping reduction; (0<=n<=64, default=0).

-clipping     disable clipping prevention by iteration; default=off
-dither       dither output using triangular dither; default=off

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

David Robinson for the method itself and motivation to implement it in Delphi.
Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]
Title: lossyWAV Development
Post by: jesseg on 2007-11-28 19:53:39
Quote
lFLCDrop Change Log:
v1.2.0.2
-added support for "-0 (emulate script)" option

lFLC.bat Change Log:
v1.0.0.2
- improved temp file handling
- fixed quality preset bug
fixed a pretty massive FUBAR on my part, the variable name for passing in the quality preset wasn't right, so it was defaulting to -2 always.  that's been fixed.  that's what i get for initially working on it 9 hours straight without breaks.
 

[edit] removed, newer version on later post [/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-11-28 21:57:39
I just tried insane -nts settings on my problem set to get a feeling about the security margin we have when using -3:

a) -3 -nts 30    => 319/390 kbps for my regular/problem sample set

I was astonished about the quality of Atem-lied which I tried first. badvilbel was next and also has a remarkable quality. bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A however have big errors (no abxing required), and the errors of furious and triangle are also easy to perceive though quality isn't really bad.
The big errors of bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A are pretty much of the kind I know from wavPack lossy.
Everybody who likes to hear the potential problems lossyWav has when accuracy demand is too small is invited to do a listening test with this setting. The problems of the bad samples mentioned are easy to hear.

b) -3 -nts 20    => 320/405 kbps for my regular/problem sample set

Results were a lot better. Only bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A are not transparent, with bibilolo and S37_OTHERS_MartenotWaves_A being already roughly acceptable. Just keys_1644ds is still missing quality very seriously, though it too has improved in a remarkable way.

c) -3 -nts 16    => 321/419 kbps for my regular/problem sample set

Only key_1644ds and S37_OTHERS_MartenotWaves_A are not transparent to me. S37_OTHERS_MartenotWaves_A is already very hard to abx for me, and even for key_1644ds it's not easy.

d) -3 -nts 12    => 326/438 kbps for my regular/problem sample set

Only keys is not totally transparent to me - and I was able to abx keys only with a pretty bad 7/10 result.

e) -3 -nts 9    => 333/455 kbps for my regular/problem sample set

Now also keys_1644ds is transparent to me.


Looking at these results to me even -3 (-nts 6 defaulted) seems to have a remarkable security margin.
The default -3 setting yields 345/474 kbps for my regular/problem sample set.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-28 22:06:53
I just tried insane -nts settings on my problem set to get a feeling about the security margin we have when using -3:

a) -3 -nts 30    => 319/390 kbps for my regular/problem sample set

I was astonished about the quality of Atem-lied which I tried first. badvilbel was next and also has a remarkable quality. bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A however have big errors (no abxing required), and the errors of furious and triangle are also easy to perceive though quality isn't really bad.
The big errors of bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A are pretty much of the kind I know from wavPack lossy.
Everybody who likes to hear the potential problems lossyWav has when accuracy demand is too small is invited to do a listening test with this setting. The problems of the bad samples mentioned are easy to hear.

b) -3 -nts 20    => 320/405 kbps for my regular/problem sample set

Results were a lot better. Only bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A are not transparent, with bibilolo and S37_OTHERS_MartenotWaves_A being already roughly acceptable. Just keys_1644ds is still missing quality very seriously, though it too has improved in a remarkable way.

c) -3 -nts 16    => 321/419 kbps for my regular/problem sample set

Only key_1644ds and S37_OTHERS_MartenotWaves_A are not transparent to me. S37_OTHERS_MartenotWaves_A is already very hard to abx for me, and even for key_1644ds it's not easy.

d) -3 -nts 12    => 326/438 kbps for my regular/problem sample set

Only keys is not totally transparent to me - and I was able to abx keys only with a pretty bad 7/10 result.

e) -3 -nts 9    => 333/455 kbps for my regular/problem sample set

Now also keys_1644ds is transparent to me.


Looking at these results to me even -3 (-nts 6 defaulted) seems to have a remarkable security margin.
The default -3 setting yields 345/474 kbps for my regular/problem sample set.
That's a lot of listening! It's reassuring that the previously determined -3 settings have been confirmed by your test.

I went down a slightly different path with -snr <large negative number> to effectively remove it from the calculation of the minimum value for each FFT result. I think that some of your large -nts values would sound *very* different without the -snr safety net. That's not to say that -snr is necessarily bad, but I think it bloats the bitrate a bit.
Title: lossyWAV Development
Post by: TBeck on 2007-11-28 22:23:12
This gives me an opportunity to thank you all though for the work that you have put in.  I think this is an extremely exciting development.

I second this!

Thank you very much!

If lossyWAV get's enough users, i will evaluate if some modifications of TAK can significantly improve the compression of it's output. In this context "significantly" means at least by about 20 kbps. I have some ideas, but you can not be sure until you tried it.

Thank you again!

  Thomas
Title: lossyWAV Development
Post by: Nick.C on 2007-11-28 22:35:53
This gives me an opportunity to thank you all though for the work that you have put in.  I think this is an extremely exciting development.
I second this!

Thank you very much!

If lossyWAV get's enough users, i will evaluate if some modifications of TAK can significantly improve the compression of it's output. In this context "significantly" means at least by about 20 kbps. I have some ideas, but you can not be sure until you tried it.

Thank you again!

  Thomas
*Another* 20kbps saving! On top of everything else, that would probably push the average output of -3 down to circa 320kbps using TaK.......

Congratulations on the piping by the way, I may have to beseech aid in implementing it in lossyWAV - though how you pipe in and pipe out  of lossyWAV then ensure that the output pipe goes to the lossless encoder I haven't the faintest clue........
Title: lossyWAV Development
Post by: GeSomeone on 2007-11-28 23:18:14
-nts amended as requested.

Now you can really cause awful results...

Attached File  lossyWAV_beta_v0.5.4.zip

Just a side note again .. when you're going to experiment further (in the code) with settings it would be best to call those (in between) versions Alpha again. When you arrive at something you're confident about you could release another beta. (I'm not saying something isn't right, but maybe another alpha round is needed?)
Title: lossyWAV Development
Post by: Nick.C on 2007-11-29 08:09:44
-nts amended as requested.

Now you can really cause awful results...

Attached File  lossyWAV_beta_v0.5.4.zip
Just a side note again .. when you're going to experiment further (in the code) with settings it would be best to call those (in between) versions Alpha again. When you arrive at something you're confident about you could release another beta. (I'm not saying something isn't right, but maybe another alpha round is needed?)
Well, all I did was change an input range to a particular parameter, I did not substantially change the code.  I see what you mean though.

[edit] On reflection, no settings per se have been changed (other than the inclusion of the ability to revert to a close approximation of David's original script), only the ability to change settings has been augmented.

The more I listen to -3 -snr -215, the more I like it. I still think that there is a place for -snr, however I feel that it needs better explanation. I'll work up a spreadsheet which will graphically demonstrate the -skew, -nts and -snr parameters effects on a suitably small fft_length.

The bottom line though is that there is only one process which actually modifies the audio data, namely the bits_to_remove procedure - no heuristics in that process at all. The number of bits_to_remove may depend on a heuristically generated minimum_value, but the added noise caused by the subsequent bit reduction has already been calculated - therefore the link between minimum_value and bits_to_remove. [/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-11-30 09:03:33
... The more I listen to -3 -snr -215, the more I like it. ...

From the bitrate you gave for your sample set which consists of problem samples to a high degree it's hard to imagine that keys_1644ds, bibilolo, or Martenotwaves are fine. I will try it this weekend. Anyway I'd like to know what a negative -snr value is doing.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-30 12:33:53
... The more I listen to -3 -snr -215, the more I like it. ...
From the bitrate you gave for your sample set which consists of problem samples to a high degree it's hard to imagine that keys_1644ds, bibilolo, or Martenotwaves are fine. I will try it this weekend. Anyway I'd like to know what a negative -snr value is doing.
Attached spreadsheet shows how -skew, -snr and -nts interact on a 64 sample FFT (random numbers used for FFT output, F9 to recalculate for another iteration).

As an aside:

Bibilolo -3: 1487438 bytes; -3 -snr -215: 1470329 bytes;
Keys_1644ds -3: 105088 bytes; -3 -snr -215 : 105088 bytes;
S37_OTHERS_MartenotWaves_A -3: 711469 bytes; -3 -snr -215: 711469 bytes.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-11-30 14:26:11
Axon,

I share your unease at the way pseudo-psychoacoustics have been arrived at for lossyWAV. I wouldn't put it any stronger than that though. I don't have the time to get involved, and am very grateful to Nick and halb27 for pushing this forward with such enthusiasm.

It seems like 2BDecided's original code had some artifact problems... which makes no sense if it was purely by the book.
The basic algorithm is just "find the noise floor, and quantise at or below it".

The fundamental flaw in my implementation was that it couldn't "see" dips in the noise floor at low frequencies which are audible to human listeners - so it would happily fill them with noise. The "resolution" I used wasn't sufficient for low frequencies. The solution is either to skew the results, or modify the spreading, or both (I haven't taken the time to figure out which is the "right" approach) - the current version does both, to great effect. The reason my original script got away with it most of the time is because there are very few recordings where the noise floor is lowest at low frequencies - normally, the lower limit is at a high frequency, so inaccuracies in estimating it at low frequencies have no effect on the result for most recordings.

There was also a bug in later lossyFLAC MATLAB scripts which caused it to analyse the tail end of the "noise it had just added to the previous block" when assessing the noise floor of the current block. Nick spotted that, and corrected it in his code. I haven't generated a "fixed" MATLAB version.


The obvious "extras" for lossyWAV are a hybrid/lossless mode (quite possible), and a noise-shaped mode (already implemented, but not released for IP reasons). Finally, it might make sense to delineate between a proper psychoacoustic model (borrow one?) and a non-psychoacoustic implementation (close to now, but tamed a little).


btw Nick, I don't have any objections to you leaving switches in the final release for testing - just hide them well away in the depths of the manual! And please don't feel like you have to respect my wishes or anything - you've well and truly adopted my baby now!

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-30 14:46:51
Axon,

I share your unease at the way pseudo-psychoacoustics have been arrived at for lossyWAV. I wouldn't put it any stronger than that though. I don't have the time to get involved, and am very grateful to Nick and halb27 for pushing this forward with such enthusiasm.

It seems like 2BDecided's original code had some artifact problems... which makes no sense if it was purely by the book.
The basic algorithm is just "find the noise floor, and quantise at or below it".

The fundamental flaw in my implementation was that it couldn't "see" dips in the noise floor at low frequencies which are audible to human listeners - so it would happily fill them with noise. The "resolution" I used wasn't sufficient for low frequencies. The solution is either to skew the results, or modify the spreading, or both (I haven't taken the time to figure out which is the "right" approach) - the current version does both, to great effect. The reason my original script got away with it most of the time is because there are very few recordings where the noise floor is lowest at low frequencies - normally, the lower limit is at a high frequency, so inaccuracies in estimating it at low frequencies have no effect on the result for most recordings.

There was also a bug in later lossyFLAC MATLAB scripts which caused it to analyse the tail end of the "noise it had just added to the previous block" when assessing the noise floor of the current block. Nick spotted that, and corrected it in his code. I haven't generated a "fixed" MATLAB version.


The obvious "extras" for lossyWAV are a hybrid/lossless mode (quite possible), and a noise-shaped mode (already implemented, but not released for IP reasons). Finally, it might make sense to delineate between a proper psychoacoustic model (borrow one?) and a non-psychoacoustic implementation (close to now, but tamed a little).


btw Nick, I don't have any objections to you leaving switches in the final release for testing - just hide them well away in the depths of the manual! And please don't feel like you have to respect my wishes or anything - you've well and truly adopted my baby now!

Cheers,
David.
Thanks David, I'll look after her..... As to switches, I agree with the concensus that they should remain, although hidden from the attentions of casual users. I would also probably limit the input ranges so that truly awful results can be avoided.

A hybrid / lossless mode is totally possible - either at the same time as the processing, or as a stand alone program. If I venture down the piping route, it would have to be at the same time.

I corrected the Matlab script as well as my code and posted it as LossyFLAC6_x (I think).

All the best, and thanks again.

Nick.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-11-30 15:52:17
I corrected the Matlab script as well as my code and posted it as LossyFLAC6_x (I think).
Yes, you did thanks. I didn't get chance to merge the fix back into what I had.

It would be interesting to put all your tweaks into the noise shaping version, but the wait for (a) time and (b) the IP to expire means I'm looking at, er, sometime after I retire! (I'm currently 30-something!). I think I'll just release what I have when the IP expires and let someone else play with it. It would be so cool to have the option of going from true lossless to virtually lossless to high VBR mp3-like lossy (but with fewer problem samples) in the one codec.

Mind you, you're pretty much there already, without the noise shaping!

Cheers,
David.
Title: lossyWAV Development
Post by: GeSomeone on 2007-11-30 17:32:44
Attached File  Spread___Skew.zip ( 7.99k )

It is very hard to see the effect of a parameter change because of the random Log FFT output 

Having read 2Bdecided's comment it might be best to ditch the -0 settings as they emulate a flawed implementation.
Title: lossyWAV Development
Post by: Nick.C on 2007-11-30 18:02:10

Attached File  Spread___Skew.zip ( 7.99k )

It is very hard to see the effect of a parameter change because of the random Log FFT output 

Having read 2Bdecided's comment it might be best to ditch the -0 settings as they emulate a flawed implementation.
You could copy the random number column and paste it in place as values to fix it. That would allow you to see the effects more clearly on a static example. Try looking again at the relativities between the two lines for minimum and the two lines for average....
Title: lossyWAV Development
Post by: jesseg on 2007-12-01 05:50:44
Having read 2Bdecided's comment it might be best to ditch the -0 settings as they emulate a flawed implementation.


I agree to some extent.  But perhaps a commandline string something like -allowbadsettings which will allow people to use -0 as well as remove the limits to the limited settings.  This would of course be another great option to hide deeeep in the manual.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-01 22:20:51
The -0 setting is no longer required as it can be re-created from the relevant parameters. -clipping, -dither and -allowable will also be removed at the next revision.

I have started the coding for correction files and can now create a WAV file (.lwcdf.WAV : lossyWAV correction data file) of the difference between the source and bit_removed data. It's basically just hiss and compresses less well than the lossy.WAV file.

There's still a lot to do on the correction file side of things, but it's shouldn't be too difficult - just time consuming.

I'm a bit concerned as to how, if I go down the route of two WAV files: one lossy; one lwcdf; that if a WAV file is processed more than once, then what happens if the wrong correction file is added to the lossy file? Probably something not too good......

@Halb27: I've narrowed down my variations to -3: -snr 18 -skew 36 -nts 6 -spf 22235-22236-22347-22358-22469 -fft 10001 -cbs 512.

This permutation yields 34.77MB / 392.2kbps on my 53 sample set.
Title: lossyWAV Development
Post by: halb27 on 2007-12-02 10:20:55
I'm a bit concerned as to how, if I go down the route of two WAV files: one lossy; one lwcdf; that if a WAV file is processed more than once, then what happens if the wrong correction file is added to the lossy file? Probably something not too good....

I'd store a checksum of the lossyWAV result in the correction file so you can figure out a wrong combination in the bring-it-back-to-lossless application.

Other than that I'm having a hard time with listening tests resulting from your -snr -215 approach.
I easily found that there's no magic with negative snr values: for my sample sets -snr -215/-100/-10/0 all gave the same average bitrate, and the result of -snr 10 was close by. So it's just the same machinery as with positive snr values: modifying the FFT min if the snr offset from the FFT average is lower. With -snr -215 or similar there's simply no modification of the FFT min, and -snr -215 simply works as if there was no snr machinery at all.

-3 -snr -215 yields 313/430 kbps with my regular/problem samples set. While this is welcome with regular tracks, it looks a bit low with the problem samples.
I listened to it (to get used to problems I started with -nts 16), and I added more problem samples. The result wan't good with badvilbel, bibilolo, bruhns, dithernoise_test, eig, furious, keys_1644ds, utb. There are clear artifacts/distortions audible. Sure that was with an insane setting of -nts 16 for a warm up.
Using -nts 9 and -nts 6 improves a lot, the distortion like noise is gone, I'd even call the results 'acceptable', but I can still abx furious, dithernoise_test, keys, utb, and badvilbel.

My usual approach for improving is to bring bitrate up for the problem set but to a minor degree with the regular set. From current -3 setting and previous experience I know a '1' instead of the '2' for the first frequency zone of the 1024 sample FFT should do the job. It does, but only for the statistics, my listening experience yielded pretty much the same not totally satisfying quality.

That's my current state. The interesting question is: if -2 -snr -215 is a bit poor for some problems, what is the most effective way to improve: may be a higher -skew value will do it, or may be just the basic thing of the entire machinery: a lower -nts value (would match the idea of going a bit back to the pure basics), or may be really the snr machinery has en essential participation in preserving quality (after all the current -3 quality is very good). Quite interesting questions, but the answers will take some time.

And of course I'll try your new suggestion for the -3 setting.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-02 20:01:16
I'd store a checksum of the lossyWAV result in the correction file so you can figure out a wrong combination in the bring-it-back-to-lossless application.
Not sure how I will achieve this inside a WAV file....
Other than that I'm having a hard time with listening tests resulting from your -snr -215 approach.
I easily found that there's no magic with negative snr values: for my sample sets -snr -215/-100/-10/0 all gave the same average bitrate, and the result of -snr 10 was close by. So it's just the same machinery as with positive snr values: modifying the FFT min if the snr offset from the FFT average is lower. With -snr -215 or similar there's simply no modification of the FFT min, and -snr -215 simply works as if there was no snr machinery at all.
That was exactly the point, to be able to switch off the -snr setting.
-3 -snr -215 yields 313/430 kbps with my regular/problem samples set. While this is welcome with regular tracks, it looks a bit low with the problem samples.
I listened to it (to get used to problems I started with -nts 16), and I added more problem samples. The result wan't good with badvilbel, bibilolo, bruhns, dithernoise_test, eig, furious, keys_1644ds, utb. There are clear artifacts/distortions audible. Sure that was with an insane setting of -nts 16 for a warm up.
Using -nts 9 and -nts 6 improves a lot, the distortion like noise is gone, I'd even call the results 'acceptable', but I can still abx furious, dithernoise_test, keys, utb, and badvilbel.

My usual approach for improving is to bring bitrate up for the problem set but to a minor degree with the regular set. From current -3 setting and previous experience I know a '1' instead of the '2' for the first frequency zone of the 1024 sample FFT should do the job. It does, but only for the statistics, my listening experience yielded pretty much the same not totally satisfying quality.

That's my current state. The interesting question is: if -2 -snr -215 is a bit poor for some problems, what is the most effective way to improve: may be a higher -skew value will do it, or may be just the basic thing of the entire machinery: a lower -nts value (would match the idea of going a bit back to the pure basics), or may be really the snr machinery has en essential participation in preserving quality (after all the current -3 quality is very good). Quite interesting questions, but the answers will take some time.

And of course I'll try your new suggestion for the -3 setting.
I've come to the realisation that the -snr setting is what (along with -skew and -nts) makes -3 so acceptable. Before -snr, we didn't have a way to stop minimum values which were close to the average introducing noise close to the average. Now we do - if we set -snr to 21 then we will never add noise above the average -21dB level.

I think that -3 is close to finished - I await your listening results with anticipation!

Nick.
Title: lossyWAV Development
Post by: halb27 on 2007-12-02 21:01:23

I'd store a checksum of the lossyWAV result in the correction file so you can figure out a wrong combination in the bring-it-back-to-lossless application.
Not sure how I will achieve this inside a WAV file....

Depends on the overall procedure. I guess you want to compress the correction file (though the compression ratio may be small - which just says that lossyWAV is working efficiently), so the final representation of the correction file won't have a WAV format. If you compress by your own method you can take care of the checksum easily, and if you use FLAC or similar, you can use tags to store the checksum of the lossyWAV result.

I've finished my investigations on -3.
First I wave edited all the old and new serious problem samples so that they consist only of the problematic spots. This way I hope to get a more meaningful statistics. With current -3 the average bitrate of this problem essence is 464 kbps.
When using -3 -snr -215 I got good, but not perfect results qualitywise, and I tried already without success to increase quality by using a spreading length of 1 for the lowest frequency zone of the 1024 sample FFT.
Next I tried to improve by using a higher -skew value. But this also doesn't bring the solution: using -3 -snr -215 -skew 44 yields an average bitrate of 422 kbps for my problem essence which is too low.
Next I lowered the -nts value, and -3 -snr -215 -nts 3 yields a bitrate of 444 kbps for my problem essence. I listened to it and was content with the result though to me it's a bit much on the cutting edge as my furious result was 7/10 and I also have the suspicion that utb isn't perfect though my ABX results don't back this up. With my regular sample set the average bitrate is 344 kbps which is nearly identical to the 345 kbps of current -3. Qualitywise the current -3 setting is more secure IMO, so I prefer it.
Then I used your new -3 proposal, but with the -spf value of current -3, that is I used -3 -snr 18, and the statistics is 331 kbps for my regular set, and 445 kbps for my problem essence. Listening to the problems showed that nearly everything is fine to me with the exception of dithernoise_test which was easy to abx 10/10 due to 1 spot where the noise like sound suddenly changes with the lossyWAV result in contrary to the original. With utb again I have the suspicion that it's not totally correct though I couldn't abx it and thus may be wrong.
Finally I tried your very new -3 proposal -3 -snr 18 -spf 22235-22236-22347-22358-22469. dithernoise_test is better now, it was harder for me to abx, and I arrived at 8/10. For utb my suspicion for being not perfect is gone.

So your new proposal is within the quality demand which to me is fine for -3 though it's on the cutting edge. But that's just my listening with my old ears to not very many samples (cosidered to be extraordinarily problematic though). The average bitrate for my regular sample set is 335 kbps which is only 10 kbps lower than that of current -3. Average bitrate however of the problem essence is 446 kbps, and that's 18 kbps less than that of current -3.
So we lose a lot more kbps in the problem area where a higher degree of kbps is wanted than we gain in the regular area. Once sensitive for especially dithernoise_test I tested it again with current -3, and everything is fine to me. As is utb.

So in the end IMO we should stick with current -3. An average bitrate of ~350 kbps for regular music is very good I think, and it seems we can't do essentially better with our weaponry without sacrificing safety margin to a considerable extent.
What the investigation has shown is that -snr has it's own specific part in preserving quality. It's not just an amplification of the merits of the -skew option.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-02 21:10:37
So in the end IMO we should stick with current -3. An average bitrate of ~350 kbps for regular music is very good I think, and it seems we can't do essentially better with our weaponry without sacrificing safety margin to a considerable extent.
What the investigation has shown is that -snr has it's own specific part in preserving quality. It's not just an amplification of the merits of the -skew option.
Thank you very much my friend for spending a lot of time on settings validation. I was nearly at the same conclusion when you posted. Therefore, -3 is fixed - permanently (unless we find a particularly awkward sample......).

I am tidying up the code and removing redundant parameters. Will post beta v0.5.5 tonight or tomorrow.

Thanks again.

Nick.
Title: lossyWAV Development
Post by: GeSomeone on 2007-12-03 16:52:58
So your new proposal is within the quality demand which to me is fine for -3 though it's on the cutting edge.

Isn't that exactly where -3 should be? And -2 being "transparent as far as could be determined"?

Quote
The average bitrate for my regular sample set is 335 kbps which is only 10 kbps lower than that of current -3. Average bitrate however of the problem essence is 446 kbps, and that's 18 kbps less than that of current -3.

3% to 4% extra compression is something lossless codecs would have to work very hard for, so nothing to give away easily, except for a reason of course.

Thanks, for your testing and observations.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-03 21:49:30
So your new proposal is within the quality demand which to me is fine for -3 though it's on the cutting edge.
Isn't that exactly where -3 should be? And -2 being "transparent as far as could be determined"?
Quote
The average bitrate for my regular sample set is 335 kbps which is only 10 kbps lower than that of current -3. Average bitrate however of the problem essence is 446 kbps, and that's 18 kbps less than that of current -3.
3% to 4% extra compression is something lossless codecs would have to work very hard for, so nothing to give away easily, except for a reason of course.

Thanks, for your testing and observations.
I take on board what you're saying, but I agree with Halb27 that we're aiming for transparency at -3 with increasing resilience at -2 and -1. The initial aim of the process was to "slightly" reduce bitrate - what we have currently with -3 is significant reduction using the interplay of -nts, -skew and -snr. Maybe -3 -snr 18 -nts 7.5 would produce adequate results, maybe not. However, while there's only really Halb27 doing the ABX'ing, I will unconditionally accept his opinion.

Anyway,

lossyWAV beta v0.5.5 attached: Superseded.

-allowable, -dither, -clipping and -overlap removed;

Reference_threshold values used to determine bits_to_remove from calculated minimum_value have been re-calculated. Very slight increase in bitrate (406.9 v0.5.4 vs 407.3 v0.5.5 for my 53 sample set).

Code tidied.

[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV beta v0.5.5 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-1            extreme settings [4xFFT] (-cbs 512 -nts -2.0 -skew 36 -snr 21
              -spf 22224-22225-11235-11246-12358 -fft 11011)
-2            default settings [3xFFT] (-cbs 512 -nts +1.5 -skew 36 -snr 21
              -spf 22224-22235-22346-12347-12358 -fft 10101)
-3            compact settings [2xFFT] (-cbs 512 -nts +6.0 -skew 36 -snr 21
              -spf 22235-22236-22347-22358-2246C -fft 10001)

Standard Options:

-o <folder>   destination folder for the output file
-nts <n>      set noise_threshold_shift to n dB (-48.0dB<=n<=+48.0dB)
              (-ve values reduce bits to remove, +ve values increase)
-force        forcibly over-write output file if it exists; default=off

Codec Specific Options:

-wmalsl       optimise internal settings for WMA Lossless codec; default=off

Advanced / System Options:

-snr <n>      set minimum average signal to added noise ratio to n dB;
              (-215.0dB<=n<=48.0dB) Increasing value reduces bits to remove.
-skew <n>     skew fft analysis results by n dB (0.0db<=n<=48.0db) in the
              frequency range 20Hz to 3.45kHz
-spf <5x5hex> manually input the 5 spreading functions as 5 x 5 characters;
              These correspond to FFTs of 64, 128, 256, 512 & 1024 samples;
              e.g. 22235-22236-22347-22358-2246C (Characters must be one of
              1 to 9 and A to F (zero excluded).
-fft <5xbin>  select fft lengths to use in analysis, using binary switching,
              from 64, 128, 256, 512 & 1024 samples, e.g. 01001 = 128,1024
-cbs <n>      set codec block size to n samples (512<=n<=4608, n mod 32=0)

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

David Robinson for the method itself and motivation to implement it in Delphi.
Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]
Title: lossyWAV Development
Post by: halb27 on 2007-12-03 22:10:09
... 3% to 4% extra compression is something lossless codecs would have to work very hard for, so nothing to give away easily, except for a reason of course. ...

Please keep in mind that we did not have a big amount of testing so far, and I did abx dithernoise_test 8/10 with my 58 year old ears for this 10 kbps saving setting. We are not in the situation of lossless codecs where lossless is lossless after all, but also with -3 IMO we should be pretty safe qualitywise, cause otherwise there's no good distinction from mp3 etc. If after years of lossyWAV usage a sample should come up which isn't totally transparent but has a negligible issue this is an acceptable situation for -3 IMO, but we should take some care not to be in this situation at the very lossyWAV start. Not for the advantage of just having an average bitrate of 340 kbps instead of 350.

As -nts is an official option you can easily save some kbps by increasing the -nts value as Nick said if you prefer to be a little bit adventurous.
Title: lossyWAV Development
Post by: jesseg on 2007-12-04 06:20:17
newer version of lFLCDrop, check the last page(s)
Title: lossyWAV Development
Post by: Nick.C on 2007-12-04 08:40:34
It's competition time for all the graphically creative users out there.... As the wiki is now up and running (many thanks to Mitch 1 2!), complete with Foobar2000 converter settings, I/we need an icon for lossyWAV.

Answers on the back of used large denomination currency of your choice () to: this thread.....
Title: lossyWAV Development
Post by: halb27 on 2007-12-04 19:43:04
It's not very important, but those who use lossyWAV together with FLAC may find this useful:

Synthetic Soul found already that FLAC -5 yields nearly the same file size as -8. I can confirm and extend this:

For FLAC used in our context in many respect it makes nearly no difference whether we use -8, -5, or -3.
What's important to many tracks is the -m parameter (defaulted with -8 and -5, but not with -3).
To a small degree also the -e parameter makes a difference (defaulted with -8, but not with -5 and -3).

So -8, -5 -e, or -3 -m -e all yield an identical file size in a practical sense (at least with my test set), and -3 -m -e is the fastest encoding procedure among these.
If you allow for another option -3 -m -e -r 2 speeds things up a bit more while not really sacrificing file size (with my test set).
Dropping -e speeds up things further. File size increases a bit more noticable than with the -m -e variants, but to most users it's probably still negligible. Use -3 -m for an amazing speed (together with -r 2 if you like to), or -5. File size for -3 -m and -5 usually is identical in a practical sense.

Keep in mind though that with these speed settings overall encoding time is dominated by lossyWAV. So it may not be wise to hunt for the ultimate FLAC speed.
Title: lossyWAV Development
Post by: Mitch 1 2 on 2007-12-05 01:41:02
I can also confirm it. The increase in speed justifies the consistently negligible (<1kbps) increase in bitrate.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-05 07:56:22
I can also confirm it. The increase in speed justifies the consistently negligible (<1kbps) increase in bitrate.
Good find Halb27, thanks for the confirmation Mitch 1 2! So, for those on a time budget, flac -3 -e -m -r 2 -b 512 is the way to go.....

On my 53 sample set, this increases the bitrate from 407.5 (-8) to 413.4 (-3 -e -r 2 -m -b 512) - Fast though......
Title: lossyWAV Development
Post by: halb27 on 2007-12-05 08:11:39
...On my 53 sample set, this increases the bitrate from 407.5 (-8) to 413.4 (-3 -e -r 2 -m -b 512) - Fast though......

That's not negligible to me, but I hope that's due to the nature of your more or less problematic snippets set (guess that's still your 53 sample set). With full sized regular music as Mitch_1_2 said I expect the difference to be <1 kbps on average.

If somebody finds that on a real life sample set of several full length tracks difference is > 1 kbps please let us know. For getting the precise difference we can look at the total size of the files under consideration. I expect difference to be ~0.1%.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-05 08:18:23
...On my 53 sample set, this increases the bitrate from 407.5 (-8) to 413.4 (-3 -e -r 2 -m -b 512) - Fast though......
That's not negligible to me, but I hope that's due to the nature of your more or less problematic snippets set (guess that's still your 53 sample set). With full sized regular music as Mitch_1_2 said I expect the difference to be <1 kbps on average.

If somebody finds that on a real life sample set of several full length tracks difference is > 1 kbps please let us know. For getting the precise difference we can look at the total size of the files under consideration. I expect difference to be ~0.1%.
I'll run a "real-world" conversion test - the same as my previous set, abut 10 albums and revert.
Title: lossyWAV Development
Post by: halb27 on 2007-12-05 09:36:49
...
Jean Michel Jarre - Oxygene         / 773kbps / 454kbps / 372kbps / 377kbps
...So, overall an average of 850kbps / 410kbps / 350kbps / 351kbps

Thanks for your test, Nick. So in an overall sense FLAC -3 -m -e -r 2 is fine IMO, though it's quite interesting that with an album like Oxygene things aren't totally satisfying.
Do you mind trying FLAC -3 -m -e -r 3 and FLAC -3 -m -e on Oxygene?
Title: lossyWAV Development
Post by: Nick.C on 2007-12-05 10:24:44

...
Jean Michel Jarre - Oxygene         / 773kbps / 454kbps / 372kbps / 377kbps
...So, overall an average of 850kbps / 410kbps / 350kbps / 351kbps

Thanks for your test, Nick. So in an overall sense FLAC -3 -m -e -r 2 is fine IMO, though it's quite interesting that with an album like Oxygene things aren't totally satisfying.
Do you mind trying FLAC -3 -m -e -r 3 and FLAC -3 -m -e on Oxygene?
Apologies, using revised calculated constants for Reference_Threshold for beta v0.5.5, Oxygene has increased to 372kbps, and 5kbps increase with -3 -e -m -r 2 -b 512. I forgot I did the last comparison using a previous version. I will do it again with vanilla -3 / -8.

Artist - Album / FLAC / lossyFLAC -2 / lossyFLAC-3; lossyFLAC -3/-3 -e -m -r 2 -b 512;

Code: [Select]
AC/DC - Dirty Deeds Done Dirt Cheap    / 781kbps / 398kbps / 331kbps / 332kbps
B52's - Good Stuff                     / 993kbps / 408kbps / 361kbps / 362kbps
David Byrne - Uh-Oh                    / 937kbps / 398kbps / 344kbps / 345kbps
Fish - Songs From The Mirror           / 854kbps / 384kbps / 336kbps / 336kbps
Gerry Rafferty - City To City          / 802kbps / 400kbps / 338kbps / 338kbps
Iron Maiden - Can I Play With Madness  / 784kbps / 422kbps / 371kbps / 372kbps
Jean Michel Jarre - Oxygene            / 773kbps / 454kbps / 372kbps / 377kbps
Marillion - The Thieving Magpie        / 790kbps / 404kbps / 344kbps / 344kbps
Mike Oldfield - Tr3s Lunas             / 848kbps / 421kbps / 365kbps / 366kbps
Scorpions - Best Of Rockers N' Ballads / 922kbps / 421kbps / 354kbps / 354kbps
[/font]

So, overall an average of 850kbps / 410kbps / 351kbps / 351kbps

I'm not worried about one spurious result - Oxygene, after all, is a fairly specific type of music.
Title: lossyWAV Development
Post by: halb27 on 2007-12-05 11:33:43
...
So, overall an average of 850kbps / 410kbps / 351kbps / 351kbps

This matches perfectly my experience with -3 -e -m -r 2 -b 512 as well as that of Mitch 1 2 as of his post.
You're right: we shouldn't care too much about specific music, especially as the result isn't extraordinarily bad.

Thanks again for your test.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-05 22:29:10
Having listened to the comments on noise shaping, I had a look on wikipedia and found the basic principles.

As I already have a mechanism to store the difference between the original sample and the bit_removed sample, I have some of a noise shaping algorithm already in place.

The coefficients have so far eluded me.

One simple possibility that springs to mind is to start with zero at the codec block / channel start and then add the first difference then divide by two. Then add the next difference and divide by two. And so on.

We'll see how it sounds.
Title: lossyWAV Development
Post by: jesseg on 2007-12-06 00:11:28
[edit]
nasty 1st version logo removed, check the 1st post on the next page for the new one.
[/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-12-06 08:17:27
Having listened to the comments on noise shaping, I had a look on wikipedia and found the basic principles.

... I have some of a noise shaping algorithm already in place.

The coefficients have so far eluded me.

One simple possibility that springs to mind is to start with zero at the codec block / channel start and then add the first difference then divide by two. Then add the next difference and divide by two. And so on.

We'll see how it sounds.

You're moving on fast. Wonderful!
When considering noise shaping: into what frequency range do yo want to put the noise?
Title: lossyWAV Development
Post by: Nick.C on 2007-12-06 08:47:57
Having listened to the comments on noise shaping, I had a look on wikipedia and found the basic principles.

... I have some of a noise shaping algorithm already in place.

The coefficients have so far eluded me.

One simple possibility that springs to mind is to start with zero at the codec block / channel start and then add the first difference then divide by two. Then add the next difference and divide by two. And so on.

We'll see how it sounds.
You're moving on fast. Wonderful!
When considering noise shaping: into what frequency range do yo want to put the noise?
Ah, that's the problem - I don't yet know how to determine that.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-06 13:25:37
I've implemented a simplistic bit_removal noise shaping function, enabled with the -shaping parameter. As yet, I haven't re-calculated the reference_threshold values nor have I included any dithering - need advice from someone with more of a clue than myself. Going to re-read the wikipedia article and a PDF I found.

lossyWAV beta v0.5.6 attached. Superseded.

[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV beta v0.5.6 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-1            extreme settings [4xFFT] (-cbs 512 -nts -2.0 -skew 36 -snr 21
              -spf 22224-22225-11235-11246-12358 -fft 11011)
-2            default settings [3xFFT] (-cbs 512 -nts +1.5 -skew 36 -snr 21
              -spf 22224-22235-22346-12347-12358 -fft 10101)
-3            compact settings [2xFFT] (-cbs 512 -nts +6.0 -skew 36 -snr 21
              -spf 22235-22236-22347-22358-2246C -fft 10001)

Standard Options:

-o <folder>   destination folder for the output file
-nts <n>      set noise_threshold_shift to n dB (-48.0dB<=n<=+48.0dB)
              (-ve values reduce bits to remove, +ve values increase)
-force        forcibly over-write output file if it exists; default=off

Codec Specific Options:

-wmalsl       optimise internal settings for WMA Lossless codec; default=off

Advanced / System Options:

-shaping      enable fixed shaping using bit_removal difference of previous
              samples [value = brd(-1)/(2^1)+brd(-2)/(2^2)+...+brd(-n)/(2^n)];
              default=off
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (-215.0dB<=n<=48.0dB) Increasing value reduces bits to remove.
-skew <n>     skew fft analysis results by n dB (0.0db<=n<=48.0db) in the
              frequency range 20Hz to 3.45kHz
-spf <5x5hex> manually input the 5 spreading functions as 5 x 5 characters;
              These correspond to FFTs of 64, 128, 256, 512 & 1024 samples;
              e.g. 22235-22236-22347-22358-2246C (Characters must be one of
              1 to 9 and A to F (zero excluded).
-fft <5xbin>  select fft lengths to use in analysis, using binary switching,
              from 64, 128, 256, 512 & 1024 samples, e.g. 01001 = 128,1024
-cbs <n>      set codec block size to n samples (512<=n<=4608, n mod 32=0)

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

David Robinson for the method itself and motivation to implement it in Delphi.
Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]Please have a listen to this and let me know......
Title: lossyWAV Development
Post by: halb27 on 2007-12-06 13:55:50
...
When considering noise shaping: into what frequency range do yo want to put the noise?
Ah, that's the problem - I don't yet know how to determine that.

Well, I can't contribute other than with these thoughts:
Usually noise would be most welcome IMO to go into the frequency range > 16 kHz cause we're not sensitive there. But with our approach of nts +6 which means reduced control in the high frequency area this may be a bit dangerous. May be noise in the >18 kHz region could do it. But for the sake of tweeters may be it's wise to only do a mild noise shaping.
May be doing it the other way around (noise in the < 3 kHz range) is the better way to go cause we have a better control here due to the work of skew and snr and spf).

Anyway as you provided already details like snr and a positive nts value which have shown up to be very advantageous I have full confidence you will arrive at a good result.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-06 14:55:28
Anyway as you provided already details like snr and a positive nts value which have shown up to be very advantageous I have full confidence you will arrive at a good result.
I tried -3 -shaping -snr 18 -nts 15 and got 31.28MB / 352.9kbps for my 53 sample set - quite reasonable on my DAP.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-07 08:10:59
I tried -3 -shaping -snr 18 -nts 15 and got 31.28MB / 352.9kbps for my 53 sample set - quite reasonable on my DAP.
Artist - Album / FLAC / lossyFLAC -2 / lossyFLAC-3 / lossyFLAC -3 & FLAC -3 -e -m -r 2 -b 512 / lossyFLAC -3 -shaping -snr 21 -nts 15 & FLAC -3 -e -m -r 2 -b 512:

Code: [Select]
AC/DC - Dirty Deeds Done Dirt Cheap    / 781kbps / 398kbps / 331kbps / 332kbps / 294kbps
B52's - Good Stuff                     / 993kbps / 408kbps / 361kbps / 362kbps / 329kbps
David Byrne - Uh-Oh                    / 937kbps / 398kbps / 344kbps / 345kbps / 315kbps
Fish - Songs From The Mirror           / 854kbps / 384kbps / 336kbps / 336kbps / 306kbps
Gerry Rafferty - City To City          / 802kbps / 400kbps / 338kbps / 338kbps / 300kbps
Iron Maiden - Can I Play With Madness  / 784kbps / 422kbps / 371kbps / 372kbps / 334kbps
Jean Michel Jarre - Oxygene            / 773kbps / 454kbps / 372kbps / 377kbps / 316kbps
Marillion - The Thieving Magpie        / 790kbps / 404kbps / 344kbps / 344kbps / 307kbps
Mike Oldfield - Tr3s Lunas             / 848kbps / 421kbps / 365kbps / 366kbps / 322kbps
Scorpions - Best Of Rockers N' Ballads / 922kbps / 421kbps / 354kbps / 354kbps / 318kbps
[/font]

So, overall an average of 850kbps / 410kbps / 351kbps / 351kbps / 314kbps

Also, Mitch 1 2 has indicated that values of -nts in excess of 15 are acceptable while maintaining the -snr 21 value.
[edit] Using -3 -nts 48 -snr 21 -shaping, I get 363.6kbps on my 53 sample set. [/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-12-07 08:29:04
Sounds very interesting, but I'd really like to know a bit about where the noise goes.

I also have a problem about the target and how it fits into what we have so far. So far we have the quality targets -3, -2, and -1 which should all be transparent (-1 in an overkill sense, -2 in a sense with a certain but not overkill safety margin, -3 with only a minor safety margin).
What's the target when using noise-shaping?
Most important:
Do we still want transparency with a certain though small safety margin (equivalent to: should we use it as -3 with the current meaning of -3)?
Or do we want to have something like -4 which should transparent nearly any time but is allowed to be not transparent though only in an acceptable way on rare occassion?
Or should we use the meaning I just described for a potential -4 for our final -3, and readjust the internal details of -2 and -1 so that the new -2 is somewhere between the current -3 and -2, and the new -1 is somewhere between the current -2 and -1?
Title: lossyWAV Development
Post by: Nick.C on 2007-12-07 09:02:01
Sounds very interesting, but I'd really like to know a bit about where the noise goes.

I also have a problem about the target and how it fits into what we have so far. So far we have the quality targets -3, -2, and -1 which should all be transparent (-1 in an overkill sense, -2 in a sense with a certain but not overkill safety margin, -3 with only a minor safety margin).
What's the target when using noise-shaping?
Most important:
Do we still want transparency with a certain though small safety margin (equivalent to: should we use it as -3 with the current meaning of -3)?
Or do we want to have something like -4 which should transparent nearly any time but is allowed to be not transparent though only in an acceptable way on rare occassion?
Or should we use the meaning I just described for a potential -4 for our final -3, and readjust the internal details of -2 and -1 so that the new -2 is somewhere between the current -3 and -2, and the new -1 is somewhere between the current -2 and -1?
I know what you mean about wanting to know where the noise goes.

I am hoping that you will / have had a play about with the -shaping parameter and also -snr / -nts to find a good compromise.

If we can get transparency using -shaping on the existing problem samples below current -3 bitrate, then I think that we should revise -3. If not then I would not be averse to the introduction of a carefully crafted -4 quality setting which would be "very-nearly-transparent-on-problem-samples" if there was a noticable reduction in bitrate compared to the existing -3.
Title: lossyWAV Development
Post by: Mitch 1 2 on 2007-12-07 10:03:53
@halb27:

Would you mind testing out lossyWAV -3 -nts 48 (without noise shaping) on your test set? Casually listening, I've found that I can't hear any difference between files processed with this setting and the originals. Nick also found that he couldn't hear the difference. The -snr 21 default setting seems to be preventing audible distortion, even with the maximum nts value.
Title: lossyWAV Development
Post by: halb27 on 2007-12-07 11:18:27
@halb27:

Would you mind testing out lossyWAV -3 -nts 48 (without noise shaping) on your test set? Casually listening, I've found that I can't hear any difference between files processed with this setting and the originals. Nick also found that he couldn't hear the difference. The -snr 21 default setting seems to be preventing audible distortion, even with the maximum nts value.

Yes I will.
'Casually listening' ? That was my very question about what shall we target at.
I hold it back at the moment, and just try and see what it sounds like.
-nts 48 is a huge value though even without noise shaping -nts 20 isn't real bad except for bad samples.
Title: lossyWAV Development
Post by: halb27 on 2007-12-07 18:58:50
@halb27:

Would you mind testing out lossyWAV -3 -nts 48 (without noise shaping) on your test set? Casually listening ...

OOPs, I didn't read carefully: didnt read 'without noise shaping'.
Anyway I did listen carefully to my regular sample set using -3 -nts 48, and to me too quality is ok. I even did some abxing on several spot and couldn't find a difference.
But: it's different with spots that are hard to encode. I proved that already for -nts 30 and -nts 20. And my regular set yielded 320 kbps with -3 -nts 48.
When allowing for really bad results though on rare occasion we're better off using vorbis, aac, mpc, and mp3 in the 200- kbps range.

Anyway I'll test the -shaping version.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-07 19:57:44
@halb27:

Would you mind testing out lossyWAV -3 -nts 48 (without noise shaping) on your test set? Casually listening ...
OOPs, I didn't read carefully: didnt read 'without noise shaping'.
Anyway I did listen carefully to my regular sample set using -3 -nts 48, and to me too quality is ok. I even did some abxing on several spot and couldn't find a difference.
But: it's different with spots that are hard to encode. I proved that already for -nts 30 and -nts 20. And my regular set yielded 320 kbps with -3 -nts 48.
When allowing for really bad results though on rare occasion we're better off using vorbis, aac, mpc, and mp3 in the 200- kbps range.

Anyway I'll test the -shaping version.
I don't think that we want to go into the <300kbps range for normal music - there's plenty of good quality competition there. Thinking about it, I don't really want to implement a -4 quality setting if it's going to let through artefacts. Going back to the beginning, the stated aim is transparency for all quality settings.

That said, I've been playing with -3 -shaping -nts 18 -snr 18 and I can't notice any problems at all.

Maybe a reasonable target for a -shaping setting would be a bitrate slightly below the existing -3 setting.
Title: lossyWAV Development
Post by: halb27 on 2007-12-07 21:17:18
I tried -shaping -snr 18 -nts 15.
My regular sample set was encoded with an average bitrate of 308 kbps, and for my problem sample essence it was 425 kbps.
With the problem sample set I can easily abx keys_1644ds (9/10). I have also the suspicionthat furious and utb aren't totally transparent, but I'm not the one who can prove it.
So far not so bad.

Then I decided to listen to some regular music, and it was with the very first track (Blackbird, Yesterday from The Beatles: Love, sec. 31.2-34.4) that didn't sound fine to me. I tried to abx it and got at 7/7, then 8/10.
It's an inaccuracy with the voice, so I don't think noise shaping moves noise into the high frequency range.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-07 21:21:10
I tried -shaping -snr 18 -nts 15.
My regular sample set was encoded with an average bitrate of 308 kbps, and for my problem sample essence it was 425 kbps.
With the problem sample set I can easily abx keys_1644ds (9/10). I have also the suspicionthat furious and utb aren't totally transparent, but I'm not the one who can prove it.
So far not so bad.

Then I decided to listen to some regular music, and it was with the very first track (Blackbird, Yesterday from The Beatles: Love, sec. 31.2-34.4) that didn't sound fine to me. I tried to abx it and got at 7/7, then 8/10.
It's an inaccuracy with the voice, so I don't think noise shaping moves noise into the high frequency range.
Okay, back to the drawing board with -shaping then.... I'll need to research fixed shaping coefficients. Thanks for the listening time.

[edit] lossyWAV beta v0.5.7 attached: Superseded. modified (even simpler) noise shaping feedback function. [/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-12-08 08:00:59
My results for v0.5.7 -snr 18 -nts 15 -shaping:

My regular sample set gets at a very good 293 kbps.
The Blackbird, Yesterday problem has gone for me, and I don't have a suspicion on furious any more.
However I abxed utb 7/8 (and  ended up 7/10), eig 7/10, and bruhns  9/10. There's a rather strong inaccuracy with bruhns at ~sec. 7.

Other than that I agree with you, Nick, that we should have only 3 quality parameters, and -3 should be transparent from the best of our experience when we go final. In case we should really get at a final average bitrate of ~300 kbps for -3 I personally don't have the demand for talking about a security margin with -3.
In case we should really arrive at that bitrate for -3 we should readjust -2 and -1 IMO: -2 being near current -3 but a little more demanding, and -1 being more where -2 is now (but definitely with nts <= 0).

Just an idea: you do a static noise shaping right now, and the noise shaping machinery is supposed to be simple, that is shifts noise up or down in frequency. In case of shifting up: wouldn't it be more or less equivalent (or may be at least a clearer approach) to allow for a weakened noise threshold tn the 12+ kHz range?
Title: lossyWAV Development
Post by: Nick.C on 2007-12-08 10:16:31
My results for v0.5.7 -snr 18 -nts 15 -shaping:

My regular sample set gets at a very good 293 kbps.
The Blackbird, Yesterday problem has gone for me, and I don't have a suspicion on furious any more.
However I abxed utb 7/8 (and  ended up 7/10), eig 7/10, and bruhns  9/10. There's a rather strong inaccuracy with bruhns at ~sec. 7.

Other than that I agree with you, Nick, that we should have only 3 quality parameters, and -3 should be transparent from the best of our experience when we go final. In case we should really get at a final average bitrate of ~300 kbps for -3 I personally don't have the demand for talking about a security margin with -3.
In case we should really arrive at that bitrate for -3 we should readjust -2 and -1 IMO: -2 being near current -3 but a little more demanding, and -1 being more where -2 is now (but definitely with nts <= 0).

Just an idea: you do a static noise shaping right now, and the noise shaping machinery is supposed to be simple, that is shifts noise up or down in frequency. In case of shifting up: wouldn't it be more or less equivalent (or may be at least a clearer approach) to allow for a weakened noise threshold tn the 12+ kHz range?
I still don't really know where the noise is going. If you could try -3 -shaping -snr 21 -nts 15, I feel that this may be better. Selective -nts parameter for the bin in which the min value is found is possible to implement - I'll have a think and revert tonight.
Title: lossyWAV Development
Post by: [JAZ] on 2007-12-08 13:12:17
I went a bit lost lately with the addition of "noise shaping", so i'm going to give my thoughts, in case any of them is good:

Usual objective of noise shaping: reduce the effect (noise) of a produced artifact (usually when applying dither), changing (shaping) it from white noise (flat) to a curve that is less perceptible.

lossyWav tries to reduce the bitdepth of a portion of audio, so that the lossless encoder can benefit and reduce the bitrate demands.
Right now, lossywav works like: It runs different FFT's to verify the requirements at different resolutions, can define a minimum signal to (quantization) noise margin and has the skew function to correct a misinterpretation of the FFT analisys. I have to recognize that i don't completely know what the -spf function does (does it affect the analisys, or the generated audio?), and now has the noise shaping function to reduce the artifacts on some bad cases.

Context of noise shaping within lossyWav: Lossywav artefacts are the consequence of a reduced SNR, caused by the bitdepth reduction. This translates to quantization of a signal, and possibly should be applying dithering to it. This means, then, applying noise shaping in the context of dither, and as such, noise shaping determines where the dither noise goes.

Consequences: Two things to have in mind:
a) noise shaping uses the fact that we're less sensistive to higher frequencies, but the lower the bitdetph, the lower the SNR is.
b) noise is generally hard to compress losslessly, and more so, in the higher frequencies.

From the above: it should be used only where the engine detects that the lowest signal (in the block being processed) compared to the quantization level is too low, and assuring the bitdepth is not too small (i recall reading here that applying noise shaping to 8bits is already not recommended).


Conclusion: I believe that right now you are only doing dither, not noise shaping. Shaping is the output of a filter, with white noise as input, similar to a notchband. (at least, the way i understand it). You should find if the problems you're trying to fix are really on soft signals, or in strong signals.
If the latter, then the problem really is too small bitdepth, and there you should not apply noise shaping.
Title: lossyWAV Development
Post by: SebastianG on 2007-12-08 14:00:40
Okay, back to the drawing board with -shaping then.... I'll need to research fixed shaping coefficients. Thanks for the listening time.

Ok, I see there's an increased interest in using noise shaping. I'm not sure whether I understand what you are trying to do and why -- and please forgive me for not following the discussion to closely. To be honest, it looks a bit like groping in the dark. In case you have an idea of what "noise shaping" is actually supposed to do in your case I might be able to help you show how it could be implemented. In case you don't you might wanna try my very first suggestion (see the first page of 2B's lossy flac thread).

Cheers!
SG
Title: lossyWAV Development
Post by: Nick.C on 2007-12-08 14:49:51
Okay, back to the drawing board with -shaping then.... I'll need to research fixed shaping coefficients. Thanks for the listening time.
Ok, I see there's an increased interest in using noise shaping. I'm not sure whether I understand what you are trying to do and why -- and please forgive me for not following the discussion to closely. To be honest, it looks a bit like groping in the dark. In case you have an idea of what "noise shaping" is actually supposed to do in your case I might be able to help you show how it could be implemented. In case you don't you might wanna try my very first suggestion (see the first page of 2B's lossy flac thread).

Cheers!
SG
I'd be the first to admit that I'm groping in the dark when it comes to noise shaping. Thanks for the reminder about your post on the first page of the original thread. I'll go and re-read and try to interpret / formulate an algorithm to enable implementation.

I really do need help with noise shaping, I'm a noob when it comes to audio processing - the offer is very welcome SebastianG!

@[JAZ]: Nothing apart from the window function (Hanning) affects the FFT analyses themselves. Any other parameters, i.e. -skew, -spf, -snr & -nts, modify the process of taking the FFT output and working out the "lowest" signal for that particular analysis. At the moment, there is no dither in the process at all, only rounding on bit-reduction.

What I'm really looking for is, as has been said above, a method of shifting the noise to the >16kHz band.

Any aid in comprehending this difficult topic would be greatly appreciated.
Title: lossyWAV Development
Post by: halb27 on 2007-12-08 20:27:53
We have a problem.

I tried v0.5.7 -snr 21 -nts 15 -shaping according to your proposal, Nick. It yields 318 kbps with my regular set and 444 kbps with my problem essence set.
I listened to the beginning of Blackbird, Yesterday, and quality was very fine to me.
I started to try to abx the problems from my last test, and used eig as the first example. abx result was 9/10.

For a comparison I also tried to abx plain -3. Now used to the kind of problem (smearing, kind of an echo) I was able to abx plain -3 9/10 as well, though to me quality is better.

I think we should fix this before continuing the noise shaping way.
I am not very sensitive to temporal resolution problems, so it would be very kind if somebody could help testing lossyWAV on samples known to be pre-echo prone to mp3 etc.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-08 21:58:05
We have a problem.

I tried v0.5.7 -snr 21 -nts 15 -shaping according to your proposal, Nick. It yields 318 kbps with my regular set and 444 kbps with my problem essence set.
I listened to the beginning of Blackbird, Yesterday, and quality was very fine to me.
I started to try to abx the problems from my last test, and used eig as the first example. abx result was 9/10.

For a comparison I also tried to abx plain -3. Now used to the kind of problem (smearing, kind of an echo) I was able to abx plain -3 9/10 as well, though to me quality is better.

I think we should fix this before continuing the noise shaping way.
I am not very sensitive to temporal resolution problems, so it would be very kind if somebody could help testing lossyWAV on samples known to be pre-echo prone to mp3 etc.
Bummer...... I agree that -shaping may need to wait until we have ironed out the problems with this newly discovered artefact. It may be possible that it can be dealt with using existing settings, however I am beginning to think that removing -dither was not the best idea that I've ever had.

Out of interest, how does -3 -shaping sound with this smearing sample? [edit2] More importantly, does the smearing exist at vanilla -2? [/edit2]

[edit] And a (very) belated thankyou to jesseg for taking the time to come up with an icon. I love the rows of bits - the font is a bit too curly for my taste, but I'll go with a consensus opinion. I still haven't worked out how to change the default console icon in Delphi though...... [/edit]
Title: lossyWAV Development
Post by: halb27 on 2007-12-09 11:27:39
v0.5.7 -2: eig is fine to me.
v0.5.7 -3 -nts 0: some partial and the final result: 4/5, 5/7, 6/8, 6/10. So though I couldn't abx it according to the final result it is expected to be not ransparent. But as the effect is extremely subtle to me, and respecting the very special nature of this artificial sample, I personally can live with it for -3. But a lot more experience with potentially temporal resolution problems is most welcome (for instance with the castanets sample and other percussion instruments).
v0.5.7 -3 -shaping: eig is ok to me, but testing bruhns I got at 9/10. As the problem to me sounds like an artefact in the HF range I guess shifting noise up is working.

I don't want to spoil the party, but for the next period of time (never say never) I'm not in the mood of testing noise shaping.

As for the classical non-noise-shaping we should try to work out -nts, -skew, -snr, -fft, and -spf default values by moving the current values to more defensive ones and make samples like eig sound transparent.
I'll take my part in this.
Title: lossyWAV Development
Post by: sundance on 2007-12-09 13:01:16
Nick.C,
Quote
I still haven't worked out how to change the default console icon in Delphi though......

In Delphi 7 it's "Project | Options | Application" to apply a custom symbol. Hope you'll find it since this is from a German version of Delphi 7.

.sundance.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-09 13:10:55
Nick.C,
Quote
I still haven't worked out how to change the default console icon in Delphi though......

In Delphi 7 it's "Project | Options | Application" to apply a custom symbol. Hope you'll find it since this is from a German version of Delphi 7.

.sundance.
Thanks sundance - I'll try that this evening.

@Halb27: I think there might me some benefit in reducing the C at the end of the 1024 fft spf to, say, 9, to reduce the number of bins being averaged.

It may be that a more conservative approach to HF spreading will allow -shaping to become more useful.
Title: lossyWAV Development
Post by: jesseg on 2007-12-10 06:24:32
well, i got bored again...



[edit]
click here to see it on different colored backgrounds (http://ictybtihky.com/lossywav/)
all of those are actually the same exact PNG file as the one i put in this post. 
[/edit]

[edit2]
here's the logo, "naked", if you wanna see it alone.

(http://ictybtihky.com/lossywav/logo2_2_bare.png)

[/edit2]
Title: lossyWAV Development
Post by: halb27 on 2007-12-10 09:00:26
...@Halb27: I think there might me some benefit in reducing the C at the end of the 1024 fft spf to, say, 9, to reduce the number of bins being averaged.

IMO that's the right direction, and I did first trials, but not with the 1024 sample FFT but with the 64 sample FFT the resolution of which is fine IMO for judging about the highest frequency zones and which has a good time resolution which may be essential for samples like eig. So far I've seen the second highest frequency zone is most important for eig. 22225 yields quite a good though not perfectly transparent result. I'm pretty busy now but I'll try whether 22224 (as of -2) will improve things. But I guess we'll also have to come down from -nts +6 a bit. We'll see.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-10 09:34:34
...@Halb27: I think there might me some benefit in reducing the C at the end of the 1024 fft spf to, say, 9, to reduce the number of bins being averaged.
IMO that's the right direction, and I did first trials, but not with the 1024 sample FFT but with the 64 sample FFT the resolution of which is fine IMO for judging about the highest frequency zones and which has a good time resolution which may be essential for samples like eig. So far I've seen the second highest frequency zone is most important for eig. 22225 yields quite a good though not perfectly transparent result. I'm pretty busy now but I'll try whether 22224 (as of -2) will improve things. But I guess we'll also have to come down from -nts +6 a bit. We'll see.
I've tried -3 -spf 22234-22235-22346-22357-22468 and it raises the bitrate to 412.3kbps for my sample set. It takes about 0.1bits off the number removed from eig.
Title: lossyWAV Development
Post by: halb27 on 2007-12-10 11:48:34
I've tried -3 -spf 22234-22235-22346-22357-22468 and it raises the bitrate to 412.3kbps for my sample set. It takes about 0.1bits off the number removed from eig.

My trial yesterday was with -3 -spf 22225-22235-22346-22357-224FF and bits to remove for eig went down significantly (~ 1 bit in the critical first seconds). So I think the 2nd highest frequency zone is essential here, maybe the highest zone as well. Average bitrate of regular music did not go up significantly btw.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-10 14:02:29
I've tried -3 -spf 22234-22235-22346-22357-22468 and it raises the bitrate to 412.3kbps for my sample set. It takes about 0.1bits off the number removed from eig.

My trial yesterday was with -3 -spf 22225-22235-22346-22357-224FF and bits to remove for eig went down significantly (~ 1 bit in the critical first seconds). So I think the 2nd highest frequency zone is essential here, maybe the highest zone as well. Average bitrate of regular music did not go up significantly btw.
You're right - it does indeed bring down eig quite a lot. How about a combination: 22225-22235-22346-22357-22468? This yields 414.0kbps for my 53 sample set.
Title: lossyWAV Development
Post by: halb27 on 2007-12-10 20:36:01
I tried eig again using -3 -spf 22224-22236-22347-22358-2246C thus being more demanding with the two highest frequency zones at the 64 sample FFT. Now I can't abx eig any more.
But this is pretty much on the cutting edge for my listening experience, and I'm sure there are a lot of people out there with a better sensitivity towards temporal resolution problems. So I suggest we reduce the positive nts values and use -nts +3 for -3, and -nts 0 for -2.

-3 -spf 22224-22236-22347-22358-2246C -nts 3 yields 375 kbps with my regular set, -2 -nts 0 yields 422 kbps.

To me this is still a very good result not far away from the rsults of the current setting, and it brings us to a considerable amount back to the solid basis where in theory -nts should be 0.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-10 21:23:27
I tried eig again using -3 -spf 22224-22236-22347-22358-2246C thus being more demanding with the two highest frequency zones at the 64 sample FFT. Now I can't abx eig any more.
But this is pretty much on the cutting edge for my listening experience, and I'm sure there are a lot of people out there with a better sensitivity towards temporal resolution problems. So I suggest we reduce the positive nts values and use -nts +3 for -3, and -nts 0 for -2.

-3 -spf 22224-22236-22347-22358-2246C -nts 3 yields 375 kbps with my regular set, -2 -nts 0 yields 422 kbps.

To me this is still a very good result not far away from the rsults of the current setting, and it brings us to a considerable amount back to the solid basis where in theory -nts should be 0.
I will happily agree the -spf parameter for -3.

As you can't abx any of the problem samples using -3 -spf 22224-22236-22247-22358-2246C -nts +6.0, I feel that a reduction of 3dB (0.5 bits potentially) for the -nts parameter is a bit too much.

-3 -spf 22224-22236-22347-22358-2246C -nts +3.0 results in 433.5kbps for my sample set and changing the +3.0 to  +4.5 results in 422.7kbps for my sample set.

So, I suggest we use -spf 22224-22236-22347-22358-2246C -nts +4.5 for quality preset -3.
Title: lossyWAV Development
Post by: halb27 on 2007-12-10 21:49:08
As you can't abx any of the problem samples using -3 -spf 22224-22236-22247-22358-2246C -nts +6.0, I feel that a reduction of 3dB (0.5 bits) for the -nts parameter is a bit too much.

-3 -spf 22224-22236-22347-22358-2246C -nts +3.0 results in 433.5kbps for my sample set and changing the +3.0 to  +4.5 results in 422.7kbps for my sample set.

So, I suggest we use -spf 22224-22236-22347-22358-2246C -nts +4.5 for quality preset -3.

For a real-life impression why don't you take a restricted selection of full length tracks? If bitrate goes up for problematic tracks like those in your sample set this is welcome. It's not so welcome of course with regular music.
My regular set consists of just 12 full length tracks of various musical direction so I can get at an impression very fast. I know from posted experience that my 375 kbps result is a bit low compared to other musical mixtures reported, but the difference isn't a big one. From this experience I think it's safe to say when the result of my regular set is 375 kbps then average bitrate is ~380 kbps.
Sure ~380 kbps is a bit more than the ~350 kbps of the current -3 setting, but it's not by much IMO.
If it's up to decide between -nts +3 and -nts +4.5 the difference is even smaller.
The reason why I dislike a rather small lowering +6 to +4.5 is that I do think that small -nts steps don't have a significant effect. This is a bit due to my listening experience when using insane positive nts values.
So I think in order to have a significant quality effect we shouldn't consider a delta lower than 3 for nts.
Not for the sake of saving ~15 kbps.
Of course this is because if in doubt I want to play it safe. Just my attitude towards it.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-10 21:59:35
As you can't abx any of the problem samples using -3 -spf 22224-22236-22247-22358-2246C -nts +6.0, I feel that a reduction of 3dB (0.5 bits) for the -nts parameter is a bit too much.

-3 -spf 22224-22236-22347-22358-2246C -nts +3.0 results in 433.5kbps for my sample set and changing the +3.0 to  +4.5 results in 422.7kbps for my sample set.

So, I suggest we use -spf 22224-22236-22347-22358-2246C -nts +4.5 for quality preset -3.

For a real-life impression why don't you take a restricted selection of full length tracks? If bitrate goes up for problematic tracks like those in your sample set this is welcome. It's not so welcome of course with regular music.
My regular set consists of just 12 full length tracks of various musical direction so I can get at an impression very fast. I know from posted experience that my 375 kbps result is a bit low compared to other musical mixtures reported, but the difference isn't a big one. From this experience I think it's safe to say when the result of my regular set is 375 kbps then average bitrate is ~380 kbps.
Sure ~380 kbps is a bit more than the ~350 kbps of the current -3 setting, but it's not by much IMO.
If it's up to decide between -nts +3 and -nts +4.5 the difference is even smaller.
The reason why I dislike a rather small lowering +6 to +4.5 is that I do think that small -nts steps don't have a significant effect. This is a bit due to my listening experience when using insane positive nts values.
So I think in order to have a significant quality effect we shouldn't consider a delta lower than 3 for nts.
Not for the sake of saving ~15 kbps.
Of course this is because if in doubt I want to play it safe. Just my attitude towards it.
Tomorrow morning I'll process the 10 albums previously used for bitrate comparison using your proposal for the -3 quality preset.

I had a thought - if we end up with, say, 380kbps then that's still a bit less than OGG q 10 (circa 400kbps) and I'm not worried as it will be only about 60kbps above the upper bitrate limit for standard MP3.

I would be content with that.

Overall I am more concerned with the quality of the processed output than I am with the bitrate.

As an aside, I recently ordered a 16GB compact flash (to go with my 3 x 4GB SD Cards) for my iPAQ - lots more space for my .lossy.FLAC collection  ! Combined with Mortplayer using GSPFLAC.DLL it's working well.

[edit] Corrected OGG max bitrate [/edit]
Title: lossyWAV Development
Post by: Nick.C on 2007-12-11 14:02:49
I'll process the 10 albums previously used for bitrate comparison using your proposal for the -3 quality preset. ........
lossyWAV beta v0.5.8 attached: Superseded.

Modified -1, -2 & -3 quality presets.[!--sizeo:1--][span style=\"font-size:8pt;line-height:100%\"][!--/sizeo--]
Code: [Select]
lossyWAV beta v0.5.8 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-1            extreme settings [4xFFT] (-cbs 512 -nts -3.0 -skew 36 -snr 21
              -spf 22224-22225-11235-11246-12358 -fft 11011)
-2            default settings [3xFFT] (-cbs 512 -nts  0.0 -skew 36 -snr 21
              -spf 22224-22235-22346-12347-12358 -fft 10101)
-3            compact settings [2xFFT] (-cbs 512 -nts +3.0 -skew 36 -snr 21
              -spf 22224-22235-22347-22358-2246C -fft 10001)

Standard Options:

-o <folder>   destination folder for the output file
-nts <n>      set noise_threshold_shift to n dB (-48.0dB<=n<=+48.0dB)
              (-ve values reduce bits to remove, +ve values increase)
-force        forcibly over-write output file if it exists; default=off

Codec Specific Options:

-wmalsl       optimise internal settings for WMA Lossless codec; default=off

Advanced / System Options:

-shaping      enable fixed shaping using bit_removal difference of previous
              samples [value = brd(-1)/4]; default=off
-snr <n>      set minimum average signal to added noise ratio to n dB;
              (-215.0dB<=n<=48.0dB) Increasing value reduces bits to remove.
-skew <n>     skew fft analysis results by n dB (0.0db<=n<=48.0db) in the
              frequency range 20Hz to 3.45kHz
-spf <5x5hex> manually input the 5 spreading functions as 5 x 5 characters;
              These correspond to FFTs of 64, 128, 256, 512 & 1024 samples;
              e.g. 22235-22236-22347-22358-2246C (Characters must be one of
              1 to 9 and A to F (zero excluded).
-fft <5xbin>  select fft lengths to use in analysis, using binary switching,
              from 64, 128, 256, 512 & 1024 samples, e.g. 01001 = 128,1024
-cbs <n>      set codec block size to n samples (512<=n<=4608, n mod 32=0)

-quiet        significantly reduce screen output
-nowarn       suppress lossyWAV warnings
-detail       enable detailled output mode

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

David Robinson for the method itself and motivation to implement it in Delphi.
Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.
Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.
[/size]Summary of bitrates for 10 album test set.
Code: [Select]
 Conversion using lossyWAV beta v0.5.8, FLAC -8
|=======================================|=========|=========|=========|=========|
|Album                                  | FLAC -8 |  lW -1  |  lW -2  |  lW -3  |
|=======================================|=========|=========|=========|=========|
|AC/DC - Dirty Deeds Done Dirt Cheap    | 781kbps | 468kbps | 417kbps | 366kbps |
|B52's - Good Stuff                     | 993kbps | 476kbps | 421kbps | 376kbps |
|David Byrne - Uh-Oh                    | 937kbps | 464kbps | 413kbps | 363kbps |
|Fish - Songs From The Mirror           | 854kbps | 451kbps | 399kbps | 357kbps |
|Gerry Rafferty - City To City          | 802kbps | 468kbps | 416kbps | 366kbps |
|Iron Maiden - Can I Play With Madness  | 784kbps | 486kbps | 437kbps | 387kbps |
|Jean Michel Jarre - Oxygene            | 773kbps | 538kbps | 475kbps | 422kbps |
|Marillion - The Thieving Magpie        | 790kbps | 473kbps | 421kbps | 373kbps |
|Mike Oldfield - Tr3s Lunas             | 848kbps | 491kbps | 436kbps | 389kbps |
|Scorpions - Best Of Rockers N' Ballads | 922kbps | 492kbps | 437kbps | 378kbps |
|=======================================|=========|=========|=========|=========|
|Average                                | 850kbps | 480kbps | 426kbps | 376kbps |
|=======================================|=========|=========|=========|=========|
|53 sample "problem" set                | 784kbps | 543kbps | 491kbps | 434kbps |
|=======================================|=========|=========|=========|=========|
Title: lossyWAV Development
Post by: halb27 on 2007-12-11 14:50:32
Thank you, Nick.
Title: lossyWAV Development
Post by: UED77 on 2007-12-11 17:40:21
For all of the developers involved with this project, I'd first like to compliment you for coming up with such a wonderful idea. I've been tracking this thread for a while, and am greatly impressed by the progress

My question relates to the identification of LossyWAV files. I'm sure it's a lot of [unnecessary?] hassle, but it might be advantageous to somehow note in a RIFF chunk the fact that the file is a LossyWAV file, and perhaps include a note of the settings used to create the file. APEv2 tags are also an option, though I hear [I have not confirmed this personally] that some software has trouble with APE tags at the end of RIFF files. RIFF mechanisms would be preferable as this information would be stored by some codes (WavPack for sure, others?) and could be re-pasted into a file when uncompressed.

Yes, I am aware that lossily compressed audio can be decompressed to WAV files without the WAV file being tagged in any special way, but if some tagging mechanism that would be performed by LossyWAV were in effect, it would be immediately obvious that the file is not losslessly compressed.

Then again, if it's more trouble that it is worth, then forget it, but if it's doable, then it would be a nice feature to have. In an ideal world, then lossless encoding tools could even read this header/footer/info tag and adjust blocksizes accordingly, allowing the user to get efficient results without personally knowing that the file has been pre-processed.

Regards,

UED77

[Edit: I just browsed back a couple pages to some posts I've previously missed and spotted a discussion about the possibility of including checksums. Unfortunately, I found the response to that a bit complicated to understand, so I don't know if it was ruled feasible or not.]
Title: lossyWAV Development
Post by: Nick.C on 2007-12-11 22:11:14
.... but it might be advantageous to somehow note in a RIFF chunk the fact that the file is a LossyWAV file, and perhaps include a note of the settings used to create the file....

....spotted a discussion about the possibility of including checksums....
Thanks for the input and appreciation. We're having fun with this project !

When I work out how the WAV format works (Halb27 did all the difficult bit by writing the WAV I/O routines) I'll try to add a check for a relevant chunk with lossyWAV data in it and if none exists, create one with something like "lossyWAV <version> <quality setting> <list of other settings that were actually used> <CRC32 of output samples>"

I've just removed any reliance on knowing how large the file is (with a view to piped input, though how I pipe in from, say, Foobar2000 then pipe out to, say, FLAC, I haven't a clue as yet. Could it be as simple "lossywav - <quality setting> | flac -8 - -o<output_filename>"?) from the code.

Another thing on the list of things to do is to allow the parallel creation of a correction file, presumably with the same RIFF chunk in it, to allow recreation of the lossless original.

Thanks again,

Nick.
Title: lossyWAV Development
Post by: jesseg on 2007-12-12 02:56:41
bug:
Title: lossyWAV Development
Post by: Nick.C on 2007-12-16 10:20:10
I had a PM from stel with some concerns:

Quote
I've come across what I think is a problem sample. Its a bit strange in that the sample seems to play OK on one of my DAPs, but I can hear sound distortion/ sound breaking up in places when played on the other DAP. What puzzles me more is that the original FLAC plays without a problem on both DAPs and this is what leads me to believe its an encoder problem. I first spotted the issue on beta 5.4 but I've just tried it on 5.8 and get the same results.

The DAPS are both rockboxed Sansa E280 & iAudio M5 and the earphones are Shure SE530. I can only hear the problem sample on the M5. The sample is 'Groove Armada - Soundboy Rock - Lightsonic' and although I hear issues throughout the track I've noted that it definitely happens at 4.30sec, the average bitrate for this track is 528kbps. when encoded using the standard -3 lossyWAV settings.

Even more annoying is that I cannot hear the distortion on my PC using an AMP & Sennheiser HD650's either.

I've also come across the problem on a different album but I need to dig this out again because I forgot to take a note when it happened.

Are you interested in investigating this further? What would you need?

I've also come across two samples by 'Shakespears Sister\Long Live The Queens!' album where the average bitrate for -3 encoding is 696kbps & 711kbps. Could these prove useful to you?
If anyone else has had any experience of this, please add samples of up to 30 seconds of the portion of the track in question to this thread.

Many thanks,

Nick.
Title: lossyWAV Development
Post by: stel on 2007-12-16 19:24:21
Sorry for not posting these sooner. I've been out all day.
I've attached 10 second samples. The issue happens 5 seconds in when encoding the sample.flac file.
Sorry, the M5_issue.flac isn't great quality but you can clearly hear the issue I'm experiencing.
If you need any more info, give me a shout.
I'm going to try and find the other track I've got problems with.

Edit: Oh no, look at the post number... I'm not the devil, honest

Thanks
Steve
Title: lossyWAV Development
Post by: Nick.C on 2007-12-16 21:46:20
Sorry for not posting these sooner. I've been out all day.
I've attached 10 second samples. The issue happens 5 seconds in when encoding the sample.flac file.
Sorry, the M5_issue.flac isn't great quality but you can clearly hear the issue I'm experiencing.
If you need any more info, give me a shout.
I'm going to try and find the other track I've got problems with.

Edit: Oh no, look at the post number... I'm not the devil, honest

Thanks
Steve
Thanks for the samples - I'll have a listen....

Just a thought, but which FLAC setting are you using? It appears to me that -8 will require more CPU than, say, -3. Halb27 and Mitch 1 2 found that -3 -e -m -r 2 -b 512 works really well (only about 1kbps difference from -8), and probably takes less effort to decode.
Title: lossyWAV Development
Post by: halb27 on 2007-12-16 21:52:23
Thanks for your sample, stel.

I can hear the distortion with your M5 sample at ~ sec.5, but not when encoding sample.flac myself using lossyWAV -3 - the same experience you have on your computer.

How can we figure out whether it's an encoder problem (the fact that lossless FLAC works fine sounds like that) or a problem specific to the iAudio M5 (the fact that on a computer and on your other DAP there is no problem sounds like that)?

I suggest you do some other encodings using -2, -1, -1 -nts 6, -1 -nts 9, -1 -nts 12, ... and report what happens.
There's a lot of clipping in this sample. Can you reduce the volume a bit by using for instance a wav editor, and have a look whether the problem remains?

Nick, I guess when using -detail if you report a -1 as the number of bits removed this means no bit is removed due to clipping prevention (or does it say: due to clipping prevention not all the bits have been removed that could have been removed if we wouldn't use clipping prevention)?
Can you imagine there is a problem when no bit is removed due to clipping prevention in one block, and ~10 bits are removed in the next block?
Title: lossyWAV Development
Post by: Nick.C on 2007-12-16 22:04:40
Nick, I guess when using -detail if you report a -1 as the number of bits removed this means no bit is removed due to clipping prevention (or does it say: due to clipping prevention not all the bits have been removed that could have been removed if we wouldn't use clipping prevention)?
Can you imagine there is a problem when no bit is removed due to clipping prevention in one block, and ~10 bits are removed in the next block?
It *shouldn't* report -1 bits removed *ever*. That means there's a bug in that bit of the code.

On your point about 0 btr in one codec_block then 10 btr in the next, I don't really know.....
Title: lossyWAV Development
Post by: halb27 on 2007-12-16 23:02:36
It *shouldn't* report -1 bits removed *ever*. That means there's a bug in that bit of the code.

There are a lot of -1's when using -3 -detail.
On your point about 0 btr in one codec_block then 10 btr in the next, I don't really know.....

Just my thought when looking at the -detail report interpreting '-1' as '0 btr'.
Title: lossyWAV Development
Post by: stel on 2007-12-17 07:28:28
Thanks for your replies gents,
Nick, I'm using flac settings -3 -m -e -r 2 -b 512 at the moment, but I had the same problem using -5 & -8

One thing I haven't tried yet is to play the actual lossyWAV file, I will try this tonight along with your suggestions halb27.

Also, I have seen the -1 bits removed on several of my encodings in the past due to clipping prevention, are you saying this could have a large impact on the encoding?

Also regards the clipping, should I be using replaygain on this type of track? I've never used replaygain before because I've always been under the impression that its processing the sound so its no longer sounds like the original.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-17 07:52:51
Thanks for your replies gents,
Nick, I'm using flac settings -3 -m -e -r 2 -b 512 at the moment, but I had the same problem using -5 & -8

One thing I haven't tried yet is to play the actual lossyWAV file, I will try this tonight along with your suggestions halb27.

Also, I have seen the -1 bits removed on several of my encodings in the past due to clipping prevention, are you saying this could have a large impact on the encoding?

Also regards the clipping, should I be using replaygain on this type of track? I've never used replaygain before because I've always been under the impression that its processing the sound so its no longer sounds like the original.
Replaygain (applied to the file, rather than appended) will only ever change amplitude and should only affect volume - not sound.

I'll have a look at the -1 btr issue, although, thinking about it, it should have no impact at all as when btr falls to zero then the block is merely stored and is not processed at all.

Playing the WAV should be a good test to see whether the problem lies with playback or processing.
Title: lossyWAV Development
Post by: halb27 on 2007-12-17 07:56:25
a) Nick said a '-1' shouldn't be seen with -detail, so probably there's something wrong and he'll certainly find out. Maybe only the display is affected. We'll see.
From what it's meant to be the current clipping prevention is to maintain quality - the downside is that bitrate with strongly clipping tracks can get pretty high as with your reported samples. But maybe there are other side effects with the clipping prevention strategy like no bit removed in one block due to clipping prevention and 10 bits removed in the next block due to normal lossyWAV mechanism leading to distortion. I can't imagine it's like that cause that would be audible also outside of your iAudio M5 environment. But until things are clear we should keep it in mind. For clarification it would be fine if you could encode a volume reduced variant of your sample. I can do the loudness reduced encoding in case you're not used to wave editing.

b) The replaygain procedure doesn't process the sound. It just computes a volume correction value and stores it in the file so that the playback machinery is able to adjust volume according to this value (in case the playback machinery has a replaygain feature).
The target is to have each track in a series of tracks originating from different albums at its adequate loudness.
Sound impression varies with different volume due to the volume depent frequency characteristics of our audio perception.
The only form of 'sound processing' because of replaygain can occur on playback when there is something like a 'soft clipping prevention' feature with replaygain which usually can be switched off if not wanted.
Anyway whether or not you want to use replaygain has nothing to do with our problem here in case it should be an encoding problem.
Title: lossyWAV Development
Post by: jesseg on 2007-12-17 11:48:15
b) The replaygain procedure doesn't process the sound. It just computes a volume correction value and stores it in the file so that the playback machinery is able to adjust volume according to this value (in case the playback machinery has a replaygain feature).
[snip]
The only form of 'sound processing' because of replaygain can occur on playback when there is something like a 'soft clipping prevention' feature with replaygain which usually can be switched off if not wanted.


Well that's not entirely true.  In fact, if the replaygain code is handled in 16bits, it can easily cause an audible effect to be heard.  Given it's a portable, and probably only supports 16bit (and FIR), I would bet money that the output of the replaygain multiplier is a 16bit number, thereby truncating bits.

The first thing to do would be to find out if your player's DAC even supports 24bit output.  If not, then it's at least running dithering on the end of the replaygain function (or somewhere before it's sent to the DAC), if not just truncating anyways.
Title: lossyWAV Development
Post by: halb27 on 2007-12-17 13:49:49
b) The replaygain procedure doesn't process the sound. ...


Well that's not entirely true.  In fact, if the replaygain code is handled in 16bits, it can easily cause an audible effect to be heard.  Given it's a portable, and probably only supports 16bit (and FIR), I would bet money that the output of the replaygain multiplier is a 16bit number, thereby truncating bits. ....

OK, but that's not exactly what I would call sound processing.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-17 14:30:03
I'll have a look at the -1 btr issue, although, thinking about it, it should have no impact at all as when btr falls to zero then the block is merely stored and is not processed at all.
The bug is merely presentational - when btr is indicated as -1 then it is actually 0 and the codec_block is stored. This will be amended for beta v0.5.9

As an aside, could this method be applied to 32-bit float values, i.e. round the mantissa only? And would any of the lossless codecs make use of zero LSB's in the mantissa?
Title: lossyWAV Development
Post by: 2Bdecided on 2007-12-17 14:56:45
b) The replaygain procedure doesn't process the sound. It just computes a volume correction value and stores it in the file so that the playback machinery is able to adjust volume according to this value (in case the playback machinery has a replaygain feature).
[snip]
The only form of 'sound processing' because of replaygain can occur on playback when there is something like a 'soft clipping prevention' feature with replaygain which usually can be switched off if not wanted.


Well that's not entirely true.  In fact, if the replaygain code is handled in 16bits, it can easily cause an audible effect to be heard.
No, not "easily". "Potentially".

Of course, the effects of ReplayGain are almost always audible - it changes the volume!

However, 16bits are more than enough to accomplish this transparently for most audio signals. Not all, but most.

The potential for detecting the re-quantisation (or re-dithering) exists with extremely dynamic signals, where ReplayGain reduces the volume because substantial sections are relatively loud, but the listener cranks the volume up to concentrate on some relatively quiet sections. In these sections, the dither or quantisation may be audible.


A far greater and more common problem is owning a poor DAP with a hissy amplifier, low power etc, where ReplayGain will make everything too quiet and drop it into the noise floor.

Cheers,
David.


Sorry for not posting these sooner. I've been out all day.
I've attached 10 second samples. The issue happens 5 seconds in when encoding the sample.flac file.
Sorry, the M5_issue.flac isn't great quality but you can clearly hear the issue I'm experiencing.
Just to check...

M5_issue is mono. I assume your DAP is actually playing back in stereo?

Cheers,
David.
Title: lossyWAV Development
Post by: stel on 2007-12-17 21:07:43
OK, I've spent some time this evening playing around with different encoder settings on my problem sample and I'm kicking myself with the results. The good news is that it's not lossyWAV that's at fault...

My problem was caused by the 'Bit Depth Control' settings in foobar2000. The 'Format is:' was set to lossy and 'Highest BPS mode supported:' set to 24. Setting the format to lossless(or hybrid) fixed the issue. I guess I've encoded everything at 24bit and my DAP isn't happy about it.

Apologies to everyone who spent time looking into it.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-17 21:15:00
OK, I've spent some time this evening playing around with different encoder settings on my problem sample and I'm kicking myself with the results. The good news is that it's not lossyWAV that's at fault...

My problem was caused by the 'Bit Depth Control' settings in foobar2000. The 'Format is:' was set to lossy and 'Highest BPS mode supported:' set to 24. Setting the format to lossless(or hybrid) fixed the issue. I guess I've encoded everything at 24bit and my DAP isn't happy about it.

Apologies to everyone who spent time looking into it.
Many thanks for the subsequent investigations! My iPAQ won't even play 24bit.....

[edit] Interestingly, the difference between my 53 sample test set at 16 bit and 24 bit (Foobar WAV output) is about 40kB in 38.4MB [/edit]
Title: lossyWAV Development
Post by: jesseg on 2007-12-17 23:04:32
The 'Format is:' was set to lossy and 'Highest BPS mode supported:' set to 24. Setting the format to lossless(or hybrid) fixed the issue.


You're the second person in this thread alone to run into problems from that setting.  I would think they would have it set to lossless by default, but I guess not (or is it?)
Title: lossyWAV Development
Post by: Nick.C on 2007-12-18 12:12:22
I've been toying with looking at implementing (& optimising in IA-32) mixed-radix FFT's (i.e. non-power-of-two-length).

At present, they're *very* slow.

Is there a feeling that changing the timeframe of the fft analysis from 1.45msec (64 samples @ 44.1kHz) to, say, 1.497msec (66 samples @ 44.1kHz) or 23.22msec (1024 samples @ 44.1kHz) to 20msec (882 samples @ 44.1kHz) would be beneficial?

This would also allow better handling of WAV's which are other than 44.1kHz as the optimal FFT length could be calculated directly rather than using the existing 64 / 128 / 256 / 512 / 1024 sample lengths, e.g. 72 samples = 1.5msec @ 48kHz; 960 samples = 20msec @ 48kHz.

If it is considered to be of merit then I will progress the implementation / optimisation over the festive period, initially alongside the existing power-of-two FFT routine unless (until?) the mixed-radix method becomes acceptably fast.
Title: lossyWAV Development
Post by: halb27 on 2007-12-18 14:32:13
The best of possibilities that arise from such a generalization IMO comes with the long FFTs where a finer differentiation becomes possible. By now we only have the choice between a 512 and a 1024 sample FFT.

Anyway I can't imagine ist's a real benefit, even for a 48 kHz source which makes up for a ~10% difference in timing.

I once was worried about the overlapping of the 1024 sample FFT into the neighboring blocks (and I would still prefer the 5/8 partitioning I once suggested) which can also be reduced by using a say 882 sample FFT, but because of the very good quality we have I'm content with the way you do the FFT right now.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-12-18 15:10:10
I think it's a waste of time. The exact numbers aren't important, and even in lossy codecs (where the exact numbers are important) they still stick to fixed values irrespective of the sample rate, with reasonable results.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-18 15:34:16
The best of possibilities that arise from such a generalization IMO comes with the long FFTs where a finer differentiation becomes possible. By now we only have the choice between a 512 and a 1024 sample FFT.

Anyway I can't imagine ist's a real benefit, even for a 48 kHz source which makes up for a ~10% difference in timing.

I once was worried about the overlapping of the 1024 sample FFT into the neighboring blocks (and I would still prefer the 5/8 partitioning I once suggested) which can also be reduced by using a say 882 sample FFT, but because of the very good quality we have I'm content with the way you do the FFT right now.
I think it's a waste of time. The exact numbers aren't important, and even in lossy codecs (where the exact numbers are important) they still stick to fixed values irrespective of the sample rate, with reasonable results.

Cheers,
David.
Thanks guys - I'll park it. Only the presentational bug of -1 btr to be corrected for beta v0.5.9.

-shaping may get the bullet as it is really beyond my grasp of audio processing.

As things seem to be ticking along nicely, I think I'll concentrate on producing the correction file next.
Title: lossyWAV Development
Post by: 2Bdecided on 2007-12-19 10:16:45
I probably said this before, but I strongly advise borrowing the wavpack file format for that, as much as possible/relevant, because I'm sure David Bryant has worked through some of the issues that might be faced here. It might not be entirely applicable, but it's always good to re-use someone else's (good) work when they're happy for you to do so!

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-19 10:31:52
I probably said this before, but I strongly advise borrowing the wavpack file format for that, as much as possible/relevant, because I'm sure David Bryant has worked through some of the issues that might be faced here. It might not be entirely applicable, but it's always good to re-use someone else's (good) work when they're happy for you to do so!

Cheers,
David.
My approach was going to be even more simplistic - write two WAV files, one lossy, one correction, then encode both in FLAC. I'm working on adding a FACT chunk to the WAV file - containing a null terminated string which will include lossyWAV version information, parameters used and date/time of processing (and ultimately the CRC32 of the lossless and lossy data) which would be written to both files.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-20 08:49:29
I'm working on adding a FACT chunk to the WAV file - containing a null terminated string which will include lossyWAV version information, parameters used and date/time of processing (and ultimately the CRC32 of the lossless and lossy data) which would be written to both files.
I've successfully implemented a mechanism to insert a 160 byte FACT chunk immediately after the FMT chunk in the WAV file. This takes the form:
Code: [Select]
fact/152/lossyWAV v0.5.9 : 20/12/2007 08:46:57
-2 -cbs 512 -nts 0.00 -snr 21.00 -skew 36.00
-spf 22224-22235-22336-12347-12358 -fft 10101
If a file has already been processed, the FACT chunk will be found and lossyWAV will exit. When encoding in FLAC the --keep-foreign-metadata switch must be used to preserve the lossyWAV FACT chunk.

Thinking about it, I should make a bit more effort and make the FACT chunk variable length (up to a sensible maximum). In this way, the total length of the FACT chunk will be (8+string_length+(string_length and 1).
Title: lossyWAV Development
Post by: Nick.C on 2007-12-21 09:49:12
lossyWAV beta v0.5.9 attached at post 1 of the thread.

Fixed btr -1 bug;
Implementation of FACT chunk inclusion in output when processed. lossyWAV will exit if it finds a lossyWAV FACT Chunk in a WAV file. FLAC required --keep-foreign-metadata switch to retain FACT chunk.
Title: lossyWAV Development
Post by: jesseg on 2007-12-22 07:52:54
Could you have the exe return... something besides error level 1, when it actually has an error, such as the WAV file already having the lossyWAV flag?  That will allow me, and others, to add handling for it into batch files and software.

Very good idea though.  I'll release a new batch for iFLCDrop when I can get back a specific code for the "lossyWAV flag exists" error.  With the error code returned I can make the script copy the source-file into the temp directory, and have it get encoded into FLAC anyways.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-22 11:37:45
Could you have the exe return... something besides error level 1, when it actually has an error, such as the WAV file already having the lossyWAV flag?  That will allow me, and others, to add handling for it into batch files and software.

Very good idea though.  I'll release a new batch for iFLCDrop when I can get back a specific code for the "lossyWAV flag exists" error.  With the error code returned I can make the script copy the source-file into the temp directory, and have it get encoded into FLAC anyways.
Very good idea, I'll start thinking about the codes needed and will post as beta v0.6.0 in a couple of days.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-24 09:55:09
Could you have the exe return... something besides error level 1, when it actually has an error, such as the WAV file already having the lossyWAV flag?  That will allow me, and others, to add handling for it into batch files and software.

Very good idea though.  I'll release a new batch for iFLCDrop when I can get back a specific code for the "lossyWAV flag exists" error.  With the error code returned I can make the script copy the source-file into the temp directory, and have it get encoded into FLAC anyways.
Very good idea, I'll start thinking about the codes needed and will post as beta v0.6.0 in a couple of days.
Another thought - would it be useful to, say, "lossyWAV <wavfile.wav> -check" to allow the user to check for a processed file without trying to process it again?
Title: lossyWAV Development
Post by: jesseg on 2007-12-24 12:16:05
Yes, especially if it returned an error code if it was already processed.  It would allow an application to check or batch check for lossyWAV or non-lossyWAV files... for whatever reason someone might want to do that.  But even without an error code, it would be ok for non-automated or non-batch use I guess.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-24 12:20:01
Yes, especially if it returned an error code if it was already processed.  It would allow an application to check or batch check for lossyWAV or non-lossyWAV files... for whatever reason someone might want to do that.  But even without an error code, it would be ok for non-automated or non-batch use I guess.
I would of course include an error code  following your request. You can even see the rifffact chunk stored at the beginning of FLAC files thanks to Josh's latest --keep-foreign-metadata switch, and Tiny Hexer file binary viewer / editor.
Title: lossyWAV Development
Post by: singaiya on 2007-12-24 19:03:20
Is lossyWAV's method similar to Wavpack lossy's method? I thought it was a different approach, but then seeing mention of correction files maybe they're more similar than I thought?
Title: lossyWAV Development
Post by: Nick.C on 2007-12-24 19:10:13
The concept of the correction file will be similar in that the lossless original will be able to be recomposed from the lossy.wav and the lwcdf.wav files by simple sample addition.

lossyWAV rounds LSB's to zero where the added noise of the rounding is calculated to be below a threshold value.
Title: lossyWAV Development
Post by: Nick.C on 2007-12-24 20:17:17
I would of course include an error code  following your request. You can even see the rifffact chunk stored at the beginning of FLAC files thanks to Josh's latest --keep-foreign-metadata switch, and Tiny Hexer file binary viewer / editor.
lossyWAV beta v0.6.0 attached at post 1 of the thread.

Error code = 16 on exit if WAV file has already been processed.

-check parameter included to allow checking without trying to process the file, error code = 16 if processed. [edit]It's not clear in the parameters that -check should only be used in the context: "lossyWAV <wavfile.wav> -check" as it will always exit after determining whether a lossyWAV FACT chunk exists.
Title: lossyWAV Development
Post by: jaybeee on 2007-12-29 17:06:34
Thanks for the latest update Nick.C.
I personally think it's best to add download links to the first post in the thread; that way it's always easy to find 

I use foobar, the set up as per the lossywav wiki entry (http://wiki.hydrogenaudio.org/index.php?title=LossyWAV#Example_Foobar2000_Converter_Settings), and so do you think it's possible to preserve Replaygain tags? (I'm unsure if this is a foobar or lossywav issue).
Title: lossyWAV Development
Post by: Nick.C on 2007-12-29 20:29:17
Thanks for the latest update Nick.C.
I personally think it's best to add download links to the first post in the thread; that way it's always easy to find 

I use foobar, the set up as per the lossywav wiki entry (http://wiki.hydrogenaudio.org/index.php?title=LossyWAV#Example_Foobar2000_Converter_Settings), and so do you think it's possible to preserve Replaygain tags? (I'm unsure if this is a foobar or lossywav issue).
David will correct me if I am wrong, but I've seen slightly different Replaygain values for the same track pre as opposed to post processing, so I don't know if retaining them is a good idea. It wasn't much though. lossyWAV does nothing at all with tags - that's all Foobar (thankfully).

I will edit the first post in this thread to reflect both its content and as the download point.
Title: lossyWAV Development
Post by: halb27 on 2007-12-29 20:36:44
How do you make foobar pass the replaygain information to the resulting (say) FLAC file?
I'd love to do that as my input ape files do have replaygain information.

I personally wouldn't care about slightly incorrect replaygain values (and I can't but imagine they are small).
The bigger problem to me is with replaygain that sometimes the replaygain values have to be corrected manually to achieve an equal loundness impression. I'd be happy if I could do these manual corrections just once in my ape files, and take profit of it whenever I encode them.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-03 16:26:15
lossyWAV beta v0.6.1 now appended to post #1 of this thread.
Title: lossyWAV Development
Post by: halb27 on 2008-01-03 18:08:32
Thank you very much for the new version, especially for improving on the bit-to-remove routine and for implementing the correction file mechanism.
Hopefully some more people might find lossyWav attractive because of this feature.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-03 18:11:47
Thank you very much for the new version.
It's just occurred to me, but the error was in the detection of -ve clipping samples - so conceivably it has adversely influenced recent fine-tuning of the -3 quality preset settings......
Title: lossyWAV Development
Post by: halb27 on 2008-01-03 18:15:42
... so conceivably it has adversely influenced recent fine-tuning of the -3 quality preset settings......

What is your opinion on that? Do you think we can try again to arrive at a lower average bitrate? Or should we be more cautious?
Title: lossyWAV Development
Post by: Nick.C on 2008-01-03 18:51:09
... so conceivably it has adversely influenced recent fine-tuning of the -3 quality preset settings......
What is your opinion on that? Do you think we can try again to arrive at a lower average bitrate? Or should we be more cautious?
I think that with the bug allowing undetected -ve clipping "through the net" and not activating the clipping-prevention mechanism when it should have, some samples will have sounded worse than they should have - therefore we could probably be a bit more aggressive.
Title: lossyWAV Development
Post by: halb27 on 2008-01-03 19:28:34
I see, clipping was not detected on -ve occasion, and thus the clipping prevention strategy was not triggered.

With my personal kind of thinking this is not a sufficient reason for going less defensive, but I see you'ld like to have average bitrate a bit lower, and maybe that's true for other potential users as well.

Well, there is no strict reason to have the -nts defaults where they are right now. I personally am happy with them, but I'm happy too with other defaults especially as we've always wanted to have -nts as a major option.

I think we should encourage users to use other -nts value than the defaults, and the defaults themselves can be like that though better not at the extreme edges.
With -3 for instance I think an -nts usage from -nts 0 to -nts 10 is okay, with -nts 0 playing it very safe, and -nts 10 allowing for potential but very minor audible deviations from the original.
With -1 any -nts value >=0 makes sense, and everybody can choose the security margin he likes.
For -2 any -nts value <= 10 is useful (though a large positive value may be more adequate together with -3).

Maybe we should try to write something like this in the wiki (shall I do it in case I'm allowed to?).
Title: lossyWAV Development
Post by: Nick.C on 2008-01-03 19:32:25
I see, clipping was not detected on -ve occasion, and thus the clipping prevention strategy was not triggered.

With my personal kind of thinking this is not a sufficient reason for going less defensive, but I see you'ld like to have average bitrate a bit lower, and maybe that's true for other potential users as well.

Well, there is no strict reason to have the -nts defaults where they are right now. I personally am happy with them, but I'm happy too with other defaults especially as we've always wanted to have -nts as a major option.

I think we should encourage users to use other -nts value than the defaults, and the defaults themselves can be like that though better not at the extreme edges.
With -3 for instance I think an -nts usage from -nts 0 to -nts 10 is okay, with -nts 0 playing it very safe, and -nts 10 allowing for potential but very minor audible deviations from the original.
With -1 any -nts value >=0 makes sense, and everybody can choose the security margin he likes.
For -2 any -nts value <= 10 is useful (though a large positive value may be more adequate together with -3).

Maybe we should try to write something like this in the wiki (shall I do it in case I'm allowed to?).
I agree with your reasoned response - I think that I was just a bit eager to reduce the bitrate (as usual!).

Feel free to edit the wiki article - you are a major contributor to the project after all....
Title: lossyWAV Development
Post by: halb27 on 2008-01-03 21:07:15
...Feel free to edit the wiki article - you are a major contributor to the project after all....

Done, but can you please look it up and correct errors as I'm not a native English speaker.

BTW, I've moved the player comparison link to the codecs section, and added a remark on Rockbox supporting FLAC and wavPack. Guess the codecs section is the more appropriate place especially as the preset section got a bit fatter by my contribution.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-03 22:07:48
Done, but can you please look it up and correct errors as I'm not a native English speaker.

BTW, I've moved the player comparison link to the codecs section, and added a remark on Rockbox supporting FLAC and wavPack. Guess the codecs section is the more appropriate place especially as the preset section got a bit fatter by my contribution.
Looking good - thanks again for your input and restraining influence....
Title: lossyWAV Development
Post by: Nick.C on 2008-01-04 11:19:27
lossyWAV beta v0.6.1 now appended to post #1 of this thread.
"Fixed" the bug but introduced another (although fairly benign....) expect beta v0.6.2 shortly.

lossyWAV beta v0.6.2 now appended to post #1 of this thread. beta v0.6.1 removed due to bug.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-04 16:07:13
I was playing about with FLAC settings on my processed 53 sample set and found the following (all at -b 512):

-0; 45415797B; 3.89s
-1; 46004820B; 4.89s
-2; 44305745B; 4.75s
-3; 41716318B; 4.89s
-4; 42151483B; 5.45s
-5; 40646358B; 8.20s
-6; 40637026B; 9.03s
-7; 40497737B; 29.49s
-8; 40305661B; 38.44s

-3 -m -e -r 2; 40894438B; 13.52s

Is it just me, my sample set or my PC - but does -5 seem to be the sweet spot for encoding?
Title: lossyWAV Development
Post by: 2Bdecided on 2008-01-04 16:19:49
Is that not the FLAC default?

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-04 16:23:12
Is that not the FLAC default?

Cheers,
David.
Erm, yes.....  Should possibly have stopped to consider why the best time (-5) corresponded to the best time / compression point.
Title: lossyWAV Development
Post by: halb27 on 2008-01-04 16:43:42
I was playing about with FLAC settings on my processed 53 sample set and found the following (all at -b 512):

-0; 45415797B; 3.89s
-1; 46004820B; 4.89s
-2; 44305745B; 4.75s
-3; 41716318B; 4.89s
-4; 42151483B; 5.45s
-5; 40646358B; 8.20s
-6; 40637026B; 9.03s
-7; 40497737B; 29.49s
-8; 40305661B; 38.44s

-3 -m -e -r 2; 40894438B; 13.52s

Is it just me, my sample set or my PC - but does -5 seem to be the sweet spot for encoding?

File sizes are plausible, but encoding time is strange, as the internal effort of -5 should be higher compared to -3 -m -e -r 2, at least that's what I learnt from the documentation in case I've read that correctly.
That was the motivation behind my struggling for a fast and space saving setting, and it is like that with my system.
I'm a bit unhappy with your 53 sample set for such tests - in this case because they consist of small snippets to a pretty large extent. Do you mind encoding a small set of full length tracks with -5 and -3 -m -e -r 2?
Other than that -5 is a good FLAC setting of course.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-01-04 16:53:50
Nick,

This is looking really good. Do you feel you're close to a release candidate yet? It seems that the code works well enough, and it's just a case of settling the command line options and documentation.

Would you be able to find time to explain the algorithm as it currently stands? Or should I just read the code  Obviously I'm most interested in the changes, but other people might want a full overview without reading 29 pages!

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-04 21:31:56
Nick,

This is looking really good. Do you feel you're close to a release candidate yet? It seems that the code works well enough, and it's just a case of settling the command line options and documentation.

Would you be able to find time to explain the algorithm as it currently stands? Or should I just read the code  Obviously I'm most interested in the changes, but other people might want a full overview without reading 29 pages!

Cheers,
David.
Thanks very much David, I appreciate it.

I'm happy with the code - it's fairly robust (now that I've found that bug in the bits-to-remove routine that only caused a crash if I was trying to write a correction file as well...) and the only outstanding item is the reconstitution of the lossy and lwcdf files to recreate the lossless original.

I will try to put the algorithm into engineering speak (my working language) and pass on to you for formalising into audio-processing language.

I should also try to adequately comment the code to allow sense to be made of it by someone other than myself - as it will form part of the release package.
Title: lossyWAV Development
Post by: jesseg on 2008-01-06 20:49:57
I had an idea, it might already be this way.  I didn't check.  But if the correction file was encoded with the bits reversed (16=1/15=2/etc) for whatever bit-depth needed, wouldn't lossless codecs encode that more efficiently?  Since it's not really an audio file someone would use directly, that makes sense for starters, and it shouldn't complicate things if lossyWAV-compliant decoding were ever to be built into any lossless decoders.  And there should be practically no speed hit at all.


Also, here's a massively improved version of lFLCDrop.    Hopefully the only thing I'll have to add later is the support for automatic re-assembling of a lossless WAV when the correction file is present.  And if anyone has any suggestions about things to add or change with the custom settings area in the batch file - now is the perfect time to suggest something.

Quote
lFLCDrop Change Log:
v1.2.0.4
- added correction file settings
- added custom encoding setting

lFLC.bat Change Log:
v1.0.0.4
- added flac decoding support
- added correction file encoding support
- added custom settings section
- improved temp file handling


a newer version is out, check for a later post
Title: lossyWAV Development
Post by: carpman on 2008-01-06 21:32:25
Hi all,

very nice to see LossyWav development coming on so quickly!

@jesseg

just ran flacdrop v1.2.0.4 with lossywav v.0.6.0.2

settings were

Left=33
Top=550
Encoder=D:\active_encoders_all\lossywav\iFLC.bat
Switches=-2
Image=
OutputDir=
StayOnTop=1
Minimize=1
DeleteSource=0

but unlike when I was using
flacdrop v1.2.0.2 with lossywav v.0.5.4.4 (the last versions I ran) no FLAC files were created. The DOS window came up and it was clearly processing -- I checked the temp directory and could see the lossy.wav files being created but no FLAC files in the Output directory (set as the same as the input directory).

Any ideas?

It could be me but I'm not doing anything differently than with previous version.

ps. I did a search on my hard drive for [inputfilenames].flac but nothing came up.

C.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-06 21:58:54
I had an idea, it might already be this way.  I didn't check.  But if the correction file was encoded with the bits reversed (16=1/15=2/etc) for whatever bit-depth needed, wouldn't lossless codecs encode that more efficiently?  Since it's not really an audio file someone would use directly, that makes sense for starters, and it shouldn't complicate things if lossyWAV-compliant decoding were ever to be built into any lossless decoders.  And there should be practically no speed hit at all.
Unfortunately, as not all of the differences are positive, this wouldn't help as the new bit 1 of every negative difference would be 1.

Something David said *ages* ago popped back into my head and I want to get a second opinion with respect to my understanding:

Looking for zero values in the sample data was mentioned.

I took this to mean "look for FFT's where all the input values are zero" and have implemented a checking mechanism as follows:

When filling up the FFT input array, OR each sample value with a "running total" which is initialised as zero before the filling starts.

If the resultant "running total" is zero then the FFT is full of zeros and does not require to be calculated.

For every FFT not calculated, do not take a 0db value when calculating the bits_to_remove for the codec_block in question (as rounded zero's are still zero = no added noise).

If *every* FFT was full of zeros, set bits_to_remove to zero and simply store the codec_block.

[edit] This approach reduces my FLAC'd processed 53 sample set by a whole 95 bytes. However, it may slightly increase processing throughput..... [/edit]
Title: lossyWAV Development
Post by: jesseg on 2008-01-06 22:11:17
That makes sense.  I guess I understand little about how FLAC, and other lossless compression works.


carpman, check your system-drive for the directory that was created of the same path as the files you were trying to encode.  You should find the flac files and can delete them.  I made a silly booboo and forgot to parse and use the drive letter from the input & output locations.

Quote
lFLC.bat Change Log:
v1.0.0.5
- fixed drive letter bug


see attached
Title: lossyWAV Development
Post by: halb27 on 2008-01-06 22:46:08
... look for FFT's where all the input values are zero ...

The input values to the FFT: isn't that the wave samples of a block?
Title: lossyWAV Development
Post by: carpman on 2008-01-06 22:57:01
carpman, check your system-drive for the directory that was created of the same path as the files you were trying to encode.  You should find the flac files and can delete them.


There weren't any FLAC files created on my system-drive [I searched all drives for the FLAC files and none came up]

Anyway, no matter -- I'll use the new version now -- thanks for the quick reply.

C.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-06 23:17:20
... look for FFT's where all the input values are zero ...
The input values to the FFT: isn't that the wave samples of a block?
Yes, but at 64 samples and 50% end_overlap and 50% fft_overlap, there are 17 FFT analyses performed per 512 sample codec_block. So, using the existing method, if one whole fft is zero then this brings the whole block's bits_to_remove to zero for 64 samples of silence (which round to silence anyway).....
Title: lossyWAV Development
Post by: halb27 on 2008-01-07 09:28:56
I see. You look at the codec blocks' subblocks formed by the particular FFTs and ignore the FFT result (not doing the FFT) if the subblock contains silence.
Sounds reasonable though I think for a silent passage only the first and last block (containing partial silence) will take profit of this mechanism as the main silence part affects entire blocks.
I'm not quite sure about the window function. Can the windowing bring FFT input to zero though it's not zero in the wave input? I guess it doesn't but I'm not sure.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-07 10:06:09
I see. You look at the codec blocks' subblocks formed by the particular FFTs and ignore the FFT result (not doing the FFT) if the subblock contains silence.
Sounds reasonable though I think for a silent passage only the first and last block (containing partial silence) will take profit of this mechanism as the main silence part affects entire blocks.
I'm not quite sure about the window function. Can the windowing bring FFT input to zero though it's not zero in the wave input? I guess it doesn't but I'm not sure.
The silence check is carried out on raw audio samples.

As the checking mechanism is now in place, I am wondering about "what constitutes digital (near) silence"?

I have put in place a few lines of code which take the absolute value of the sample rather than the actual value when performing the silence check.

In determining whether to process a single FFT the threshold for "silence" can be set to zero (total silence) or some value above zero (near silence).

Determining what constitutes "near silence" is the tricky bit......
Title: lossyWAV Development
Post by: Nick.C on 2008-01-07 11:14:28
lossyWAV beta v0.6.3 appended to post #1 of this thread.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-01-07 11:29:46
[edit] This approach reduces my FLAC'd processed 53 sample set by a whole 95 bytes. However, it may slightly increase processing throughput..... [/edit]
Why would it change the output at all. However you round (or not) zeros, they're still zeros. (Where's the "confused" smiley!).

It should be quicker though.

Be very careful if you intend to convert near-silence into digital silence. If you do it, please only for the "lower quality" mode -3. Near-silence is quite easy to encode losslessly anyway, and you could hit all kinds of problems for little benefit.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-07 11:44:40
[edit] This approach reduces my FLAC'd processed 53 sample set by a whole 95 bytes. However, it may slightly increase processing throughput..... [/edit]
Why would it change the output at all. However you round (or not) zeros, they're still zeros. (Where's the "confused" smiley!).

It should be quicker though.

Be very careful if you intend to convert near-silence into digital silence. If you do it, please only for the "lower quality" mode -3. Near-silence is quite easy to encode losslessly anyway, and you could hit all kinds of problems for little benefit.

Cheers,
David.
I am not changing the samples themselves, merely disregarding FFT results which *are* going to be zero by not even calculating them - therefore not including a known 0db result in the minimum_of_all_fft_results calculation when determining the final bits_to_remove.

At the moment the -detection parameter is optional and I would intend to keep it that way.
Title: lossyWAV Development
Post by: halb27 on 2008-01-07 12:22:48
[edit] This approach reduces my FLAC'd processed 53 sample set by a whole 95 bytes. However, it may slightly increase processing throughput..... [/edit]
Why would it change the output at all. However you round (or not) zeros, they're still zeros. ... Near-silence is quite easy to encode losslessly anyway, and you could hit all kinds of problems for little benefit.

I've thought this over again, and I think the -detection mechanism as is does already affect a near-silence situation: in a temporal sense, not a sense of amplitude. If we look at a codec block with partial silence some short-term FFT results can remain unconsidered. This does lower the accuracy compared to not using -detection as proved with the 53 sample set. And it's not clear whether this is a welcome thing in every situation (think of a strong transient starting or stopping within a block with silence just before or after the transient).

In the end I also see a bad ratio of benefits against risks, even when considering the current just temporal-near-silence detection.

ADDED: I think our short blocksize of 512 samples (~12 msec) is enough to take care of temporal silence.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-07 12:36:26
[edit] This approach reduces my FLAC'd processed 53 sample set by a whole 95 bytes. However, it may slightly increase processing throughput..... [/edit]
Why would it change the output at all. However you round (or not) zeros, they're still zeros. ... Near-silence is quite easy to encode losslessly anyway, and you could hit all kinds of problems for little benefit.
I've thought this over again, and I think the -detection mechanism as is does already affect a near-silence situation: in a temporal sense, not a sense of amplitude. If we look at a codec block with partial silence some short-term FFT results can remain unconsidered. This does lower the accuracy compared to not using -detection as proved with the 53 sample set. And it's not clear whether this is a welcome thing in every situation (think of a strong transient starting or stopping within a block with silence just before or after the transient).

In the end I also see a bad ratio of benefits against risks, even when considering the current just temporal-near-silence detection.

ADDED: I think our short blocksize of 512 samples (~12 msec) is enough to take care of temporal silence.
Fair enough - I'll park the thought and continue to clean up the code with a view to going RC1 later this week.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-07 17:05:40
Right, to achieve concensus on which parameters should be included in lossyWAV RC1, can I have your thoughts on the following:

Keep:
-1, -2, -3;
-o <folder>
-nts <n>
-snr <n>
-force
-check
-correction
-wmalsl
-quiet
-nowarn
-below
-low

Remove:
-skew <n>
-spf <5x5hex>
-fft <5xbin>
-cbs <n>
-detail
Title: lossyWAV Development
Post by: halb27 on 2008-01-07 22:19:09
A good selection IMO. I'd just prefer to remove -snr as well and keep only -nts as a quality affecting parameter apart from -1/-2/-3.

About -wmalsl: We wanted to have options to optimize the lossyWAV precedure for specific lossless codecs. So far we have just this switch. Do we really need it? IIRC the switch only addresses codec block size, but isn't the codec blocksize a multiple of 512? I don't see a problem with respect to quality and efficiency with having lossyWAV blocksize = 512 and codec blocksize of the lossless codec a multiple of 512.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-08 21:45:46
A good selection IMO. I'd just prefer to remove -snr as well and keep only -nts as a quality affecting parameter apart from -1/-2/-3.

About -wmalsl: We wanted to have options to optimize the lossyWAV precedure for specific lossless codecs. So far we have just this switch. Do we really need it? IIRC the switch only addresses codec block size, but isn't the codec blocksize a multiple of 512? I don't see a problem with respect to quality and efficiency with having lossyWAV blocksize = 512 and codec blocksize of the lossless codec a multiple of 512.
lossyWAV v0.6.4 RC1 appended to post #1 in this thread.

At your suggestion I ditched -wmalsl - 2048 is a multiple of 512 after all as you point out. I kept -snr as it has been determined to be an intrinsic element in quality retention during the testing phase.
Title: lossyWAV Development
Post by: jesseg on 2008-01-09 03:23:11
Quote
lFLC.bat Change Log:
v1.0.0.6
- fixed bugs caused by directories and filenames with certain characters
- updated to reflect changes in lossyWAV v0.6.4 RC1 command-line parameters
- improved handling of file extensions



I found a bug when there was certain characters in directory names or file names, that weren't handled write even inside of quotes, which are now handled properly.  Heck, it should even work with unicode now, but I wouldn't bet on lFLCDrop front-end handling it properly.

The batch file has been updated to lossyWAV v0.6.4 RC1 standards.  It should still be compatible with older versions of lossyWAV, but the custom settings area doesn't specify the block-size anymore.  FLAC blocksize can still be set in the "enc_cust_flacoptions_string" variable at the top, and it's 512 samples by default.

Regarding the file extension handling - If a WAV file has the lossyWAV chuck and already has a ".lossy.wav" extension, the FLAC file's extension will not have another added ".lossy" tacked onto it.  However, if a WAV file has the lossyWAV chuck and does not already have a ".lossy.wav" extension, then the FLAC file's extension will have the ".lossy" tacked onto it.    If anyone thinks this is an annoying option, I could certainly provide an option in the batch file to turn off the renaming from WAV files that already have lossyWAV chunks, and I suppose I could also provide the option turn off the FLAC encoding of those files all together.  Let me know if this would be useful to you.

And finally...  the batch file is now over 6 KB 

[edit] link removed, newer version later in the thread [/edit]
Title: lossyWAV Development
Post by: GeSomeone on 2008-01-09 17:36:24
At your suggestion I ditched -wmalsl - 2048 is a multiple of 512 after all as you point out.

If you would mention in the documentation (or help file if there is one) that for WMA lossless -cbs 2048 is recommended, we can forget about WMA lossless
Title: lossyWAV Development
Post by: Nick.C on 2008-01-09 18:09:30
At your suggestion I ditched -wmalsl - 2048 is a multiple of 512 after all as you point out.
If you would mention in the documentation (or help file if there is one) that for WMA lossless -cbs 2048 is recommended, we can forget about WMA lossless
But, as 2048 is exactly 4 codec_blocks, is there really a need to retain the -cbs parameter (removed at RC1, along with -wmalsl)?
Title: lossyWAV Development
Post by: halb27 on 2008-01-09 18:22:54
At your suggestion I ditched -wmalsl - 2048 is a multiple of 512 after all as you point out.

If you would mention in the documentation (or help file if there is one) that for WMA lossless -cbs 2048 is recommended, we can forget about WMA lossless

I don't really understand what you mean. Do you like to have the possibility of using a lossyWAV blocksize of 2048 for the sake of WMA lossless? Or do you want to force a block size of 2048 on the WMA lossless side?
Title: lossyWAV Development
Post by: carpman on 2008-01-10 03:10:32
jesseg,

i ran "lFLCDrop.v1.2.0.4.  , lFLC.bat.v1.0.0.6 with "lossyWAV_v0.6.4_RC1"

i got the same problem as last time (see: http://www.hydrogenaudio.org/forums/index....129&st=716) (http://www.hydrogenaudio.org/forums/index.php?showtopic=56129&st=716))

DOS window comes up and does the lossy.wav processing and creates the lossy.wav temp file but then the DOS window closes and no FLAC processing is initiated and the lossy.wav file is deleted.

I'm running WinXP SP2
The FLAC Drop and Lossy Wav programs are running from my 2nd HD (D:) and the Input / Output directory is on my Primary (System Drive) Drive (C:).

Again I've searched for *.FLAC (which would obviously include *Lossy.FLAC) and nothing has been created on either drive and this mirrors what is shown in the DOS process window.

Didn't have this problem with flacdrop v1.2.0.2 & lossywav v.0.5.4.4.

Any ideas?

C.
Title: lossyWAV Development
Post by: jesseg on 2008-01-10 05:16:52
Are you using the latest version of FLAC?  (v1.2.1B)  You have to use that or else it will close instantly because older versions of FLAC don't support the foreign metadata feature. 
Title: lossyWAV Development
Post by: carpman on 2008-01-10 06:54:49
Are you using the latest version of FLAC?  (v1.2.1B)  You have to use that or else it will close instantly because older versions of FLAC don't support the foreign metadata feature. 


Thanks!
Problem solved. I was using 1.2.0 (i think).

C.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-10 11:27:42
Reading the FLAC wikipedia article, I notice that FLAC will encode any integer WAV between 4 and 32 bits.

When I try to output 32bit from Foobar, I get 32bit Float rather than integer, so I can't test lossyWAV properly, but the internals allow 16, 24 & 32bit integer WAV files to be processed.

I will amend the WAV reading / writing routines to properly allow for 4<=bits<=32 integer values to be scaled properly (as internally, lossyWAV only uses 32bit integers for sample storage, 64bit floats for most calculations).
Title: lossyWAV Development
Post by: Alex B on 2008-01-10 14:00:54
Thanks for your hard work. I have not had particular interest in this quality/bitrate range, but I decided to try the release candidate.

I think I have stumbled on a problem sample on my very first try.

I browsed my recent albums and selected a track that is a bit difficult for usual lossy encoders. It is Livin' In The Future from Bruce Springsteen's latest album. This track would score well in the "highest lossless bitrate thread". FLAC 1.21 -8 produces 1133 kbps for the complete track and my 30 s. sample is 1162 kbps.

I used the "-3" lossyWAV compression option and my settings were exactly as instructed on the wiki page.

At first I noticed that something may be different, but didn't know what to look for. However, after some trials I understood what I heard and was able to ABX it:

Code: [Select]
foo_abx 1.3.1 report
foobar2000 v0.9.4.5
2008/01/10 10:31:26

File A: D:\lossyWAV\Livin_In_The_Future.flac
File B: D:\lossyWAV\Livin_In_The_Future.lossy.flac

10:31:26 : Test started.
10:32:45 : 01/01  50.0%
10:34:24 : 02/02  25.0%
10:34:55 : 03/03  12.5%
10:35:21 : 04/04  6.3%
10:35:38 : 05/05  3.1%
10:36:50 : 06/06  1.6%
10:37:19 : 07/07  0.8%
10:38:30 : 08/08  0.4%
10:39:33 : 09/09  0.2%
10:39:49 : 10/10  0.1%
10:40:05 : Test finished.

----------
Total: 10/10 (0.1%)


I wonder if anyone else is able to hear the difference. (I can give hints later if needed.)

I uploaded a 30 s. sample here:
http://rs274.rapidshare.com/files/82598575...The_Future.flac (http://rs274.rapidshare.com/files/82598575/Livin_In_The_Future.flac) (4.15 MB)
(I didn't have enough attachment space at HA.)
Title: lossyWAV Development
Post by: halb27 on 2008-01-10 14:22:37
Thanks for testing and providing a problematic sample. I'll try it tonight.
Do you mind trying if -2 solves the issue?
Title: lossyWAV Development
Post by: GeSomeone on 2008-01-10 14:37:23
At your suggestion I ditched -wmalsl - 2048 is a multiple of 512 after all as you point out.
If you would mention in the documentation that for WMA lossless -cbs 2048 is recommended, we can forget about WMA lossless
I don't really understand what you mean.

Sorry, I didn't pay enough attention to the fact that -cbs would also be ditched. In that case my remark makes no sense. Nevermind
Title: lossyWAV Development
Post by: Nick.C on 2008-01-10 14:39:48
Thanks for the sample Alex B, it's people like you who allow us to refine and improve the quality presets. I've downloaded it and as you say, it's high bitrate - for both FLAC and lossyFLAC -3/-5 (1162.0kbps vs 544.1kbps).

My ears / listening environment have not allowed me to find the problem area(s?) of the sample.

Out of interest, what was your listening environment when you identified the issue?

[edit2] Additionally, maybe instead of trying -2, could you try -3 -nts 2? This may be enough to address the as yet broadly unidentified problem..... [/edit2]

[edit3] This seems to be a sample which activates the anti-clipping mechanism regularly: I tried
lossywav -3 and got 8.7384 bits removed, 0.2577 not removed;
lossywav -3 -nts 0 and got 8.6029 bits removed, 0.2492 not removed;
lossywav -3 -nts -3 and got 8.3549 bits removed, 0.2337 not removed;

lossywav -3 -snr 24 and got 8.3618 bits removed, 0.2295 not removed;
lossywav -3 -snr 27 and got 7.8982 bits removed, 0.1974 not removed;

lossywav -3 -nts -3 -snr 27 and got 7.8069 bits removed, 0.1950 not removed.[/edit3]

[edit1]
Sorry, I didn't pay enough attention to the fact that -cbs would also be ditched. In that case my remark makes no sense. Nevermind
Don't worry about it - it was a fairly brutal reduction in settings ![/edit1]
Title: lossyWAV Development
Post by: halb27 on 2008-01-10 19:05:12
... I wonder if anyone else is able to hear the difference. (I can give hints later if needed.) ...

I can't hear the difference. Can you give a hint please?
My lossyFLAC -3 bitrate is 417 kbps (filesize = 1528 KB) BTW.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-10 19:47:18
... I wonder if anyone else is able to hear the difference. (I can give hints later if needed.) ...
I can't hear the difference. Can you give a hint please?
My lossyFLAC -3 bitrate is 417 kbps (filesize = 1528 KB) BTW.
I made a mistake, my bitrate is 418.4kbps for the lossy.flac version. I still can't hear it - however, it prompted me to re-examine the process_codec_block routine and I've managed to speed the processing up by about 50%.

One thing just occured to me - at present only 2 [edit2] 1024 sample FFT[/edit2] analyses are carried out on a 512 sample codec_block -512:511 and 0:1023. This gives 50% overlap over the length of the file. What it doesn't do is carry out an fft analysis with the middle of the codec_block as the middle of the fft. I can easily add in the extra analysis - with no speed penalty [edit2] compared to v0.6.4 [/edit2] due to the vast speedup I tripped over.

Also, maybe the -spf for the 1024 sample FFT should be 22469 rather than 2246C (assuming we have a potential high frequency problem.....)?
Title: lossyWAV Development
Post by: halb27 on 2008-01-10 20:09:04
Nick, I think we should learn about the problem before trying to fix it.
Title: lossyWAV Development
Post by: Alex B on 2008-01-10 21:13:11
Hmm... it may be difficult to hear the problem without any hints.

The first occurance is immediately after the first snare hit when the drum is still sounding. It is like the tuning was adjusted slightly. The drums have slightly different pitch after the sharp hit of the drum stick. I ABXed the 0-1 s range. I think I could hear the same later, but it is more difficult because the other instruments and singer's voice are partially masking the effect.

There is also at least one short passage where I could hear a similar effect in the singer's voice, but I need to recheck the exact position. I'll try to find it again and report back.

I used Terrratec DMX 6fire 24/96 & Koss PortaPro in the ABX test, but before that I compared the complete tracks using small powered Genelec studio monitors and became suspicous.

It may well be that -2 makes the problem vanish totally. It wasn't easy to ABX it even though I had a feeling that something is different.

I am not sure if can do further ABXing just now. I've had an exhausting day...
Title: lossyWAV Development
Post by: Nick.C on 2008-01-10 21:29:37
Hmm... it may be difficult to hear the problem without any hints.

The first occurance is immediately after the first snare hit when the drum is still sounding. It is like the tuning was adjusted slightly. The drums have slightly different pitch after the sharp hit of the drum stick. I ABXed the 0-1 s range. I think I could hear the same later, but it is more difficult because the other instruments and singer's voice are partially masking the effect.

There is also at least one short passage where I could hear a similar effect in the singer's voice, but I need to recheck the exact position. I'll try to find it again and report back.

I used Terrratec DMX 6fire 24/96 & Koss PortaPro in the ABX test, but before that I compared the complete tracks using small powered Genelec studio monitors and became suspicous.

It may well be that -2 makes the problem vanish totally. It wasn't easy to ABX it even though I had a feeling that something is different.

I am not sure if can do further ABXing just now. I've had an exhausting day...
Alex B, thanks for the clarification of the problem.
Title: lossyWAV Development
Post by: halb27 on 2008-01-10 21:36:47
... I've had an exhausting day...

So relax.
Maybe tomorrow (or whenever) it would be nice if you could test -2.
I can't contribute cause even with your hint I can't abx it. With versions very much earlier however I also made the experience with specific samples that pitch was changed somehow.
So with your excellent hearing you can give a valuable contribution to lossyWAV improvement.

What's not totally correct with the current setting of -3 is that we have decreased the noise sensitivity threshold a bit. We've thought we can allow for that because we have other precautions which however are less effective in the high frequency range.
So this is the first thing to consider.
With -2 we aren't this little bit aggressive, so your ABX result using -2 is very much welcome.
In case -2 is alright it would be nice if you could try -3 -nts 0 as well, as this keeps this little aggressive mode away from -3 as well.
If however even -2 isn't totally satisfying it would be very much appreciated if you could try -2 -nts 3 as this make the noise sensitivity more defensive.
Title: lossyWAV Development
Post by: halb27 on 2008-01-10 21:54:43
... at present only 2 [edit2] 1024 sample FFT[/edit2] analyses are carried out on a 512 sample codec_block -512:511 and 0:1023. This gives 50% overlap over the length of the file. ...

As we once decided to have an overlapping of<more than 50% the window length it would be good to have an improvement here. I remember my proposal of using these windows: -448:575, -64:959.
I think overlapping is good as is coverage of the edges. What do you think?
Title: lossyWAV Development
Post by: Nick.C on 2008-01-10 22:11:14
... at present only 2 [edit2] 1024 sample FFT[/edit2] analyses are carried out on a 512 sample codec_block -512:511 and 0:1023. This gives 50% overlap over the length of the file. ...
As we once decided to have an overlapping of<more than 50% the window length it would be good to have an improvement here. I remember my proposal of using these windows: -448:575, -64:959.
I think overlapping is good as is coverage of the edges. What do you think?
My most recent speedup is reliant on 50% overlap either side of the codec_block. Adding in the extra analysis gives: -512:511; -256:767 & 0:1023 - at no speed penalty compared to v0.6.4. Any other overlap would not give even coverage - look at what happens with adjacent codec_blocks and plot the FFT lengths....
Title: lossyWAV Development
Post by: Alex B on 2008-01-10 22:45:51
Let's see if I can do some testing tomorrow. As we know, trying to test codecs at this quality level is exhausting.

Hearing a small pitch change is like a visual experience. One tiny bit of sound ends at a bit higher "position" than the other. If you lose concentration for a second you are out and it may take a while before you can hear the difference again.

As far as I understand, the problem may well be caused by small differences in the reproduction of the highest harmonics.

Edit: typo
Title: lossyWAV Development
Post by: shadowking on 2008-01-11 00:11:09
This track is compressed rock. Its very strange that one would abx it because all the instruments and vocals are going at it at once. With wavpack (should be similar with other hybrids) those were the hardest to abx.
Title: lossyWAV Development
Post by: halb27 on 2008-01-11 08:16:21
The problem may be small but so far we should consider it a pitch problem to be solved.
I couldn't sleep this night and so I could think about it a lot.
I constructed the error file last night and listened to it (and looked at it with a wave editor).
I am convinced that the primary problem isn't caused by the noise level being too high. When listening to the error file what's most annoying is not the noise itself but the fluctuation in noise. Especially at the blocks' edges this fluctuation can form a strong transient.

I was a bit sceptical before about this abrubt noise level change with respect to the anti clipping strategy. But that's too short sighted. We do have this potential problem whenever there's a strong change in bits to remove from one block to the next.

To work against this we should take care that bits to remove changes only 1 bit at the blocks' edges. If for a sequence of 10 blocks bits to remove is 1, and for the next 10 blocks bits to remove is 8, we should not immediately go from 1 bits to remove to 8 bits to remove, but do it gradually, so the bits to remove in the 20 blocks is 1,1,1,1,1,1,1,1,1,1,2,3,4,5,6,7,8,8,8,8. If bits to remove of the first 10 blocks is 8 and the next 10 blocks is 1, bits to remove should be 8,8,8,8,7,6,5,4,3,2,1,1,1,1,1,1,1,1,1,1. Unfortunately this means having potentially to work on past blocks so this means buffering and deferred output.

I think we should do it this way for -2 and -1.
For -3 the number of intermediate steps with their restricted advantage of the removed bits should be lowered IMO. For -3 I think we can allow for a stepsize of 2 bits to remove when going from one block to the next. But we should do it in a way that the error level never has an immediate change of 2 bits to remove. We can easily do this by changing bits to remove by 1 bit for the first 256 samples in the block and another 1 bit for the last 256 samples. By just looking at 1 block this doesn't bring a compression improvement compared to change bits to remove by 1 for the entire block. The advantage is in the fact that we have roughly half of the intermediate blocks. So going from 1 bit to remove to 8 bits to remove as in the sample above looks like this: 1,1,1,1,1,1,1,1,1,1,2 resp. 3,4 resp. 5,6 resp. 7, 8,8,8,8,8,8,8.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-11 08:31:21
The problem may be small but so far we should consider it a pitch problem to be solved.
I couldn't sleep this night and so I could think about it a lot.
I constructed the error file last night and listened to it (and looked at it with a wave editor).
I am convinced that the primary problem isn't caused by the noise level being too high. When listening to the error file what's most annoying is not the noise itself but the fluctuation in noise. Especially at the blocks' edges this fluctuation can form a strong transient.

I was a bit sceptical before about this abrubt noise level change with respect to the anti clipping strategy. But that's too short sighted. We do have this potential problem whenever there's a strong change in bits to remove from one block to the next.

To work against this we should take care that bits to remove changes only 1 bit at the blocks' edges. If for a sequence of 10 blocks bits to remove is 1, and for the next 10 blocks bits to remove is 8, we should not immediately go from 1 bits to remove to 8 bits to remove, but do it gradually, so the bits to remove in the 20 blocks is 1,1,1,1,1,1,1,1,1,1,2,3,4,5,6,7,8,8,8,8. If bits to remove of the first 10 blocks is 8 and the next 10 blocks is 1, bits to remove should be 8,8,8,8,7,6,5,4,3,2,1,1,1,1,1,1,1,1,1,1. Unfortunately this means having potentially to work on past blocks so this means buffering and deferred output.

I think we should do it this way for -2 and -1.
For -3 the number of intermediate steps with their restricted advantage of the removed bits should be lowered IMO. For -3 I think we can allow for a stepsize of 2 bits to remove when going from one block to the next. But we should do it in a way that the error level never has an immediate change of 2 bits to remove. We can easily do this by changing bits to remove by 1 bit for the first 256 samples in the block and another 1 bit for the last 256 samples. By just looking at 1 block this doesn't bring a compression improvement compared to change bits to remove by 1 for the entire block. The advantage is in the fact that we have roughly half of the intermediate blocks. So going from 1 bit to remove to 8 bits to remove as in the sample above looks like this: 1,1,1,1,1,1,1,1,1,1,2 resp. 3,4 resp. 5,6 resp. 7, 8,8,8,8,8,8,8.
Given the way that lossyWAV adds noise / reduces bits, I do not understand how pitch can be changed.

It would be relatively simple to ensure that each codec_block will have no more than 1 more bit removed than the last codec_block. To go the other way as well would be a large amount of coding.

I think that one initial approach would be to re-examine the -spf 22224 / 2246C for 64 / 1024 samples to see if the problem can be eradicated. I will re-post beta v0.6.2 to allow manipulation of those parameters removed at v0.6.4 RC1. I will also post beta v0.6.5 which incorporates the speedup and the extra 1024 sample FFT analysis per block.

[edit] Right, beta v0.6.5 appended to post #1 of this thread along with beta v0.6.2 as mentioned previously. Beta v0.6.5 limits the increase in bits_to_remove between blocks to 1 bit and incorporates the 3 1024 sample FFT analyses amendment. For my 53 sample set, beta v0.6.5 -3 / flac -5 produces 445.2kbps; -2 / flac -5 produces 508.7kbps and -1 / flac -5 produces 559.5kbps. [/edit]
Title: lossyWAV Development
Post by: halb27 on 2008-01-11 09:16:53
Given the way that lossyWAV adds noise / reduces bits, I do not understand how pitch can be changed.

It would be relatively simple to ensure that each codec_block will have no more than 1 more bit removed than the last codec_block. To go the other way as well would be a large amount of coding.

I think that one initial approach would be to re-examine the -spf 22224 / 2246C for 64 / 1024 samples to see if the problem can be eradicated. I will re-post beta v0.6.2 to allow manipulation of those parameters removed at v0.6.4 RC1. I will also post beta v0.6.5 which incorporates the speedup and the extra 1024 sample FFT analysis per block.

Pitch of the original signal can't change of course but the way we add noise can give the impression that pitch has changed. I did have this very impression with former listening tests. And I'm absolutely convinced it's not the noise due to bits to remove but the modulation of the noise due to the abrupt noise level changes. The way we realize 2Bdecide's basic principles at the moment causes this particular problem. We do take good care of the low to medium frequency range when doing the bits to remove analysis, but we do add a significant amount of noise there afterwards because of the noise modulation side effect.

You may convince yourself by first looking at the error signal with a wave editor. See how artificially strange this signal looks because of the abrupt changes in noise level. Then listen to it while within the wave editor. You can hear the noise as thus, but what's real annoying isn't the noise itself, it's the noise modulation due to abrupt changes in level.

Sorry that working backwards causes you a lot of trouble, and I can understand that you'd like to have another solution. But I definitely don't see a sense in giving the -spf setting a higher sensitivity for the HF range. Guess it's already unnecessarily high there (maybe the last change in this respect which was caused by problems with eig wasn't a good choice, cause maybe the problem is caused by the very problem we're talking about). Maybe gradually changing bits to remove gives room for being less defensive in -spf and -nts setting with -3 thus giving the chance to arrive at a lower average bitrate. Just speculation of course but what I want to say is there's no way around taggling the real problem. I think if you look and listen to the error signal you can understand.
Of course we can always bring bits to remove down and thus reduce the problem. But I think that's not the way to go.

I've thought about the working backward procedure. It's not nice of course, but I think the amount of effort necessary isn't extremely high. Whenever you output a block right now you can just write it to a buffer containing 16 blocks. You also record the current state of the number of bits to remove for the block and add this to the buffer space provided for the block. So whenever you have to work backwards you just address the bits to remove state of the blocks in the buffer.
The buffer is organized as a ring. So before putting the current block into the buffer you really output that block that is in the buffer for the longest time.
Sure the ring buffer has to be managed but I think that's not very difficult. Sure it's easy for me to talk about it and you having to do it in case you like to. Sorry about that.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-11 09:43:27
I've incorporated the bits_to_remove delta limit = +1 for subsequent codec_blocks in beta v0.6.5 - I think that it would be worth listening to to see if we are more sensitive to increases in noise rather than decreases in noise - this version limits the increase in noise to 6dB per codec_block. [edit] The extra 1024 sample FFT analysis is also incorporated. [/edit]

I will think on your method of looping the blocks to be written and revert.
Title: lossyWAV Development
Post by: halb27 on 2008-01-11 09:47:53
Thanks a lot.
Title: lossyWAV Development
Post by: Alex B on 2008-01-11 11:06:10
halb27,

I created a few smaller clips of the original and the lossy version. I tried to isolate the possible problems. Maybe these help in confirming that I have heard something. The clips should be accurately cutted (I used exact numerical values when creating the selections). While cutting these I inspected the difference signal (invert-mix-paste) in Audition. I saw the abrudly changing noise you explained. In addition, the Spectral Phase and Pan displays show small differences when the original and lossy version are compared.

I have yet to try to ABX them, except the first snare drum hit (00000_00595ms) which I already did. I think I can hear similar differences in the other clips, but ABXing them is more difficult.

For example, the cymbal crash in the 09400_10400ms clip may be slightly altered. I not saying that the actual pitch has changed, but the crash may be a bit brighter in one of the clips, which creates the impression of changed tuning.

The new lossyWAV clips are directly cutted from my first (-3) lossyWAV sample. I think it would be useful if someone else could hear one or more differences before trying other settings.

[attachment=4186:attachment]
[attachment=4187:attachment]
[attachment=4188:attachment]
[attachment=4189:attachment]
[attachment=4190:attachment]
[attachment=4191:attachment]
[attachment=4192:attachment]
[attachment=4193:attachment]
Title: lossyWAV Development
Post by: Nick.C on 2008-01-11 11:11:53
Halb27,

I created a few smaller clips of the original and the lossy version. I tried to isolate the possible problems. Maybe these help in confirming that I have heard something. The clips should be accurately cutted (I used exact numerical values when creating the selections). While cutting these I inspected the difference signal (invert-mix-paste) in Audition. I saw the abrudly changing noise you explained. In addition, the Spectral Phase and Pan displays show small differences when the original and lossy version are compared.

I have yet to try to ABX them, except the first snare drum hit (00000_00595ms) which I already did. I think I can hear similar differences in the other clips, but ABXing them is more difficult.

For example, the symbal crash in the 09400_10400ms clip could to be slightly altered. I not saying that the actual pitch has changed, but the crash may be a bit brighter in one of the clips, which creates the impression of changed tuning.

These are all from my first (-3) lossy sample. I think it would be useful if someone else could hear one or more differences before trying other settings.


I processed the sample in v0.6.2 and v0.6.5 (-detail re-enabled...) and got the following:
Code: [Select]
lossyWAV beta v0.6.2 : WAV file bit depth reduction method by 2Bdecided.
Delphi implementation by Nick.C from a Matlab script, www.hydrogenaudio.org
%lossyWAV Warning% : Quality level 3 selected.
%lossyWAV Warning% : Forcibly over-write output file if it exists.
%lossyWAV Warning% : Detailled output mode enabled
Processing : livin_in_the_future.wav
Format     : 44.10kHz; 2 ch.; 16 bit.
Progress   :
Block    Time   00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Tot.
====================================================================
    0    0.00s.  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  8   8
   16    0.19s.  8  9  9 10 10  9  7  7  7  7  6  6  8  8  8  8 127
   32    0.37s.  9  9  9  7  7  7  7  5  5  8  7  7  9  8  8  8 120
   48    0.56s.  8  8  7  7  8  7  7  8  8  6  6  8  9  7  9 10 123
   64    0.74s. 10  9 10 10 10 10 10 10 10  0 10 10 10 10  8 10 147
   80    0.93s.  0 10 10 10  0 10  0 10 10 10  0  9 10 10 10 10 119
   96    1.11s. 10 10  9 10 10 10 10 10  9  9 10 10 10  9  9 10 155
  112    1.30s.  9 10 10  0 10 10 10 10 10 10 10 10  8 10 10  9 146
  128    1.49s. 10 10  9  9  9  9 10  9  9 10 10 10  9  9  9  9 150
  144    1.67s.  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9 10 145
  160    1.86s.  0 10 10 10  8  9 10 10  9  9 10  9 10  9  9  9 141
  176    2.04s.  9  9  9  9  9  9  9  9 10 10 10 10 10  9  9  9 149
====================================================================
Average    : 8.7384; bits; [22580/2584; 22.65x; CBS=512]
%lossyWAV Warning% : 666 bits not removed due to clipping.

lossyWAV beta v0.6.5, Copyright (C) 2007,2008 Nick Currie. Portions (C) 1996
Don Cross. lossyWAV is issued with NO WARRANTY WHATSOEVER and is free software.
%lossyWAV Warning% : Detailled output mode enabled
Processing : livin_in_the_future.wav
Format     : 44.10kHz; 2 ch.; 16 bit.
Progress   :
Block    Time   00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 Tot.
====================================================================
    0    0.00s.  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  1   1
   16    0.19s.  2  3  4  5  6  7  7  7  7  7  6  6  7  8  7  8  97
   32    0.37s.  8  9  7  7  7  7  7  5  5  6  7  7  8  8  8  8 114
   48    0.56s.  8  8  7  7  8  7  7  8  8  6  6  7  8  7  8  9 119
   64    0.74s. 10  9 10 10 10 10 10 10 10  0  1  2  3  4  5  6 110
   80    0.93s.  0  1  2  3  0  1  0  1  2  3  0  1  2  3  4  5  28
   96    1.11s.  6  7  8  9 10 10 10 10  9  9 10 10 10  9  9 10 146
  112    1.30s.  9 10 10  0  1  2  3  4  5  6  7  8  8  9 10  9 101
  128    1.49s.  9  7  8  9  8  9 10  9  9 10 10 10  9  9  9  9 144
  144    1.67s.  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9 10 145
  160    1.86s.  0  1  2  3  4  5  6  7  8  9 10  9  9  9  9  9 100
  176    2.04s.  9  9  9  9  9  9  9  9 10 10 10 10 10  9  9  9 149
  ...    ......  ..................................................
====================================================================
Average    : 8.0232 bits; [20732/2584; 20.17x; CBS=512]
%lossyWAV Warning% : 0.1947 bits not removed due to clipping.
[/size]

Alex B, if you have time, could you try the sample with beta v0.6.5?
Title: lossyWAV Development
Post by: 2Bdecided on 2008-01-11 11:27:20
Be careful with restricting the deltas. It could increase the bitrate quite a lot for (as yet) no proven gain.

I was worried by the abrupt changes in noise to start with, and had strategies for cross fading block boundaries in the dithered and noise shaped versions. I didn't bother with the non-dithered version, but it would be possible here too by adding extra noise briefly and fading it out/in.

I couldn't find any situation where this cross fading was needed, so I dumped it.


If lossyWAV goes from no noise to 48dB of noise (8-bits) in a single block, that's because it believes that the audio in the entirety of that block (and slightly either side - remember overlap!) can take it.


Psychoacoustically, there are different thresholds for constant noise vs modulated noise, though I'm not sure if anyone has tested switched noise. I guess it too could be fractionally more audible.

There were almost no psychoacoustics in lossyFLAC, but my intention was to keep the noise well below both these thresholds (if they are indeed different). However, if it's below the threshold for constant noise, and above the threshold for modulated noise, then of course smoothing transitions or restricting deltas will help.

However, if the noise is simply too high in a given block because the calculations are wrong, and you introduce restricted deltas which happen to drag it down in that block, then of course you will stop the noise being audible, but you won't know if restricted deltas were really needed to solve it. Single block unlimited deltas (as now) with a slightly lower noise for that block might be the "better" solution.


I fear that raises more questions that it answers. Sorry!

Cheers,
David.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-01-11 11:49:10
Hang on a moment though - I think you guys are over reacting.

Isn't this what you designed the setting "-3" for? Probably transparent almost all the time. If someone can ABX something, does that mean that setting wants changing?

If it can be ABXed at -2, then you have work to do!


I can't ABX it at -3, but I can see that the added noise is getting quite close to the signal over the 10-16k region, and is above it over 16k. (see attached pictures). I assume (because I haven't seen you mention it) that you still ignore things over 16k? Or not?

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2008-01-11 11:54:42
So you too see the switch noise to be a potential problem.
So why not trying to avoid it? Sure average bitrate may come down significantly, but we don't know in advance. Moreover even in this case we have the option to gradually change the number of bits removed within a block as suggested with -3 to minimize the number of intermediate blocks while still smoothing error level.

I don't see it as a viable argument that this procedure would hide other problems. In principle this can be an unwanted side effect with any quality improving action. With this very action I think its rather the other way around: decreasing bits to remove by increasing -nts, -snr or whatsoever may well hide this very problem. If there should be a problem with the decision about how many bits to remove due to inpuit analysis it is expected to show up earlier or later also when using this smoothing strategy.
Title: lossyWAV Development
Post by: halb27 on 2008-01-11 12:11:07
Isn't this what you designed the setting "-3" for? Probably transparent almost all the time. If someone can ABX something, does that mean that setting wants changing?

If it can be ABXed at -2, then you have work to do!  ....

You are right, but unluckily we haven't had a lot of testing so far. I guess it was me who has done most of the testing so far, especially in the recent months, and my 58 year old ears aren't very good witnesses.
We are very thankful as for AlexB's testing especially as his hearing seems to be excellent.
So we should take any reported issue seriously and look for improvement. This does not necessarily mean that something is changed in the end.
Problem in this case is that Nick would have to do a lot of work in case he follows my suggestions, and it cannot be excluded that it is good for nothing.
... I can see that the added noise is getting quite close to the signal over the 10-16k region, and is above it over 16k. (see attached pictures). I assume (because I haven't seen you mention it) that you still ignore things over 16k? Or not? ....

Well that's an important finding. So maybe a higher -nts value is the solution. But it's still an open question to what extent the noise level in the 10+ kHz region is generated by the switch noise. Do you mind trying -3 -nts 0 and -3 -nts 3? In case the switch noise participates in the problem the SNR in the 10+ kHz region is not expected to improve very much.
Title: lossyWAV Development
Post by: Alex B on 2008-01-11 12:14:59
Alex B, if you have time, could you try the sample with beta v0.6.5?


It's better. I tried it with the 00000_00595ms sample. I couldn't reliably ABX it.

In addition I compared 0.64rc vs 0.65b. The ABX result was 9/10.

The bitrate increased from 494 to 555 kbps
(using FLAC -8 --padding 80. The small padding block is for the replay gain tag. foobar seems to take the tags into account when it calculates bitrates.)
Title: lossyWAV Development
Post by: Nick.C on 2008-01-11 12:46:34
I can't ABX it at -3, but I can see that the added noise is getting quite close to the signal over the 10-16k region, and is above it over 16k. (see attached pictures). I assume (because I haven't seen you mention it) that you still ignore things over 16k?
The cutoff is 16kHz - however, I already suggested changing 2246C to 22469 for the 1024 sample FFT - this brings bits_to_remove down a bit by reducing the spreading at high frequencies.

As an aside is it better to carry out 2 x FFT's (-512:511; 0:1023) or 1 (-256:767) at 1024 samples? The thinking behind the single FFT is that it is centred on the codec_block in question and is still overlapped 50% with the next FFT.
Title: lossyWAV Development
Post by: Alex B on 2008-01-11 12:49:25
Hang on a moment though - I think you guys are over reacting.

Isn't this what you designed the setting "-3" for? Probably transparent almost all the time. If someone can ABX something, does that mean that setting wants changing?

If it can be ABXed at -2, then you have work to do!

Those are my thoughts too. Unless my finding gets backup from others and several similar samples are found I don't think you need to worry too much.

I can't ABX it at -3, but I can see that the added noise is getting quite close to the signal over the 10-16k region, and is above it over 16k. (see attached pictures). I assume (because I haven't seen you mention it) that you still ignore things over 16k? Or not?

Perhaps a young tester who can easily hear up to 20 kHz or more would find easier to ABX this. My practical limit is about 17-18 kHz, I think.

I guess it was me who has done most of the testing so far, especially in the recent months, and my 58 year old ears aren't very good witnesses.
We are very thankful as for AlexB's testing especially as his hearing seems to be excellent. ...

I think we all hear things a bit differently. You have often pinpointed things that I might not have noticed. I may be sensitive to this kind of problem which sounds like a slight pitch change to me. I heard a similar effect in your "French lady" LAME -V0 sample, if you remember.


Edit: a typo again
Title: lossyWAV Development
Post by: Nick.C on 2008-01-11 13:08:46
Perhaps a young tester how can easily hear up to 20 kHz or more would find easier to ABX this. My practical limit is about 17-18 kHz, I think.

I guess it was me who has done most of the testing so far, especially in the recent months, and my 58 year old ears aren't very good witnesses.
We are very thankful as for AlexB's testing especially as his hearing seems to be excellent. ...
I think we all hear things a bit differently. You have often pinpointed things that I might not have noticed. I may be sensitive to this kind of problem which sounds like a slight pitch change to me. I heard a similar effect in your "French lady" LAME -V0 sample, if you remember.
I'd like to re-iterate halb27's thanks for initially identifying the problem and subsequently carrying out the ABX tests.

Thinking about the problem, it seems that the drop from 10 to 0 and back to 10 at codec_block 72/73/74 is due to clipping prevention rather than low minimum signal.

I agree that 10/0/10 is a bit of a steep change, but is a restricted_delta of +1 a bit conservative? Would +2 or +3 be acceptable? The higher the restricted_delta value, the fewer subsequent codec_blocks required to get back to the actual calculated value rather than sequential last_btr+restricted_delta values, i.e. 10,0,10,10,10,10,10 with restricted_delta=2 > 10,0,2,4,6,8,10.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-01-11 14:37:50
But it's still an open question to what extent the noise level in the 10+ kHz region is generated by the switch noise.
The switching doesn't "generate" noise. With white noise, the transient at the start is exactly as "loud" (if you want to put it that way) as the noise itself - no more or less.

It's not like a tone, where an instant start could be perceived as a click.

Cheers,
David.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-01-11 14:50:16
I don't see it as a viable argument that this procedure would hide other problems. In principle this can be an unwanted side effect with any quality improving action. With this very action I think its rather the other way around: decreasing bits to remove by increasing -nts, -snr or whatsoever may well hide this very problem. If there should be a problem with the decision about how many bits to remove due to inpuit analysis it is expected to show up earlier or later also when using this smoothing strategy.
Of course either approach can be the wrong one, yet appear to solve the problem.

All I was pointing out is that, for this reason, you really need to figure out a way of finding out which is right, but this is necessarily difficult.

My bet would be that it has nothing to do with switching transients, and everything to do with a simple nts.

At worst, it might be that the nts is "more wrong" for noise-like signals than tone-like signals - and that, specifically, it needs to find the peaks in the spectrum (as well as the troughs) and ensure that the noise is always at least 25dB (say) below them. Noise 18dB down from a peak can change the peak by 1dB, noise 25dB down can change it by 0.5dB. For most signals, the added noise is already much lower than 25dB below the spectral peak, but for signals which are originally noise-like anyway, it can currently get close to this limit.

Just a thought - IIRC you might well have (something like) this in there already!

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2008-01-11 15:33:33
.... I agree that 10/0/10 is a bit of a steep change, but is a restricted_delta of +1 a bit conservative? Would +2 or +3 be acceptable? ...

Maybe this is the best way out. Within the intermediate block(s) the total change can still be done in 1 bit steps - the way I suggested it for -3. Thus only few intermediate blocks, and still a smoothly changing resolution. Resolution 1 bit wise can change for instance every 128 samples thus allowing a total resolution change of 4 bits from block to block.
We can even adapt analysis to this 128 sample subblock scheme and let only those FFT results influence the bits to remove calculation which really are related to the actual 128 sample subblock. This makes the analysis more exact and has the potential to lower average bitrate.
Title: lossyWAV Development
Post by: GeSomeone on 2008-01-11 15:36:58
Thinking about the problem, it seems that the drop from 10 to 0 and back to 10 at codec_block 72/73/74 is due to clipping prevention rather than low minimum signal.
But then again wouldn't this be around a peak value where masking (the noise or change thereof) would work optimal?

I agree that 10/0/10 is a bit of a steep change, but is a restricted_delta of +1 a bit conservative? Would +2 or +3 be acceptable?

Those are good questions, first it has to be determined if switching the noise is the problem, secondly, if so, what to do to make it not a problem.
The whole method is base on modulating noise. Even with restricted delta, the noise is still modulated, only in a different way which might cause different side effects (maybe lower frequency artifacts?).

Sorry I can just think a little bit with you about the theory but not really help with abx-ing all these possibilities.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-12 13:08:41
lossyWAV beta v0.6.6 attached to first post of this thread.
Title: lossyWAV Development
Post by: Bourne on 2008-01-12 15:50:28
can we expect full transparency when it reaches 1.0 final ? This is pretty cool.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-12 17:29:26
can we expect full transparency when it reaches 1.0 final ? This is pretty cool.
The stated aim is full transparency for -3 with -2 and -1 being more conservative options for the user. At present -3 is pretty near transparent (although to my ears it is, but my ears certainly aren't the best on the planet....), but we're trying to iron out the problem for v0.6.4 RC1 with Bruce Springsteen's Livin In The Future identified by Alex B. Beta v0.6.5 was a pretty good comeback as Alex B's ABX'ing was inconclusive (I take it that means somewhere being able to and not being able to ABX the resulting WAV file).

With more finely tuned ears listening out for artefacts we'll get closer and closer to transparent (though to get there absolutely would probably take an infinite amount of time).

As I said, -3 is currently transparent for me, but transparency is in the ear of the beholder....
Title: lossyWAV Development
Post by: halb27 on 2008-01-12 21:46:06
...
Your samples 00000_00595ms, 09400_10400ms, 19800_21000ms, 21600_23100ms
...

Finally I found the time to abx your samples (I had a lot of trouble trying to bring my system to an uptodate state - now I'm back to my old configuration).

With your 00000_00595 samples I got at a 6/7 which in the end was 7/10.
With 19800_21000 I also have the suspicion that something's wrong but could not abx it.
With 21600_23100 I got at 6/8 and ended up 6/10.

Though these aren't good results I think it's enough for a confirmation.

I tried 0.6.6 on your samples. The results are better, but with 00000_00595 I got at 7/9 and ended up 7/10.
So the problem is still there.

I went back to 0.6.4RC1 and used a setting of -3 -nts 0.
Now I can't abx the problem any more.

So this is evidence that 2Bdecided is right and it's just a -nts problem.

As for this I suggest we default -3 to -nts 0, -2 to -nts 2 and -1 to -nts 4, and keep -spf the way it was done with 0.6.4RC1 (IMO the high frequency range is covered already well by the short FFT with its low spreading value).

I still feel uncomfortable with abrupt noise level changes, but maybe this is a wrong idea. At least it's not backed up by this sample.

Average bitrage will increase again - something which isn't liked especially with -3. In the wiki there's encouragement already to use a higher -nts value than default for people who prefer a smaller filesize and accept minor errors. Maybe we should find a formulation which enforces this encouragement.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-13 01:48:10

...
Your samples 00000_00595ms, 09400_10400ms, 19800_21000ms, 21600_23100ms
...

Finally I found the time to abx your samples (I had a lot of trouble trying to bring my system to an uptodate state - now I'm back to my old configuration).

With your 00000_00595 samples I got at a 6/7 which in the end was 7/10.
With 19800_21000 I also have the suspicion that something's wrong but could not abx it.
With 21600_23100 I got at 6/8 and ended up 6/10.

Though these aren't good results I think it's enough for a confirmation.

I tried 0.6.6 on your samples. The results are better, but with 00000_00595 I got at 7/9 and ended up 7/10.
So the problem is still there.

I went back to 0.6.4RC1 and used a setting of -3 -nts 0.
Now I can't abx the problem any more.

So this is evidence that 2Bdecided is right and it's just a -nts problem.

As for this I suggest we default -3 to -nts 0, -2 to -nts 2 and -1 to -nts 4, and keep -spf the way it was done with 0.6.4RC1 (IMO the high frequency range is covered already well by the short FFT with its low spreading value).

I still feel uncomfortable with abrupt noise level changes, but maybe this is a wrong idea. At least it's not backed up by this sample.

Average bitrage will increase again - something which isn't liked especially with -3. In the wiki there's encouragement already to use a higher -nts value than default for people who prefer a smaller filesize and accept minor errors. Maybe we should find a formulation which enforces this encouragement.
[Vino Rosso]Meh - oh well, just back from my company's Christmas party to a variation order for lossyWAV - no problem..... On the plus side, if v0.6.4 RC1 with -3 -nts 0 solves the problem then we will all benefit from the 50% speedup found when I started investigating Alex B's problem and potential solutions. Not the end of the world then - just a few kbps extra.....

On the face of it, maybe -nts 0 is the only acceptable starting point for the lowest quality option - so -nts -2 for -2 and -nts -4 for -1?

Ouch - 462kbps for my 53 sample set (40.98MB). But, we want transparency at all quality presents - so be it.[/Vino Rosso]
Title: lossyWAV Development
Post by: halb27 on 2008-01-13 11:25:19
I've tried 0.6.4.RC1 -3 -nts 0 on my small regular track sample set which however has shown to be pretty representative for regular music. The average bitrate is 402 kbps.

I was a little fast last night with conclusions, probably because I was so happy having been able to abx the problem finally. What is missing at the moment IMO is AlexB's opinion towards -3 -nts 0.
AlexB, do you mind trying 0.6.4.RC1 -3 -nts 0?
Title: lossyWAV Development
Post by: Nick.C on 2008-01-13 17:23:08
I've tried 0.6.4.RC1 -3 -nts 0 on my small regular track sample set which however has shown to be pretty representative for regular music. The average bitrate is 402 kbps.

I was a little fast last night with conclusions, probably because I was so happy having been able to abx the problem finally. What is missing at the moment IMO is AlexB's opinion towards -3 -nts 0.
AlexB, do you mind trying 0.6.4.RC1 -3 -nts 0?
Spooky - my 10 album test set got 402kbps as well [edit] at -3 -nts 0; 450kbps at -2 -nts -2 and 494kbps at -1 -nts -4 [/edit] .....
Title: lossyWAV Development
Post by: halb27 on 2008-01-13 20:40:43
This is an adequate and pretty evenly spread increase in bitrate to me for -3, -2, -1.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-13 21:18:36
This is an adequate and pretty evenly spread increase in bitrate to me for -3, -2, -1.
Ok, I'll post v0.6.7 RC2 in the thread. You should notice a fairly impressive improvement in processing throughput.
Title: lossyWAV Development
Post by: halb27 on 2008-01-13 22:02:53
Thank you, Nick.
Speed is very good.
Guess -nts defaults are -nts 0 for -3, -nts -2 for -2, and -nts -4 for -1. Right?
But what else is different compared to 0.6.4RC1? Average bitrate for my regular sample set is now 403 kbps.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-13 22:10:30
Thank you, Nick.
Speed is very good.
Guess -nts defaults are -nts 0 for -3, -nts -2 for -2, and -nts -4 for -1. Right?
But what else is different compared to 0.6.4RC1? Average bitrate for my regular sample set is now 403 kbps.
If you were one of the first two to download v0.6.7 RC2 then you downloaded a version which still had the "maximum additional bits_to_remove increase per codec_block" mechanism active, with a delta of +2 bits. Sorry , I tried to remove it as quick as I could - try re-downloading....
Title: lossyWAV Development
Post by: halb27 on 2008-01-13 22:49:32
It looks fine now (I tried with AlexB's sample).

I'll change the wiki as I described the -nts defaults.
Title: lossyWAV Development
Post by: lexor on 2008-01-14 00:29:22
Sup all, I've got a couple of questions about lossyWAV:

1) the wiki is angling at standard lossless decoders (like flac, etc) decoding lossy.flac/etc., but will standard WAV decoder decode lossyWAV correctly?

2) if all level settings are aiming at transparency... why have level settings?
Title: lossyWAV Development
Post by: TBeck on 2008-01-14 01:02:01
2) if all level settings are aiming at transparency... why have level settings?

I second this.

If -3 is beeing tuned to be transparent under any known condition, it would make sense for me to have one safer setting which handles possibly unknown problem files better. Beeing the more paranoid one, i probably would choose this (-2) . But i would never like to go even higher (-1). For me there is also some kind of a psychological barrier: For my taste lossy (wave) files should not have more than half the size of lossless files (on average)...

But this is just my taste...
Title: lossyWAV Development
Post by: carpman on 2008-01-14 02:12:03
I second this.


I don't.

I hope you'll keep the 3 levels.

So far I've been using lossy.wav -2 then encoding to flac (testing with vinyl restoration projects and results are very good).

For me it's like this:

-1 when it HAS to be transparent (eg. if I'd spent many many hours working on a piece in whatever capacity)
-2 when I really want it to be transparent (and figure that only in extreme cases it won't be, -2 is the perfect setting between MP3 320 and Lossless, and for me preferable to WavPack Hybrid).
-3 when I'd like it to be transparent, but I'm not too fussed if it isn't (I've got plenty of music which springs to mind).

So please keep the 3 levels -- and thanks for all your hard work.

C.

By the way -- has anyone done listening tests to MP3s transcoded from lossy.wav versus .wav?

In theory should there be any perceptual difference?

C.
Title: lossyWAV Development
Post by: buktore on 2008-01-14 02:48:43
Since it is still "lossy" I think that to have an option to choose for is still a better way to go. I mean, lossless codec do have an option even though it will work just fine without one or if developer decide to not include it. and still we got a lot of option anyway. ( which is good BTW.)

Oops. nearly for got what I'm here for. I drop by to show my gratitude & encouragement to everyone involve in this. (2Bdecided,Nick.C,halb27 and anyone else that I'm not mention) Thanks for your time and effort. 
Title: lossyWAV Development
Post by: Nick.C on 2008-01-14 07:43:59
1) the wiki is angling at standard lossless decoders (like flac, etc) decoding lossy.flac/etc., but will standard WAV decoder decode lossyWAV correctly?
The WAV file is still a WAV file - there is no decoding to do as all that is different between the original lossless WAV file and the lossyWAV file is that some LSB's are zero.
2) if all level settings are aiming at transparency... why have level settings?
Every lossy codec I've come across has quality settings - all presets aim at transparency, some fail with some tracks, with reducing likelihood as output bitrate increases.
So please keep the 3 levels -- and thanks for all your hard work.

By the way -- has anyone done listening tests to MP3s transcoded from lossy.wav versus .wav?

In theory should there be any perceptual difference?
I found a post on anythingbutipod.com (http://www.anythingbutipod.com/forum/showthread.php?t=22464) which tends to suggest that an OGG file transcoded from lossyWAV was bigger than lossless > OGG. As to perceptual differences, I think that's a question for David....
I second this.

If -3 is beeing tuned to be transparent under any known condition, it would make sense for me to have one safer setting which handles possibly unknown problem files better. Beeing the more paranoid one, i probably would choose this (-2) . But i would never like to go even higher (-1). For me there is also some kind of a psychological barrier: For my taste lossy (wave) files should not have more than half the size of lossless files (on average)...

But this is just my taste...
It is as you say, but -3 at v0.6.4 RC1 has proven *not* to be transparent within a couple of days of release. I can't say I was very happy, but I was delighted that Alex B's ears are so good that he was able to identify a problem with the track in question. So, -1 for paranoics, -2 for most people and -3 for DAP users (my preference being -3).
Since it is still "lossy" I think that to have an option to choose for is still a better way to go. I mean, lossless codec do have an option even though it will work just fine without one or if developer decide to not include it. and still we got a lot of option anyway. ( which is good BTW.)

Oops. nearly for got what I'm here for. I drop by to show my gratitude & encouragement to everyone involve in this. (2Bdecided,Nick.C,halb27 and anyone else that I'm not mention) Thanks for your time and effort. 
Thanks for the appreciation - we've all had fun with this project!
Title: lossyWAV Development
Post by: 2Bdecided on 2008-01-14 11:03:14
I found a post on anythingbutipod.com (http://www.anythingbutipod.com/forum/showthread.php?t=22464) which tends to suggest that an OGG file transcoded from lossyWAV was bigger than lossless > OGG. As to perceptual differences, I think that's a question for David....
I saw that too. It matches my early tests with mp3. It's not a big deal.

What is interesting is taking mp3 problem samples, and trying to ABX WAV>mp3 vs lossy.WAV>mp3. It would be nice if -1 (at least) could make that difference unABXable - but this might be unrealistic. I should get back to playing around with trumpet.wav or whatever it was called.

Cheers,
David.
Title: lossyWAV Development
Post by: GeSomeone on 2008-01-14 21:08:32
to my taste lossy (wave) files should not have more than half the size of lossless files (on average)..


[pedantic]I think you mean lossyFlac (or lossyTAK  ), as lossyWav files are the same size as the source wavs[/pedantic]

Yes, I would wish that too, but I found out that the nature of the source file makes a big difference.
just some examples:
a reasonably quiet track (a singer and a guitar) that rates 553 with FLAC -8 and 429 with lossyFlac -3 -nts 0
a lot louder track (another singer with just a guitar) rates 857 in FLAC -8 but 347 with lossyFlac -3 -nts 0

go figure 
Title: lossyWAV Development
Post by: Nick.C on 2008-01-14 22:19:52
Yes, I would wish that too, but I found out that the nature of the source file makes a big difference.
just some examples:
a reasonably quiet track (a singer and a guitar) that rates 553 with FLAC -8 and 429 with lossyFlac -3 -nts 0
a lot louder track (another singer with just a guitar) rates 857 in FLAC -8 but 347 with lossyFlac -3 -nts 0
It seems counter-intuitive, but looking at the nearly 3700 tracks that I've processed, the higher the initial bitrate, the lower the processed bitrate and vice-versa (subject to usual caveats about tracks which do not follow the generalism) [both processed bitrates less than the lossless bitrate].
Title: lossyWAV Development
Post by: halb27 on 2008-01-15 07:24:01
... a reasonably quiet track (a singer and a guitar) that rates 553 with FLAC -8 and 429 with lossyFlac -3 -nts 0
a lot louder track (another singer with just a guitar) rates 857 in FLAC -8 but 347 with lossyFlac -3 -nts 0 ...

When there's only very few instruments probability is high that parts of the spectrum have low energy. The lossyWAV principle is based on preserving the low energy parts with reasonable accuracy. So 'simple' music needs more bits as a rule.
The more instruments the more noise-like becomes the music - technically speaking - and the harder it gets for a lossless codecs.

lossyWAV looks worst compared to pure lossless with quiet 'simple' music. lossyWAV has no chance to save a significant amount of bits in this case.

I see it in a positive way: in many cases lossyWAV saves a lot of bits compared to lossless. In those cases where the relation isn't so good it's for the most part because lossless is already very efficient.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-15 12:27:06
Question: David mentioned in another thread about the number of actual bits remaining after rounding.

Is there any perceived benefit to be gained by implementing a(nother) safety net as follows:

When filling FFT array, OR a mask variable with the absolute value of each sample. This will allow the determination of the maximum set bit in the codec_block for that channel (max_bit).

Limit the bits_to_remove to the lower of the calculated value and Max(0,(max_bit-minimum_bits_to_keep)), thereby retaining at least minimum_bits_to_keep bits of actual resolution in that codec_block.

[edit] Also, if the number of clipped samples were restricted to, say, 5 per channel per codec_block (i.e. max of 10 for stereo, 0.977% of samples in the codec_block), would that seem reasonable? Even if they were all in series that would only be 0.1134 milliseconds. The reason I ask is that when I apply this to the livin_in_the_future problem track, although it clips, the bits_to_remove lost due to clipping is zero with only 196 clipping samples in the whole file (1323000 samples x 2 channels). [/edit]

[edit2] Say, -1 = 0 clips; -2 = 1 clip; -3 = 5 clips? [/edit2]
Title: lossyWAV Development
Post by: halb27 on 2008-01-15 15:14:01
I guess it doesn't hurt, but I also think it won't reduce bitrate in a significant way.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-15 15:26:09
I guess it doesn't hurt, but I also think it won't reduce bitrate in a significant way.
The first will increase the bitrate, the second certainly reduces it. I will post beta v0.6.8 in the first post of this thread, using minimum_bits_to_keep=5 and maximum_clips = (0,1,5).
Title: lossyWAV Development
Post by: halb27 on 2008-01-15 15:56:41
Sorry for being not clear. I only addressed your second suggestion.
As for your first certainly it's another defensive action, but it looks a bit like not having confidence in the lossyWAV principle.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-01-15 16:59:25
Are you using minimum_bits_to_keep in a defensive way already? Sorry, I'm not keeping up. If it's key to maintaining quality as it is, then maybe you should add what you propose. If it's not, then extending downwards to help quieter blocks doesn't seem necesary. If it is necesary, it would be better to keep the nosie floor at least x dB below the peaks in the spectral domain, rather than in the time domain - which is what I was trying to get at in a post on the last page.

I'm not sure what you're setting at with the clipping. If you let one sample clip in a block, then there are no wasted bits in that block, surely? The sample is 1111111111111111 so no zeros, so wasted_bits=0. Not sure how other codecs handle it - I remember Bryant saying wavpack was different.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-15 17:58:03
Are you using minimum_bits_to_keep in a defensive way already? Sorry, I'm not keeping up. If it's key to maintaining quality as it is, then maybe you should add what you propose. If it's not, then extending downwards to help quieter blocks doesn't seem necesary. If it is necesary, it would be better to keep the nosie floor at least x dB below the peaks in the spectral domain, rather than in the time domain - which is what I was trying to get at in a post on the last page.

I'm not sure what you're setting at with the clipping. If you let one sample clip in a block, then there are no wasted bits in that block, surely? The sample is 1111111111111111 so no zeros, so wasted_bits=0. Not sure how other codecs handle it - I remember Bryant saying wavpack was different.

Cheers,
David.
On the clipping front, if bits_to_remove=6 then what would have been 10000000 00000000 (assuming there was no sign bit - that bit is done with floats) would be clipped to 01111111 11000000, i.e. as if it had been rounded down not up.

On the minimum_bits_to_keep front, at present maximum_bits_to_remove=bits_per_sample-minimum_bits_to_keep = 16-5 = 11 for *all* codec_blocks. With the new proposal, if the highest filled bit (taking the ABS of -ve numbers first) is the 8th then at most 3 bits would be removed, regardless of what the algorithm produced.

I do have faith in the method, I just like belt, braces and hands in pockets keeping trousers up......
Title: lossyWAV Development
Post by: halb27 on 2008-01-16 08:01:56
I wasn't aware that there is already a minimum_bits_to_keep of 5 out of 16.
Based on this I second your suggestion and prefer a minimum_bits_to_keep of 5 out of number of bits used in the block.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-16 08:25:22
I wasn't aware that there is already a minimum_bits_to_keep of 5 out of 16.
Based on this I second your suggestion and prefer a minimum_bits_to_keep of 5 out of number of bits used in the block.
Thanks, I will keep this then - however, it may be that we would want to "tweak" the value for different quality presets, in the same way as has been done for allowable clips.

On the allowable clips front, if you were to process livin_in_the_future at -3 and ABX it, looking for clipping artefacts, I would be very grateful, as it did suffer a bit from lost bits due to clipping reduction with v0.6.7 RC2 and doesn't with beta v0.6.8.
Title: lossyWAV Development
Post by: TBeck on 2008-01-16 08:58:10
Are you using minimum_bits_to_keep in a defensive way already? Sorry, I'm not keeping up. If it's key to maintaining quality as it is, then maybe you should add what you propose. If it's not, then extending downwards to help quieter blocks doesn't seem necesary. If it is necesary, it would be better to keep the nosie floor at least x dB below the peaks in the spectral domain, rather than in the time domain - which is what I was trying to get at in a post on the last page.

I agree.

On the minimum_bits_to_keep front, at present maximum_bits_to_remove=bits_per_sample-minimum_bits_to_keep = 16-5 = 11 for *all* codec_blocks. With the new proposal, if the highest filled bit (taking the ABS of -ve numbers first) is the 8th then at most 3 bits would be removed, regardless of what the algorithm produced.

I do have faith in the method, I just like belt, braces and hands in pockets keeping trousers up......

I don't...

If 2Bdecided's approach is working right, you will gain nothing except probably worse compression.

Long time ago i was using old fashioned logarithmic quantization to compress audio: High resolution for low levels, little resolution for high levels. That's very similar to your proposal.

Unfortunately this often doesn't work well. Think about a combination of a low frequency signal with high amplitude (bass guitar) and a higher frequncy signal with low volume. Your approach will calculate the bits_per_sample from the low frequency signal and introduce distortion for the high frequency signal.

It's also likely to fail with a pure low frequency signal of high amplitude. Here i always got annoying distortions with the logarithmic approach.

I don't think it would be a good idea to sacrifice compression for a very questionable improvement of the sound quality.

Sorry: Bad explaination because of my very limited english...
Title: lossyWAV Development
Post by: jido on 2008-01-16 09:45:40
Are you using minimum_bits_to_keep in a defensive way already? Sorry, I'm not keeping up. If it's key to maintaining quality as it is, then maybe you should add what you propose. If it's not, then extending downwards to help quieter blocks doesn't seem necesary. If it is necesary, it would be better to keep the nosie floor at least x dB below the peaks in the spectral domain, rather than in the time domain - which is what I was trying to get at in a post on the last page.

I'm not sure what you're setting at with the clipping. If you let one sample clip in a block, then there are no wasted bits in that block, surely? The sample is 1111111111111111 so no zeros, so wasted_bits=0. Not sure how other codecs handle it - I remember Bryant saying wavpack was different.

Cheers,
David.
On the clipping front, if bits_to_remove=6 then what would have been 10000000 00000000 (assuming there was no sign bit - that bit is done with floats) would be clipped to 01111111 11000000, i.e. as if it had been rounded down not up.

On the minimum_bits_to_keep front, at present maximum_bits_to_remove=bits_per_sample-minimum_bits_to_keep = 16-5 = 11 for *all* codec_blocks. With the new proposal, if the highest filled bit (taking the ABS of -ve numbers first) is the 8th then at most 3 bits would be removed, regardless of what the algorithm produced.

I do have faith in the method, I just like belt, braces and hands in pockets keeping trousers up......


Would that introduce a varying noise floor for same volume samples? If so, it may be detrimental to perceived audio quality.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-01-16 09:55:34
TBeck,

What Nick is proposing won't make anything sound worse, because it's just extending a safety net to lower amplitudes. It's never used to throw away more bits than the "find the noise floor and quantise below it" approach - only to keep at least 5 bits when fewer than 5 bits were going to be kept.

Nick: does this kick in very often?


Where I do agree with you TBeck is that it's not a great safety catch, for exactly the reasons you've explained. It needs to be done in a spectral domain, not the time domain.

Whether it's worth doing in either domain is open to question. It might be, but it's extra complexity. It's heading even further down the route of having a "psychoacoustic" model. There will come a point when it's better to "borrow" someone else's.

For now, I'm more inclined to be happy with what we have.

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2008-01-16 09:59:00
.. Unfortunately this often doesn't work well. Think about a combination of a low frequency signal with high amplitude (bass guitar) and a higher frequncy signal with low volume. ...

Yes, but it's just an additional safety action - not really necessary IMO, but as it's done already Nick.C's new approach is just more consequent than the old approach.
It's low level signals that benefit from the new approach with respect to this safety action.

EDIT: 2Bdecided was faster.
Title: lossyWAV Development
Post by: TBeck on 2008-01-16 10:08:52
What Nick is proposing won't make anything sound worse, because it's just extending a safety net to lower amplitudes. It's never used to throw away more bits than the "find the noise floor and quantise below it" approach - only to keep at least 5 bits when fewer than 5 bits were going to be kept.


Yes, but it's just an additional safety action - not really necessary IMO, but as it's done already Nick.C's new approach is just more consequent than the old approach.
It's low level signals that benefit from the new approach with respect to this safety action.

I was aware of this but failed to express it right. 
Title: lossyWAV Development
Post by: Nick.C on 2008-01-16 11:18:33
So, reading the last few posts:

I will remove the revised minimum_bits_to_keep method (no, it doesn't kick in very often at all, and it slows down the processing slightly).

Seeking concensus:

Should we retain the recent implementation of allowing a few clipped samples to be "rounded the other way"?
Title: lossyWAV Development
Post by: halb27 on 2008-01-16 11:33:56
I don't care much about it with a slight negative feeling towards letting 5 samples clip though I don't think that would be audible.
I feel positive about letting isolated samples clip, but that's only because of AlexB's provided track where it happens that bits removed changed abruptly due to only 1 clipped sample.

For differentiating -3 from -2 my favorite is: let 1 sample per block clip for -2, let 2 samples clip for -3.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-16 11:35:48
I don't care much about it with a slight negative feeling towards letting 5 samples clip though I don't think that would be audible.
I feel positive about letting isolated samples clip, but that's only because of AlexB's provided track where it happens that bits removed changed abruptly due to only 1 clipped sample.

For differentiating -3 from -2 my favorite is: let 1 sample per block clip for -2, let 2 samples clip for -3.
I was thinking more: -1 = 0; -2 = 1; -3 = 5;

Code has speeded up yet again - now approaching 50% faster than v0.6.4 RC1....
Title: lossyWAV Development
Post by: 2Bdecided on 2008-01-16 13:10:20
On the clipping front, if bits_to_remove=6 then what would have been 10000000 00000000 (assuming there was no sign bit - that bit is done with floats) would be clipped to 01111111 11000000, i.e. as if it had been rounded down not up.
If you're normally rounding to the nearest number, then rounding down when you should be rounding up means you're jumping further away from the wanted value on this sample than on any other, doesn't it? I.e. you're adding more noise - potentially 50% more.

As it's level dependent, it's not strictly noise - it's distortion.

With my apologies in advance if I've misunderstood!

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-16 13:18:54
If you're normally rounding to the nearest number, then rounding down when you should be rounding up means you're jumping further away from the wanted value on this sample than on any other, doesn't it? I.e. you're adding more noise - potentially 50% more.

As it's level dependent, it's not strictly noise - it's distortion.

With my apologies in advance if I've misunderstood!

Cheers,
David.
No, exactly right - but given the duration - does the benefit not outweight the potential cost?
Title: lossyWAV Development
Post by: halb27 on 2008-01-16 14:00:07
It would be good to see the resulting difference in bitrate due to allowing this kind of restricted clipping for entire tracks. I think the difference is small. I will try with the only seriously clipping album in my collection.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-16 14:30:22
It would be good to see the resulting difference in bitrate due to allowing this kind of restricted clipping for entire tracks. I think the difference is small. I will try with the only seriously clipping album in my collection.
Using livin_in_the_future, there is a 3.5% reduction at -2 (1 clip per channel per codec_block allowed) and 5% reduction at -3 (5 clips...)

lossyWAV beta v0.6.9 attached to post #1 in this thread.
Title: lossyWAV Development
Post by: halb27 on 2008-01-16 16:43:39
I guess that's the result for the 30 sec part - but maybe it's representative for the entire track.
3.5% resp. 5% isn't bad, but it also shows for this case that we get most of the effect with less then 5 samples allowed to clip which is a more cautious approach. Do you mind trying 2 allowed clipped samples per block?
Title: lossyWAV Development
Post by: Nick.C on 2008-01-16 16:46:27
I guess that's the result for the 30 sec part - but maybe it's representative for the entire track.
3.5% resp. 5% isn't bad, but it also shows for this case that we get most of the effect with less then 5 samples allowed to clip which is a more cautious approach. Do you mind trying 2 allowed clipped samples per block?
Tried -2 and -3 at 2 allowable clips.
-2 : the FLAC file decreases from 1715683 to 1712152 bytes (no clips 1777874) : -3.50% to -3.70%;
-3 : the FLAC file increases from 1523876 to 1524359 bytes (no clips 1603204) : -4.95% to -4.92%

I think that -1 = 0; -2 = 1; -3 = 2 may be optimal.
Title: lossyWAV Development
Post by: halb27 on 2008-01-16 18:30:17
Looks good. Looking at the -2 result there's only a negligible difference between allowing 1 or 2 samples to clip. So allowing for just 1 sample to clip may be preferable also with -3.

But that's only for AlexB's sample.
I just encoded 7 full length tracks (those in my selective collection) from Francoise Hardy's Album 'Le temps des souvenirs' which I know has a lot of clipping. 0.6.9 (5 allowed clipping samples) provides for a decrease in total filesize of -19,4% (against using 0.6.7RC2).
So for clipped tracks your suggestion yields a significant improvement.
I'd like to try these tracks with less samples allowed clipping. Can you provide such an experimental version, please?
Title: lossyWAV Development
Post by: Nick.C on 2008-01-16 18:52:16
Will do - should be up in an hour or so......
Title: lossyWAV Development
Post by: halb27 on 2008-01-16 19:05:59
Meanwhile I can report about my listening test with 0.6.9 on AlexB's sample and 2 of my 7 tracks. It's alright to me - I could not abx any of those spots I had a suspicion that there's a slightly audible issue. What I considered 'wrong' in the encoding was also 'wrong' in the orginal -  Francoise Hardy's album has a pretty bad quality.

So from this even 5 allowed clipped samples per block are ok to use. Anyway allowing for only a lower amount of clipped samples provides for a higher degree of safety and might be the better choice in case file size remains similar.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-16 19:53:39
lossyWAV beta v0.7.0 attached to post #1 in this thread.

Very pleased with the speed now - beta v0.7.0 processes 125MB of WAV in 14 seconds (average 53.1x) on an Intel C2D E6600 @ 3.0GHz, 2 x 80GB HDD in RAID0, Windows XP SP2
Title: lossyWAV Development
Post by: halb27 on 2008-01-16 20:55:38
My 0.7.0 result for AlexB's track:

-clips 3: -5,0%
-clips 2: -5,0% (file size with my FLAC setting: 1523674)
-clips 1: -4,7%

My 0.7.0 result for my 7 Francoise Hardy tracks:

-clips 3: -19,2%
-clips 2: -18,4%
-clips 1: -14,2%


So from this the essential reduction in filesize is achieved already with just 1 allowed sample per block to clip.
2 allowed samples to clip is attractive to some extent, but to a minor degree.
3 allowed samples to clip brings only an insignificant advantage and is not attractive.
More than 3 allowed samples to clip is useless in a practical sense.

So I think we have 4 useful choices:

a) 1 allowed clipped sample per block for -2, 2 allowed clipped samples per block for -3.
b) 1 allowed clipped sample per block for -2 and -3.
c) full clipping prevention with -2, 1 allowed clipped sample per block for -3.
d) full clipping prevention with -2, 2 allowed clipped samples per block for -3.

I personally don't care much about whether we should allow for 1 or 2 clipped samples with -3. I think both choices are fully in congruence with what we want to achieve with -3.
I'm more worried about whether or not we should allow for clipped samples with -2. I feel a bit uncomfortable due to the nature of -2 and the distortion 2Bdecided mentioned when allowing clipping to occur though I don't think that this can be audible.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-16 21:03:25
So I think we have 4 useful choices:

a) 1 allowed clipped sample per block for -2, 2 allowed clipped samples per block for -3.
b) 1 allowed clipped sample per block for -2 and -3.
c) full clipping prevention with -2, 1 allowed clipped sample per block for -3.
d) full clipping prevention with -2, 2 allowed clipped samples per block for -3.

I personally don't care much about whether we should allow for 1 or 2 clipped samples with -3. I think both choices are fully in congruence with what we want to achieve with -3.
I'm more worried about whether or not we should allow for clipped samples with -2. I feel a bit uncomfortable due to the nature of -2 and the distortion 2Bdecided mentioned when allowing clipping to occur though I don't think that this can be audible.
Is *one* "distorted" sample (22.68 microseconds) going to be of any real significance? Personally, unless over-ruled by someone with more expert knowledge, is -1=0; -2=1; -3=2, i.e. as per beta v0.7.0.

I'm really pleased about the reduction on your badly clipping tracks : -18.4% is excellent!

This modification may bring down the average bitrate for -3 a bit to bring is a bit closer to that for v0.6.4 RC1.
Title: lossyWAV Development
Post by: halb27 on 2008-01-16 21:29:23
Is *one* "distorted" sample (22.68 microseconds) going to be of any real significance? Personally, unless over-ruled by someone with more expert knowledge, is -1=0; -2=1; -3=2, i.e. as per beta v0.7.0.

So OK then.
This modification may bring down the average bitrate for -3 a bit to bring is a bit closer to that for v0.6.4 RC1.

This is another story and depends heavily on the degree of clipped tracks in the user's collection.

For a short impression I encoded the small regular track set I used so often to find out about the average bitrate for a lossyWAV version:

0.6.7RC2 -3: Total filesize: 141231970
0.7.0 -3 -clips 2: Total filesize: 141227370

Difference: -0,003%
Title: lossyWAV Development
Post by: Nick.C on 2008-01-16 21:38:09
For a short impression I encoded the small regular track set I used so often to find out about the average bitrate for a lossyWAV version:

0.6.7RC2 -3: Total filesize: 141231970
0.7.0 -3 -clips 2: Total filesize: 141227370

Difference: -0,003%
Hehehe..... there is a small penalty for including the -clips parameter : " -clips n" is added to the parameter string in the "fact" chunk in the wav file.....

I just transcoded my Mike Oldfield collection (261 tracks, 24h30m12s) in 40m48s : an average throughput (FLAC [from NAS] > WAV [local]> lossyWAV > FLAC) of 36x - and lo and behold, there was *no* difference at all in the total filesize (from v0.6.7 RC2 and beta v0.7.0) that couldn't be explained by the extra 9 bytes per file!
Title: lossyWAV Development
Post by: halb27 on 2008-01-16 22:36:42
When it was up to introducing our current clipping prevention scheme I searched hard for clipped tracks with the result that my collection has clipping next to nothing.

IMO we should stick to your suggestion for -3. After all clipping exists. But I think my clipping album is an extreme case of clipping, and AlexB's sample is more representative for clipped tracks.
Because of this and the fact that clipping is very rare I suggest to allow only 1 clipped sample in a block for -3, and keep the clipping prevention scheme in full action with -2.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-19 15:20:38
When it was up to introducing our current clipping prevention scheme I searched hard for clipped tracks with the result that my collection has clipping next to nothing.

IMO we should stick to your suggestion for -3. After all clipping exists. But I think my clipping album is an extreme case of clipping, and AlexB's sample is more representative for clipped tracks.
Because of this and the fact that clipping is very rare I suggest to allow only 1 clipped sample in a block for -3, and keep the clipping prevention scheme in full action with -2.
I would certainly agree that -1 should have full clipping prevention. For -2, maybe 1 clip per channel per codec_block would be acceptable. For -3, 2 clips per channel per codec_block seems to work well. I will strip out the -clips parameter and post v0.7.1 RC3 sometime tomorrow.

I've been optimising in IA-32 / x87 again and the speed is getting marginally better.

Given that v0.6.7 RC2 has 95 downloads at the moment with no negative comments, I feel that we're *really* close to v1.0.0 final - we just have to agree amongst ourselves as to the exact number of (rounded down) clips acceptable for each quality preset.
Title: lossyWAV Development
Post by: halb27 on 2008-01-19 21:58:29
I would certainly agree that -1 should have full clipping prevention. For -2, maybe 1 clip per channel per codec_block would be acceptable. For -3, 2 clips per channel per codec_block seems to work well ... we just have to agree amongst ourselves as to the exact number of (rounded down) clips acceptable for each quality preset.

Well to me the pretty rare event of clipping is an argument not to circumvent the clipping protection scheme with -2 and keep -2 in a 'pure' form. But we shouldn't continue this forever, so do go ahead with your favorite choice. I also don't think giving away clipping protection for 1 sample per channel and block will be audible.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-22 21:13:43
Well to me the pretty rare event of clipping is an argument not to circumvent the clipping protection scheme with -2 and keep -2 in a 'pure' form. But we shouldn't continue this forever, so do go ahead with your favorite choice. I also don't think giving away clipping protection for 1 sample per channel and block will be audible.
lossyWAV beta v0.7.1 attached to post #1 in this thread.
Title: lossyWAV Development
Post by: halb27 on 2008-01-22 22:16:20
Thank you Nick, especially for -noclips.
Can you tell a bit about the new window function and the noise constants? Are these changes conservative, or is it necessary to do some testing?
Title: lossyWAV Development
Post by: Nick.C on 2008-01-22 22:24:12
Thank you Nick, especially for -noclips.
Can you tell a bit about the new window function and the noise constants? Are these changes conservative, or is it necessary to do some testing?
I had a brief PM discussion with David and I realised that I was using a zero-ended window function - values 0.5 fft_length apart did not sum to 1. I modified this slightly and the values 0.5 fft_length apart now sum to 1. The noise constants were re-calculated and incorporated into the code.

The bitrate has come down slightly (462.22kbps [v0.6.7 RC2] to 461.54kbps [beta v0.7.1]) at -3 for my 53 sample set.

If you have the time, I would welcome validation of the new window function. However, I feel that all it does is to use more samples per codec_block (64 not 62, etc.) so it should not sacrifice quality.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-23 21:43:08
I had a thought - at present -1 uses 4 FFT's (64, 128, 512 & 1024 samples); -2 uses 2 FFT's (64, 256 & 1024 samples) and -3 uses 2 FFT's (64 & 1024 samples).

I am thinking about implementing a "-extrafft" parameter to add an extra FFT analysis to the existing quality preset at the user's discretion, which will basically increase the processing time, but also increase the scope of the analysis.

In this way, -1 would use 5 FFT's (64, 128, 256, 512 & 1024 samples); -2 would use 4 FFT's (64, 128, 512 & 1024 samples) and -3 would use 3 FFT's (64, 256 & 1024 samples).

Thoughts, anyone?
Title: lossyWAV Development
Post by: halb27 on 2008-01-24 15:06:38
From my understanding we need more than 1 FFT because for getting a good temporal resolution (catching transients) we need a rather short FFT, and for a good frequency resolution in the low to medium frequency range we need a long FFT (for the very high frequency range the short FFT should be sufficient).

From this and from practical results I don't see why we should have more than 2 FFTs. It's okay to have an addtional FFT for safety reasons when going from -3 to -2, and from -2 to -1, but I don't see why we should go beyond that.

Sorry for being so negative towards your recents suggestions. I see you're eager to get further improvements the one or other way.
To me personally I think things are very good and don't need refinement as long as no issues come up in practice.

The only thing I personally would like to have consideration one last time is the coverage of the FFT windows of the 1024 sample FFT over the block.
We have a different sight on this as you feel the need that there is a 50% overlap of FFT analyses between adjacent blocks. I don't see any overlapping necessary for the blocks, and I think your consideration is based on one of 2Bdecided's remarks but I beleive this is a misunderstanding. Bits to remove decision in my understanding is not a global decision, not a block-overlapping decision, IMO not even a block-orientated decision (but I think the latter has no practical impact). IMO we can assign to each singular sample a number of bits to remove based on the analysis of those FFT windows where the specific sample has a contribution.
Block consideration comes in as the lowest number of bits to remove (per sample) must be chosen in order to assign a bits to remove number to the block. Moreover it's useful to base the FFT window partitioning based on the block. What's necessary is a good overlapping of the FFTs in the block under consideration. According to 2Bdecided the overlapping of the FFT windows (within the block!) should be 50% or more. We had a discussion before from which I thought we have an overlapping of 5/8 but IIRC this is not practice currently.
My suggestion once was that in order to have the overlapping not to go far into neighboring blocks as their samples have nothing to do with the block under consideration, and I suggested to have the centre point of the most outward FFT windows a little bit within the block. 2Bdecided preferred the edge position because of a good temporal resolution at the very beginning and end of the block, but this cannot be an issue with the 1024 sample FFT, simply because this job is up to the short FFT.

So can you please reconsider using the following 1024 sample FFT windows: -448:575, -64:959. With this the center of the FFT window is just 64 samples (1/16 of the window length) away from the edges, and think this isn't a problem for catching problems at the edges (and temporal resolution issues are catched up by the short FFT, not the 1024 sample FFT, and the 64 sample FFTs can be centered at the edges). The advantage is in the middle area of the block as this area is covered better now by the 2 FFT windows. With the center points at the very edges we are already 50% away from the FFT centers when it's about the middle of the block, and the samples there participate only partially in the FFT analyses. If you do a third long FFT centered at the block center the way you wrote about (but I'm not sure whether this is in action right now) things are alright of course, but at the cost of an additional FFT window.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-24 19:50:44
From my understanding we need more than 1 FFT because for getting a good temporal resolution (catching transients) we need a rather short FFT, and for a good frequency resolution in the low to medium frequency range we need a long FFT (for the very high frequency range the short FFT should be sufficient).

From this and from practical results I don't see why we should have more than 2 FFTs. It's okay to have an addtional FFT for safety reasons when going from -3 to -2, and from -2 to -1, but I don't see why we should go beyond that.

Sorry for being so negative towards your recents suggestions. I see you're eager to get further improvements the one or other way.
To me personally I think things are very good and don't need refinement as long as no issues come up in practice.

The only thing I personally would like to have consideration one last time is the coverage of the FFT windows of the 1024 sample FFT over the block.
We have a different sight on this as you feel the need that there is a 50% overlap of FFT analyses between adjacent blocks. I don't see any overlapping necessary for the blocks, and I think your consideration is based on one of 2Bdecided's remarks but I beleive this is a misunderstanding. Bits to remove decision in my understanding is not a global decision, not a block-overlapping decision, IMO not even a block-orientated decision (but I think the latter has no practical impact). IMO we can assign to each singular sample a number of bits to remove based on the analysis of those FFT windows where the specific sample has a contribution.
Block consideration comes in as the lowest number of bits to remove (per sample) must be chosen in order to assign a bits to remove number to the block. Moreover it's useful to base the FFT window partitioning based on the block. What's necessary is a good overlapping of the FFTs in the block under consideration. According to 2Bdecided the overlapping of the FFT windows (within the block!) should be 50% or more. We had a discussion before from which I thought we have an overlapping of 5/8 but IIRC this is not practice currently.
My suggestion once was that in order to have the overlapping not to go far into neighboring blocks as their samples have nothing to do with the block under consideration, and I suggested to have the centre point of the most outward FFT windows a little bit within the block. 2Bdecided preferred the edge position because of a good temporal resolution at the very beginning and end of the block, but this cannot be an issue with the 1024 sample FFT, simply because this job is up to the short FFT.

So can you please reconsider using the following 1024 sample FFT windows: -448:575, -64:959. With this the center of the FFT window is just 64 samples (1/16 of the window length) away from the edges, and think this isn't a problem for catching problems at the edges (and temporal resolution issues are catched up by the short FFT, not the 1024 sample FFT, and the 64 sample FFTs can be centered at the edges). The advantage is in the middle area of the block as this area is covered better now by the 2 FFT windows. With the center points at the very edges we are already 50% away from the FFT centers when it's about the middle of the block, and the samples there participate only partially in the FFT analyses. If you do a third long FFT centered at the block center the way you wrote about (but I'm not sure whether this is in action right now) things are alright of course, but at the cost of an additional FFT window.
The -448/-64 method does not benefit from the code speedup as the 0:1023 existing FFT is recycled as the -512:511 in the next codec_block, neither does it benefit from a 50% overlap between FFT analyses.

I would rather go down the -256:767 route if we are going to deviate from the -512:511;0:1023 route. Someone with more knowledge than me should ultimately make the decision, but if the existing -512:511;0:1023 is not acceptable then my preference is clear.
Title: lossyWAV Development
Post by: halb27 on 2008-01-24 22:35:02
The -448/-64 method does not benefit from the code speedup as the 0:1023 existing FFT is recycled as the -512:511 in the next codec_block ...

I see, this was the speedup trick. Clever done.
... neither does it benefit from a 50% overlap between FFT analyses. ....

I do not understand why you want a 50% FFT overlap for neighboring blocks. We have a per block analysis and determination of bits to remove. Ideally we don't consider samples at all from neighboring blocks, it is a negative side effect that we have to accept due to the nature of the FFT window. Sure we want accuracy at the block's edges so the FFT windows will reach into the neighboring block. But to do so to a smaller degree than 50% if we can allow is better than reaching 50% into the neighborhood.

But the speedup thing is valuable, especially for -3 with its excellent speed.
What do you think about the -448:575, -64:959 windows for -2, or at least for -1?
Title: lossyWAV Development
Post by: Nick.C on 2008-01-25 08:45:40
The -448/-64 method does not benefit from the code speedup as the 0:1023 existing FFT is recycled as the -512:511 in the next codec_block ...
I see, this was the speedup trick. Clever done.
... neither does it benefit from a 50% overlap between FFT analyses. ....
I do not understand why you want a 50% FFT overlap for neighboring blocks. We have a per block analysis and determination of bits to remove. Ideally we don't consider samples at all from neighboring blocks, it is a negative side effect that we have to accept due to the nature of the FFT window. Sure we want accuracy at the block's edges so the FFT windows will reach into the neighboring block. But to do so to a smaller degree than 50% if we can allow is better than reaching 50% into the neighborhood.

But the speedup thing is valuable, especially for -3 with its excellent speed.
What do you think about the -448:575, -64:959 windows for -2, or at least for -1?
I hear what you say - all three options are now available in lossyWAV beta v0.7.2, attached to post #1 in this thread.
Title: lossyWAV Development
Post by: halb27 on 2008-01-25 12:22:28
... I hear what you say - all three options are now available in lossyWAV beta v0.7.2, attached to post #1 in this thread.

Wonderful - you make me happy. Thank you very much.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-01-25 13:07:05
Ideally we don't consider samples at all from neighboring blocks, it is a negative side effect that we have to accept due to the nature of the FFT window.
Not quite - the concepts of time and frequency are linked, and you can only have the frequency accuracy of a 1024-point FFT by looking at 1024 samples. If you want that accuracy (and I believe we do) you need that many samples. So no, even "ideally" we need to consider samples from neighboring blocks - just as, at the limit, a single sample tells you nothing.

As long as there is at least 50% overlap, and all the blocks is covered by a 0.5 or higher parts of the window function, it really doesn't matter which of the two or three proposed schemes you use.

You're looking for the quietest part, and that could be anywhere in the block. Focusing on the start, middle, end, or any point(s) in between has no advantage in this respect.

What we do know is that something special can happen at block boundaries which cannot happen anywhere else (we introduce a transition), so focussing on these has some merrit, but I wouldn't argue to the death for it!


The worst case scenario is this: you have a notch in the frequency spectrum that's narrow enough that you need a 1024 point FFT to catch it (otherwise the shorter FFT will catch it anyway, and the position of the 1024 point FFT doesn't matter!). Now, switch this notch in and out at block boundaries, so one block has it, and the next doesn't. If the notch is in white noise, we won't hear the switching transients, so we can switch in a single sample.

So, if you use a 64-point FFT, you can't see this notch - it's too narrow.

Yet if you use a 1024-point FFT, you'll hit your problem - the centred window sees more of the notch than the edge window.

Does it make any audible difference? I can't tell, but I've attached a sample if you want to check. It's only 1 second long, the first 1/2 has alternate filtered/not filtered 512-sample blocks. The second 1/2 is all unfiltered. You can clearly hear the difference between these two, but does lossywav processing change it at all, with either window position?

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-25 13:22:26
Thanks David,

The idea of using the centred analysis, i.e. -256:767, has the whole codec_block in the 50% or higher zone and will also include 256 samples from the codec_blocks either side, although that does mean that the block edge samples are only at 50%.

However, prioritising the codec_block edges, the existing method (-512:511; 0:1023) has the samples at each end of the codec_block at 100% in one or other analysis.

[edit] Your sample using v0.7.2, FLAC -5, -3: (e) 54031bytes; -3 -overlap: (o) 54158bytes; -3 -centre: © 53326 bytes. (attached) Will try listening to them. [/edit]

[edit2] All I'm getting is a slight difference in tone of a sub-frequency that isn't the noise itself..... Mind you, my ears have visited !loud! environments a few times too often. [/edit2]
Title: lossyWAV Development
Post by: 2Bdecided on 2008-01-25 16:19:23
It was the lowest frequencies that I removed, but my ears at least can't hear that they're absent/present/absent/present every 512 samples in the original file - they just sound quieter overall for the first half of the file.

Looking at the bits removed, the different modes are doing what you'd expect: the centred mode clearly picks up the blocks with the notch filter and removes fewer bits (and removes more bits where there's pure white noise), the others don't really notice a difference.

However, during the second 1/2 of the file (which is just white noise), the centred mode jumps around a lot in bits removed even though there's no difference (other than the noise being random) between blocks. I've attached an image showing how the added noise jumps around (all three lossy-original=difference signals boosted by 42dB for display).

All three lossy versions sound the same as the original to me.

If anyone can think of a more critical test sample, please post.

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2008-01-25 18:05:19
Ideally we don't consider samples at all from neighboring blocks, it is a negative side effect that we have to accept due to the nature of the FFT window.
Not quite - the concepts of time and frequency are linked, and you can only have the frequency accuracy of a 1024-point FFT by looking at 1024 samples. If you want that accuracy (and I believe we do) you need that many samples. So no, even "ideally" we need to consider samples from neighboring blocks - just as, at the limit, a single sample tells you nothing.

As long as there is at least 50% overlap, and all the blocks is covered by a 0.5 or higher parts of the window function, it really doesn't matter which of the two or three proposed schemes you use.

You're looking for the quietest part, and that could be anywhere in the block. Focusing on the start, middle, end, or any point(s) in between has no advantage in this respect.

What we do know is that something special can happen at block boundaries which cannot happen anywhere else (we introduce a transition), so focussing on these has some merrit, but I wouldn't argue to the death for it!


The worst case scenario is this: you have a notch in the frequency spectrum that's narrow enough that you need a 1024 point FFT to catch it (otherwise the shorter FFT will catch it anyway, and the position of the 1024 point FFT doesn't matter!). Now, switch this notch in and out at block boundaries, so one block has it, and the next doesn't. If the notch is in white noise, we won't hear the switching transients, so we can switch in a single sample. ...

I think it's all a misunderstanding, probably I didn't make my point clear enough.
Of course we want a 1024 sample FFT, and of course every sample in the 1024 sample window counts, and of course if we want accuracy at the edge any 1024 sample FFT window which takes good care of the edges stretches its samples in a significant way into the neighboring block.
I just call this a negative side effect as we do want to assign a number of bits to remove to the block under consideration, and in this respect it's a negative (though necessary) side effect in my understanding.

In the end: do you think with the two FFT windows -448:575, -64:959 for the 0:511 block the edges are not covered well by these?

As for your sample I didn't understand what you want to show other than that a good accuracy for the edge region is needed for the 1024 sample FFT. That's again the question to me: don't you think the -448:575, -64:959 windows are a good choice for preserving the accuracy of the 1024 FFT at the edges?
According to your graphs for your sample BTW noise is (slightly) lower with these windows then with the exactly edge positioned ones.

I guess we have the same thing in mind: accuracy at the edges, but for that IMO the centre point needn't be exactly at the edge but can be a little bit interior to the block. The advantage is that with such a choice the centre region is taken better care of which is a bit underexposed with the center of the 2 FFT windows situated exactly at the edges.
Title: lossyWAV Development
Post by: halb27 on 2008-01-25 22:57:58
As there were several changes since I tested lossyWAV the last time, I did it again (using -3 -noclips -overlap) and tried to abx my usual problem samples and 2 regular tracks with french female voices.
Everything's fine. The only slight suspicion was with badvilbel where I thought I could hear more noise than in the original. I arrived at 4/4 which turned into a 5/10 finally. So I can't abx it.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-27 19:47:55
As there were several changes since I tested lossyWAV the last time, I did it again (using -3 -noclips -overlap) and tried to abx my usual problem samples and 2 regular tracks with french female voices.
Everything's fine. The only slight suspicion was with badvilbel where I thought I could hear more noise than in the original. I arrived at 4/4 which turned into a 5/10 finally. So I can't abx it.
To allow better tuning of this particular variable, I'll revise the -overlap parameter to take a value (0..16) which will set the overlap of the 1024 sample FFT to 512-16*(overlap_value), i.e. 512..256 samples. I will revise the -centre parameter to add in a central 1024 sample FFT where overlap size>256.

lossyWAV beta v0.7.3 attached to post #1 in this thread.
Title: lossyWAV Development
Post by: halb27 on 2008-01-27 21:36:10
Great, Nick! Thank you very much (cause I was thinking already that it wouldn't be bad to have the center of the 1024 sample windows a further bit more inside the block.

Just in order that I don't do something wrong can you please comfirm ot tell me I'm wrong:

a) -overlap 6 means: we have 2 1024 sample FFT windows per block with the center of each being in the block and 96 samples away from the edges?

b) -centre means: we have 3 1024 sample FFT windows per block, 1 with the centre at the block's centre and 2 with the centre at the block's edges?
Title: lossyWAV Development
Post by: Nick.C on 2008-01-27 21:41:50
Great, Nick! Thank you very much (cause I was thinking already that it wouldn't be bad to have the center of the 1024 sample windows a further bit more inside the block.

Just in order that I don't do something wrong can you please comfirm ot tell me I'm wrong:

a) -overlap 6 means: we have 2 1024 sample FFT windows per block with the center of each being in the block and 96 samples away from the edges?

b) -centre means: we have 3 1024 sample FFT windows per block, 1 with the centre at the block's centre and 2 with the centre at the block's edges?
To summarise:

-overlap 0 := -512:511;0:1023;
-overlap 4 := -448:575;-64:959;
-overlap 8 := -384:639; -128:895;
-overlap 12 := -320:703; -192:831;
-overlap 16 := -256:767;

-centre := additional -256:767. (unless -overlap 16 has been specified, obviously ).

Have fun!
Title: lossyWAV Development
Post by: halb27 on 2008-01-27 21:53:09
Thanks a lot!
Title: lossyWAV Development
Post by: Nick.C on 2008-01-27 22:06:16
Oops - immediate bug-fix (affects v0.7.2 and the first two downloaders of v0.7.3). The end_overlap of the second (possibly third) FFT analyses at 1024 sample length was being calculated incorrectly (it was still assuming end_overlap = fft_length div 2). Apologies for the error.
Title: lossyWAV Development
Post by: halb27 on 2008-01-28 18:58:06
Using new v0.7.3 -3 -noclips -overlap 8 I tried my usual killer samples as well as some regular music again.
Everything's fine.
Average bitrate for my sample set of full length regular tracks is exactly 400 kbps.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-28 19:22:40
Using new v0.7.3 -3 -noclips -overlap 8 I tried my usual killer samples as well as some regular music again.
Everything's fine.
Average bitrate for my sample set of full length regular tracks is exactly 400 kbps.
Good to hear - I'll leave the -overlap parameter where it is at the moment.

I've added a few options to the quality presets:

-1 now has an additional variant -1a (1 added FFT length);
-2 now has 2 additional variants -2a & -2b (1 and 2 added FFT lengths respectively);
-3 now has 3 additional variants -3a, -3b & -3c (1, 2 and 3 added FFT lengths respectively);

In this way, the user can opt to spend a bit more time on the processing (if time is not an important factor) by carrying out FFT analyses at additional FFT lengths.

-extrafft parameter removed as superseded.

lossyWAV beta v0.7.4 attached to post #1 in this thread.

[edit] Immediate update required: I must have "broken" the 24-bit handling some time ago.... Now fixed at beta v0.7.5 in the usual place. [/edit]
Title: lossyWAV Development
Post by: silverfire on 2008-01-31 12:38:32
Not a big issue, but 0.7.5 beta still says 0.7.4 

Quote
lossyWAV beta v0.7.4, Copyright © 2007,2008 Nick Currie.
Title: lossyWAV Development
Post by: Nick.C on 2008-01-31 13:01:34
Not a big issue, but 0.7.5 beta still says 0.7.4 
Quote
lossyWAV beta v0.7.4, Copyright © 2007,2008 Nick Currie.
  Oops.... Will be corrected in beta v0.7.6 - I'm trying to implement the -merge parameter to revert lossy + lwcdf to lossless.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-01-31 17:00:18
In the end: do you think with the two FFT windows -448:575, -64:959 for the 0:511 block the edges are not covered well by these?
Yes, I already said...
As long as there is at least 50% overlap, and all the blocks is covered by a 0.5 or higher parts of the window function, it really doesn't matter which of the two or three proposed schemes you use.
...what I mean was that it is good enough - I am happy with it.
(Even one in the centre is good enough!)

Think about it the opposite way:
1. forget the blocks!
2. consider that the moment with the lowest noise floor could be anywhere
3. pick an amount of window overlap that you're happy will catch this moment adequately, wherever it is relative to the window
4. now remember the blocks again, and use that lowest noise floor to set the bits_to_remove in the appropriate block.

Your suggestion increases the block overlap slightly, in a non-uniform way. It's fine. It may be beneficial (either because it overlaps more in the current block, or ignores more of the adjacent blocks), or it may be wasteful (because the existing method is fine already and efficient). I don't know.

Quote
I guess we have the same thing in mind: accuracy at the edges
No, I'm happy with 50% overlap and centred anywhere. But if you're going to centre it anywhere, it might as well be at the edges.

Quote
but for that IMO the centre point needn't be exactly at the edge but can be a little bit interior to the block.
That's true - but it's only useful if 50% overlap isn't good enough - i.e. if it's too little for within the block, or too much for outside the block. I prefer the solution (if there's a problem) which adjusts the threasholds etc so that 50% overlap is sufficient and resilient to wherever the minimum happens to be relative to the window, but that may be because 50% overlap gives equal and efficient coverage over time, and I like that.

Quote
The advantage is that with such a choice the centre region is taken better care of which is a bit underexposed with the center of the 2 FFT windows situated exactly at the edges.
That's the thing though: if it is underexposed (i.e. it ever causes a problem), I would conclude that the algorithm is wrong and the thresholds or overlap need to be adjusted to compensate. I might do what you've proposed, or something different, but I'd want to find something where it went wrong to decide what's most appropriate ti fix it. The sample I provided, if no one can hear any difference, seems to indicate that there's nothing to fix.

But I'll say it again - any solution with 50% or more overlap is fine by me wherever the windows fall. (unless a problem sample due to this crops up! if deleting or adding half a block of silence to any sample causes a dramatic change, then it really needs to be looked at).

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2008-02-01 09:39:37
Essentially you say that everything should be fine as long as each sample is off the centre of the corresponding FFT window to a maximum of 50% the FFT length.

Guess it's due to my paranoid nature towards audio that I would prefer a lower value than 50%, but you certainly are more experienced. And as for practical experience you're right: everything looks fine with the 50% overlap.

I see we're getting a lot of variations and options and have arrived at that point already. As far as this is due to me: let's forget about my personal preferences. Now that we're close to the final version it's more important to have the options clean and simple.
Title: lossyWAV Development
Post by: Nick.C on 2008-02-01 21:50:13
My preference vis-a-vis FFT end_overlap and FFT_overlap is 50%/50%, i.e. -512:511;0:1023 for 1024 sample FFT. Yes, this takes into account 3 codec blocks for the 1024 sample analysis, but I feel that this works better than simply using -256:767 as we will have the block ends at 100% not 50%.

I'm still hurting my head on the -merge parameter.....
Title: lossyWAV Development
Post by: halb27 on 2008-02-01 22:11:56
With -256:767 the block's edges are 50% the FFT length away from the centre, so the general 50% strategy applies.
Title: lossyWAV Development
Post by: Nick.C on 2008-02-01 22:43:50
With -256:767 the block's edges are 50% the FFT length away from the centre, so the general 50% strategy applies.
I know, but as more bits are removed, I am worried that quality might be suffering.... -overlap 16 (-256:767) = 455.1kbps for my 53 sample set vs 461.5kbps for the existing -512:511;0:1023 processing.
Title: lossyWAV Development
Post by: amors on 2008-02-24 12:50:20
What about new beta versions? Or development is stopped?
Title: lossyWAV Development
Post by: Nick.C on 2008-02-24 13:54:28
What about new beta versions? Or development is stopped?
I've been working (slowly) on the -merge parameter - it's taking some time.

The settings for each of the quality presets are pretty much cast in stone now (pending identification of new problem samples). However.....

Reading back through the thread, I'm almost tempted to include a "-4" parameter which would be the same as -3 was at v0.6.4 RC1 but with 5 allowable clips per channel per codec_block - as was said at the time, to re-write the settings for -3 due to only one problem sample might be considered to be a knee jerk reaction.

lossyWAV beta v0.7.6 attached to post #1 in this thread.
Title: lossyWAV Development
Post by: amors on 2008-02-24 15:22:14
Thank you for the answer and your work.
Title: lossyWAV Development
Post by: stel on 2008-02-25 08:05:01
I for one, appreciate your -4 option. Currently testing, but I doubt that I will hear any problems.
I'm another one saying thanks to everyone who's been involved in this project.
Great sound quality and an increase in battery life on my DAP what more could my ears ask for...
Title: lossyWAV Development
Post by: Nick.C on 2008-02-27 07:57:30
I for one, appreciate your -4 option. Currently testing, but I doubt that I will hear any problems.
I'm another one saying thanks to everyone who's been involved in this project.
Great sound quality and an increase in battery life on my DAP what more could my ears ask for...
 

Thinking about the -4 preset and bearing in mind the following table which shows the processed sizes for each of the current presets (and a/b/c variants):

Code: [Select]
53 "Problem" Sample Set Processing Results
==========================================
WAV 131,183,096 bytes 1411.2kbps
FLAC 72,652,785 bytes  781.6kbps (-8)
-1a  52,746,167 bytes  567.4kbps (-5)
-1   51,856,977 bytes  557.9kbps (-5)
-2b  49,032,764 bytes  527.5kbps (-5)
-2a  48,851,896 bytes  525.5kbps (-5)
-2   47,865,987 bytes  514.9kbps (-5)
-3c  43,742,164 bytes  470.6kbps (-5)
-3b  43,497,733 bytes  467.9kbps (-5)
-3a  43,235,774 bytes  465.1kbps (-5)
-3   42,976,155 bytes  462.3kbps (-5)
-4c  39,396,622 bytes  423.8kbps (-5)
-4b  39,238,991 bytes  422.1kbps (-5)
-4a  39,016,370 bytes  419.7kbps (-5)
-4   38,821,415 bytes  417.6kbps (-5)
I am tempted to make -4 equivalent to -4c (accepting the performance hit as a trade off for less likelihood of lack of transparency). A delta of 6.3kbps (-4c compared to -4) is not a large increase in bitrate for increased confidence......
Title: lossyWAV Development
Post by: halb27 on 2008-02-27 08:01:52
Hi Nick,
pretty high bitrate in the table - I'm used to ~400 kbps on average with -3. Has anything changed for -3?
Title: lossyWAV Development
Post by: Nick.C on 2008-02-27 08:04:52
Hi Nick,
pretty high bitrate in the table - I'm used to ~400 kbps on average with -3. Has anything changed for -3?
My bad, I should have prefaced the table with "following results from my 53 sample set".... Nothing has changed, except I've "improved" the maximum_bits_to_remove process to take into account the actual RMS value of the codec_block.

[edit] I've found an oversight in the codec_block RMS calculation (and amended it), -3 now 42,976,155 bytes, 462.3kbps, -4 now 38,821,415 bytes, 417.6kbps. [edit2] Table above corrected.[/edit2]
The -merge parameter will now add a .lossy.wav and corresponding .lwcdf.wav file to re-create the lossless original.

lossyWAV beta v0.7.7 attached to the first post in this thread.
[/edit]
Title: lossyWAV Development
Post by: GeSomeone on 2008-02-27 10:45:13
Code: [Select]
-4c  39,396,622 bytes  423.8kbps (-5)
-4b  39,238,991 bytes  422.1kbps (-5)
-4a  39,016,370 bytes  419.7kbps (-5)
-4   38,821,415 bytes  417.6kbps (-5)
I am tempted to make -4 equivalent to -4c (accepting the performance hit as a trade off for less likelihood of lack of transparency). A delta of 6.3kbps (-4c compared to -4) is not a large increase in bitrate for increased confidence......
The -merge parameter will now add a .lossy.wav and corresponding .lwcdf.wav file to re-create the lossless original.

Thanks Nick,
it's appreciated that you're still tying the "loose" ends while the fun wore off a bit , that rebuild with correction file kept you busy for a while.

As for -4 becomes -4c ... I don't see the point.
At first the goal was to make -2 transparent and -3 is great for just listening,
next -3 had to be transparent and -4 is (re)created for slightly lower bit rates.
Now you want -4 transparent too.  Where does it end? 

The nice thing about your results is that the settings scale so nicely. This give users a chance to pick a sweetspot (size/chances for audible noise) according to their needs.
Don't be too anxious about the lowest settings being not totally transparent all of the time, it's supposed to be lossy after all.  lossyWav with such settings might even be atractive too another group of users that need <400k bit rates and find the (possible) artifacts introduced with these settings to be preferred over those that normal lossy codec might give.

The main reason for someone not adding the a-c variants would be speed, so worse bit rate together with worse speed (-4 -> -4c) doesn't seem right for a default.
Title: lossyWAV Development
Post by: Mitch 1 2 on 2008-02-27 11:05:28
Don't be too anxious about the lowest settings being not totally transparent all of the time, it's supposed to be lossy after all.

I agree. While I appreciate all the tuning work by Nick.C and halb27, I think that the lowest setting is being held to too high a standard. I only casually listen to music, so there's no point exhaustively ABX'ing a preset which is not supposed to be perfect anyway. I would also like for lossyWAV to have a -4 preset, which would, of course, entail more of a risk.
Title: lossyWAV Development
Post by: Nick.C on 2008-02-27 11:20:08
Now you want -4 transparent too. Where does it end? 

The nice thing about your results is that the settings scale so nicely. This give users a chance to pick a sweetspot (size/chances for audible noise) according to their needs.
Don't be too anxious about the lowest settings being not totally transparent all of the time, it's supposed to be lossy after all. lossyWav with such settings might even be atractive too another group of users that need <400k bit rates and find the (possible) artifacts introduced with these settings to be preferred over those that normal lossy codec might give.

The main reason for someone not adding the a-c variants would be speed, so worse bit rate together with worse speed (-4 -> -4c) doesn't seem right for a default.
I hear you! I will leave -4 as is and allow the more paranoid user (  ) to use the extra FFT's.

Don't be too anxious about the lowest settings being not totally transparent all of the time, it's supposed to be lossy after all.
I agree. While I appreciate all the tuning work by Nick.C and halb27, I think that the lowest setting is being held to too high a standard. I only casually listen to music, so there's no point exhaustively ABX'ing a preset which is not supposed to be perfect anyway. I would also like for lossyWAV to have a -4 preset, which would, of course, entail more of a risk.
Do you mean keep the existing -4 or go even further? You can still use -nts to increase the bits to remove, however this makes the -snr limiter kick in more often, so you would have to change both at once.

[edit] A quick check shows that -4 -nts 12 -snr 15 yields 33,168,675 bytes, 356.8kbps for the same sample set. [/edit]
Title: lossyWAV Development
Post by: shadowking on 2008-02-27 12:01:52
Although its not a major issue, I am still slightly bothered by the positive abx of Alex B at > 400 k bitrate vbr. i know its a minute difference and hard to define, but you can't just grab ordinary music (Springsteen drums) and abx at 400 k unless it was a fluke sample (even so its very very tough chance).. The fact that it was the solo intro section means that again the is not enough masking of HF stuff despite such high bitrate. I think that there might not be advantages over Bryants new Wavpack --dns which can outperform lossywav in its current state using much lower bitrate - maybe even under 300k using --dns on some samples like Alex B's. The other question is what chances would Alex B have at pulling of random abx using vorbis, aac, mpc etc @ 320k let alone 400 k ??

I know its like comparing apples to oranges and not everyone using the format is interested in 500k or total transparency, but my head just says @ 400 k I don't want to see people pulling off abx tricks.

I am thinking maybe only -1 and -2 should have been available as fully transparent. But I would like much more options - 256 k would be plenty for some people. -3 has maybe too high expectations ? personaly wavpack --dns at 270 k lossy + correction files looks attractive to me.

I think the scale should be flexible and direct:

-1 - For mutli-transcoding ++ overkill
-2 - Transparent suitable for archiving (Default)
-3 - High quality .Normaly undistinguishable from original.
-4 - medium
-5 - portable

Or a starters guide for lossywav settings:

+Highest quality: Archiving / editing (-1 .. -2)
+High quality / Hifi (-3 .. -4)
+Medium (-5 .. -6)
+Portable / outdoor (-7 ...-8....)
Title: lossyWAV Development
Post by: Nick.C on 2008-02-27 13:00:16
The fact that it was the solo intro section means that again the is not enough masking of HF stuff despite such high bitrate.
There is no masking of any frequency - bit reduction will add noise across the whole spectrum.
I think that there might not be advantages over Bryants new Wavpack --dns which can outperform lossywav in its current state using much lower bitrate - maybe even under 300k using --dns on some samples like Alex B's. The other question is what chances would Alex B have at pulling of random abx using vorbis, aac, mpc etc @ 320k let alone 400 k ??

I know its like comparing apples to oranges and not everyone using the format is interested in 500k or total transparency, but my head just says @ 400 k I don't want to see people pulling off abx tricks.

I am thinking maybe only -1 and -2 should have been available as fully transparent. But I would like much more options - 256 k would be plenty for some people. -3 has maybe too high expectations ? personaly wavpack --dns at 270 k lossy + correction files looks attractive to me.
lossyWAV is and always has been pure VBR. The sample set I use for testing purposes will produce a higher bitrate of output than any real music I've found so far. Previous testing at -3 had my sample set at 462kbps and my 10 album test set at 402kbps. I will process my 10 album test set this evening and post the results.

I think the scale should be flexible and direct:

-1 - For mutli-transcoding ++ overkill
-2 - Transparent suitable for archiving (Default)
-3 - High quality .Normaly undistinguishable from original.
-4 - medium
-5 - portable

Or a starters guide for lossywav settings:

+Highest quality: Archiving / editing (-1 .. -2)
+High quality / Hifi (-3 .. -4)
+Medium (-5 .. -6)
+Portable / outdoor (-7 ...-8....)
Using the settings at the end of my last post, I will add a "-5" parameter which might yield about 310 to 320kbps. This will require to be listened to in order to validate it as a meaningful / acceptable preset, as forcing down the bitrate is meaningless unless the quality of the output remains fit for its intended use.

In the interim, I will post beta v0.7.8 which includes the -5 preset. I will also process my 10 album test set at -5 this evening and post the results.

[edit] Thinking about Alex_B's livin_in_the_future_sample, could someone with good ears try to ABX it against v0.7.8 -4? This would let me know if the "active" maximum_bits_to_remove recently introduced has any beneficial effect on this sample. [/edit]
Title: lossyWAV Development
Post by: halb27 on 2008-02-27 13:35:48
... I am still slightly bothered by the positive abx of Alex B at > 400 k bitrate vbr. i know its a minute difference and hard to define, but you can't just grab ordinary music (Springsteen drums) and abx at 400 k unless it was a fluke sample (even so its very very tough chance).. The fact that it was the solo intro section means that again the is not enough masking of HF stuff despite such high bitrate. I think that there might not be advantages over Bryants new Wavpack --dns which can outperform lossywav in its current state using much lower bitrate - maybe even under 300k using --dns on some samples like Alex B's.

IIRC there had been two changes after Alex B's abxing: one which made the mechanism more sensitive to the HF area, and one which reduced the noise a bit in an overall sense. After that AlexB couldn't abx the problem any more with -3 IIRC.
As for the comparison with wavPack lossy IMO it's true that when targeting at a relatively low bitrate, say 300 kbps or below, wavPack lossy --dns is the more appropriate choice. With lossyWAV + a lossless codec we have the issue that a small codec's blocksize usually is best in an overall sense which however makes the lossless codec a bit inefficient usually. wavPack lossy doesn't suffer from this. That's why I personally woldn't target at a bitrate like 300 kbps with lossyWAV.
lossyWAV's advantage is it's quality reinsuring mechanism which however needs the current quality setting of at least -3. Anyway loosening it a bit like with the current -4 or -5 approach is a good option for those people who don't need transparency but a very high quality while having bitrate in the 350 kbps or even a bit below that area.

Previous testing at -3 had my sample set at 462kbps and my 10 album test set at 402kbps. I will process my 10 album test set this evening and post the results.

Would it hurt a lot if you skipped your 52 sample set (with a lot of problem samples where a high bitrate is welcome) in favor of a regular music set? It's not necessery to encode 10 complete albums (a lot of work), a  hopefully represantative sample set from these albums will do it. IMO it's more important to have the result of regular tracks even when not very representative than the result of problem sample snippets.
Title: lossyWAV Development
Post by: GeSomeone on 2008-02-27 14:26:30
lossyWAV is and always has been pure VBR.

Technically it's FLAC, WavPack, TAK etc. that are VBR. lossyWav is fixed bit rate because wav's have fixed bit rate. 
(Of course what lossyWav does is influence the bitrate of the lossless part by making the wav easier to compress.)
Title: lossyWAV Development
Post by: 2Bdecided on 2008-02-27 15:21:26
No, it's conceptually VBR, but packed into a CBR linear PCM bitstream for output because that's how the world works.

Can I just say - "preset 5" - please, no!

The lossyWAV principle works well, but it goes from "fine" to "poor" to "useless" over a range of 6-12dB (1 to 2 bits).

It's splitting hairs to define 3 presets between "fine" and "useless". Unlike mp3, I don't believe there's that amount of useful room to play with. You very quickly hit something with a bitrate far higher than mp3, and an audio quality far lower.


Still, I guess it's a good thing if people are asking for lower quality!

Cheers,
David.


Although its not a major issue, I am still slightly bothered by the positive abx of Alex B at > 400 k bitrate vbr. i know its a minute difference and hard to define
...and it was at the preset that's not supposed to be transparent. IIRC it wasn't ABXed at the transparent preset, and was subsequently fixed on the non-transparent preset.

I'm not being defensive. I'm very keen for people to find genuine problem samples. This wasn't one IIRC (it's been 38 pages - I'm sorry if I'm thinking of the wrong one!).

Quote
but you can't just grab ordinary music (Springsteen drums) and abx at 400 k unless it was a fluke sample (even so its very very tough chance).. The fact that it was the solo intro section means that again the is not enough masking of HF stuff despite such high bitrate. I think that there might not be advantages over Bryants new Wavpack --dns which can outperform lossywav in its current state using much lower bitrate - maybe even under 300k using --dns on some samples like Alex B's.
You should see the bitrate of lossyWAV if the noise is allowed to be non-flat!

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2008-02-27 16:05:22
...
Can I just say - "preset 5" - please, no!

The lossyWAV principle works well, but it goes from "fine" to "poor" to "useless" over a range of 6-12dB (1 to 2 bits). ...

From the lossyWAV principle: yes, but with the added skew and snr mechanism there is a certain room for this IMO.

I once tried -3 with -nts 10 and higher, and to me quality was still good with -nts 10. That was before the HF sensitivity increase due to AlexB's sample, and I arrived at a bitrate ~330 kbps on average. I think something like this can make sense for -5. -4 can be -nts 3 or similar (more attractive to me than -5, but I'll stick with -3).
Title: lossyWAV Development
Post by: shadowking on 2008-02-27 16:11:02
okay guys, thanks for the explanations. I just don't know where lossywav quality drops sharply (wavpack falls over below 235 k ).. so go for whatever you think is the max point for non-offensive losswav listening .
Title: lossyWAV Development
Post by: Nick.C on 2008-02-27 17:11:58
I'll do some processing of my 10 album test set tonight and post the results. For my 53 sample set and a slightly modified set of quality presets (-4=-3.5; -5=-4; -6=-4.5; -7=-5, but all slightly changed) which may feature in beta v0.7.9:
Code: [Select]
Preset  [Equiv. Settings]    Total Size      Bitrate  [Delta.BR]     10 Album Test Set
==========================================================================================
  FLAC  [---------------] 72,652,785 bytes, 781.6kbps [--------] 3.35GB, 854kbps (-------)
   -1   [-nts -4 -snr 25] 52,138,258 bytes, 560.9kbps [--------] 1.94GB, 496kbps (-65kbps)
   -2   [-nts -2 -snr 23] 48,177,581 bytes, 518.3kbps [42.6kbps] 1.78GB, 453kbps (-65kbps)
   -3   [-nts  0 -snr 21] 42,976,155 bytes, 462.3kbps [56.0kbps] 1.58GB, 403kbps (-59kbps)
   -4   [-nts  3 -snr 20] 40,324,698 bytes, 433.8kbps [28.5kbps] 1.47GB, 375kbps (-59kbps)
   -5   [-nts  6 -snr 19] 37,934,855 bytes, 408.1kbps [25.7kbps] 1.38GB, 352kbps (-56kbps)
   -6   [-nts  9 -snr 18] 35,826,396 bytes, 385.4kbps [22.7kbps] 1.31GB, 333kbps (-52kbps)
   -7   [-nts 12 -snr 17] 33,950,736 bytes, 365.2kbps [20.2kbps] 1.25GB, 318kbps (-47kbps)
Title: lossyWAV Development
Post by: halb27 on 2008-02-27 19:27:22
It was your target so far to have the -snr value constant. So I quickly checked my productive collection I reincoded recently using -3, and thanks to the lossy.flac embedded meta-information -snr value is -snr 21, so you kept this value for -3. Fine.
The fact that you increase the -snr value for -2 and -1 is in congruence with the increasing defensiveness of these settings, but as -snr affects mainly the quality of the lower frequency range which is already covered particularly by the values we had so far, I personally would prefer a higher -nts value when it is about sacrificing a little bit of bitrate. No big thing to me however.

As for the lower bitrate settings: not my world, just a suggestion:
in case it turs out that too much quality is sacrifcied an alternative is not to lower -snr that much but instead use a larger spreading length for the highest and - to a minor degree - the second highest frequency zone.
When it's about sacrificing quality I think it's perceptually the least offensive to do it in the very high frequency range. With -nts 12 -snr 17 I'm afraid chance isn't very low to get a modest quality in the frequency range of the fundamentals where it will be more disturbing.
Just a suggestion in case this should happen.
Title: lossyWAV Development
Post by: Nick.C on 2008-02-27 20:26:44
It was your target so far to have the -snr value constant. So I quickly checked my productive collection I reincoded recently using -3, and thanks to the lossy.flac embedded meta-information -snr value is -snr 21, so you kept this value for -3. Fine.
The fact that you increase the -snr value for -2 and -1 is in congruence with the increasing defensiveness of these settings, but as -snr affects mainly the quality of the lower frequency range which is already covered particularly by the values we had so far, I personally would prefer a higher -nts value when it is about sacrificing a little bit of bitrate. No big thing to me however.

As for the lower bitrate settings: not my world, just a suggestion:
in case it turs out that too much quality is sacrifcied an alternative is not to lower -snr that much but instead use a larger spreading length for the highest and - to a minor degree - the second highest frequency zone.
When it's about sacrificing quality I think it's perceptually the least offensive to do it in the very high frequency range. With -nts 12 -snr 17 I'm afraid chance isn't very low to get a modest quality in the frequency range of the fundamentals where it will be more disturbing.
Just a suggestion in case this should happen.
I'll see what effect keeping the -snr constant has on the lower quality presets. Table above amended to include results of (ongoing) 10 Album Test Set processing.
Title: lossyWAV Development
Post by: halb27 on 2008-02-27 20:46:53
... I just don't know where lossywav quality drops sharply (wavpack falls over below 235 k ) ...

From former experiments I guess that's slighty above 300 kbps. In this bitrate range expectations are higher of course than with wavPack lossy's 235 kbps edge. So I think the practical edge - talking only of bitrate - is pretty much where Nick.C has it now with his least demanding quality setting.
Title: lossyWAV Development
Post by: Nick.C on 2008-02-27 22:02:02
... I just don't know where lossywav quality drops sharply (wavpack falls over below 235 k ) ...
From former experiments I guess that's slighty above 300 kbps. In this bitrate range expectations are higher of course than with wavPack lossy's 235 kbps edge. So I think the practical edge - talking only of bitrate - is pretty much where Nick.C has it now with his least demanding quality setting.
I've been listening to some of the tracks from my 10 album test set (processed using v0.7.9, -7) and I have to say that I am happy with the quality of the output. I haven't been especially listening out for problems and have been able to get on with other things while the music is playing in the background.

lossyWAV beta v0.7.9 attached to post #1 in this thread.

[edit] forgot to include quality setting..... [/edit]
Title: lossyWAV Development
Post by: The Sheep of DEATH on 2008-02-28 01:17:53
I'm sorry if I don't understand the purpose here, but what's the point of creating a "lossy" flac, without any characteristic lossy modeling techniques, at bitrates around 320kbps?

Can anyone (and I mean anyone) possibly ABX the difference in an MP3 at 320kbps?  Well, from what I've seen, no, you can't.  However, it does seem like lossy flacs at this bitrate are easily ABX-able. 

What gives?  If lossy flac is inferior to MP3 at the same bitrate, then...?  Sorry for my ignorance in this regard.
Title: lossyWAV Development
Post by: shadowking on 2008-02-28 02:00:15
MP3 is abaxable on certain signals even at 320 k - artificial impulse heavy stuff.. You are right though that there is no point in 'archiving' without some correction files at less than 99.999 % transparent bitrate. There is a point of creating 'medium' settings as they may be more than enough for a certain listener + they can use some correction file mechanism to fully restore the original.
Title: lossyWAV Development
Post by: Nick.C on 2008-02-28 08:09:14
I'm sorry if I don't understand the purpose here, but what's the point of creating a "lossy" flac, without any characteristic lossy modeling techniques, at bitrates around 320kbps?

Can anyone (and I mean anyone) possibly ABX the difference in an MP3 at 320kbps?  Well, from what I've seen, no, you can't.  However, it does seem like lossy flacs at this bitrate are easily ABX-able. 

What gives?  If lossy flac is inferior to MP3 at the same bitrate, then...?  Sorry for my ignorance in this regard.
The purpose in introducing -4 to -7 is to accept the requests made by Shadowking, Mitch 1 2 and GeSomeone for a lower bitrate preset. Yes, it *may* not be transparent, but it will certainly extend the battery life on your DAP of choice as lower bitrates = less battery drain in reading files and FLAC is already a low power drain decoder. A recent test at anythingbutipod (http://www.anythingbutipod.com/forum/showthread.php?t=24537) indicates that even at about 380kbps, lossyFLAC will get more battery life than MP3 on one DAP using RockBox. lossyWAV is not competing with MP3 - however it is satisfying to think that the output from -7 is "listenable" to  .

-6 and -7 may not pass muster in terms of quality - it remains to be seen from ABX tests by people with good ears. Unfortunately, lossyWAV has only had a small core of ABX testers throughout its development (to those who have taken part, I am extremely grateful!), so settings validation has been a limited exercise.

@halb27 - I tried my 10 album test set at -7 -snr 21 and I got 1.37GB, 350kbps - so -snr 21 is kicking in a lot more than -snr 17. Sounds like a case for some iteration....
Title: lossyWAV Development
Post by: halb27 on 2008-02-28 13:57:38
@halb27 - I tried my 10 album test set at -7 -snr 21 and I got 1.37GB, 350kbps - so -snr 21 is kicking in a lot more than -snr 17. Sounds like a case for some iteration....

It's clear that -snr 21 leads to a higher bitrate though I would not have expected to arrive at 350 kbps with -nts 12 -snr 21. Does your current version use the spreading of 22224 for the 64 sample FFT also with these new quality settings? This could be an explanation.
Anyway I'll try your new -7 and -6 setting tonight with high quality regular music (cause that's what we're targeting at with these settings - suboptimal behavior of specific problem samples don't count much here).
Your bitrate targets are fine IMO, and maybe quality is adequate already wih your parameters. Your own listening experience does sound like that.
Title: lossyWAV Development
Post by: Nick.C on 2008-02-28 20:11:33
@halb27 - I tried my 10 album test set at -7 -snr 21 and I got 1.37GB, 350kbps - so -snr 21 is kicking in a lot more than -snr 17. Sounds like a case for some iteration....
It's clear that -snr 21 leads to a higher bitrate though I would not have expected to arrive at 350 kbps with -nts 12 -snr 21. Does your current version use the spreading of 22224 for the 64 sample FFT also with these new quality settings? This could be an explanation.
Anyway I'll try your new -7 and -6 setting tonight with high quality regular music (cause that's what we're targeting at with these settings - suboptimal behavior of specific problem samples don't count much here).
Your bitrate targets are fine IMO, and maybe quality is adequate already wih your parameters. Your own listening experience does sound like that.
The current version does indeed have 22224 as the 64 sample FFT spreading - all of the presets from 3 to 7 have the same spreading string.

The more I listen to -7 (I'm using -7a which uses the same FFT's as -2, i.e. 64, 256 and 1024 samples), the more I am content to leave this as the least demanding preset.

I hope that your listening tests go well (and thanks for the testing!).
Title: lossyWAV Development
Post by: halb27 on 2008-02-28 20:44:13
I just finished listening very carefully to 8 full high quality tracks of various genres from my collection which I know pretty well from frequently listening to them.
I used -7, and I did it in foobar ABX mode so I could easily switch listening to the corresponding spot in the original whenever I thought the encoding isn't totally fine. Which happened very, very often - but not because of a real issue but simply because I was very sceptical towards a lossyWAV 320 kbps encoding.

In fact I was totally content with all these encodings. Keep in mind though this wasn't an abx test, but it was a test which assured me that your -7 setting is expected to let me fully enjoy music.
Forget about my remarks concerning -snr and spreading in the HF region. Guess you were directly on the right track. Congratulations.

As shadowking pointed out, these relatively low bitrate settings are especially welcome with your correction file approach.
I personally will stick with -3, but lesson learnt is that I really don't have to worry about -3's quality which I admit I did sometimes (you certainly remember my concern about overlapping).

Wonderful work, Nick.
Title: lossyWAV Development
Post by: Nick.C on 2008-02-28 20:52:45
I just finished listening very carefully to 8 full high quality tracks of various genres from my collection which I know pretty well from frequently listening to them.
I used -7, and I did it in foobar ABX mode so I could easily switch listening to the corresponding spot in the original whenever I thought the encoding isn't totally fine. Which happened very, very often - but not because of a real issue but simply because I was very sceptical towards a lossyWAV 320 kbps encoding.

In fact I was totally content with all these encodings. Keep in mind hough this wasn't an abx test, but it was a test which assured me that your -7 setting is expected to let me fully enjoy music.
Forget about my remarks concerning -snr and spreading in the HF region. Guess you were directly on the right track. Congratulations.

As shadowking pointed out, these relatively low bitrate settings are especially welcome with your correction file approach.
I personally will stick with -3, but lesson learnt is that I really don't have to worry about -3's quality which I admit I did sometimes (you certainly remember my concern about overlapping).

Wonderful work, Nick.
Thank you - I'm really pleased that -7 has provided an acceptable compromise between quality and bitrate.

I think that the lack of problems at this bitrate is largely due to the revised maximum_bits_to_remove process which is re-calculated on a codec_block by codec_block basis - now maximum_bits_to_remove actually takes into account the RMS value of the samples which are going to have their bits removed.

I have been trying hard to come up with a preset which will achieve as low a bitrate as possible while at the same time not introducing glaring artifacts - for my iPAQ-as-a-DAP-with-large-CF-card solution  .

I'm in the middle of a large transcode at the moment: 1374 tracks (of 3556), 13.2GB, 317kbps, 99h41m14s duration. Listening to favourites as I go I am continually pleasantly surprised with the outcome.
Title: lossyWAV Development
Post by: halb27 on 2008-02-28 21:11:36
... I think that the lack of problems at this bitrate is largely due to the revised maximum_bits_to_remove process ...

Makes me want to reencode my collection (I used 0.7.4 for my last encoding) - no issue with my new hardware finally working.
I remember a recent remark of yours towards caution (I took it as that) for the higher quality presets which I contributed to your current work with -4 and less. Now I can't find it anymore.

Just to make sure: is it safe to use 0.7.9 for a -3 encoding?

ADDED:
OOPs, I found your remark:
[edit] forgot to include quality setting..... [/edit]
What does that mean?
Title: lossyWAV Development
Post by: Nick.C on 2008-02-28 21:25:10
... I think that the lack of problems at this bitrate is largely due to the revised maximum_bits_to_remove process ...
Makes me want to reencode my collection (I used 0.7.4 for my last encoding) - no issue with my new hardware finally working.
I remember a recent remark of yours towards caution (I took it as that) for the higher quality presets which I contributed to your current work with -4 and less. Now I can't find it anymore.

Just to make sure: is it safe to use 0.7.9 for a -3 encoding?

ADDED:
OOPs, I found your remark:
[edit] forgot to include quality setting..... [/edit]
What does that mean?
When I was talking about being happy with the output, I omitted to include the quality preset that I transcoded at. 

I am absolutely happy with v0.7.9 for transcoding, and if no adverse reports come in in the next few days then it will be v0.8.0 RC3!

The -merge parameter works and will recombine the .lossy.wav and .lwcdf.wav files if they are in the same directory (specify the .lossy.wav and -merge in the command line and it will find the .lwcdf.wav file and output a .wav file with added extension stripped off). I imagine that this will be most easily used on whole album .wav files as it would probably be a pain to do it for lots of individual tracks.
Title: lossyWAV Development
Post by: halb27 on 2008-02-28 21:38:42
Wonderful. A great step towards the final version.
Nonetheless it would be marvellous if we could get some more listener feedback.

I just decided due to these good results to try to abx -4 and maybe lower on my usual problem samples, probably this weekend. Maybe I'll change my mind and I'll use a setting like this for my next encoding.
Title: lossyWAV Development
Post by: The Sheep of DEATH on 2008-02-28 23:08:47
Hmm, I see.  So even at -7, it's not easily ABXable, and it's great on the battery.  I also use my Pocket PC as a DAP, so this is welcome news to me to.

One quick question: What are the a, b, c modes for each preset (you mention they correspond to "extra FFT analyses," but what does that mean from a quality/filesize standpoint)?  That is, how is 7 different from 7a different from 7c?

[edit]Hey, Pocket PCs support WavPack lossy, don't they?  How does this LossyFLAC up to low-complexity WavPack lossy @320kbps?
[edit2] Looks like there's a 1kbps increasing difference in bitrate moving from -7 to -7a to -7c.  Also, lossyWAV takes longer as you go down.  That's the "extra analysis" for ya.  But why the extra bitrate to go with it? 
Title: lossyWAV Development
Post by: Nick.C on 2008-02-28 23:17:39
Hmm, I see.  So even at -7, it's not easily ABXable, and it's great on the battery.  I also use my Pocket PC as a DAP, so this is welcome news to me to.

One quick question: What are the a, b, c modes for each preset (you mention they correspond to "extra FFT analyses," but what does that mean from a quality/filesize standpoint)?  That is, how is 7 different from 7a different from 7c?
Quality presets -1, -2 & -3 use 4, 3 and 2 FFT analyses respectively in processing the codec_blocks (-1 = 64, 256, 512 & 1024 samples FFT'; -2 = 64, 256 & 1024 sample FFT's; -3 = 64 & 1024 sample FFT's).

What "a" does is move from the number of FFT analyses used for that quality preset to the adjacent "better" preset, i.e. -3a = same FFT analyses as -2; -3b = same FFT analyses as -1; -3c = 64, 12, 256, 512 & 1024 sample FFT's.

Exception: -3 to -7 use the same FFT analyses, so -7a = same FFT analyses as -2, etc.

Adding FFT analyses is more likely to spot any quiet spots missed by the FFT's already used, and will generally slightly increase the bitrate (see the comparison on page 35 of the thread).
Title: lossyWAV Development
Post by: GeSomeone on 2008-02-29 13:41:51
I'm sorry if I don't understand the purpose here, but what's the point of creating a "lossy" flac, without any characteristic lossy modeling techniques, at bitrates around 320kbps?

Perhaps a bit too late, but in this thread (http://www.hydrogenaudio.org/forums/index.php?showtopic=55522) you find the start of this experiment. This is kind of "devellopment" thread with a real life implementation of that idea.
Because of the lack of psycho accoustic modeling it might not work reliable at bit rates as low as 320k, however bit rates depend quite a bit on the material to be encoded.
Title: lossyWAV Development
Post by: The Sheep of DEATH on 2008-02-29 21:18:15

I'm sorry if I don't understand the purpose here, but what's the point of creating a "lossy" flac, without any characteristic lossy modeling techniques, at bitrates around 320kbps?

Perhaps a bit too late, but in this thread (http://www.hydrogenaudio.org/forums/index.php?showtopic=55522) you find the start of this experiment. This is kind of "devellopment" thread with a real life implementation of that idea.
Because of the lack of psycho accoustic modeling it might not work reliable at bit rates as low as 320k, however bit rates depend quite a bit on the material to be encoded.

Ah, very helpful thread!  It should probably even be linked to on the first post, in my opinion.

No psychoacoustic modeling, I see.  Does wavpack implement such a model at ~235kbps?  If not, perhaps the two are in the same boat after all.
Title: lossyWAV Development
Post by: shadowking on 2008-02-29 23:50:54
Wavpack now has a basic psychoacoustic mechanism in --dns option as well as smart mid-side stereo through -x. It will shift noise up or down depending on the signal. When using -S0 noise falls flat like lossywav.
Title: lossyWAV Development
Post by: The Sheep of DEATH on 2008-03-01 04:21:25
Wavpack now has a basic psychoacoustic mechanism in --dns option. It will shift noise up or down depending on the signal. When usign -S0 noise falls flat like lossywav.


I assume implementation of this dynamic noise floor-like algorithm is comparatively difficult (i.e. likely cannot be ported to lossyWAV)?  Assuming it can be ported, would it provide as substantial a quality gain in the lower (<320kbps) bitrate ranges?  This is quite intriguing.
Title: lossyWAV Development
Post by: amors on 2008-03-01 09:10:38
Is it possible to work with foobar with parameters "correction" and "merge"?
Title: lossyWAV Development
Post by: Nick.C on 2008-03-01 11:38:32
Is it possible to work with foobar with parameters "correction" and "merge"?
I haven't yet got my head around how to automate the -merge process. The -correction parameter will be able to be used fairly simply with foobar, the -merge parameter will take more work - I'll try to start modifying the batch file tonight.
Title: lossyWAV Development
Post by: halb27 on 2008-03-01 12:50:11
Wavpack now has a basic psychoacoustic mechanism in --dns option as well as smart mid-side stereo through -x. It will shift noise up or down depending on the signal. When using -S0 noise falls flat like lossywav.

lossyWAV has no specific noise shifting mechanism but a special mechanism which keeps noise especially small in the low to medium frequency range. In a sense the effect is similar.
Title: lossyWAV Development
Post by: halb27 on 2008-03-02 22:20:57
I finished my abx test.
I used Atemlied, badvilbel, bibilolo, bruhns, dither_noise_test, eig, fiocco, furious, harp40_1, herding_calls, keys_1644ds, Livin_In_The_Future, S37_OTHERS_MartenotWaves_A, triangle-2_1644ds, trumpet, Under The Boardwalk, Blackbird/Yesterday.

I made up my mind to make it easier for a start and did not use -4 but -7.
The result was: in a strict sense I couldn't abx any of these samples.
For bibilolo (sec. 6.7-9.3) however I got at 6/7 and finally 8/10, for bruhns (sec. 4.6-7.8) I got 7/8 and finally 8/10. My feeling was that 'Livin' in the Future' (sec. 23.2-25.6) also isn't totally correct but failed miserably to abx it.
But even in these cases where I think someone with better ears can abx them fine the differences to the original are very subtle to me.

I switched to -6 and with this I could not get even suspicious results for bibilolo and bruhns. This time I improved hearing the difference for 'Livin' in the Future':
7/10. The difference is hard to describe: something like the singing isn't done with exactly the same amount of fun as in the original.

Finally using -5 everything was alright to me even with 'Livin' in the Future'.

So even with these demanding samples and a lot of listening effort at least with what I can give -7 provided an excellent result.
Nick, you provided -4 to -7 for the sake of lower bitrate while accepting small deviations from the original. It's a bit too early to say so, but in case no other experience comes up I think there's no need for this differentiation. -7 is it. In fact it's so good that IMO it can become a -3 (or a -4,  and we can let -5 or something inbetween -5 and -4 to be the -new -3). With these great result I think we can also lower the -nts demands down to -nts 0 for -2 and also a bit for -1.

Looks like your new RMS orientation has done a great job.
Title: lossyWAV Development
Post by: The Sheep of DEATH on 2008-03-02 23:26:31
Now that is great!

Time to start work on -8/9/10 then?    I guess the best thing to do is keep going lower until the results  become easily abx-able.  That was your intention with -7 in the first place, right?

Keep up the good work!

I finished my abx test.
I used Atemlied, badvilbel, bibilolo, bruhns, dither_noise_test, eig, fiocco, furious, harp40_1, herding_calls, keys_1644ds, Livin_In_The_Future, S37_OTHERS_MartenotWaves_A, triangle-2_1644ds, rumpet, Under The Boardwalk, Blackbird/Yesterday.

I made up my mind to make it easier for a start and did not use -4 but -7.
The result was: in a strict sense I couldn't abx any of these samples.
For bibilolo (sec. 6.7-9.3) however I got at 6/7 and finally 8/10, for bruhns (sec. 4.6-7.8) I got 7/8 and finally 8/10. My feeling was that 'Livin' in the Future' (sec. 23.2-25.6) also isn't totally correct but failed miserably to abx it.
But even in these cases where I think someone with better ears can abx them fine the differences to the original are very subtle to me.

I switched to -6 and with this I could not get even suspicious results for bibilolo and bruhns. This time I improved hearing the difference for 'Livin' in the Future':
7/10. The difference is hard to describe: something like the singing isn't done with exactly the same amount of fun as in the original.

Finally using -5 everything was alright to me even with 'Livin' in the Future'.

So even with these demanding samples and a lot of listening effort at least with what I can give -7 provided an excellent result.
Nick, you provided -4 to -7 for the sake of lower bitrate while accepting small deviations from the original. It's a bit too early to say so, but in case no other experience comes up I think there's no need for this differentiation. -7 is it. In fact it's so good that IMO it can become a -3 (or a -4,  and we can let -5 or something inbetween -5 and -4 to be the -new -3). With these great result I think we can also lower the -nts demands down to -nts 0 for -2 and also a bit for -1.

Looks like your new RMS orientation has done a great job.
Title: lossyWAV Development
Post by: Nick.C on 2008-03-03 08:34:31
Now that is great!

Time to start work on -8/9/10 then?    I guess the best thing to do is keep going lower until the results  become easily abx-able.  That was your intention with -7 in the first place, right?
Not exactly - but probably a good place to start...

I have revised -1 to -7 as seen in the table below and have processed my 53 sample set:
Code: [Select]
   Preset   [Equiv. Settings]    Total Size      Bitrate  [Delta.BR]     10 Album Test Set
==========================================================================================
    FLAC    [---------------] 72,652,785 bytes, 781.6kbps [--------] 3.35GB, 854kbps (-------)
  v0.7.9 -1 [-nts -4 -snr 25] 52,138,258 bytes, 560.9kbps [--------] 1.94GB, 496kbps (-65kbps)
  v0.7.9 -2 [-nts -2 -snr 23] 48,177,581 bytes, 518.3kbps [42.6kbps] 1.78GB, 453kbps (-65kbps)
  v0.7.9 -3 [-nts  0 -snr 21] 42,976,155 bytes, 462.3kbps [56.0kbps] 1.58GB, 403kbps (-59kbps)
  v0.7.9 -4 [-nts  3 -snr 20] 40,324,698 bytes, 433.8kbps [28.5kbps] 1.47GB, 375kbps (-59kbps)
  v0.7.9 -5 [-nts  6 -snr 19] 37,934,855 bytes, 408.1kbps [25.7kbps] 1.38GB, 352kbps (-56kbps)
  v0.7.9 -6 [-nts  9 -snr 18] 35,826,396 bytes, 385.4kbps [22.7kbps] 1.31GB, 333kbps (-52kbps)
  v0.7.9 -7 [-nts 12 -snr 17] 33,950,736 bytes, 365.2kbps [20.2kbps] 1.25GB, 318kbps (-47kbps)
==========================================================================================
  v0.8.0 -1 [-nts -3 -snr 24] 51,077,948 bytes, 549.5kbps [--------]
  v0.8.0 -2 [-nts  0 -snr 22] 46,198,740 bytes, 497.0kbps [52.5kbps]
  v0.8.0 -3 [-nts  3 -snr 20] 40,331,901 bytes, 433.9kbps [63.9kbps]
  v0.8.0 -4 [-nts  6 -snr 19] 37,943,564 bytes, 408.2kbps [25.7kbps]
  v0.8.0 -5 [-nts  9 -snr 18] 35,840,504 bytes, 385.6kbps [22.6kbps]
  v0.8.0 -6 [-nts 12 -snr 17] 33,969,718 bytes, 365.4kbps [20.2kbps]
  v0.8.0 -7 [-nts 15 -snr 16] 32,360,935 bytes, 348.1kbps [17.3kbps]
I'm just about to listen to new -7 to see just how awful it is.

lossyWAV beta v0.8.0 attached to post #1 in this thread.
Title: lossyWAV Development
Post by: halb27 on 2008-03-03 09:01:49
Your new v0.8.0 settings are very attractive to me.
A well-spaced differentiation in quality parameters IMO, and everybody's needs should be satisfied by one of these settings.
Title: lossyWAV Development
Post by: Nick.C on 2008-03-03 09:03:41
Your new v0.8.0 settings are very attractive to me.
A well-spaced differentiation in quality parameters IMO, and everybody's needs should be satisfied by one of these settings.
Casual listening to v0.8.0 -7 is not revealing any glaring problems, so I'm happy!
Title: lossyWAV Development
Post by: 2Bdecided on 2008-03-03 12:04:18
I take back my previous concerns. With Skew (is it fixed internally at 36?) and SNR, it's much harder to make the quality fall off a cliff, at least for samples where the lowest bins are at lower (more audible) frequencies. I'm guessing any "problems" will be for samples where the lowest bins are at higher frequencies (typically less audible).

I'm really impressed with the way all this tuning has come together - well done Nick.C, halb27, and other listeners.

You do realise that you've engineered a kind of crude psychoacoustic model?

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2008-03-03 12:33:01
I take back my previous concerns. With Skew (is it fixed internally at 36?) and SNR, it's much harder to make the quality fall off a cliff, at least for samples where the lowest bins are at lower (more audible) frequencies. I'm guessing any "problems" will be for samples where the lowest bins are at higher frequencies (typically less audible).

I'm really impressed with the way all this tuning has come together - well done Nick.C, halb27, and other listeners.

You do realise that you've engineered a kind of crude psychoacoustic model?

Cheers,
David.
Oops - that wasn't what was meant to happen!!  It does seem to work though. Skew is indeed fixed at 36dB.

I think the final element which has allowed the bitrate to be reduced to the level that it has at v0.8.0 -7 is the addition of the variable maximum_bits_to_remove.

Very happy with the results - will move to v0.8.1 RC3 after a couple of days delay for problem reports.
Title: lossyWAV Development
Post by: The Sheep of DEATH on 2008-03-04 00:37:16
I take back my previous concerns. With
It's much harder to make the quality fall off a cliff...
You do realise that you've engineered a kind of crude psychoacoustic model?


I wonder how it will sound at 200kbps.  Has there been any experimentation on that low of a bitrate?  I'm fairly certain it would be inferior to the average mp3, but I starting to get curious as to just how much the bitrate can be lowered... 

So here's my suggestion.  Why not go all the way to the bottom of the bitrate barrel, and tune your way up?  That's what Aoyumi did/does with Vorbis, which gave it (literally) the best lossy quality in the world.  Apparently, you can scale up the changes you make in the lower bitrates to the higher ones, and all bitrates would end up with the benefit. 

The point is, it's much easier to catch and tune for artifacts at low bitrates.  Once tuned, though, the tuning would apply to practically all bitrates, making all quality levels better...see what I'm saying?  There's no way I can abx 350kbps, but if you sent me down to 200, I could, and we can "tune things up." 

[edit]On a different note, on many files, the difference between -7c and -6c is under 7kbps.  This doesn't seem like what was intended...
Title: lossyWAV Development
Post by: Nick.C on 2008-03-04 08:30:30
I take back my previous concerns. With
It's much harder to make the quality fall off a cliff...
You do realise that you've engineered a kind of crude psychoacoustic model?
I wonder how it will sound at 200kbps.  Has there been any experimentation on that low of a bitrate?  I'm fairly certain it would be inferior to the average mp3, but I starting to get curious as to just how much the bitrate can be lowered... 

So here's my suggestion.  Why not go all the way to the bottom of the bitrate barrel, and tune your way up?  That's what Aoyumi did/does with Vorbis, which gave it (literally) the best lossy quality in the world.  Apparently, you can scale up the changes you make in the lower bitrates to the higher ones, and all bitrates would end up with the benefit. 

The point is, it's much easier to catch and tune for artifacts at low bitrates.  Once tuned, though, the tuning would apply to practically all bitrates, making all quality levels better...see what I'm saying?  There's no way I can abx 350kbps, but if you sent me down to 200, I could, and we can "tune things up." 

[edit]On a different note, on many files, the difference between -7c and -6c is under 7kbps.  This doesn't seem like what was intended...
I do not really want to try to go that low.... Some of the albums I've processed using v0.8.0 -7 are coming in at about 280kbps - with no glaring artifacts. I think that the main objectives of the development process have been met (or exceeded) and I am content with the current -7.

Overall, as the encoded processed file will carry the file extension of the encoder, I want to make sure that the quality of any processed output will not negatively skew public opinion against the lossless encoder.

I too am interested in "how low can we go?" - so I'll post beta v0.8.1 with a revised -nts maximum value.

On the -7c / -6c bitrate delta, I think that that means that we are approaching a limit imposed by the combination of the parameters used to maintain quality and therefore it is working perfectly. Always remember, lossyWAV is pure VBR.

lossyWAV beta v0.8.1 attached to post #1 in this thread.

From a test using my 53 problem sample set:

Code: [Select]
|-----|-----------|-----------|-----------|
| SNR |  NTS=18   |  NTS=21   |  NTS=24   |
|-----|-----------|-----------|-----------|
|   6 | 305.8kbps | 295.2kbps | 287.8kbps |
|   7 | 307.3kbps | 297.1kbps | 289.9kbps |
|   8 | 309.2kbps | 299.2kbps | 292.3kbps |
|   9 | 311.2kbps | 301.6kbps | 294.9kbps |
|  10 | 313.6kbps | 304.2kbps | 297.8kbps |
|  11 | 316.3kbps | 307.3kbps | 301.1kbps |
|  12 | 319.7kbps | 311.1kbps | 305.2kbps |
|  13 | 323.8kbps | 315.6kbps | 310.1kbps |
|  14 | 328.3kbps | 320.6kbps | 315.4kbps |
|  15 | 333.2kbps | 326.0kbps | 321.1kbps |
|-----|-----------|-----------|-----------|
From which, -snr 15 -nts 18 and -snr 14 -nts 21 might be reasonable. I listened to -snr 6 -nts 24 and it was awful and -snr 9 -nts 24 wasn't much better.... I would consider the lower limit for -snr to be 12 and the upper limit for -nts to be 21.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-03-04 12:39:48
Don't forget Sheep that lossyWAV can only add spectrally flat noise. If you push it far enough, you'll just end up with something that's a very complex way of delivering a 5-bit LPCM file!

Tuning at a point where you can hear the noise, and then cranking the bitrate up, does have merit. However, it makes more sense when the noise is shaped to match the music. lossyWAV doesn't do that. It still makes some sense, however.

Cheers,
David.
Title: lossyWAV Development
Post by: shadowking on 2008-03-04 13:26:56
I don't think we should not go lower than -7 at this stage. My guess is that -7 and maybe higher setting can be abxed on a quite pasage with the volume cranked right up. Its not normal listening but its something to consider. Quality will collapse below 240 k or somewhere near. With wavpack and dualstream its possibe to get good output @ 235 k esp on louder music .. but audiable hiss / noise with quite passages will be there and not hard to hear on some critical sample. I don't know how lossywav will sound with a quality collapse - it could be spurts of offensive noise rather than just hiss. 280 k can yield mostly transparent results I think, but 235k is pushing it to the limit and lossywav doesn't need a bad rep. There are better solutions at < 250 k .
Title: lossyWAV Development
Post by: halb27 on 2008-03-04 13:36:49
I don't think we should not go lower than -7 at this stage. ...

Exactly what I am thinking. We've reached an average bitrate of ~310 kbps with very good quality, and quality drops more than bitrate when trying to achieve significantly more - at least with the current techniques.

Nick, I think it was me who made you stop from further investigating the noise shaping approach. But that was in another situation. I still wouldn't like a development with a weak basis when it's up to the -3 or -2 quality region, especially as this approach isn't intrinsically safe - other than using -skew and -snr or the RMS oriented max_to_remove_bits which only make the basic approach more defensive. But now things have changed and there's interest in going rather low in bitrate while allowing the utmost quality to be missed a bit. Moreover I think it's safe to say the techniques used so far have matured. In this situation I'd like to encourage you to continue with what you once started in case you are interested.
Title: lossyWAV Development
Post by: Nick.C on 2008-03-04 13:44:16
I don't think we should not go lower than -7 at this stage. My guess is that -7 and maybe higher setting can be abxed on a quite pasage with the volume cranked right up. Its not normal listening but its something to consider. Quality will collapse below 240 k or somewhere near. With wavpack and dualstream its possibe to get good output @ 235 k esp on louder music .. but audiable hiss / noise with quite passages will be there and not hard to hear on some critical sample. 280 k can yield mostly transparent results I think, but 235k is pushing it to the limit and lossywav doesn't need a bad rep. There are better solutions at < 250 k .
I hear what you are saying - especially about not needing a bad reputation.....

The hiss on quiet passages may already be mitigated by the variable maximum_bits_to_remove which takes into account the RMS value of the codec_block being processed.

Off at a tangent, at the moment there are 3 spreading-function strings for -1, -2 and -3 (-4 to -7 being copies of -3). As the spreading-function string from -3 has done so well for -3 to -7, is there any merit in making all the spreading function strings the same as -3?

If this happened, then I could envisage a modification where quality could be specified between 0 and 1 where 0 = -7 and 1 = -1, using say 3 decimal points resolution, with 0.5 equating to the current -3.

Also, would it be beneficial to shift to 3 FFT analyses for quality presets -1? Possibly if quality<0.5 then FFT Analyses = 2, if quality>=0.5 then FFT Analyses=3.

Using the -3 spreading function, the revised -1 would produce 504.1kbps for my 53 problem sample set using the original 4 FFT analyses (501.3kbps with 3 FFT analyses) and the revised -2 would produce 468.2kbps using 3 FFT analyses.
Title: lossyWAV Development
Post by: skamp on 2008-03-04 13:44:33
The way I see it, there are basically three ranges of bitrates in mainstream music: 64-320 kbps (the upper limit being that of MP3 CBR); 600-1000 kbps (lossless codecs); lossy codecs such as WavPack Hybrid, OptimFROG DualStream and lossyWAV would fill the gap in-between quite nicely, IMO. I don't see much point in competing in two fields where there's already quite a lot of competition.

At 320 kbps, it doesn't take more space than the highest quality MP3's that some people swear by, so if it's transparent and more suitable for transcoding than psycho-acoustic codecs, I'm happy with it.
Title: lossyWAV Development
Post by: Nick.C on 2008-03-04 13:50:19
I don't think we should not go lower than -7 at this stage. ...
Exactly what I am thinking. We've reached an average bitrate of ~310 kbps with very good quality, and quality drops more than bitrate when trying to achieve significantly more - at least with the current techniques.

Nick, I think it was me who made you stop from further investigating the noise shaping approach. But that was in another situation. I still wouldn't like a development with a weak basis when it's up to the -3 or -2 quality region, especially as this approach isn't intrinsically safe - other than using -skew and -snr or the RMS oriented max_to_remove_bits which only make the basic approach more defensive. But now things have changed and there's interest in going rather low in bitrate while allowing the utmost quality to be missed a bit. Moreover I think it's safe to say the techniques used so far have matured. In this situation I'd like to encourage you to continue with what you once started in case you are interested.
My noise shaping attempt was in retrospect agricultural to say the least, including a bit of guesswork - it was quite rightly consigned to the recycler. I would really like to be able to understand how noise shaping works and, more importantly, how to implement it in this context - however, I haven't yet found any sources which are understandable to me.

To use noise shaping which relates to the music may be an infringement of the patents David mentioned some time ago however.
Title: lossyWAV Development
Post by: GeSomeone on 2008-03-04 14:46:12
My guess is that -7 and maybe higher setting can be abxed on a quiet pasage with the volume cranked right up. [..] Quality will collapse below 240 k or somewhere near.
[..] 280 k can yield mostly transparent results I think, but 235k is pushing it to the limit and lossywav doesn't need a bad rep.

Although this could be true, there is a bit of guessing involved.
2 points to keep in mind
- (as 2Bdecided keeps telling) the bitrates are not fixed .. so the bit rate result for a loud track can be much different from a not so loud track. ( 280k may be ok for one track while another might need 380k)
- lossyWav does a good job in avoiding problems at quiet passages.

I agree there is no sense in having an awful sounding pre-set at the achieved bit rates. It seems that 0.8.0b hit a fairly good range of workable settings.
Title: lossyWAV Development
Post by: halb27 on 2008-03-04 15:58:08
... As the spreading-function string from -3 has done so well for -3 to -7, is there any merit in making all the spreading function strings the same as -3? ...

I see sense in having the spreading a little bit more demanding with -2 and especially -1 cause these settings are out for getting a certain security margin. I wouldn't put this only into the -nts value.
IMO there's no need for a change but instead of changing the spreading I'd rather use 3 analyses instead of 4 with -1 and maybe just 2 with -2. This would speed up things, and I don't think these many anakyses are really necessary.

I personally don't like a continuous quality scale but prefer it the way it is. Discrete values make me feel better as the quality details are more transparent.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-03-04 17:11:52
My noise shaping attempt was in retrospect agricultural to say the least, including a bit of guesswork - it was quite rightly consigned to the recycler. I would really like to be able to understand how noise shaping works and, more importantly, how to implement it in this context - however, I haven't yet found any sources which are understandable to me.
Do you want me to dig out my fixed noise shaping version? I think it worked properly. It was a long time ago!

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2008-03-04 17:54:23
My noise shaping attempt was in retrospect agricultural to say the least, including a bit of guesswork - it was quite rightly consigned to the recycler. I would really like to be able to understand how noise shaping works and, more importantly, how to implement it in this context - however, I haven't yet found any sources which are understandable to me.
Do you want me to dig out my fixed noise shaping version? I think it worked properly. It was a long time ago!

Cheers,
David.
That would be wonderful - I can understand your code  .
Title: lossyWAV Development
Post by: 2Bdecided on 2008-03-05 16:43:53
Nick,

Here it is. Hope it's some use to you. I'm sure SebG could explain noise shaping pretty well.

No claims that this is correct, but it seems to work. It's "optimised" for debugging, not reading or running!

NOTE: This is only provided to demonstrate fixed noise shaping. Don't use it to encode anything - it's a hack of two old versions and the rest of the code probably doesn't work properly.

Note too that I don't think it handles zero bits to remove properly. Without dither, it's easy to get limit cycles in this instance.

You'll have to figure out how much noise shaping "buys" you - obviously it depends on the input signal, which is why I didn't use fixed noise shaping - but it's probably useful if you're aiming for lower bitrates.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2008-03-05 17:09:33
Nick,

Here it is. Hope it's some use to you. I'm sure SebG could explain noise shaping pretty well.

No claims that this is correct, but it seems to work. It's "optimised" for debugging, not reading or running!

NOTE: This is only provided to demonstrate fixed noise shaping. Don't use it to encode anything - it's a hack of two old versions and the rest of the code probably doesn't work properly.

Note too that I don't think it handles zero bits to remove properly. Without dither, it's easy to get limit cycles in this instance.

You'll have to figure out how much noise shaping "buys" you - obviously it depends on the input signal, which is why I didn't use fixed noise shaping - but it's probably useful if you're aiming for lower bitrates.

Cheers,
David.
Thanks very much David, I'll try to get my teeth into it tonight....
Title: lossyWAV Development
Post by: carpman on 2008-03-05 17:34:58
Hi all,

Does anyone know if there are issues using FLCDrop with the latest version of LossyWav. The reason I ask is due to all these new settings. In FLCDrop AFAIK there's just the old 1,2,3 and I was wondering if those command switches are still relevant, with -3c and -7a et al ?

Thanks.
C.
Title: lossyWAV Development
Post by: jesseg on 2008-03-05 19:17:12
I'm 2 inches from releasing an updated version of the batch file and the front end.  So far, the changelog looks like this, but it's not guaranteed final yet.

Code: [Select]
lFLCDrop Change Log:
v1.2.0.5
- presets updated to -1 through -7
- all presets create correction files, except custom

lFLC.bat Change Log:
v1.0.0.7
- added automatic functionality for the -merge option
- new variable in custom preset to enable/disable automatic merging
- custom preset defaults match normal -2 preset functionality


I'm just dealing with a possible bug (or screw up on my part) for the automatic -merge function, and then merging that code into the custom preset section, and it should be fully updated and synced with current lossyWAV "goings-on".

re: the automatic merge function, if the FLAC file to decode has custom metadata, will check the decoded WAV file for the lossyWAV "tag".  If it's a lossyWAV, then it will see if a .lwcdf.flac exists and decode to .lwcdf.wav, or if no .lwcdf.flac exists, it will check for a .lwcdf.wav, or else exit.  And in the first two cases of lossyWAV correction file existing, it will ultimately run the -merge option, and delete the two lossyWAV files. (the .wav files, not the source .flac files)

[edit] yep, already thought of a needed change to the changelog...  to add a custom preset variable to toggle the deleting of the pre-merged .lossy.wav files.  and i also realized that i'm not handling the encoding of a .lwcdf.wav file to .lwcdf.flac file (if it exists) when encoding an already lossy .wav file.  wowzahs.  now i'm more than 2 inches away.  [/edit]
Title: lossyWAV Development
Post by: carpman on 2008-03-05 20:36:48
now i'm more than 2 inches away


Thanks jesseg for the update. Regardless of how many inches, I shall wait for your new release.

Good luck with it.

C.
Title: lossyWAV Development
Post by: Nick.C on 2008-03-05 21:09:48
re: the automatic merge function, if the FLAC file to decode has custom metadata, will check the decoded WAV file for the lossyWAV "tag".  If it's a lossyWAV, then it will see if a .lwcdf.flac exists and decode to .lwcdf.wav, or if no .lwcdf.flac exists, it will check for a .lwcdf.wav, or else exit.  And in the first two cases of lossyWAV correction file existing, it will ultimately run the -merge option, and delete the two lossyWAV files. (the .wav files, not the source .flac files)
Thanks for the PM: -merge function now appears to be working (if both files are in the same place.....).

I've had a play with the method David supplied for noise shaping and early though it is, I'd like to get a second opinion from better ears.

Static noise shaping has been employed and is not optional (at this time - I'll make it optional later, v0.8.1 is still available). I have listened to -7 -nts 30 -snr 12 and it's "acceptable" but I have limited allowable volume (kids in bed) and would like more ears to listen in. For my 53 problem sample set it produces 301.8kbps(!).

lossyWAV beta v0.8.2 attached to post #1 in this thread.
Title: lossyWAV Development
Post by: The Sheep of DEATH on 2008-03-06 15:02:04
Interesting.  You might be aware of this, but the first post says 0.83 is actually available already (unannounced!  ), but no download links other than 0.6.7rc2 are available! 

Maybe you're just in the process of updating? 
Title: lossyWAV Development
Post by: Nick.C on 2008-03-06 15:35:55
Interesting.  You might be aware of this, but the first post says 0.83 is actually available already (unannounced!  ), but no download links other than 0.6.7rc2 are available! 

Maybe you're just in the process of updating? 
  Oops - what a mistake.

lossyWAV beta v0.8.3 attached to post #1 in this thread.
Title: lossyWAV Development
Post by: The Sheep of DEATH on 2008-03-06 16:29:10
I've noticed that turning on shaping (originally just -7c), the resulting flac is actually 1.5% larger.  Is this intentional?  Or are tweaked snr and nts options a must with shaping?

I tried -7 -nts 30 -snr 12 -shaping, but quality was very scratchy (read: added noise) on the piano sample I tested with.  In terms of artifacts, -snr 12 -nts 21 without shaping actually produced the better result on this sample, at roughly the same bitrate. 

Maybe I got a b0rked build?  I guess I can upload the sample a bit later.  Cheers!
Title: lossyWAV Development
Post by: SebastianG on 2008-03-06 18:23:18
Hi, 2B!

I just skimmed trough LossyFLAC.m and noticed that there's a misunderstanding regarding filter coefficients. The filter coefficients from "the book" are b=[2.033 -2.165 1.959 -1.590 0.6149]; which corresponds to H(z)=2.033-2.165*z^-1...+0.6149*z^-4. But this isn't actually the noise shaping filter in this case. 1-z^-1*H(z) is. It's common and popular to write the transfer function of noise shaping filters as 1-z^-1*H(z). So, in case you have the filter coefficients for H(z) and want to plot the frequency response of the actual noise shaping filter you need to use freqz([1 -b]) for the FIR cases. Since you're removing the leading coefficient and inverting signs you just need to skip this part for the "book filter".

You'll see that the response of the filter isn't that bad after all. Its deviation from the one I was suggesting is within +/-5 dB at nearly all frequencies.

Just to confuse you a bit more I'm rewriting the transfer function's expression of the filter I was suggesting:
Code: [Select]
1 -1.1474 z^-1 +0.5383 z^-2 -0.3520 z^-3 +0.3475 z^-4
-----------------------------------------------------  =
1 +1.0587 z^-1 +0.0676 z^-2 -0.6054 z^-3 -0.2738 z^-4


            2.2061 -0.4707 z^-1 -0.2534 z^-2 -0.6213 z^-3
1 - z^-1 -----------------------------------------------------
         1 +1.0587 z^-1 +0.0676 z^-2 -0.6054 z^-3 -0.2738 z^-4

The new numerator is simply a-b with the leading zero removed (polynomial division + factoring out -z^-1). This form has its advantages when it comes to implementig noise shaping. The following image is a "DSP circuit picture" explaining how noise shaping can be done:
(http://img441.imageshack.us/img441/292/noiseshaper2de9.th.png) (http://img441.imageshack.us/my.php?image=noiseshaper2de9.png)


Still, I think the use of fixed shaping for this purpose is very limited. You could do much better with some easy signal adaptive filters like H(z)/A(z) where H(z) is some fixed filter and 1/A(z) is the LPC synthesis filter for the current frame or something like that.


Cheers,
SG
Title: lossyWAV Development
Post by: 2Bdecided on 2008-03-06 18:55:06
Hi, 2B!

I just skimmed trough LossyFLAC.m and noticed that there's a misunderstanding regarding filter coefficients. The filter coefficients from "the book" are b=[2.033 -2.165 1.959 -1.590 0.6149]; which corresponds to H(z)=2.033-2.165*z^-1...+0.6149*z^-4. But this isn't actually the noise shaping filter in this case. 1-z^-1*H(z) is. It's common and popular to write the transfer function of noise shaping filters as 1-z^-1*H(z). So, in case you have the filter coefficients for H(z) and want to plot the frequency response of the actual noise shaping filter you need to use freqz([1 -b]) for the FIR cases. Since you're removing the leading coefficient and inverting signs you just need to skip this part for the "book filter".
Thank you. I didn't know that's what was being quoted. It's amazing it works as well as it does!

I don't need that step for any of the filters then (just drop the leading ones before typing them in!). As I said, the code is hacked from another version, which does need that process.

Quote
Just to confuse you a bit more I'm rewriting the transfer function's expression of the filter I was suggesting:
I'm laid up in bed with a cold. I'm not even going to try to follow this now!

Quote
The following image is a "DSP circuit picture" explaining how noise shaping can be done:
[a href="http://img441.imageshack.us/my.php?image=noiseshaper2de9.png" target="_blank"]

Cheers,
David.
Title: lossyWAV Development
Post by: SebastianG on 2008-03-06 19:21:14
Quote
Just to confuse you a bit more I'm rewriting the transfer function's expression of the filter I was suggesting:
I'm laid up in bed with a cold. I'm not even going to try to follow this now!
Quote
The following image is a "DSP circuit picture" explaining how noise shaping can be done:
[a href="http://img441.imageshack.us/my.php?image=noiseshaper2de9.png" target="_blank"]

Yeah, that rings a bell. But I can't remember agreeing on whether the patent really applies or not.

Cheers,
SG
Title: lossyWAV Development
Post by: Nick.C on 2008-03-06 20:27:06
Hi, 2B!

I just skimmed trough LossyFLAC.m and noticed that there's a misunderstanding regarding filter coefficients. The filter coefficients from "the book" are b=[2.033 -2.165 1.959 -1.590 0.6149]; which corresponds to H(z)=2.033-2.165*z^-1...+0.6149*z^-4. But this isn't actually the noise shaping filter in this case. 1-z^-1*H(z) is. It's common and popular to write the transfer function of noise shaping filters as 1-z^-1*H(z). So, in case you have the filter coefficients for H(z) and want to plot the frequency response of the actual noise shaping filter you need to use freqz([1 -b]) for the FIR cases. Since you're removing the leading coefficient and inverting signs you just need to skip this part for the "book filter".

You'll see that the response of the filter isn't that bad after all. Its deviation from the one I was suggesting is within +/-5 dB at nearly all frequencies.

Just to confuse you a bit more I'm rewriting the transfer function's expression of the filter I was suggesting:
Code: [Select]
1 -1.1474 z^-1 +0.5383 z^-2 -0.3520 z^-3 +0.3475 z^-4
-----------------------------------------------------  =
1 +1.0587 z^-1 +0.0676 z^-2 -0.6054 z^-3 -0.2738 z^-4


            2.2061 -0.4707 z^-1 -0.2534 z^-2 -0.6213 z^-3
1 - z^-1 -----------------------------------------------------
         1 +1.0587 z^-1 +0.0676 z^-2 -0.6054 z^-3 -0.2738 z^-4
The new numerator is simply a-b with the leading zero removed (polynomial division + factoring out -z^-1). This form has its advantages when it comes to implementig noise shaping. The following image is a "DSP circuit picture" explaining how noise shaping can be done: (http://img441.imageshack.us/img441/292/noiseshaper2de9.th.png) (http://img441.imageshack.us/my.php?image=noiseshaper2de9.png)

Still, I think the use of fixed shaping for this purpose is very limited. You could do much better with some easy signal adaptive filters like H(z)/A(z) where H(z) is some fixed filter and 1/A(z) is the LPC synthesis filter for the current frame or something like that.


Cheers,
SG
Okay, I admit to being a bit baffled at the moment....

I looked up noise shaping in Wikipedia and found
Code: [Select]
y(n) = x(n)+A.E(x(n-1))+B.E(x(n-2))+C.E(x(n-3))+D.E(x(n-4))+
E.E(x(n-5))+F.E(x(n-6))+G.E(x(n-7))+H.E(x(n-8))+I.E(x(n-9))
I also found some code which was using a a filter with 9 coefficients so I implemented the noise shaping in lossyWAV like that, i.e. output = input - coeff[0..8] x quantization_error[0..-8], where quantization_error = output - input.

Initially, this kept crashing until I divided all coefficients by coeff[0] and then disregarded coeff[0] (per David's code).

Looking again at the Wikipedia article, it appears that I have omitted to include dither in the calculation.

I still feel as if I'm groping in the dark here and would gratefully accept any advice, pointers, etc.
Title: lossyWAV Development
Post by: botface on 2008-03-06 21:43:20
Hi,
  I'm completely new to the forum - this is my first post so please excuse me if I'm posting in the wrong place or whatever

I'm fascinated and quite excited by lossywav and have done some testing with v0.6.7_RC2. I have not found any obvious problems using 16/44 input files but I can't get it to run with 24 bit files. I've read the item that says that most testing has been done with 16/44 but the wiki says that it can handle up to 32/48 and I'm very keen to use it on 24 bit files as all of the lossless codecs give relatively poor results in terms of file size with 24 bit files.

I have tried files generated by adobe audition in "24 bit packed int (type 1 - 24 bit)". This causes Lossywav to error instantly with the message "FMT chunk wrong size"

I have also tried files in "24 bit packed int (type 1 - 20 bit". Lossywav manges to openthe file, recognises the format as 48.00khz; 2ch.; 20 bit but it then fails with a Windows error message "lossyWAV.exe has encountered a problem and needs to close.  We are sorry for the inconvenience."

As I said, I'm not sure if I'm posting in the right place but I would like very much to help with testing if you think it might be useful. I should point out though that I am not very technical, I'm just a music lover. In fact this is the first time I've run anything using cmd - I've always relied on GUI front ends

Botface
Title: lossyWAV Development
Post by: Nick.C on 2008-03-06 22:03:08
Hi,
  I'm completely new to the forum - this is my first post so please excuse me if I'm posting in the wrong place or whatever

I'm fascinated and quite excited by lossywav and have done some testing with v0.6.7_RC2. I have not found any obvious problems using 16/44 input files but I can't get it to run with 24 bit files. I've read the item that says that most testing has been done with 16/44 but the wiki says that it can handle up to 32/48 and I'm very keen to use it on 24 bit files as all of the lossless codecs give relatively poor results in terms of file size with 24 bit files.

I have tried files generated by adobe audition in "24 bit packed int (type 1 - 24 bit)". This causes Lossywav to error instantly with the message "FMT chunk wrong size"

I have also tried files in "24 bit packed int (type 1 - 20 bit". Lossywav manges to openthe file, recognises the format as 48.00khz; 2ch.; 20 bit but it then fails with a Windows error message "lossyWAV.exe has encountered a problem and needs to close.  We are sorry for the inconvenience."

As I said, I'm not sure if I'm posting in the right place but I would like very much to help with testing if you think it might be useful. I should point out though that I am not very technical, I'm just a music lover. In fact this is the first time I've run anything using cmd - I've always relied on GUI front ends

Botface
Hi there,

lossyWAV will only work with PCM integer values (4 to 32 bit as in wiki article, *not* 32bit float). These are packed out to the nearest byte and stored. I am unsure what type of audio data your values apply to. [edit] If the FMT chunk is the wrong size (i.e. not integer values) then lossyWAV will exit. [/edit]

Sorry not to be much help.

Nick.

[edit] ps. Please could you post a sample (<=30 seconds in length) for me to test with? It would be very much appreciated. [/edit]
Title: lossyWAV Development
Post by: SebastianG on 2008-03-06 23:29:01
Okay, I admit to being a bit baffled at the moment....

It's probably the z-domain (http://en.wikipedia.org/wiki/Z-transform) thingy. It takes a while to wrap one's head around it.

I looked up noise shaping in Wikipedia and found
Code: [Select]
y(n) = x(n)+A.E(x(n-1))+B.E(x(n-2))+C.E(x(n-3))+D.E(x(n-4))+
E.E(x(n-5))+F.E(x(n-6))+G.E(x(n-7))+H.E(x(n-8))+I.E(x(n-9))
I also found some code which was using a a filter with 9 coefficients so I implemented the noise shaping in lossyWAV like that, i.e. output = input - coeff[0..8] x quantization_error[0..-8], where quantization_error = output - input.

The 1st problem with this wikipedia article is that it's not really obvious what E is. Is it the unfiltered or the filtered noise? Btw, output-input isn't the the quantization error. It's the already-filtered error. So, in your case you'll get an all-pole filter which is a totally different beast than a FIR filter where the actual quantization errors are used. The difference is subtle: Note, that in the picture I made I pick up the signal after the feedback and right before dither and quantization noise is added to compute the "quantization error" (unfiltered noise).
The 2nd problem with this wikipedia article is that it doesn't say anything about FIR or IIR filters and whether and/or how they can be used and what type of filter is actually described there.

That said: Regardless of what E is, their noise shaping formula is equivalent to the structure I drew where H(z) either corresponds to an all-pole-IIR or a FIR filter.

Initially, this kept crashing until I divided all coefficients by coeff[0] and then disregarded coeff[0] (per David's code).

By "crashing" I guess you mean the filter went unstable. You probably used the coefficients in a wrong way. It might be a sign problem (sign of E is wrong) or you got the wrong E (filtered noise instead of unfiltered noise).

Looking again at the Wikipedia article, it appears that I have omitted to include dither in the calculation.
I still feel as if I'm groping in the dark here and would gratefully accept any advice, pointers, etc.

If you omit dither you can't guarantee the quantization error to be white/uncorrelated. The noise shaping stuff still works but you may get unexpected results because the filter is supposed to be applied on white/uncorrelated noise. So, that's why at least rectangular dithering should be used.

More explanations and pseudo code following...
Code: [Select]
You might have missed some informations regarding the picture I drew
X      : input signal
Y      : output signal ( = input + filtered noise )
E      : dither & quantization noise (unfiltered white noise please)
+      : is obviously mixing two signals. Note it can also be used
         for subtraction (source line(s) marked with a minus)
         Also, quantization is modelled as mixing the signal with
         errors.
[z^-1] : This is a simple filter: A delay of one sample
[H(z)] : This is any filter you like to use

So, suppose you have some given filter coefficients for H(z):
b[] = {b[0],b[1],b[2],...,b[n]}; // array, indexed starting at 0
a[] = {  1 ,a[1],a[2],...,a[m]}; // array, we don't need a[0]
The index actually corresponds to the power of 1/z for the z-domain
interpretation, 'b' holds the numerator coefficients and 'a' holds
the denominator coefficients.

x[k] and y[k] are the input and output samples.

We also need some filter memory with exactly max(n+1,m) samples. Let's
write fifo[0] for the last sample we added to the fifo, fifo[1] was
the last sample in the previous loop and so on...

Then, the inner loop over 'k' would look like this:
{
   wanted_temp = x[k] + fifo[0] * b[0]
                      + fifo[1] * b[1]
                      + ..............
                      + fifo[n] * b[n];
   y[k] = quantize( wanted_temp + dither );
   qerror_temp = wanted_temp - y[k];
   new_fifo_sample = qerror_temp - fifo[0] * a[1]
                                 - fifo[1] * a[2]
                                 - ..............
                                 - fifo[m-1] * a[m];
   fifo_add( fifo, new_fifo_sample );
   // Now: fifo[0] == new_fifo_sample
}

For implementing H(z) I used the direct form II (http://ccrma.stanford.edu/~jos/filters/Direct_Form_II.html) structure where the delay-line is shared among the recursive and non-recursive filter parts.

The 4th order filter I was suggesting for 24->16 bit word length reduction @ 44 kHz sampling frequency leads to the following coefficients for H(z):
b[0..3] = { 2.2061 , -0.4707 , -0.2534 , -0.6213 };
a[1..4] = { 1.0587 , 0.0676 , -0.6054 , -0.2738 };
Again: H(z) is NOT the transfer function of the noise shaper, it is G(z) = 1 - z^-1 * H(z).

Note: This post comes with no warrenty and might contain errors.

Cheers,
SG
Title: lossyWAV Development
Post by: Nick.C on 2008-03-07 07:54:39
Okay, I admit to being a bit baffled at the moment....
It's probably the z-domain (http://en.wikipedia.org/wiki/Z-transform) thingy. It takes a while to wrap one's head around it.
I looked up noise shaping in Wikipedia and found
Code: [Select]
y(n) = x(n)+A.E(x(n-1))+B.E(x(n-2))+C.E(x(n-3))+D.E(x(n-4))+
E.E(x(n-5))+F.E(x(n-6))+G.E(x(n-7))+H.E(x(n-8))+I.E(x(n-9))
I also found some code which was using a a filter with 9 coefficients so I implemented the noise shaping in lossyWAV like that, i.e. output = input - coeff[0..8] x quantization_error[0..-8], where quantization_error = output - input.

The 1st problem with this wikipedia article is that it's not really obvious what E is. Is it the unfiltered or the filtered noise? Btw, output-input isn't the the quantization error. It's the already-filtered error. So, in your case you'll get an all-pole filter which is a totally different beast than a FIR filter where the actual quantization errors are used. The difference is subtle: Note, that in the picture I made I pick up the signal after the feedback and right before dither and quantization noise is added to compute the "quantization error" (unfiltered noise).
The 2nd problem with this wikipedia article is that it doesn't say anything about FIR or IIR filters and whether and/or how they can be used and what type of filter is actually described there.

That said: Regardless of what E is, their noise shaping formula is equivalent to the structure I drew where H(z) either corresponds to an all-pole-IIR or a FIR filter.

Initially, this kept crashing until I divided all coefficients by coeff[0] and then disregarded coeff[0] (per David's code).
By "crashing" I guess you mean the filter went unstable. You probably used the coefficients in a wrong way. It might be a sign problem (sign of E is wrong) or you got the wrong E (filtered noise instead of unfiltered noise).
Looking again at the Wikipedia article, it appears that I have omitted to include dither in the calculation.
I still feel as if I'm groping in the dark here and would gratefully accept any advice, pointers, etc.
If you omit dither you can't guarantee the quantization error to be white/uncorrelated. The noise shaping stuff still works but you may get unexpected results because the filter is supposed to be applied on white/uncorrelated noise. So, that's why at least rectangular dithering should be used.

More explanations and pseudo code following...
Code: [Select]
You might have missed some informations regarding the picture I drew
X      : input signal
Y      : output signal ( = input + filtered noise )
E      : dither & quantization noise (unfiltered white noise please)
+      : is obviously mixing two signals. Note it can also be used
         for subtraction (source line(s) marked with a minus)
         Also, quantization is modelled as mixing the signal with
         errors.
[z^-1] : This is a simple filter: A delay of one sample
[H(z)] : This is any filter you like to use

So, suppose you have some given filter coefficients for H(z):
b[] = {b[0],b[1],b[2],...,b[n]}; // array, indexed starting at 0
a[] = {  1 ,a[1],a[2],...,a[m]}; // array, we don't need a[0]
The index actually corresponds to the power of 1/z for the z-domain
interpretation, 'b' holds the numerator coefficients and 'a' holds
the denominator coefficients.

x[k] and y[k] are the input and output samples.

We also need some filter memory with exactly max(n+1,m) samples. Let's
write fifo[0] for the last sample we added to the fifo, fifo[1] was
the last sample in the previous loop and so on...

Then, the inner loop over 'k' would look like this:
{
   wanted_temp = x[k] + fifo[0] * b[0]
                      + fifo[1] * b[1]
                      + ..............
                      + fifo[n] * b[n];
   y[k] = quantize( wanted_temp + dither );
   qerror_temp = wanted_temp - y[k];
   new_fifo_sample = qerror_temp - fifo[0] * a[1]
                                 - fifo[1] * a[2]
                                 - ..............
                                 - fifo[m-1] * a[m];
   fifo_add( fifo, new_fifo_sample );
   // Now: fifo[0] == new_fifo_sample
}
For implementing H(z) I used direct form II (http://ccrma.stanford.edu/~jos/filters/Direct_Form_II.html) for arbitrary IIR filters.

The 4th order filter I was suggesting for 24->16 bit word length reduction @ 44 kHz sampling frequency leads to the following coefficients for H(z):
b[0..3] = { 2.2061 , -0.4707 , -0.2534 , -0.6213 };
a[1..4] = { 1.0587 , 0.0676 , -0.6054 , -0.2738 };
Again: H(z) is NOT the transfer function of the noise shaper, it is G(z) = 1 - z^-1 * H(z).

Note: This post comes with no warrenty and might contain errors.

Cheers,
SG
Huge thanks, Sebastian - It will take me some time to get my head round it but I will endeavour to implement it when I get back from a few days away....
Title: lossyWAV Development
Post by: botface on 2008-03-08 16:17:37
Hi,
  I'm completely new to the forum - this is my first post so please excuse me if I'm posting in the wrong place or whatever

I'm fascinated and quite excited by lossywav and have done some testing with v0.6.7_RC2. I have not found any obvious problems using 16/44 input files but I can't get it to run with 24 bit files. I've read the item that says that most testing has been done with 16/44 but the wiki says that it can handle up to 32/48 and I'm very keen to use it on 24 bit files as all of the lossless codecs give relatively poor results in terms of file size with 24 bit files.

I have tried files generated by adobe audition in "24 bit packed int (type 1 - 24 bit)". This causes Lossywav to error instantly with the message "FMT chunk wrong size"

I have also tried files in "24 bit packed int (type 1 - 20 bit". Lossywav manges to openthe file, recognises the format as 48.00khz; 2ch.; 20 bit but it then fails with a Windows error message "lossyWAV.exe has encountered a problem and needs to close.  We are sorry for the inconvenience."

As I said, I'm not sure if I'm posting in the right place but I would like very much to help with testing if you think it might be useful. I should point out though that I am not very technical, I'm just a music lover. In fact this is the first time I've run anything using cmd - I've always relied on GUI front ends

Botface
Hi there,

lossyWAV will only work with PCM integer values (4 to 32 bit as in wiki article, *not* 32bit float). These are packed out to the nearest byte and stored. I am unsure what type of audio data your values apply to. [edit] If the FMT chunk is the wrong size (i.e. not integer values) then lossyWAV will exit. [/edit]

Sorry not to be much help.

Nick.

[edit] ps. Please could you post a sample (<=30 seconds in length) for me to test with? It would be very much appreciated. [/edit]

Nick,
      I've tried to send you a test file a couple of times but my posts just don't seem to be there. I'm assuming the file was too large as the attach procedure took ages. So, here's another, smaller file. It was recorded from vinyl at 32/48 and saved as "24 bit packed int (type 1 - 24 bit)".

Let me know if you need anything else

botface
Title: lossyWAV Development
Post by: jesseg on 2008-03-09 09:05:59
Code: [Select]
------------------------------------------------------------------------------
lFLCDrop v1.2.0.5
lFLC.bat for lFLCDrop v1.0.0.7
------------------------------------------------------------------------------

lFLCDrop Change Log:
v1.2.0.5
- presets updated to -1 through -7
- all presets always create correction files, except custom
- "Delete Source Files" option removed

lFLC.bat Change Log:
v1.0.0.7
- added a new set of variables for decoding
- added automatic functionality for the -merge option
- added support for auto-merging legacy lossyWAVs with proper naming convention
- added automatic encoding of .lwcdf.wav while encoding an already lossy .wav
- custom preset defaults now match -2 (default) frontend preset functionality


Let me know if you encounter any bugs.  The batch file is just getting to the level of complexity (10.4KB!) where there may be combinations of logic in the code that I just haven't thought to test fully.  But it should all be working without bugs, and there's error checking built into everything, so the main thing is that the logic would end up doing something that doesn't seem like it's what should happen.

After the command-line options (if any) for noise-shaping settle down, I'll do a release to support those additions, and I'll include a documentation on what command-line options to send to lFLC.bat for encoding & decoding from other software or batch files.  That way people can implement things like tagging through batch files in EAC, and call lFLC.bat for the dirty work of encoding, and then tag afterwards.  Feel free to use the methods in lFLC.bat for creating your own. 


Enjoy
Title: lossyWAV Development
Post by: carpman on 2008-03-09 17:52:49
@jesseg --- Thanks for the update!

Just been running:

lFLCDrop v1.2.0.5
lFLC.bat for lFLCDrop v1.0.0.7

First thing I noticed was that after doing the correction file encoding a single wav the DOS Window prompts "press any key to continue", is that supposed to happen - as I'd prefer it just encoded the whole batch without punctuation.

Also, is there any way you could create an option whereby the user specifies a directory (browse/create directory) for the correction files.

e.g.

Source Folder/ [inputs] *.wav  ,  [outputs] *.lossy.flac
Source Folder/Corrections Files/ [outputs] *.lwcdf.flac

C.
Title: lossyWAV Development
Post by: jesseg on 2008-03-09 19:14:07
I re-zipped the directory and re-uploaded it.  Somehow I had removed that pause at the last second, but forgot to re-zip it before uploading.  My bad, thanks for catching it.    It was in the exit, so it would have happened no matter what you were trying to do.  Oops. 

[edit]
And re: a sub-folder of current folder option, could be added, but it would only be controllable through lFLC.bat, not through the frontend - unless I make my own frontend.  And if I do or anyone else does, I can imagine that it's not going to rely on a batch file at all.
[/edit]
Title: lossyWAV Development
Post by: halb27 on 2008-03-09 20:04:51
... So, here's another, smaller file. ... [Budapest_10_secs.wav]

I have no problem at all with your file.
First I renamed your file to a.wav, called 'lossywav a.wav' from cmd.exe, and got a wonderful 24 bit 48 kHz a.lossy.wav file.
Then I used my standard lossyFLAC bat file with foobar on your Budapest_10_secs.wav, and this too yielded a perfect lossy.flac result.
Did you try plain 'lossywav a.wav'?
Title: lossyWAV Development
Post by: botface on 2008-03-11 09:35:36

... So, here's another, smaller file. ... [Budapest_10_secs.wav]

I have no problem at all with your file.
First I renamed your file to a.wav, called 'lossywav a.wav' from cmd.exe, and got a wonderful 24 bit 48 kHz a.lossy.wav file.
Then I used my standard lossyFLAC bat file with foobar on your Budapest_10_secs.wav, and this too yielded a perfect lossy.flac result.
Did you try plain 'lossywav a.wav'?

Funnily enough I am now able to process the file without problems either. I also have no problems with the latest beta. I've also successfully procesed a 24/88.2 file. I can't imagine what went wrong the first time.

Thanks for trying it anyway
Title: lossyWAV Development
Post by: 2Bdecided on 2008-03-11 10:40:05
I havn't checked how noise shaping is implemented in lossyFLAC.m. So, if you say your implementation is equivalent to what's shown in the picture and you are requireing the coefficients for H(z) then you need to use the numerator's and denominator's coefficients of the rewritten transfer function because removing the leading one doesn't do it for IIR filters...
I think I cheated - can you take a look and tell me if it works or not?

Also, I think several of us would really appreciate it if you could spend some time writing a good page on noise shaping for the HA wiki. (If you don't have the time, just ignore our questions on here!).

I couldn't find a single decent reference to IIR filters in noise shaping, hence my guess at how to do it.

The other problem is that the explanations that exist are often written for mathematicians. I suppose engineers and programmers should be able to understand such explanations, but I usually find them lacking. On the one hand, I want to understand at a high level what's happening, at on the other hand I want to understand bit-by-bit what's happening. Many explanations walk a fine line down the middle leaving both of these unclear to me.

Cheers,
David.
P.S. It wasn't a cold - it was/is a chest infection. Still laid up. 
Title: lossyWAV Development
Post by: SebastianG on 2008-03-12 15:43:49
Also, I think several of us would really appreciate it if you could spend some time writing a good page on noise shaping for the HA wiki.

I'm on it.

edit: I finished the article. Still waiting for wiki write acces, though.

P.S. It wasn't a cold - it was/is a chest infection. Still laid up. 

Ouch! Hope you get well soon!.

Cheers,
SG
Title: lossyWAV Development
Post by: Nick.C on 2008-03-14 09:09:56
Right, I'm back from a few days away.....

I've tried to implement the method SebastianG so kindly posted and I'm getting unexpected (even more hiss) results when I use it. I'm posting an excel fragment which will show how I've implemented it, for comment / criticism.....

As well as that, I've been re-thinking the spreading function and have realised that the current method takes into account certain values more often than it should because of the relationship between bin-width and the frequency bands used (some bands have the same start bin at short FFT lengths due to bin-width). So, I'm in the process of rewriting the spread function and will include it as a -newspread parameter to allow back to back comparison.

I'm going to try to puzzle my way through the input / output directory problem as well.

[edit] Having transcribed the function to excel, I seem to have identified and corrected an error in my implementation of the noise shaping function. I'm listening to -7 -nts 36 -snr 0 -shaping at the moment and it's not bad at all (for DAP purposes)........ [/edit]

[edit2] David, I take it that I should re-calculate the reference_threshold values with noise shaping activated to get the full benefit? [/edit2]
Title: lossyWAV Development
Post by: Nick.C on 2008-03-14 13:16:13
lossyWAV beta v0.8.4 attached to the first post in this thread.
Table of processed bitrates, for my 53 problem sample set, using lossyWAV v0.8.4 with and without -shaping & -newspread.
Code: [Select]
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
| -shaping | -newspread |    -1     |    -2     |    -3     |    -4     |    -5     |    -6     |    -7     |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
|    n     |     n      | 543.5kbps | 494.6kbps | 433.9kbps | 408.2kbps | 385.6kbps | 365.4kbps | 348.1kbps |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
|    y     |     n      | 560.1kbps | 518.3kbps | 466.8kbps | 445.8kbps | 427.5kbps | 411.9kbps | 399.2kbps |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
|    n     |     y      | 568.0kbps | 533.9kbps | 462.9kbps | 442.6kbps | 400.9kbps | 383.8kbps | 352.7kbps |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
|    y     |     y      | 581.4kbps | 552.0kbps | 491.4kbps | 475.0kbps | 441.7kbps | 428.4kbps | 403.8kbps |
|----------|------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
Title: lossyWAV Development
Post by: SebastianG on 2008-03-14 15:29:23
I've tried to implement the method SebastianG so kindly posted and I'm getting unexpected (even more hiss) results when I use it.
[...]
[edit] Having transcribed the function to excel, I seem to have identified and corrected an error in my implementation of the noise shaping function. I'm listening to -7 -nts 36 -snr 0 -shaping at the moment and it's not bad at all (for DAP purposes)........ [/edit]

Still, the noticable hiss could be explained. The fletcher munson equal loudness curves have different shapes at different levels. The ATH-derived noise shaping filter is only a special case for low noise levels. So, at higher noise levels the noise shaping filter might expose the high frequency part of the noise noticably which is why I think the use of this kind of fixed filter for lossyWAV is rather limited.

Cheers!
SG
Title: lossyWAV Development
Post by: Nick.C on 2008-03-14 21:54:17
I've tried to implement the method SebastianG so kindly posted and I'm getting unexpected (even more hiss) results when I use it.
[...]
[edit] Having transcribed the function to excel, I seem to have identified and corrected an error in my implementation of the noise shaping function. I'm listening to -7 -nts 36 -snr 0 -shaping at the moment and it's not bad at all (for DAP purposes)........ [/edit]
Still, the noticable hiss could be explained. The fletcher munson equal loudness curves have different shapes at different levels. The ATH-derived noise shaping filter is only a special case for low noise levels. So, at higher noise levels the noise shaping filter might expose the high frequency part of the noise noticably which is why I think the use of this kind of fixed filter for lossyWAV is rather limited.

Cheers!
SG
The hiss I experienced in the previous build was *very* pronounced, in beta v0.8.4 I can't hear anything wrong with the output at all when using -shaping.

The only minor disappointment when using -shaping is, as David said previously, the bitrate increases quite dramatically.

I transcoded my Mike Oldfield collection (261 tracks) this evening using -7a -nts 30 -snr 6 -shaping and got an average bitrate of 340kbps. I've listened to several of the tracks and am very pleased with the results.
Title: lossyWAV Development
Post by: Nick.C on 2008-03-15 17:22:35
Thinking about the added bitrate due to noise shaping, are there some 2 or 3 coefficient filters which might be useful as a compromise between the quality and high bitrate of SebastianG's filters and no noise shaping but lower bitrate?

[edit] I'm not going to go any further with -7a -nts 30 -snr 6 -shaping and have reverted to -7a -shaping. I've converted 1643 tracks and the average bitrate is 374kbps.

There may be some merit in revising the skewing value when noise shaping is enabled (or even when -newspread is enabled) - however, this would take a bit of work from those who have ABXed during settings testing (and would require the -spf parameter to make a re-appearance). [/edit]
Title: lossyWAV Development
Post by: SebastianG on 2008-03-15 21:10:41
Thinking about the added bitrate due to noise shaping, are there some 2 or 3 coefficient filters which might be useful as a compromise between the quality and high bitrate of SebastianG's filters and no noise shaping but lower bitrate?


There's a simple way of softening minimum phase filters:
Take the coefficients from the noise transfer function (N)
[1 b1 b2 b3 b4 ... ] (numerator)
[1 a1 a2 a3 a4 ... ] (denominator)
and create a new set of coefficients like this:
[1 b1*s b2*s^2 b3*s^3 b4*s^4 ... ] (numerator)
[1 a1*s a2*s^2 a3*s^3 a4*s^4 ... ] (denominator)
where s=1 leads to the original filter and s=0 to N(z)=1 which is no noise shaping at all.
All values inbetween are also fine.

However, you should seriously think about adaptive filters at this stage. Maybe 2B can shed some more light on the alleged danger of patent infringement. I hardly think this is an issue. Adaptive spectral noise shaping isn't big news. Pretty much every speech codec does it including Speex, by the way.

You're already very close to it: You're doing spectral analysis, psychoacoustic modelling and have a working noise shaper implementation. The only thing that's missing now is code to compute the filters. Jean-Marc Valin (jmspeex) and Monty wrote a paper (http://people.xiph.org/~jm/papers/aes120_speex_vorbis.pdf) about how Speex can benefit from Vorbis' psychoacoustic model. The same thing applies to LossyWAV. I don't remember how Monty and Jean-Marc did it but I guess it's somthing like computing the autocorrelation of the optimal noise shaping filter's impulse response via iFFT and feeding the result to the Levinson-Durbin algorithm which would give you all the denominator's coefficients (a1, a2, ...) for an all-pole noise shaping filter (b1=b2=...=0).

Cheers!
SG
Title: lossyWAV Development
Post by: Nick.C on 2008-03-15 21:18:34
Thinking about the added bitrate due to noise shaping, are there some 2 or 3 coefficient filters which might be useful as a compromise between the quality and high bitrate of SebastianG's filters and no noise shaping but lower bitrate?
There's a simple way of softening minimum phase filters:
Take the coefficients from the noise transfer function (N)
[1 b1 b2 b3 b4 ... ] (numerator)
[1 a1 a2 a3 a4 ... ] (denominator)
and create a new set of coefficients like this:
[1 b1*s b2*s^2 b3*s^3 b4*s^4 ... ] (numerator)
[1 a1*s a2*s^2 a3*s^3 a4*s^4 ... ] (denominator)
where s=1 leads to the original filter and s=0 to N(z)=1 which is no noise shaping at all.
All values inbetween are also fine.

However, you should seriously think about adaptive filters at this stage. Maybe 2B can shed some more light on the alleged danger of patent infridgement. I hardly think this is an issue. Adaptive spectral noise shaping isn't big news. Pretty much every speech codec does it including Speex, by the way.

You're already very close to it: You're doing spectral analysis, psychoacoustic modelling and have a working noise shaper implementation. The only thing that's missing now is code to compute the filters. Jean-Marc Valin (jmspeex) and Monty wrote a paper (http://people.xiph.org/~jm/papers/aes120_speex_vorbis.pdf) about how Speex can benefit from Vorbis' psychoacoustic model. The same thing applies to LossyWAV. I don't remember how Monty and Jean-Marc did it but I guess it's somthing like computing the autocorrelation of the optimal noise shaping filter's impulse response via iFFT and feed the Levinson-Durbin algorithm with it which would give you all the denominator's coefficients (a1, a2, ...) for an all-pole noise shaping filter (b1=b2=...=0).

Cheers!
SG
Thanks for the pointer - I'll have a play with it, maybe even allow -shaping to take a supplementary value in the range 0..1 as you said above.

From memory, David was very reluctant to publish his code which included adaptive filtering. Another consideration is that each codec block is only 512 samples long (per channel) - would this not require fairly heavy processing input to calculate the optimal noise shaping filter?

As an aside (and I know that looking at the spectrum in foobar is not any way to evaluate anything....) I looked at the spectral output for a lossyWAV correction file (replaygained +45dB or so) and almost all of the signal was in the high end of the spectrum - so it "looks" like my implementation of your noise shaping filter works!
Title: lossyWAV Development
Post by: SebastianG on 2008-03-15 21:54:53
Another consideration is that each codec block is only 512 samples long (per channel) - would this not require fairly heavy processing input to calculate the optimal noise shaping filter?


No. The iFFT+LevinsonDurbin approach should be quite fast. But the resulting filters aren't the best ones which is why I'm currently trying to understand how this can be combined with frequency warping. I have a stack of papers about this on my desk waiting to be read by me. ;-)


so it "looks" like my implementation of your noise shaping filter works!

Cool!

Cheers!
SG
Title: lossyWAV Development
Post by: Nick.C on 2008-03-15 22:14:30
Another consideration is that each codec block is only 512 samples long (per channel) - would this not require fairly heavy processing input to calculate the optimal noise shaping filter?
No. The iFFT+LevinsonDurbin approach should be quite fast. But the resulting filters aren't the best ones which is why I'm currently trying to understand how this can be combined with frequency warping. I have a stack of papers about this on my desk waiting to be read by me. ;-)
so it "looks" like my implementation of your noise shaping filter works!
Cool!

Cheers!
SG
I've added the supplementary parameter to -shaping in the range 0..1 and at 0.5 the added bitrate due to noise shaping is significantly reduced. I'll do a bit more testing with a view to posting v0.8.5 tomorrow.

[edit] Most of your post flew right over my head.... However, whatever can be added to lossyWAV to improve the quality of the output is well worth the effort - many thanks again! [/edit]

[edit2] 3556 tracks processed using -7a -shaping 1.000, 372kbps average bitrate...... [/edit2]
Title: lossyWAV Development
Post by: carpman on 2008-03-17 01:41:02
Hi,

Something struck me about dB level and lossy.wav performance which may well have implications as to how to get the best out of LossyWAV.

To date I've used MP3Gain (for MP3) and WavGain (for WAV prior to lossless encoding), as I've wanted my files to play at same level regardless of the player (I use foobar, so I could get foobar to do this - but I'm not always listening to my files on my system). Anyway, the results of my very small test suggest that for lossy.FLAC files encoding the original WAV versus the WavGained WAV would be a good idea:

Using:
lFLCDrop.v1.2.0.5.lFLC.bat.v1.0.0.7
lossyWAV beta v0.7.9
FLAC 1.2.1

Test:
2 copies of the same file (original.wav and wavgained.wav), the only difference being that the latter has been through Wav Gain and is 4.55 dB lower in volume.

SETTINGS: lossy.wav -3, FLAC -5

original.wav (93.55 dB)  [FLAC-5 = 690kbps, lossy.FLAC = 475kbps]
<edited to make sense>
lossy.FLAC is 31% smaller than the FLAC. </edit>

wavgained.wav (89.00 dB)  [FLAC-5 = 626kbps, lossy.FLAC = 477kbps]
<edit>lossy.FLAC is 24% smaller than the FLAC.</edit>

So this tells me that if I use the original and use foobar to look after the replay gain function, my lossy.FLAC collection would be approx 2/3 of the size of my FLAC collection.

But if I WavGain my files prior to encoding (which I had been doing) my lossy.FLAC collection would be approx 3/4 of the size of my FLAC collection.

That's a substantial difference.

If this is not news and everyone already knows this then fine, but it will be useful later to make clear to users that this is the case, as obviously it has implications regarding which method of replay gain one goes for.

C.
Title: lossyWAV Development
Post by: jesseg on 2008-03-17 04:24:56
I see what' you're saying, and I think what you did was either switch around the wording or the numbers you used.  I think you switched around the wording, because it would make sense that the WavGained version would be compressed more easily, since the original least significant bits are removed and get re-quantized into a new least significant bit - thereby having less most significant bits actually being used and FLAC being able to compress it more efficiently.

The problem with doing this is two fold, in my opinion:
1 - with WavGain, you're losing least significant bits that might not necessarily be insignificant.
2 - WavGain doesn't have a correction file.  there's no way to ever get a 1:1 copy of the original again.  If you're using normal FLAC, and the ReplayGain built into it, you can still enjoy the benefits of the volume equalization, and still be able to generate a 1:1 copy of the original source file at any time.  And if ReplayGain code ever becomes more transparent in audio quality.  Or if there is every a more accurate ReplayGain algorithm designed, you can enjoy the benefits of that by re-running ReplayGain over your libraries.

Also I noticed that the WavGained source file was slightly larger after lossWAV+FLAC.  I wonder if someone else knows why that might have been.  My initial thought was that lossyWAV thought the quantization noise introduced by WavGain was a slightly quieter noise-floor than in the original?
Title: lossyWAV Development
Post by: carpman on 2008-03-17 06:52:36
I see what' you're saying, and I think what you did was either switch around the wording or the numbers you used.  I think you switched around the wording, because it would make sense that the WavGained version would be compressed more easily, since the original least significant bits are removed and get re-quantized into a new least significant bit - thereby having less most significant bits actually being used and FLAC being able to compress it more efficiently.

The edit was simply that in the original I said the "lossy.FLAC is 69% smaller than the FLAC" when what I'd meant was that its size was 69% of the FLAC. Instead I kept the "smaller" and made it 31%. That's all the edit was.

Also I noticed that the WavGained source file was slightly larger after lossWAV+FLAC.

Yeah, I thought that was strange. 

C.
Title: lossyWAV Development
Post by: Nick.C on 2008-03-17 17:10:56
I've been testing beta v0.8.5 using the -shaping <n> parameter (0 to 1 in 0.05 steps) to process my 53 problem sample set and here are the results:
Code: [Select]
|---------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
| Shaping |    -1     |    -2     |    -3     |    -4     |    -5     |    -6     |    -7     |
|---------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
|  0.000  | 543.5kbps | 494.6kbps | 433.9kbps | 408.2kbps | 385.6kbps | 365.4kbps | 348.1kbps |
|  0.050  | 543.7kbps | 495.0kbps | 434.5kbps | 408.9kbps | 386.5kbps | 366.7kbps | 349.7kbps |
|  0.100  | 543.9kbps | 495.3kbps | 435.0kbps | 409.6kbps | 387.3kbps | 367.6kbps | 350.9kbps |
|  0.150  | 544.2kbps | 495.7kbps | 435.6kbps | 410.4kbps | 388.3kbps | 368.8kbps | 352.2kbps |
|  0.200  | 544.4kbps | 496.2kbps | 436.3kbps | 411.3kbps | 389.3kbps | 370.1kbps | 353.7kbps |
|  0.250  | 544.8kbps | 496.8kbps | 437.2kbps | 412.3kbps | 390.6kbps | 371.6kbps | 355.5kbps |
|  0.300  | 545.2kbps | 497.4kbps | 438.2kbps | 413.5kbps | 392.1kbps | 373.4kbps | 357.6kbps |
|  0.350  | 545.7kbps | 498.1kbps | 439.2kbps | 414.8kbps | 393.6kbps | 375.2kbps | 359.8kbps |
|  0.400  | 546.2kbps | 498.9kbps | 440.4kbps | 416.2kbps | 395.4kbps | 377.3kbps | 362.2kbps |
|  0.450  | 546.7kbps | 499.8kbps | 441.7kbps | 417.8kbps | 397.3kbps | 379.6kbps | 364.8kbps |
|  0.500  | 547.5kbps | 500.9kbps | 443.3kbps | 419.7kbps | 399.5kbps | 382.4kbps | 368.3kbps |
|  0.550  | 548.2kbps | 502.0kbps | 444.9kbps | 421.5kbps | 401.7kbps | 385.0kbps | 371.3kbps |
|  0.600  | 549.1kbps | 503.3kbps | 446.6kbps | 423.5kbps | 403.9kbps | 387.4kbps | 374.1kbps |
|  0.650  | 550.1kbps | 504.7kbps | 448.6kbps | 425.7kbps | 406.2kbps | 389.9kbps | 376.7kbps |
|  0.700  | 551.1kbps | 506.2kbps | 450.7kbps | 428.0kbps | 408.7kbps | 392.3kbps | 379.0kbps |
|  0.750  | 552.3kbps | 507.8kbps | 452.9kbps | 430.4kbps | 411.3kbps | 395.0kbps | 381.6kbps |
|  0.800  | 553.5kbps | 509.6kbps | 455.2kbps | 432.9kbps | 413.9kbps | 397.8kbps | 384.6kbps |
|  0.850  | 554.9kbps | 511.4kbps | 457.7kbps | 435.6kbps | 416.7kbps | 400.7kbps | 387.7kbps |
|  0.900  | 556.5kbps | 513.5kbps | 460.4kbps | 438.6kbps | 419.9kbps | 404.1kbps | 391.3kbps |
|  0.950  | 558.2kbps | 515.8kbps | 463.4kbps | 442.0kbps | 423.5kbps | 407.8kbps | 395.2kbps |
|  1.000  | 560.1kbps | 518.3kbps | 466.8kbps | 445.8kbps | 427.5kbps | 411.9kbps | 399.2kbps |
|---------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
It is interesting that adding shaping has less of an effect at the higher bitrate end of the quality spectrum than at the lower end.

Please disregard the -newspread parameter as added to beta v0.8.4, it had a bug in it which, when rectified, produces the same results as the existing method (although that in itself was a bit of a surprise....). However, the means by which it arrives at the result is likely to be quicker once optimised in IA-32/x87, so I'll replace the existing code in due course.
Title: lossyWAV Development
Post by: Nick.C on 2008-03-17 22:11:19
lossyWAV beta v0.8.5 attached to post #1 in this thread.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-03-18 11:04:33
Patent:
http://patft.uspto.gov/netacgi/nph-Parser?...RS=PN/5,204,677 (http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=5,204,677.PN.&OS=PN/5,204,677&RS=PN/5,204,677)

Post:
http://www.hydrogenaudio.org/forums/index....st&p=512342 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=512342)

I haven't reviewed it properly.

carpman, yes ReplayGain before encoding is a nice efficiency boost. Only apply negative gains. Put it within lossyWAV itself to avoid (extra) dithering.

Cheers,
David.
Title: lossyWAV Development
Post by: The Sheep of DEATH on 2008-03-18 17:04:59
All of this brings up 2 questions:
1.  How is the quality of -7 -shaping 1.000 compared to -4 -shaping 0 at similar bitrate (or to -5 -shaping 0.5, etc)? 
2.  How can we use shaping to get at lower bitrates than -7 -shaping 0?  Are tweaking snr and such necessary, or will there be a -8 or -9 added in the future, or something else?
Title: lossyWAV Development
Post by: SebastianG on 2008-03-18 17:40:28
1.  How is the quality of -7 -shaping 1.000 compared to -4 -shaping 0 at similar bitrate (or to -5 -shaping 0.5, etc)?

That's a good question.

2.  How can we use shaping to get at lower bitrates than -7 -shaping 0?

The answer is much simpler than its implementation: Adaptive noise shaping driven by a good psychoacoustic model.

The increased bitrate with activated shaping (currently) is due to the quantization noise that is added in a spectral area where there usually is no music. This leads to a worse prediction gain. The method I was sketching in one of the first posts does the opposite. It adds quantization noise "under" the signal exploiting the masking effect. This approach won't decrease the predictability too much since the spectral shape of the signal is preserved.

Cheers,
SG
Title: lossyWAV Development
Post by: carpman on 2008-03-19 02:24:50
1. Strange results (at least to a non-technical person)

I've been running a few tests with WavGain & lossyWAV, taking into account what jesseg and 2Bdecided have said.

I had thought that "1.wav" encoded via FLAC Drop then decoded to WAV would give me "1.lossy.wav" (i.e. the result of the lossyWAV processor. (for why I was doing this -- see 2 below)

But when I encoded 1.lossy.wav (without any other processing) back to FLAC using foobar and latest flac.exe (1.2.1) at -5, the file was much larger (522kbps) than the FLAC Dropped 1.lossy.flac (475kbps). 

Can someone explain why?

Additionally, I copied the 1.lossy.wav" and WavGained it and then encoded that to FLAC using foobar and flac.exe (1.2.1) and the file was much, much larger (628kbps). I'm assuming this is to do with WavGain undoing some of the work done by lossyWAV?

2. Request for help:

I was trying to get foobar to convert to lossy.wav (rather than direct to lossy.flac) but kept getting a can't flush file error.

Code: [Select]
Error flushing file (Object not found) : file://C:\Documents and Settings\[...my edit...]\test1.lossy.wav


I'd set it up as per wiki. The only difference was the batch file, which I edited to leave out the FLAC encoding (though I didn't really know what I was doing  ).

Code: [Select]
@echo off
D:\lossywav\lossyWAV %1 %3 %4 %5 %6 %7 %8 %9 -below -nowarn -quiet


Can someone tell me where I've gone wrong.

Thanks

C.
Title: lossyWAV Development
Post by: jesseg on 2008-03-19 03:02:48
But when I encoded 1.lossy.wav (without any other processing) back to FLAC using foobar and latest flac.exe (1.2.1) at -5, the file was much larger (522kbps) than the FLAC Dropped 1.lossy.flac (475kbps).

Can someone explain why?
Because you're not using the -b 512 option in your FLAC.exe command to force FLAC to use a 512 sample block size for compression.  That's my best guess.


Additionally, I copied the 1.lossy.wav" and WavGained it and then encoded that to FLAC using foobar and flac.exe (1.2.1) and the file was much, much larger (628kbps). I'm assuming this is to do with WavGain undoing some of the work done by lossyWAV?
Yes.  When it re-quantizes the sample values, I'm pretty sure it doesn't care at all about creating new LSB all the way to the bit-floor.  If it did, it would be compromising it's own quality.


Code: [Select]
@echo off
D:\lossywav\lossyWAV %1 %3 %4 %5 %6 %7 %8 %9 -below -nowarn -quiet


Can someone tell me where I've gone wrong.
When you use a full path, you have to use the full filename, like so:
Code: [Select]
D:\lossywav\lossyWAV.exe %1 %3 %4 %5 %6 %7 %8 %9 -below -nowarn -quiet


[edit]
but now after looking at the error, i'm not sure that's the problem either.
[/edit]
Title: lossyWAV Development
Post by: Nick.C on 2008-03-19 08:15:09
Patent:
http://patft.uspto.gov/netacgi/nph-Parser?...RS=PN/5,204,677 (http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PALL&p=1&u=%2Fnetahtml%2FPTO%2Fsrchnum.htm&r=1&f=G&l=50&s1=5,204,677.PN.&OS=PN/5,204,677&RS=PN/5,204,677)

Post:
http://www.hydrogenaudio.org/forums/index....st&p=512342 (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=55522&view=findpost&p=512342)

I haven't reviewed it properly.

carpman, yes ReplayGain before encoding is a nice efficiency boost. Only apply negative gains. Put it within lossyWAV itself to avoid (extra) dithering.

Cheers,
David.
I'll need to read up on ReplayGain to see how I would implement it in lossyWAV. However, I can't seen any simple way of linking tracks together to albums so I think that the lossyWAV ReplayGain implementation would only calculate / use track gain (which if you processed a whole album as one file [as i do] would in effect be album gain).

I've been optimising the code again and the FFT unit is now about 95% IA-32/x87 and is a little bit faster / smaller.

Awaiting feedback on -shaping : how does it sound? Is it worth the extra bitrate? Does anyone have any ideas for alternate filters?

Additionally, I copied the 1.lossy.wav" and WavGained it and then encoded that to FLAC using foobar and flac.exe (1.2.1) and the file was much, much larger (628kbps). I'm assuming this is to do with WavGain undoing some of the work done by lossyWAV?
Yes.  When it re-quantizes the sample values, I'm pretty sure it doesn't care at all about creating new LSB all the way to the bit-floor.  If it did, it would be compromising it's own quality.
It certainly will - WavGaining a lossy.wav file will almost certainly destroy all the carefully zeroed lsb's....
Code: [Select]
@echo off
D:\lossywav\lossyWAV %1 %3 %4 %5 %6 %7 %8 %9 -below -nowarn -quiet
Can someone tell me where I've gone wrong.
When you use a full path, you have to use the full filename, like so:
Code: [Select]
D:\lossywav\lossyWAV.exe %1 %3 %4 %5 %6 %7 %8 %9 -below -nowarn -quiet


[edit]
but now after looking at the error, i'm not sure that's the problem either.
[/edit]
I think that the problem is that %1 (%s from foobar) = "temp"+32 (or so) random hex characters+".wav" and %d is the expected output filename. If you don't rename %1 to %2 (%d from foobar) then foobar will not find the file named by %d and will give you this error.... So, I think adding:
Code: [Select]
ren %~n1.lossy.wav %2
inserted as the last line might fix your problem.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-03-19 10:44:10
You can't run lossyWAV and then WaveGain, as Nick has explained.

Nick, what you need is to run WaveGain, grab the album gain value, and then use that to scale the data within lossyWAV in floating point or the 24-bit domain, but still only outputting 16-bits. I would heretically suggest not using dither at the output, but people can if they want. I put an option into the MATLAB version "always dither LSB" to ensure that even if you didn't remove any bits, the output was still dithered - that's the correct way of applying a gain change, but I would disable it by default.

There was some code or script that allowed you to run WaveGain, grab the scaling value, and use it when encoding with lame. It was in this thread but the links have died...
http://www.hydrogenaudio.org/forums/index....0637&st=150 (http://www.hydrogenaudio.org/forums/index.php?showtopic=10637&st=150)
...you need something like that to pass the value to lossyWAV - I don't think you should try to integrate ReplayGain itself into lossyWAV because just having track mode available is kind of useless.

Cheers,
David.
Title: lossyWAV Development
Post by: spies on 2008-03-19 16:41:27
Nick, what you need is to run WaveGain, grab the album gain value, and then use that to scale the data within lossyWAV in floating point or the 24-bit domain, but still only outputting 16-bits.

I would absolutely love to see this functionally added to lossyWAV!  I envision it being an input option just like the --scale option in lame.

Depending on how this is implemented within the lossyWAV code, would it be possible to undue the gain processing with the correction file?  That would be totally awesome!

All my flacs already have replaygain value tags.  So that could be another source of the gain value for lossyWAV.  I suppose that functionality would be part of the flac to lossyWAV to flac script i.e. lFLCDrop.
Title: lossyWAV Development
Post by: jesseg on 2008-03-19 18:26:14
Yep, I can for sure add track ReplayGain to the flac commands, and ability to turn that on/off in the custom section.  I don't see why ReplayGain should be implemented with lossyWAV any other way than in the final codec used, unless the final codec doesn't support it.  (and then, should you even be using it, if you like ReplayGain?)

I don't really see a way to have a correction file to undo the WavGain "distortion" without it being a huge file after lossless compression.  As it is now, lossyFLAC + lwcdfFLAC is still smaller than a plain FLAC, especially using lossyWAV -1 preset.

Title: lossyWAV Development
Post by: Nick.C on 2008-03-19 21:11:12
Nick, what you need is to run WaveGain, grab the album gain value, and then use that to scale the data within lossyWAV in floating point or the 24-bit domain, but still only outputting 16-bits.
I would absolutely love to see this functionally added to lossyWAV!  I envision it being an input option just like the --scale option in lame.

Depending on how this is implemented within the lossyWAV code, would it be possible to undue the gain processing with the correction file?  That would be totally awesome!

All my flacs already have replaygain value tags.  So that could be another source of the gain value for lossyWAV.  I suppose that functionality would be part of the flac to lossyWAV to flac script i.e. lFLCDrop.
Your wish is my command....

-scale <n> parameter implemented which takes a value in the range 0 to 1 and scales the input WAV data by that amount. -scale is compatible with -correction and -merge will combine both files to re-create the lossless master.

*WARNING* filesizes may get large.... when I had a test with -scale 0.5 -correction using my 53 problem sample set, I got a combined filesize for 53 lossy.flac and 53 lwcdf.flac files of 93.1MB, compared to 69.3MB for the 53 lossless originals. Interestingly, the lwcdf.flac file is not too bad to listen to...

lossyWAV beta v0.8.6 attached to post #1 in this thread.
Title: lossyWAV Development
Post by: spies on 2008-03-20 01:02:28
Your wish is my command....

-scale <n> parameter implemented which takes a value in the range 0 to 1 and scales the input WAV data by that amount. -scale is compatible with -correction and -merge will combine both files to re-create the lossless master.

*WARNING* filesizes may get large.... when I had a test with -scale 0.5 -correction using my 53 problem sample set, I got a combined filesize for 53 lossy.flac and 53 lwcdf.flac files of 93.1MB, compared to 69.3MB for the 53 lossless originals. Interestingly, the lwcdf.flac file is not too bad to listen to...

lossyWAV beta v0.8.6 attached to post #1 in this thread.

Wow Nick, that was fast, thanks!  Can't wait to try it out when I get home.

Quick question though, when using the -scale option are you dithering at the output?  I think I would prefer it not to be dithered or at least make it an option of the -scale command as suggested by David.

I would not be surprised to find that the change from 69.3MB to 93.1MB would be caused by scaling and dither, but would be quite surprised if it was caused by just scaling alone.  I wonder what the result would be with ReplayGain scale values as opposed to using a scale factor of 0.5 for your test suite.
Title: lossyWAV Development
Post by: carpman on 2008-03-20 05:16:24
Thanks to everyone for their answers and patience - on the WavGain issue. That's cleared up some confusion on my part.

@ Nick, thanks, haven't had time yet, but I'll give that a go  " -- ren %~n1.lossy.wav %2"

While lossyWAV is on the Replay Gain issue:

I use replay gain on a track by track basis. I tried jesseg's suggestion of using FLAC's internal replay gain, the problem I had was that I needed to switch on replay gain processing in Foobar for it to have the desired effect. Foobar automatically adjusted all my (non-tagged-replay-gained) files which are less or greater than 89dB (because I set them that way).

It's very possible I'm missing something obvious here, but just not obvious to me.

What I would like from lossyWAV is either:
a) a way to set the value at the lossy.wav stage (i.e. 1.2 dB below 89dB - as is possible with WavGain) --- and it looks like Nick may have already achieved this, or
b) be able to manually alter the replay gain value in FLAC's vorbis comments? (is this already possible?)

Slighly OT (not LossyWAV specific):
Is it possible to scan all the files in my collection with foobar and then manually adjust the track replay gain value to 0.0000, so these files aren't affected by the gain processing? If I do this then I will be able to use the FLAC Tag replay gain method and keep the lossy.flacs at their original volume and thus get the most out of lossyWAV.  ---- I think that makes sense.

Thanks
C.
Title: lossyWAV Development
Post by: Nick.C on 2008-03-20 08:48:59
Wow Nick, that was fast, thanks!  Can't wait to try it out when I get home.

Quick question though, when using the -scale option are you dithering at the output?  I think I would prefer it not to be dithered or at least make it an option of the -scale command as suggested by David.

I would not be surprised to find that the change from 69.3MB to 93.1MB would be caused by scaling and dither, but would be quite surprised if it was caused by just scaling alone.  I wonder what the result would be with ReplayGain scale values as opposed to using a scale factor of 0.5 for your test suite.
No dithering is employed at all (yet) in lossyWAV. All WAV data related calculations (apart from the final difference for correction values) are performed using 64-bit real values, the only rounding used is for bit-reduction. Using -7 -shaping 1.0 -scale 0.5, the lossy & lwcdf files totalled 98.5MB - it's almost not worth useing correction at all, merely keep a lossless FLAC copy and transcode to lossy.FLAC for an additional lossy copy.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-03-20 11:10:27
If the correction file is just the difference between the original, and the scaled lossy version, then it'll be huge. You shouldn't do it that way. If you think about it (or even if you don't!), you are storing the entire data twice, since the correction file is also a scaled lossy version!

What you should try is this:

First, you must store the scale somewhere. Don't lose it. It's vital. ("keep foreign metadata" in correction file?)

Then...

lossy = original * scale + quantisation noise

correction = original - (lossy / scale)

merged = (lossy / scale) + correction
            = (lossy / scale) + original - (lossy / scale)
            = original!

It's more complicated, and you're going to have to check there are no differential rounding errors (i.e. lossy/scale gives the same result each time, whatever that happens to be), but it's far more efficient. You won't double the file size this way.



I think you need to separate "ReplayGain applied to make lossyWAV more efficient", and "ReplayGain applied for whatever people use ReplayGain for" (!). I'm not sure. It depends how people use it.

I would use the AlbumGain (negative ones only) with lossyWAV. If people are already using ReplayGain anyway, I would pass through all the ReplayGain data (with appropriate adjustment, because the original is now quieter) to include in the final FLAC. If people aren't using ReplayGain normally, then there's nothing to pass through - just apply the negative album gains and leave it at that.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2008-03-20 11:48:34
I've just realised that in my haste to implement -scale, I have omitted to correctly scale codec_blocks which are not having any bits removed whatsoever. Big problem.

I am working on implementing David's recent scale-then-only-store-the-scaled-difference method and will post v0.8.7 asap.

!!Please do not use -scale in beta v0.8.6!!
Title: lossyWAV Development
Post by: Nick.C on 2008-03-20 22:16:02
If the correction file is just the difference between the original, and the scaled lossy version, then it'll be huge. You shouldn't do it that way. If you think about it (or even if you don't!), you are storing the entire data twice, since the correction file is also a scaled lossy version!

What you should try is this:

First, you must store the scale somewhere. Don't lose it. It's vital. ("keep foreign metadata" in correction file?)

Then...

lossy = original * scale + quantisation noise

correction = original - (lossy / scale)

merged = (lossy / scale) + correction
            = (lossy / scale) + original - (lossy / scale)
            = original!

It's more complicated, and you're going to have to check there are no differential rounding errors (i.e. lossy/scale gives the same result each time, whatever that happens to be), but it's far more efficient. You won't double the file size this way.
Ouch my poor head..... I'm getting *close* to the right answer as you detailed above - however I need to get to grips with x87 rounding (again)......
Title: lossyWAV Development
Post by: Nick.C on 2008-03-21 12:45:59
lossyWAV beta v0.8.7 attached to post #1 in this thread.
Title: lossyWAV Development
Post by: Nick.C on 2008-03-25 20:44:33
I've been focussing on speed-ups and have been working on the FFT unit in particular. I tripped over a method of calculating a real FFT of length 2N using a complex FFT of length N. As the FFT in use in lossyWAV is complex, it seems attractive to use this method. However, I am having trouble "untangling" the results to form the result of the real analysis from the complex analysis. I'll keep working on it as I think it will speed up the processing by about 25% overall.
Title: lossyWAV Development
Post by: Nick.C on 2008-03-27 18:54:51
I've finally managed to crack the problem I was having in implementing the 2N real FFT in an N Complex fft speedup - the improvement is between 20% and 25%.

I have however found a problem with the correction / merge / scale combination for 24 bit files - this will be investigated and beta v0.8.8 will be posted.

I played about with an 8 bit WAV file and I am going to remove 8 bit processing as removing any bits from an 8 bit WAV will most probably produce foul results.....
Title: lossyWAV Development
Post by: Nick.C on 2008-03-27 21:05:33
lossyWAV beta v0.8.8 attached to post #1 in this thread.
Title: lossyWAV Development
Post by: Mitch 1 2 on 2008-03-28 03:57:56
In my own tests, lossyWAV 0.8.8 is significantly faster than version 0.8.7.

Would it make sense to drop all of the extra FFT analysis options (-Xa/b/c), in favour of an "-exhaustive" (or "-e") parameter, which would perform analysis passes using all suitable FFT sizes?
Title: lossyWAV Development
Post by: Nick.C on 2008-03-28 08:14:34
In my own tests, lossyWAV 0.8.8 is significantly faster than version 0.8.7.
 
Would it make sense to drop all of the extra FFT analysis options (-Xa/b/c), in favour of an "-exhaustive" (or "-e") parameter, which would perform analysis passes using all suitable FFT sizes?
I ran some tests (process my 53 problem sample set (125.1MB) 5 times and discard highest lowest time, taking average of the remaining 3) on a 2.0GHz Core2Duo (single instance, nothing else running.....) and got the following for v0.8.8:
Code: [Select]
|======|==================|==================|
|  QS  | Time/Rate v0.8.8 | Time/Rate v0.8.7 |
|======|==================|==================|
|  -7  | 15.71s / 47.34x  | 20.47s / 36.33x  |
|  -7a | 21.29s / 34.93x  | 27.84s / 26.71x  |
|  -7b | 27.13s / 27.41x  | 35.97s / 20.67x  |
|  -7c | 32.44s / 22.92x  | 43.44s / 17.12x  |
|======|==================|==================|
So, I *think* that I would rather leave the 3 options in place as the extra analyses still have a major effect on the processing time / rate. All tests were carried out with the input files cached in memory to ignore read latency.
Title: lossyWAV Development
Post by: halb27 on 2008-03-29 14:22:55
Now that the machinery has changed quite a bit I tried to abx my problem sample set
Atemlied, badvilbel, bibilolo, , Blackbird/Yesterday, bruhns, dither_noise_test, eig, fiocco, furious, harp40_1, herding_calls, keys_1644ds, Livin_In_The_Future, S37_OTHERS_MartenotWaves_A, triangle-2_1644ds, trumpet, Under The Boardwalk.

My personal transparency level is where quality level -4 is now. So I tried to abx quality level -4, and I can only say I can't abx any problem. The only thing mentionable is very weak suspicion that Atemlied is not totally transparent (at the very moment when the 'music' starts [sec. 0.0-1.6]), but my abx result doesn't back this up at all.

So everything is great also with the changed machinery which brought a surprisingly high speed increase (from memory - didn't do a real comparison).
Because of the good speed I also encoded my regular track and my problem sample set to learn about average bitrate for the various quality levels:

quality    regular/problem set [kbps]
       -1              467/561
       -2              418/518
       -3              372/472
       -4              346/447
       -5              325/421   
       -6              306/397
       -7              291/375

These are very good properties IMO.
Even for a purist who wants to keep the basic principle and uses -2, the average bitrate for regular music is only slightly above 400 kbps. Average bitrate for the problems is 100 kbps higher on average which IMO is more than enough of security (with earlier and less sophisticated lossyWAV versions ~470 kbps on average for my problem set was necessary to make them transparent).
-3 and -4 are the perfect solutions IMO for the non-purists struggling for transparency.
From -5 up there is an increasing risk of arriving at non-transparent results, but judging from practical listening quality is still very good (just tried [no abxing] some samples from my regular track set using -6 and was very content).

Wonderful work, Nick. Congratulations.
Title: lossyWAV Development
Post by: shadowking on 2008-03-29 14:35:34
Thanks for the tests halb27. Good to hear we have a strong 300..350k range. Bitrates are also looking good with -2 giving archive quality at half the normal bitrate of lossless.
Title: lossyWAV Development
Post by: halb27 on 2008-03-29 16:34:49
... As an aside (and I know that looking at the spectrum in foobar is not any way to evaluate anything....) I looked at the spectral output for a lossyWAV correction file (replaygained +45dB or so) and almost all of the signal was in the high end of the spectrum - so it "looks" like my implementation of your noise shaping filter works!

Finally I tried -shaping too and also looked at the correction file's spectrum. Noise is less audible than without shaping, so it works well. Noise gathers mainly in the highest spf frequency zone and above (12.4+ khz). Because of this bitrate is often expected to be higher than without shaping in order to arrive at the same S/N ratio in the highest frequency zone.

Some proposals on the bitrate bloat issue:

a) on a per block basis decide which shaping yields the higher number of bits to remove:
shaping 0 or shaping 1 (or shaping x, y, z, ....).
I came to this idea because shaping 1 does not always yield a bitrate bloat. For dither_noise_test shaping 0 yields 705 kbps, whereas shaping 1 yields 295 kbps (when used together with -7), and I couldn't hear a problem.
Sure this means at least doubling the encoding time. Moreover maybe the changing of the noise shaping is audible (but the same argument applies as towards the sudden noise increase and decrease when the anti-clipping strategy goes to work: as long as the noise is hidden we shouldn't mind).
Maybe an autoshaping strategy like this is most adequate: start for the first block with a low shaping value like shaping = 0.2.
For any current block: try the shaping done with the last block, as well as a with two shaping values that add resp. subtract a certain delta from the last shaping value. Always use the shaping from the three possible values which maximizes the number of bits to remove.
This is the basic principle. In order to save some work, checking for changing the shaping value is not necessarily done on every block or with both directions. For instance with the current block it is checked only whether an increase of the shaping value is useful. On the next block the check goes in the opposite direction: check only whether decreasing the shaping value is useful, and so on interchangingly possibly increasing resp. decreasing shaping. Things like that. Maybe the frequency of the changes can trigger pauses for the shaping checking. When changes are rare it's not useful to do the checking with every block.

b) Maybe a more direct approach is more efficient: decide on the spectrum of the signal which shaping to use. When there's a lot of HF in the music, putting the noise into the HF region should be a good thing. Not quite so when there's no HF present in the music to hide the noise.

c) Maybe giving the potential to shift the noise also towards low frequencies instead of only shifting up may be useful with the approaches of a) or b).

d) Think of quality level -7 as of targeting at rather low bitrate lovers accepting some compromise. So for -7 soften the accuracy requirements for the highest spf frequency zone. For instance don't check this zone at all other than for the 64 sample FFT analyses (so far this shouldn't seriously hurt). If necessary: use a spreading of 5 or even 6 instead of 4 for the highest frequency zone in the 64 sample FFT analyses (this hurts - a bit for a spreading of 5, more so when going higher).
Title: lossyWAV Development
Post by: Nick.C on 2008-03-29 17:12:47
Some proposals on the bitrate bloat issue:

a) on a per block basis decide which shaping yields the higher number of bits to remove:
shaping 0 or shaping 1 (or shaping x, y, z, ....).
I came to this idea because shaping 1 does not always yield a bitrate bloat. For dither_noise_test shaping 0 yields 705 kbps, whereas shaping 1 yields 295 kbps (when used together with -7), and I couldn't hear a problem.
Sure this means at least doubling the encoding time. Moreover maybe the changing of the noise shaping is audible (but the same argument applies as towards the sudden noise increase and decrease when the anti-clipping strategy goes to work: as long as the noise is hidden we shouldn't mind).
Maybe an autoshaping strategy like this is most adequate: start for the first block with a low shaping value like shaping = 0.2.
For any current block: try the shaping done with the last block, as well as a with two shaping values that add resp. subtract a certain delta from the last shaping value. Always use the shaping from the three possible values which maximizes the number of bits to remove.
This is the basic principle. In order to save some work, checking for changing the shaping value is not necessarily done on every block or with both directions. For instance with the current block it is checked only whether an increase of the shaping value is useful. On the next block the check goes in the opposite direction: check only whether decreasing the shaping value is useful, and so on interchangingly possibly increasing resp. decreasing shaping. Things like that. Maybe the frequency of the changes can trigger pauses for the shaping checking. When changes are rare it's not useful to do the checking with every block.

b) Maybe a more direct approach is more efficient: decide on the spectrum of the signal which shaping to use. When there's a lot of HF in the music, putting the noise into the HF region should be a good thing. Not quite so when there's no HF present in the music to hide the noise.

c) Maybe giving the potential to shift the noise also towards low frequencies instead of only shifting up may be useful with the approaches of a) or b).

d) Think of quality level -7 as of targeting at rather low bitrate lovers accepting some compromise. So for -7 a soften the accuracy requirements for the highest spf frequency zone. For instance don't check this zone at all other than for the 64 sample FFTs (so far this shouldn't seriosly hurt). If necessary: use a spreading of 5 or even 6 instead of 4 for the highest frequency zone in the 64 sample FFT analyses (this hurts - a bit for a spreading of 5, more so when going higher).
Unfortunately, the -shaping parameter barely changes the bits-to-remove as calculated by lossyWAV, however as SebastianG said earlier it makes the predictors in the lossless codec work less efficiently. I wonder if a variable shaping which is 1.0 at 8 or more bits to remove and 0.0 at 0 bits to remove, i.e. a resolution of 0.125 shaping per bit-to-remove, might be effective?

I'll try it out this evening and if it works at all, I'll post a new beta, probably with a parameter "-autoshape" which will be compatible with -shaping <n> in the sense that if autoshape says -shaping 0.125, but -shaping has been specified as 0.5 then shaping will be in the range 0.5 to 1, treating the -shaping value as a minimum value.
Title: lossyWAV Development
Post by: halb27 on 2008-03-29 17:22:51
Unfortunately, the -shaping parameter barely changes the bits-to-remove as calculated by lossyWAV, however as SebastianG said earlier it makes the predictors in the lossless codec work less efficiently. ...

I see, and now that you write it I remember SebastianG's remark. If this is so: maybe shifting noise downwards in a controlled way makes things easier for the predictor. It's often helpful with wavPack lossy when using rather low bitrate.
Title: lossyWAV Development
Post by: Nick.C on 2008-03-29 17:33:19
Unfortunately, the -shaping parameter barely changes the bits-to-remove as calculated by lossyWAV, however as SebastianG said earlier it makes the predictors in the lossless codec work less efficiently. ...
I see, and now that you write it I remember SebastianG's remark. If this is so: maybe shifting noise downwards in a controlled way makes things easier for the predictor. It's often helpful with wavPack lossy when using rather low bitrate.
That would take a new noise shaping function. SebastianG very kindly donated his 44.1kHz and 48kHz functions to get noise shaping working in lossyWAV, but I have no idea how to derive a new one which ideally would push noise above 20kHz and below, say, 10Hz.
Title: lossyWAV Development
Post by: halb27 on 2008-03-29 17:42:18
[That would take a new noise shaping function. SebastianG very kindly donated his 44.1kHz and 48kHz functions to get noise shaping working in lossyWAV, but I have no idea how to derive a new one which ideally would push noise above 20kHz and below, say, 10Hz.
I see, but maybe some day you will run upon such a function.

Another idea for the low bitrate lovers:
What about lowpassing to 17 or so kHz before letting the lossless codec do its job. Guess that brings the bitrate bloat down a bit. I'll try that using sox.

Good Lord: BS of course as this destroys the work of lossyWAV.
Title: lossyWAV Development
Post by: halb27 on 2008-03-29 18:50:53
Quite a pity, but interesting:

While with low bitrate settings the bitrate bloat is very remarkable when using -shaping 1, both in an absolute and, more so, in a relative sense (+25% for -7 with my regular full length track set), the absolute and relative difference is lower for the high quality settings (+12% for -2, +9% for -1).
Listening to the correction file of a -1 encoding the noise is so much less audible with -shaping 1 that it may be desirable to use -shaping 1 especially with quality -1.
Title: lossyWAV Development
Post by: Bourne on 2008-03-29 19:18:00
-
Title: lossyWAV Development
Post by: Nick.C on 2008-03-29 20:12:11
I have a question for Nick C.

The resulting processed WAV file is smaller than the original ?
Eg. I could burn an entire lossyWAV album in WAV/PCM format without taking the whole space it would with the standard WAV?
I'm afraid not - all lossyWAV does is to zero lsb's in each sample as required. It does not change the bitdepth of the sample and therefore does not change the size of the ouput file, other than to add a 'fact' chunk with the lossyWAV processing information near the beginning of the WAV file.
Title: lossyWAV Development
Post by: Bourne on 2008-03-29 20:33:54
-
Title: lossyWAV Development
Post by: Nick.C on 2008-03-29 20:39:21
so it's the lossless codec that takes advantage over the processed WAV...
Exactly right, David mentioned this in the first post in his original thread.
Title: lossyWAV Development
Post by: Nick.C on 2008-03-29 21:14:05
lossyWAV beta v0.8.9 attached to post #1 in this thread.

Thanks for the tests halb27. Good to hear we have a strong 300..350k range. Bitrates are also looking good with -2 giving archive quality at half the normal bitrate of lossless.
I remember your comments in the first page of the thread and I am glad that we are getting close to your desired 340kbps...... 

Using the -7 -autoshape with my 53 sample set I get 366.8kbps compared to 348.1kbps for -7 and 399.2kbps for -7 -shaping 1.0.
Title: lossyWAV Development
Post by: halb27 on 2008-03-29 22:38:03
... Using the -7 -autoshape with my 53 sample set I get 366.8kbps compared to 348.1kbps for -7 and 399.2kbps for -7 -shaping 1.0.

Sounds promising. I'm curious what it looks like with regular music.
Title: lossyWAV Development
Post by: Nick.C on 2008-03-29 22:58:43
... Using the -7 -autoshape with my 53 sample set I get 366.8kbps compared to 348.1kbps for -7 and 399.2kbps for -7 -shaping 1.0.
Sounds promising. I'm curious what it looks like with regular music.
I've had a thought - the current implementation of -autoshape is analogous to maximum-bits-to-remove was prior to the use of the RMS value of the codec-block (expressed in bits) to determine the variable maximum-bits-to-remove. I am working on a variant which will take into account the RMS value of the codec-block at the same time - should be posted tomorrow.
Title: lossyWAV Development
Post by: halb27 on 2008-03-29 23:24:04
Anyway it looks very promising already:

-7 -autoshape => 325/385 kbps (my regular/problem set)
-4 -autoshape => 369/450 kbps (my regular/problem set)
-2 -autoshape => 432/519 kbps (my regular/problem set)

Again, penalty is lower the higher the quality setting.

For a fair comparison I encoded 'Livin_In_The_Future' with -5 as well as -7 -autoshape.
Looking at the spectrum noise behavior is better with -7 -autoshape up to ~ 9 kHz.
This is a valuable extension to the effect of the skewing machinery which keeps noise especially low up to ~ 3 kHz.
So the entire range of the fundamentals is kept within pretty low noise this way.
Of course this isn't a judgement about audible quality in the end.
I listened to the correction files of -5 and -7 -autoshape and as expected the coloured -autoshape noise of the -7 encoding is less audible than than the white noise of -5 though it's higher in ampitude.
Again this doesn't really tell about quality for quality levels using a positive -nts value.
With -nts 0 or negative however I think the quality control mechanism makes sure everything is fine (assuming the control mechanism is really working, and we don't have a reason to doubt that).
Title: lossyWAV Development
Post by: Dynamic on 2008-04-01 05:17:56
As it is now, lossyFLAC + lwcdfFLAC is still smaller than a plain FLAC, especially using lossyWAV -1 preset.


Sorry to revive a 2-week-old quote, but I've been away from the forums for quite some time. The above comment surprised me.

I'd be surprised if the lossless combination of (lossyFLAC plus correction files) would on average come out to occupy less disk space than plain FLAC for the same music. I say this because, for example, the combination (Wavpack hybrid plus correction file) is usually a little larger than plain Wavpack lossless, and this seems understandable because you're sacrificing some efficiency to enable you to split the total information content into a playable but smaller file plus a near-random noise correction file.

If it were generally true, then you've just found a way to improve the lossless compression ratio of FLAC - an unlikely result (certainly in comparison to optimal lossless FLAC settings and block lengths), but potentially valuable if true.

As an aside, I'm loving your work, everyone. This project looks rather exciting.

I would be fascinated to probe the boundaries of how good lossyWAV is as a transcoding source for conventional lossy encoders - i.e. to store my music on my PC in, say, lossyFLAC or lossyWV format, then transcode for portable devices on demand (I've already considered applying RG Album Gain before using lossless compression, accepting a theoretical but inaudible SNR degradation on overly-loud albums in exchange for reducing the bitrate).

As an approach to verifying transcoding robustness, I wonder about choosing, say, known LAME problem samples and encoding those from original WAV versus lossyWAV sources. If the artifact behaviour stays broadly similar, is that a valid reassurance that lossyWAV at setting X makes a robust source for transcoding so long as no other problems have been found with the problem samples that affect Wavpack Lossy, such as atemlied? Even for problems that are fixed in newer LAME versions, I guess one could use an older version of LAME and a newer one to check that the original artifact and the fixed version are substantially unchanged when using lossyWAV.

This approach might then help to guide the choice of quality setting for those who desire a transcoding source. I presume lossyWAV with quality -2, for example, would even work well as source material for transcoding down in quality into lossyWAV quality -7 for the PDA-DAP low battery drain approach because the changes made should barely affect the measured noise floor compared to the original WAV, and you're going much more aggressive anyway on that second pass to quality -7.

Best regards,
Dynamic
Title: lossyWAV Development
Post by: 2Bdecided on 2008-04-01 18:23:30
Great work Nick.

Re: the bitrate bloat due to "shaping" - surely the whole point of shaping the noise is so that you can add more of it while maintaining the same level in the audible band? So when you enable shaping, you should also add a threshold shift. (Sorry if you're doing this already!).

Re: lower bitrate than normal FLAC: lossyFLAC uses a different default block size. Sometimes the output is smaller than the default FLAC blocksize, sometimes it's larger - maybe this is what's happening? Overall, what you say is correct: lossy+correction should be larger than lossless, but it would be nice if it wasn't!

Re: transcodability: I tested with mp3 problem samples very early on. It would be worth re-testing with the current version. I found you can't be too aggressive if you want to avoid any audible difference (e.g. -1!), but if "different but not worse" is good enough, you can use normal settings (which at the time was -2!). that horrible "trumpet" sample was quite revealing.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-01 21:09:05
Great work Nick.

Re: the bitrate bloat due to "shaping" - surely the whole point of shaping the noise is so that you can add more of it while maintaining the same level in the audible band? So when you enable shaping, you should also add a threshold shift. (Sorry if you're doing this already!).

Re: lower bitrate than normal FLAC: lossyFLAC uses a different default block size. Sometimes the output is smaller than the default FLAC blocksize, sometimes it's larger - maybe this is what's happening? Overall, what you say is correct: lossy+correction should be larger than lossless, but it would be nice if it wasn't!

Re: transcodability: I tested with mp3 problem samples very early on. It would be worth re-testing with the current version. I found you can't be too aggressive if you want to avoid any audible difference (e.g. -1!), but if "different but not worse" is good enough, you can use normal settings (which at the time was -2!). that horrible "trumpet" sample was quite revealing.

Cheers,
David.
Thanks David, I'm glad you like it!

Re: Bitrate Bloat - I'm not doing that at present as my ears are not really sensitive enough, but if anyone wants to do some testing, I would suggest something along the lines of the following, taking into account the relationship between -snr and -nts:

Code: [Select]
  quality_noise_threshold_shifts    : array[1..Quality_Levels] of Integer = (-3,-0,3,6,9,12,15);

  quality_signal_to_noise_ratio     : array[1..Quality_Levels] of Integer = (24,22,20,19,18,17,16);


So, if I was going to go further, I would initially add 3 to -nts for every 1 taken from -snr, i.e. -8 = -nts 18 -snr 15; -9 = -nts 21 -snr 14; etc.

I tried "-nts 30 -snr 11 -autoshape" and with my problem set it doesn't sound particularly bad - probably a starting point.

Re: Transcodability - lossyWAV does not allow re-processing of an already processed file. I would prefer to keep it that way, although if the 'fact' chunk is removed, the program will not be able to tell the difference.

I found a small bug in the noise shaping code and also some quite nice speedups (approx 7% to 10%), so:

lossyWAV beta v0.9.0 attached to post #1 in this thread.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-01 21:20:27
I have run some new tests (process my 53 problem sample set (125.1MB)  on a 2.0GHz Core2Duo (single instance, nothing else running.....) and got the following for beta v0.9.0:
Code: [Select]
|======|==================|==================|
|  QS  | Time/Rate v0.8.8 | Time/Rate v0.9.0 |
|======|==================|==================|
|  -7  | 15.71s / 47.34x  | 14.34s / 51.86x  |
|  -7a | 21.29s / 34.93x  | 19.30s / 38.54x  |
|  -7b | 27.13s / 27.41x  | 24.56s / 30.27x  |
|  -7c | 32.44s / 22.92x  | 29.47s / 25.23x  |
|======|==================|==================|
All tests were carried out with the input files cached in memory to ignore read latency.
Title: lossyWAV Development
Post by: halb27 on 2008-04-01 22:22:27
Thank you, Nick.
I'd like to do some abxing using -autoshape, but because abxing isn't so much fun I've been waiting for your version which takes the RMS value into account.
Does v0.9.0 contain this feature?
Title: lossyWAV Development
Post by: Nick.C on 2008-04-01 22:31:43
Thank you, Nick.
I'd like to do some abxing using -autoshape, but because abxing isn't so much fun I've been waiting for your version which takes the RMS value into account.
Does v0.9.0 contain this feature?
I tried to implement the -autoshape taking into account RMS value and the bitrate went through the roof. I may re-visit it, but I think the -autoshape in v0.9.0 is fairly robust, 0% shaping at 0 bits-to-remove and 100% shaping at (bits-per-sample - 3) bits-to-remove. Using -7 -autoshape -snr 11 -nts 30, my 53 problem sample set ends up at 327.3kbps, and the quality is not too bad - a starting point as I said above.
Title: lossyWAV Development
Post by: halb27 on 2008-04-01 22:39:17
OK, I'll try to abx my usual problem samples with v0.9.0 -7 -autoshape. I'll also search for other tracks looking for hiss or other HF problems.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-01 22:40:23
OK, I'll try to abx my usual problem samples with v0.9.0 -7 -autoshape. I'll also search for other tracks looking for hiss or other HF problems.
Many thanks!
Title: lossyWAV Development
Post by: 2Bdecided on 2008-04-02 10:44:29
Re: Transcodability - lossyWAV does not allow re-processing of an already processed file. I would prefer to keep it that way
That's a good thing (except for testing) - I meant feeding the output of lossyWAV to an mp3 encoder. I think that's what Dynamic was talking about.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-02 12:07:50
Re: Transcodability - lossyWAV does not allow re-processing of an already processed file. I would prefer to keep it that way
That's a good thing (except for testing) - I meant feeding the output of lossyWAV to an mp3 encoder. I think that's what Dynamic was talking about.

Cheers,
David.
Yes, I see what he meant now. However, it reminds me of a quote on anythingbutipod where someone transcoded from lossyWAV >> OGG and the filesize increased by about 1MB compared to lossless >> OGG (as a %age I have no idea....)

As an aside, -7 -autoshape -snr 9 -nts 36 is nearly palatable on my iPAQ and comes in at 321.1kbps for my 53 problem sample set.

[edit] Iterating, I found that -7 -autoshape -snr 14.35 -nts 19.95 is very close in bitrate to vanilla -7. [/edit]
Title: lossyWAV Development
Post by: user on 2008-04-02 13:15:42
Once there was a good rule,
Thou should not transcode

lossy-wav is made already for portable usage, compatibility with portable devices flac supporting, but to shrink the size of true Lossless music. No reason to go 2nd time lossy, ie. transcode from lossy->lossy. if somebody wants mp3/mpc/ogg/aac as small sized thingie for portable usage, then go directly from Lossless source to small-lossy (mp3/mpc/ogg/aac).
There are already enough programs and scripts to encode in 1 single step to various formats/sizes/bitrates, like mareo.exe.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-04-02 13:45:12
I fundamentally disagree with you user.

Of course you should not aim to transcode, but sometimes it is inevitable, and sometimes it is not worth worrying about.

Of course we should all keep our lossless files and use them everywhere, but sometimes we can't, and sometimes it is not worth worrying about (for some people).


For example...

Modern loud CDs regularly hit 1000kbps+ with lossless codecs.

What's happening is that the mathematical 96dB range is being perfectly preserved, even though the actual dynamic range is about 6dB.

I believe it is pointless keeping the lossless version. It's a "perfect" copy of a mediocre original.

Thankfully, lossyWAV, used less aggressively, allows you to make a near-lossless version.

If I can have something which is half the bitrate (or less), sounds identical, and transcodes identically, then I have no need to keep the lossless version.

To me, this is an argument for dumping the lossless original. As it says on the Monkey's Audio website: lossless is for anal retentives. I'm not one, so if all rational reasons for lossless are removed, I won't use it. YMMV.


Why not create mp3s or whatever from the lossless original?

1. "Why not?" Well, Why? Really, if there's no difference, why? It's OCD-like behaviour.

2. I might not know the lossy format I will need in the future. Shall I create mp3, ogg, AAC, HE-AAC etc?

3. If I'm a radio station, it's the broadcast (FM, mp2, mp3, WMA, whatever) that's the "transcode" - I can hardly avoid that or make it at the same time as I rip the CD.


So, for me, the "transcodability" of the less aggressive lossyWAV modes is very important.

I could give you more examples... "sensible" preservation of 24/96 files; "sensible" preservation of GBs of "working files" from audio sessions which will probably never be used again, but won't be any use at all if converted to mp3; etc etc etc.

If lossyWAV, in its more gentle modes, is "safe".

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2008-04-02 21:42:40
I just finished my abx test.

First I followed your suggestion and used -7 -autoshape -nts 20 -snr 14.
This setting yields 309 kbps for my regular set (quite a bit higher than plain -7) and 355 kbps for my problem set (a bit low for problem samples).
Hiss is pretty audible with bruhns (for instance sec. 9.3-10.2), but it's audible also with bibilolo (sec. 4.3-5.5) and badvilbel (sec. 5.9-7.2). There's also a slight inaccuracy with Atemlied (sec. 9.3-10.1) which is best audible at moderately high listening volume.
I didn't test a lot more samples then those mentioned because to me this is not adequate quality for an average of 309 kbps. The hiss (and the inaccuracy) isn't really annoying though, and I listened to some regular music (carefully but without abxing), and was content with it. Anyway looking at codecs like vorbis (I just tested the new Aoyumi version, and quality is great even at -q4 [~130 kbps] I personally don't like my abx result at a bitrate of ~310 kbps.

I redid the test using plain -7 autoshape which yields 325/384 kbps for my regular/problem test set.
bruhns 9.3-10.2 is better now to me, though still quite audible. Same goes for bibilolo.
I didn't test more samples as I personally am not content with this as well.

-6 -autoshape yields 337/404 kbps.
bibilolo is ok now, but when abxing bruhns I found added hiss already at sec. 2.3-4.4.

I skipped -5 and went directly to -4 -autoshape as from my last test I know plain -4 is transparent for me with the samples tested. -4 -autoshape yields 369/450 kbps. With this average bitrate for the problem set chances are good that everything is alright now.
bruhns at sec. 9.3-10.2 however still isn't perfect though quite acceptable.

Summing it up to me this isn't a good result for using -autoshape though it's a matter of taste whether or not one is willing to accept added hiss which seems to be the major issue when using -autoshape with low bitrate settings.

Maybe a variant of autoshape is more successful: make the frequency up shift of noise also depend on the degree to which there's energy in the input signal's 2 highest frequency zones (that is the range from ~8.2 kHz up). As bruhns is a pretty low volume sample maybe being more conservative at low volume is helpful too. What do you think, Nick?
Title: lossyWAV Development
Post by: Nick.C on 2008-04-02 21:57:03
I just finished my abx test.

First I followed your suggestion and used -7 -autoshape -nts 20 -snr 14.
This setting yields 309 kbps for my regular set (quite a bit higher than plain -7) and 355 kbps for my problem set (a bit low for problem samples).
Hiss is pretty audible with bruhns (for instance sec. 9.3-10.2), but it's audible also with bibilolo (sec. 4.3-5.5) and badvilbel (sec. 5.9-7.2). There's also a slight inaccuracy with Atemlied (sec. 9.3-10.1) which is best audible at moderately high listening volume.
I didn't test a lot more samples then those mentioned because to me this is not adequate quality for an average of 309 kbps. The hiss (and the inaccuracy) isn't really annoying though, and I listened to some regular music (carefully but without abxing), and was content with it. Anyway looking at codecs like vorbis (I just tested the new Aoyumi version, and quality is great even at -q4 [~130 kbps] I personally don't like my abx result at a bitrate of ~310 kbps.

I redid the test using plain -7 autoshape which yields 325/384 kbps for my regular/problem test set.
bruhns 9.3-10.2 is better now to me, though still quite audible. Same goes for bibilolo.
I didn't test more samples as I personally am not content with this as well.

-6 -autoshape yields 337/404 kbps.
bibilolo is ok now, but when abxing bruhns I found added hiss already at sec. 2.3-4.4.

I skipped -5 and went directly to -4 -autoshape as from my last test I know plain -4 is transparent for me with the samples tested. -4 -autoshape yields 369/450 kbps. With this average bitrate for the problem set chances are good that everything is alright now.
bruhns at sec. 9.3-10.2 however still isn't perfect though quite acceptable.

Summing it up to me this isn't a good result for using -autoshape though it's a matter of taste whether or not one is willing to accept added hiss which seems to be the major issue when using -autoshape with low bitrate settings.

Maybe a variant of autoshape is more succesfull: make the frequency up shift of noise also depend on the degree to which there's energy in the input signal's 2 highest frequency zones (that is the range from ~8.2 kHz up). Do you like to try that, Nick?
Ok, I will try again to implement the RMS variability approach that I was trying (but didn't release).

Another approach would be to make the variability of the shaping non-linear with respect to bits-to-remove. At present it increases at 1/13 per bit to remove, i.e. 0=0; 1=1/13; 2=2/13; etc; 12=12/13; 13=13/13. If I was to change this from linear to some power, say for example shaping_factor = 1-((13-bits-to-remove)/13)^n then things may change.

Again, the noise shaping function itself is totally fixed, all the autoshape function is vary how much to apply. It doesn't change the frequency to which the noise is shifted. Think of it as -shaping 0 = pure white noise; -shaping 1 = fully shaped noise; -shaping <n> = something in between.

I'll get back to the "drawing board" with the autoshape function and post v0.9.1 soon.

As an aside, I have found that TCPMP for my iPAQ plays FLAC *much* better (better = less cpu usage and more accurate output) than GSPlayer / gspflac.dll. In particular dithernoisetest would exhibit some harmonics using GSPlayer which don't exist in TCPMP. TCPMP v0.72 RC1 is still available, google is your friend....
Title: lossyWAV Development
Post by: halb27 on 2008-04-02 22:02:51
... Again, the noise shaping function itself is totally fixed, all the autoshape function is vary how much to apply. It doesn't change the frequency to which the noise is shifted. Think of it as -shaping 0 = pure white noise; -shaping 1 = fully shaped noise; -shaping <n> = something in between. ...

That's clear. I was trying to bring another thing into focus: masking effects. If there's a lot of HF energy in the input signal, your shaping factor can be close to 1, and if there's no or little HF energy there the shaping factor is better close to 0. To be considered as well as the amplitude considerations.
Title: lossyWAV Development
Post by: halb27 on 2008-04-02 22:15:34
... Another approach would be to make the variability of the shaping non-linear with respect to bits-to-remove. At present it increases at 1/13 per bit to remove, i.e. 0=0; 1=1/13; 2=2/13; etc; 12=12/13; 13=13/13. If I was to change this from linear to some power, say for example shaping_factor = 1-((13-bits-to-remove)/13)^n then things may change. ....

With bits-to-remove=1 and n=2 this yields a shaping factor of ~0.148 > 1/13~0.077. So this would make the noise shaping more agressive (in case I understand this correctly).
Title: lossyWAV Development
Post by: Nick.C on 2008-04-02 22:27:15
... Another approach would be to make the variability of the shaping non-linear with respect to bits-to-remove. At present it increases at 1/13 per bit to remove, i.e. 0=0; 1=1/13; 2=2/13; etc; 12=12/13; 13=13/13. If I was to change this from linear to some power, say for example shaping_factor = 1-((13-bits-to-remove)/13)^n then things may change. ....
With bits-to-remove=1 and n=2 this yields a shaping factor of ~0.148 > 1/13~0.77. So this would make the noise shaping more agressive (in case I understand this correctly).
Yes, it will make it more aggressive. Using the revised -autoshape -7, my 53 problem sample set yields 378.77kbps. With the v0.9.0 -autoshape -7, 366.21kbps.

lossyWAV beta v0.9.1 attached to post #1 in this thread.
Title: lossyWAV Development
Post by: halb27 on 2008-04-02 22:48:02
... Another approach would be to make the variability of the shaping non-linear with respect to bits-to-remove. At present it increases at 1/13 per bit to remove, i.e. 0=0; 1=1/13; 2=2/13; etc; 12=12/13; 13=13/13. If I was to change this from linear to some power, say for example shaping_factor = 1-((13-bits-to-remove)/13)^n then things may change. ....

With bits-to-remove=1 and n=2 this yields a shaping factor of ~0.148 > 1/13~0.77. So this would make the noise shaping more agressive (in case I understand this correctly).
Yes, it will make it more aggressive. Using the revised autoshape, my 53 problem sample set yields 378.77kbps. With the v0.9.0 autoshape, 366.21kbps.

I think in the opposite direction: using for instance something like

bits-to-remove      shaping factor
        0                          0
        1                          0
        2                          0.1
        3                          0.15
        4                          0.25
        5                          0.4
        6                          0.55
        7                          0.7
        8                          0.8
        9                          0.85
       10                           0.9
       11                          0.95
    >=12                        1.0

so that as a tendency with low-volume spots shaping is small.
Should be positive for the added hiss of low-volume spots. Reduces the bitrate bloat as well.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-02 22:55:29
I thought that the problem of hiss you were encountering would be white noise, i.e. shaping too low?

Is full -shaping 1.0 better than -autoshape for the problem samples you identified (bruhns, bibilolo, badvilbel)?

And finally (as if you had nothing else better to do  ) is the v0.9.1 -autoshape any better (if full shaping is better than autoshape v0.9.0)?
Title: lossyWAV Development
Post by: singaiya on 2008-04-03 00:35:07
David, I totally agree with your post. Well said, as usual. I'm excited for lossyWAV precisely because of it's potential as a transcodable source. I'll do some listening tests in this area when I can find the time. This sums it up for me:

If I can have something which is half the bitrate (or less), sounds identical, and transcodes identically, then I have no need to keep the lossless version.


Not to mention that there isn't a problem sample found yet (at more defensive settings).
Title: lossyWAV Development
Post by: Dynamic on 2008-04-03 01:12:32
1. "Why not?" Well, Why? Really, if there's no difference, why? It's OCD-like behaviour.

2. I might not know the lossy format I will need in the future. Shall I create mp3, ogg, AAC, HE-AAC etc?

3. If I'm a radio station, it's the broadcast (FM, mp2, mp3, WMA, whatever) that's the "transcode" - I can hardly avoid that or make it at the same time as I rip the CD.


So, for me, the "transcodability" of the less aggressive lossyWAV modes is very important.


I'm in agreement. I'd like to use a safe & robust setting in lossyWAV (-1 or -2 perhaps) just as I'd happily pre-process my rips with Album Gain and simple dither (using wavgain or foobar2000) before losslessly compressing them. I'd treat either as an excellent quality source to keep on my hard drive, which I could tag properly and then robustly encode to conventional lossy formats as I need it. Such an archive occupies far less space than straight lossless in the case of those many modern dynamically ultra-compressed albums.

I acquire new playback devices from time to time, and may desire different formats to suit the storage capacity / battery life / format compatibility / gapless support available with each. (Pragmatism frequently leads me to stick to LAME VBR MP3s, however). Also, on occassions, I actually need to have a degree of dynamic compression (foo_vlevel) for soft background music from highly dynamic sources that would get lost entirely in places if I didn't use some volume levelling. This requires processing before encoding (unless we get frame-by-frame volume levelling in mp3gain style).

I wouldn't normally want to transcode lossyWAV -2 to lossyWAV -7, for example, but perhaps if I wanted low battery-drain and fairly good quality on the right device, I'd be willing to do so, pragmatically, (and I'd be tempted to name the file as .transcoded.lossy.flac or with a .lossy7t.flac extension or some such, just in case it should ever find its way back onto my PC).

It seems that I'm part of a minority in being willing to use 'safe' lossyFLAC or lossyWV in place of true lossless as my main PC storage and for generating lossy files pretty-much on the fly, for whatever external device I wish.
Title: lossyWAV Development
Post by: The Sheep of DEATH on 2008-04-03 02:57:39
Is there currently any plan/work on dynamic noise shaping?  That is, despite some obscure and ancient "patent" (which may/may not apply depending on license/age/obscurity/location/algo differences--and on that topic, imho, any software patent shouldn't last more than 5 years, much less 15)... 

Currently, (yes I use tcpmp 0.72rc1 or the 0.8x builds floating around  ), 320kbps is my max, so I'm looking forward to just the right combination of settings (i.e. -7 with shaping, snr, nts) to produce such a file at that bitrate.  320kbps with noise shaping, whew.  This is really heating up! 
Title: lossyWAV Development
Post by: halb27 on 2008-04-03 06:48:02
... It seems that I'm part of a minority in being willing to use 'safe' lossyFLAC or lossyWV in place of true lossless as my main PC storage and for generating lossy files pretty-much on the fly, for whatever external device I wish.

I also think like that. I'd love to have just 1 collection (not a lossless and a lossy one), and -1 or -2 and a good additional noise shaping is a very promising way to go.
Title: lossyWAV Development
Post by: halb27 on 2008-04-03 07:08:38
I thought that the problem of hiss you were encountering would be white noise, i.e. shaping too low?

Is full -shaping 1.0 better than -autoshape for the problem samples you identified (bruhns, bibilolo, badvilbel)?

And finally (as if you had nothing else better to do  ) is the v0.9.1 -autoshape any better (if full shaping is better than autoshape v0.9.0)?

I'll try your proposals this weekend.
My considerations arise from my WavPack lossy experience. Before David Bryant introduced dynamic noise shaping I preferred to shift noise upwards. This eliminated ugly distortions with samples like keys, but it introduced the risk of audible hiss. This risk was very real when using settings in the 300 to 350 kbps range, especially with high values for the shift.
I think our situation is similar, and I think a strong shifting up should only be done when there's a high chance that the added hiss is masked. I think this is especially so as with the very aggressive settings quality control of our machinery has a weak basis in general and a very weak basis above ~3 kHz (though it works fine to an astonishing extent).
As for controlling the masking of HF hiss I can imagine a crude approach is sufficient.
The very first approach can be: use a shaping factor of 1 for very loud music, and a shaping factor of 0 for quiet music, but do it defensively meaning: with music of mediocre loundness use a moderate shaping factor closer to 0 than to 1.
Bits to remove is a rough measure for the loudness of the music. So the noise shaping factor can be computed by something like:

noise shaping factor   = 0     for bits-to-remove <= 5
                                   = 1     for bits-to-remove >=12
                                   = (bits-to-remove - 5)^2/49     for bits-to-remove between 5 and 12

With this very crude approach of controlling the masking I think it's best to be very conservative.

A better hiss masking control (which needn't be that defensive) could be not to take into account the loudness of the music (or the number of bits to remove), but the HF energy of the input signal, something like the sum of all the bins in the 2 highest frequency zones of the FFT analyses (~8.2+ kHz) for all the 64 sample FFTs which make up for an entire 512 sample block.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-03 07:54:49
I'll try your proposals this weekend.
My considerations arise from my WavPack lossy experience. Before David Bryant introduced dynamic noise shaping I preferred to shift noise upwards. This eliminated ugly distortions with samples like keys, but it introduced the risk of audible hiss. This risk was very real when using settings in the 300 to 350 kbps range, especially with high values for the shift.
I think our situation is similar, and I think a strong shifting up should only be done when there's a high chance that the added hiss is masked. I think this is especially so as with the very aggressive settings quality control of our machinery has a weak basis in general and a very weak basis above ~3 kHz (though it works fine to an astonishing extent).
As for controlling the masking of HF hiss I can imagine a crude approach is sufficient.
The very first approach can be: use a shaping factor of 1 for very loud music, and a shaping factor of 0 for quiet music, but do it defensively meaning: with music of mediocre loundness use a moderate shaping factor closer to 0 than to 1.
Bits to remove is a rough measure for the loudness of the music. So the noise shaping factor can be computed by something like:

noise shaping factor   = 0     for bits-to-remove <= 5
                                   = 1     for bits-to-remove >=12
                                   = (bits-to-remove - 5)^2/49     for bits-to-remove between 5 and 12

With this very crude approach of controlling the masking I think it's best to be very conservative.

A better hiss masking control (which needn't be that defensive) could be not to take into account the loudness of the music (or the number of bits to remove), but the HF energy of the input signal, something like the sum of all the bins in the 2 highest frequency zones of the FFT analyses (~8.2+ kHz) for all the 64 sample FFTs which make up for an entire 512 sample block.
So, instead of just calculating the minimum / average of each FFT output for the whole range, 20Hz > 16kHz, I could calculate a minimum / average for each different portion of the spreading frequency list. In this way, the relative outputs in each sub-range could be compared and if the high frequency range was low then apply less shaping as you have already suggested.

[edit] As an aside, I thought that we were getting close to the "end" with respect to v1.0.0, so the release numbers have been climbing rapidly. As we are in (yet another!) potentially fairly fast transitionary period, I will be appending b > z to the beta releases to give me more "time" before v1.0.0.... [/edit]

[edit2] If I was going to really push the processing-time-per-codec-block requirement, I could also carry out a 512 sample FFT on the correction data, i.e. quantization noise, and see where the quantization noise has actually gone.... [/edit2]
Title: lossyWAV Development
Post by: halb27 on 2008-04-03 08:50:37
So, instead of just calculating the minimum / average of each FFT output for the whole range, 20Hz > 16kHz, I could calculate a minimum / average for each different portion of the spreading frequency list. In this way, the relative outputs in each sub-range could be compared and if the high frequency range was low then apply less shaping as you have already suggested. ...

I do not understand the minimum / average approach.

I think what I have in mind is something else: compute the input signal's HF energy of a block as the sum of all the bins in the 2 highest frequency zones (~8.2+ kHz) of all the 64 sample FFTs which cover the block.
Compare this HF energy to predefined energy levels which tell about the noise shaping factor.
For the predefined energy levels:
Look at the HF energy (computed the same way) of the bibilolo start (at the seconds I mentioned in my last test report) and use a noise shaping factor of 0 for this energy level.
On the other end take loud music with a high amount of HF (for instance 'Living in the future'), and use the computed energy level as a measure for using a noise shaping of 1.
In production when energy level is between these two extreme forms, use a quadratic function of the form (HF-a)^2/b for interpolation to get the noise shaping value.
Title: lossyWAV Development
Post by: SebastianG on 2008-04-03 09:27:28
[edit2] If I was going to really push the processing-time-per-codec-block requirement, I could also carry out a 512 sample FFT on the correction data, i.e. quantization noise, and see where the quantization noise has actually gone.... [/edit2]

Why?
Assuming the unfiltered quantization noise to act like a memoryless source of random numbers with rectangular probability density the noise power you'll get after shaping can directly be computed with the help of the noise transfer function N. For a frequency f in radians f=Hz*(2pi/fs) where fs=sampling_frequency_in_Hz set z=cos(f)+i*sin(f) and compute |N(z)*2^{bits2remove}| which is proportional to the the amplitude spectral density of the filtered noise.

Usually a psychoacoustic codec determines the amount of tolerable noise in specific time/frequency regions. Seeing "spreading function" popping up here I assume you're actually doing that computation. For a "codec block" the result of this computation would be a curve describing the spectral power density of the tolerable noise. Then you could try to find the parameters 's' and 'b' for the curve |N(z*s)*2^{b}| so it's still under the tolerable noise curve but maximizes b -- the number of bits to remove. 's' here is the shaping strengh parameter.

my 2 cents on optimizing the number of bits to remove and the shaping strenth,
SG
Title: lossyWAV Development
Post by: 2Bdecided on 2008-04-03 10:43:58
Also, on occassions, I actually need to have a degree of dynamic compression (foo_vlevel) for soft background music from highly dynamic sources that would get lost entirely in places if I didn't use some volume levelling. This requires processing before encoding (unless we get frame-by-frame volume levelling in mp3gain style).
OT: That's possible, but no one has implemented it. It wouldn't be as good or flexible as a separate DRC, but it would often be better than transcoding an mp3 to another mp3.


[edit] As an aside, I thought that we were getting close to the "end" with respect to v1.0.0, so the release numbers have been climbing rapidly. As we are in (yet another!) potentially fairly fast transitionary period, I will be appending b > z to the beta releases to give me more "time" before v1.0.0.... [/edit]
IMO (though others may disagree strongly) your first "stable" release should be without noise shaping.

Also IMO (and again, others may disagree) the more you base your noise shaping on the input signal, the closer you get to that Sony patent.

If fixed noise shaping doesn't buy you much, you should definitely get a stable and (as far as I know) patent-free release out there before playing with noise shaping any more.

If nothing else, having a "stable" release is going to get you a lot more testers! (I would hope!).

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2008-04-03 11:36:51
IMO (though others may disagree strongly) your first "stable" release should be without noise shaping.
...
If nothing else, having a "stable" release is going to get you a lot more testers! (I would hope!). ...

As for the current state I also don't see a big advantage of noise shaping.
But we're already talking about things which are very promising cause with loud and HF rich music (aka most of pop/rock music) noise shifting can really lead to very high quality at a rather moderate bitrate. Guess this is an attractive feature for many users. Sure we're moving here in the world of psychoacoustics but to a lot slighter degree than transform codecs do it.

And look at the initial purpose we're struggling at with -2 and -1 where we don't rely on any kind of psy model to assure quality. With a good noise shaping we get a near noiseless frequency range of the fundamentals and a controlled quality in the HF region. To me this is very attractive and we should rather wait a bit yet until final release.

Of course there shouldn't be any patent problems. But is there really an issue with the proposals done so far?
Title: lossyWAV Development
Post by: GeSomeone on 2008-04-03 13:05:05
As we are in (yet another!) potentially fairly fast transitionary period, I will be appending b > z to the beta releases to give me more "time" before v1.0.0....

How about 0.10.1    0.11.1 etc. 
Although lossyWav can be considered beta, the noiseshaping functions might be considered alpha (debatable).
May I asked what will be the result when the "optimum" noise shaping is found,  better quality at the same bit rate or lower bit rate at the same quality?    anything else seems not useful.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-03 13:19:22
As we are in (yet another!) potentially fairly fast transitionary period, I will be appending b > z to the beta releases to give me more "time" before v1.0.0....
How about 0.10.1    0.11.1 etc. 
Although lossyWav can be considered beta, the noiseshaping functions might be considered alpha (debatable).
May I asked what will be the result when the "optimum" noise shaping is found,  better quality at the same bit rate or lower bit rate at the same quality?    anything else seems not useful.
Yes, I can use 0.10.0, etc. - so I will.

Noise shaping makes the processed data less predictable for the lossless codec, thus increasing bitrate. However, its use can allow more aggressive settings to be used before the results are noise shaped.

David: I'm inclined to agree with you - v1.0.0 should be issued with noise shaping code removed.

Horst: from beta v1.0.1, I would expect to improve noise shaping and its application in lossyWAV.

Sebastian: Your understanding of applied mathematics far exceeds mine - I'm not sure what you're getting at.
Title: lossyWAV Development
Post by: halb27 on 2008-04-03 13:39:59
...
David: I'm inclined to agree with you - v1.0.0 should be issued with noise shaping code removed.

Horst: from beta v1.0.1, I would expect to improve noise shaping and its application in lossyWAV.
...

Sounds like a promising road map.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-04-03 14:16:35
Sebastian: Your understanding of applied mathematics far exceeds mine - I'm not sure what you're getting at.
Simplistically, that the quantisation noise has gone exactly where you've put it in a predictable way - you don't need to check - unless something is broken.

Of course, it's easy to break something, and useful to have that checking code in there for debugging. It also saves having to think the theory through!

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2008-04-06 10:28:44
I thought that the problem of hiss you were encountering would be white noise, i.e. shaping too low?

Is full -shaping 1.0 better than -autoshape for the problem samples you identified (bruhns, bibilolo, badvilbel)?

And finally (as if you had nothing else better to do  ) is the v0.9.1 -autoshape any better (if full shaping is better than autoshape v0.9.0)?

I tried v0.9.0 -autoshape, v0.9.0 -shaping 1, v0.9.1 -autoshape on bruhns.
To me the v0.9.0 -autoshape and v0.9.1 -autoshape quality is identical (both from subjective impression as well as the abx results: sec. 9.3-10.2: 7/10 with both versions, sec. 2.3-4.4: 10/10 with v0.9.0 and 9/10 with v0.9.1).
Judgement about -shaping 1 isn't so easy. After listening to -autoshape I had problems identifying the problem with -shaping 1 and scored badly. Once more used to it however I could recognize it and with some trials the problem was even more pronounced to me than with the autoshape versions. Guess my hearing abilities are very much at their limits with these very high frequency problems, but from time to time they become apparent even to me.

I also wanted to try bibilolo sec. 4.3-5.5 but didn't succeed at all in abxing it today (fatigue? bad constitution?)
Title: lossyWAV Development
Post by: Kwevej on 2008-04-06 11:31:12
Why lossyWAV ?

I think that Musepack can do it better...
Title: lossyWAV Development
Post by: Mitch 1 2 on 2008-04-06 13:01:39
Kwevej, read the lossyWAV FAQ (http://wiki.hydrogenaudio.org/index.php?title=LossyWAV#Frequently_asked_questions).
Title: lossyWAV Development
Post by: halb27 on 2008-04-06 13:08:59
Why lossyWAV ?

I think that Musepack can do it better...

Musepack, as well as Vorbis, AAC, mp3, does it better at a bitrate in the 200 kbps area (very roughly speaking). Most people don't have the need for anything else (apart from maybe lossless archiving).

The special thing about lossyWAV is that when used the way the initial purpose was (achieved by using quality level -2 or -1) we get an extremely high quality.
There is no guarantee that things can't go wrong (after all it's lossy), but according to experience there's nothing to be afraid of. Codecs like mp3 and the others mentioned above have a complicated signal path which changes the original technical description of the music enormously, and there is a lot of heuristic decision making. As a result there's always the risk of artefacts and inaccuracies in the changed technical description though we all know codecs like Vorbis do a great job at pretty low bitrate, and it's very rare that music isn't transparent at a bitrate of ~200 kbps (usually that's overkill already).
LossyWAV in contrary doesn't change the structure of the technical description of the music at all, it uses the usual 16 bit PCM description of the wave samples and only reduces the accuracy of the samples (by rounding and thus zeroing those least significant bits which it thinks it can safely do so. The 16 bits of the PCM format are needed to accurately describe the full dynamics of loud as well as quiet music. For quiet music we can't save a lot of bits because quiet music is described with rather few bits (many of the most significant bits are zero) which we usually need for an accurate description, but for loud music we can - the full 16 bit accuracy usually isn't needed here.)
The downside is that with this approach we can't get the efficiency of Musepack etc. Using lossyWAV the secure way yields a bitrate of ~420 kbps for the -2 setting (or ~470 kbps for the -1 setting which is considered to be overkill, just for the very cautious minded or those who like to use extreme quality lossyWAV as a replacement for lossless archiving.).

There is a wish for going lower in bitrate, and due to additional internal quality assuring mechanisms we can do so without going very risky. This however isn't backed up by the initial idea anymore. To me the -4 quality setting still is transparent and yields a bitrate of ~350 kbps. Even when going lower like with -6 (~310 kbps) the rare and subtle inaccuracies are easily acceptable to me.
Noise shaping is the current theme of further development, and maybe it's possible this way to improve transparancy in the 300-350 kbps range and/or achieve extremely high though not transparent quality a bit below 300 kbps.
Title: lossyWAV Development
Post by: Mitch 1 2 on 2008-04-10 10:31:23
Nick, is there any chance of seeing piping support (both in and out), before lossyWAV 1.0?
Title: lossyWAV Development
Post by: Nick.C on 2008-04-11 13:07:36
Nick, is there any chance of seeing piping support (both in and out), before lossyWAV 1.0?
Unless I were to know where to start, i.e. how *does* foobar2000 pipe WAV data into a program, then how do I pipe that out to FLAC / WavPack / TaK / etc to produce the encoded processed output?

I think it may be better to park piping beside noise shaping and attempt to include it in v1.1.0 rather than to delay the work up to release of v1.0.0 any further.

I have been working on tidying up the code and have shaved another second off the processing time for my 53 problem sample set at preset -7:
Code: [Select]
|======|==================|==================|
|  QS  | Time/Rate v0.9.2 | Time/Rate v0.9.0 |
|======|==================|==================|
|  -7  | 13.14s / 56.60x  | 14.34s / 51.86x  |
|  -7a | 18.26s / 40.73x  | 19.30s / 38.54x  |
|  -7b | 24.25s / 30.66x  | 24.56s / 30.27x  |
|  -7c | 28.62s / 25.98x  | 29.47s / 25.23x  |
|======|==================|==================|
All tests were carried out with the input files cached in memory to ignore read latency.
Title: lossyWAV Development
Post by: Mitch 1 2 on 2008-04-11 13:47:05
Unless I were to know where to start, i.e. how *does* foobar2000 pipe WAV data into a program, then how do I pipe that out to FLAC / WavPack / TaK / etc to produce the encoded processed output?
Now you're getting ahead of yourself.  I was simply asking about stdin/stdout support.
To support foobar2000, however, I suppose you could launch (with parameters) an external encoder, e.g. "lossyWAV.exe - -o -enc flac.exe -f -b 512 -e -o %d -".
Title: lossyWAV Development
Post by: carpman on 2008-04-12 01:06:20
Hi all,

I was messing around with an audio recording from a YouTube video. I got rid of everything > 14kHz. So this isn't your high quality CD audio here (i.e. perhaps not lossyWAVs intended use), but I was surprised that when I ran it through LossyWav (-2) the resultant lossy.flac file was larger (747kbps) than the lossless FLAC file (737 kbps), both were encoded with the latest FLAC using -5.

Is that odd? Seems odd to me.

C.
Title: lossyWAV Development
Post by: jesseg on 2008-04-12 02:36:35
Not really odd.  The noise-floor above your filter was very very low so there's very little if any bits for lossyWAV to remove, and combined with the difference in block size used in FLAC (assuming that you didn't force them both to 512) then... that makes sense.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-12 09:47:10
Hi all,

I was messing around with an audio recording from a YouTube video. I got rid of everything > 14kHz. So this isn't your high quality CD audio here (i.e. perhaps not lossyWAVs intended use), but I was surprised that when I ran it through LossyWav (-2) the resultant lossy.flac file was larger (747kbps) than the lossless FLAC file (737 kbps), both were encoded with the latest FLAC using -5.

Is that odd? Seems odd to me.

C.
The upper limit for lossyWAV is 16kHz - so you had a 2kHz zone which would not allow any bits to remove.
Title: lossyWAV Development
Post by: carpman on 2008-04-12 12:37:00
Nick, jesseg

Thanks for your replies and patience.

C.
Title: lossyWAV Development
Post by: Kwevej on 2008-04-12 15:51:42
Kwevej, read the lossyWAV FAQ (http://wiki.hydrogenaudio.org/index.php?title=LossyWAV#Frequently_asked_questions).



Someone will send me a FLAC. How would I recognize, that it is really lossless?
I don't like the "Lossy Lossless" idea.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-12 16:41:43
Kwevej, read the lossyWAV FAQ (http://wiki.hydrogenaudio.org/index.php?title=LossyWAV#Frequently_asked_questions).


Someone will send me a FLAC. How would I recognize, that it is really lossless?
I don't like the "Lossy Lossless" idea.
If you rip your own FLAC files, you will always know whether they are lossless or not.

Also, how do you know that the FLAC file you have has not been created from a decoded MP3 file?
Title: lossyWAV Development
Post by: jesseg on 2008-04-13 01:54:02
Good point Nick.

Re: knowing if a FLAC is a lossyFLAC...  if --keep-foreign-metadata was used in the FLAC command-line, then you will be able to know if the source file was a lossyWAV or not (but saying nothing of the pre-lossyWAV source).  It's a little round-about way because you would (as far as i know) have to decode it to wav, and then use the lossyWAV -check method.  But...  if there's a way to view metadata in a FLAC file directly, then that would be easier.

And as Nick said, if a FLAC is encoded from an mp3, you have no way at all of knowing without a doubt through a purely technological means, unless the decoder saves the information as meta-data in the WAV or passes it through to a FLAC tag via a transcoder.  And again, FLAC would have to be set to save the meta-data.  Otherwise, it's all judgmental (and inaccurate) at that point, when you're trying to spot coding/decoding artifacts and decide what is what.
Title: lossyWAV Development
Post by: botface on 2008-04-13 20:42:16
Hi all,

I was messing around with an audio recording from a YouTube video. I got rid of everything > 14kHz. So this isn't your high quality CD audio here (i.e. perhaps not lossyWAVs intended use), but I was surprised that when I ran it through LossyWav (-2) the resultant lossy.flac file was larger (747kbps) than the lossless FLAC file (737 kbps), both were encoded with the latest FLAC using -5.

Is that odd? Seems odd to me.

C.
The upper limit for lossyWAV is 16kHz - so you had a 2kHz zone which would not allow any bits to remove.

Nick would you mind expanding on "The upper limit for lossyWAV is 16kHz"?
I assumed you meant that it was the HF cut-off point so that anything above that frequenecy would be "ignored" and hence missing from the output. However, having done a brief test on a piece recorded from FM radio the frequency plots from the original wav file and the lossywav file look the same, with no reduction in >16k levels. Especially noticeable is that the 19khz pilot tone is still there and not reduced in level.

Sorry if this is a dum question
Title: lossyWAV Development
Post by: Nick.C on 2008-04-13 20:54:00
Hi all,

I was messing around with an audio recording from a YouTube video. I got rid of everything > 14kHz. So this isn't your high quality CD audio here (i.e. perhaps not lossyWAVs intended use), but I was surprised that when I ran it through LossyWav (-2) the resultant lossy.flac file was larger (747kbps) than the lossless FLAC file (737 kbps), both were encoded with the latest FLAC using -5.

Is that odd? Seems odd to me.

C.
The upper limit for lossyWAV is 16kHz - so you had a 2kHz zone which would not allow any bits to remove.
Nick would you mind expanding on "The upper limit for lossyWAV is 16kHz"?
I assumed you meant that it was the HF cut-off point so that anything above that frequenecy would be "ignored" and hence missing from the output. However, having done a brief test on a piece recorded from FM radio the frequency plots from the original wav file and the lossywav file look the same, with no reduction in >16k levels. Especially noticeable is that the 19khz pilot tone is still there and not reduced in level.

Sorry if this is a dum question
Not a dumb question at all - I am guilty of giving a truncated explanation of what I should have elaborated on.....

When the FFT analyses are carried out on each codec_block in lossyWAV, the results between 20Hz and 16kHz are taken into account when determining bits_to_remove for that FFT (and ultimately that codec_block). The only process applied to the actual audio data is the remove_bits routine, i.e. revised_sample:=round(original_sample / (2^bits_to_remove))*(2^bits_to_remove) which sets the lowest bits_to_remove lsb's to zero. No frequencies are intentionally removed from the output samples.

Anyway, due to the lack of problematic feedback for beta v0.9.1, lossyWAV v0.9.2 RC3 is attached to post #1 in this thread.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-04-14 12:10:31
Someone will send me a FLAC. How would I recognize, that it is really lossless?
I don't like the "Lossy Lossless" idea.
If it came from an mp3 file, you can often spot this in the spectrogram.

If it came from a lossyWAV file, you can count the number of "wasted bits" in the FLAC file, or the number of (512-sample) blocks of LSBs set to zero (where some MSBs are non-zero) in the decoded .wav. If you find several of either, it's probably from lossyWAV (or some even rarer perversion of the audio).


If you add noise to either of the above, these methods won't work.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-17 12:43:51
Thanks to unfortunateson for the 96khzsample FLAC file - when I tried to process the contained WAV file lossyWAV crashed. It turned out to be a divide by zero error in the preparation of the skewing factors.

This led me to re-assess the skewing factor preparation and I quickly found a simple fix (which also improved the methodology) - however, the fix reduces the bitrate of all the quality presets by around 20kbps.

I had already made an unrelated change to the spreading function which increased the bitrate for -3 to -7 by between 2kbps and 4kbps.

However, the amendment to the skewing function preparation has reduced the difference in bitrate between the 3 existing spreading functions.

So, I have amended the skewing function preparation and there is now only one spreading function (that for quality preset -1).

lossyWAV beta v0.9.3 attached to post #1 in this thread.

As this beta has changed some longstanding "constants" of the method, I will be extremely grateful if some of our more acutely eared members could ABX some of the more problematic samples and post feedback.

I am fairly sure that quality *should* not have suffered, but my ears are not good enough to perform the critical evaluation required.

Thanks,

Nick.

I have processed my 53 problem sample set using beta v0.9.3 and the change in spreading function has changed the variation in bitrate somewhat:

Code: [Select]
|-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
|   Version   |lossyWAV -1|lossyWAV -2|lossyWAV -3|lossyWAV -4|lossyWAV -5|lossyWAV -6|lossyWAV -7|
|-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
| v0.9.2 RC3  |  543kbps  |  494kbps  |  433kbps  |  408kbps  |  385kbps  |  365kbps  |  348kbps  |  
|-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
| beta v0.9.3 |  505kbps  |  467kbps  |  435kbps  |  406kbps  |  381kbps  |  357kbps  |  337kbps  |
|-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
From this, I feel that maybe -1 should become (even) more conservative in its -snr and -nts settings (currently 24 and -3 respectively). However, the change in change in bitrate between quality presets is now significantly more linear than it has ever been.

Also, as all quality presets now use the same spreading function I could (nearly) implement a fractional quality preset (like oggenc) between -1.000 and -7.000.
Title: lossyWAV Development
Post by: halb27 on 2008-04-18 10:22:16
... I had already made an unrelated change to the spreading function which increased the bitrate for -3 to -7 by between 2kbps and 4kbps. ...

Very welcome IMO as this important frequency range didn't get much of the skewing effect so far. But why don't you do it for -2 and -1 as well especially as bitrate seems to have dropped by your changes?
As far as I can see 2Bdecided's basic principle is still totally taken care of when using -2 and -1, and that's the important thing.
I wouldn't care much about the bitrate drop. Maybe the lowest quality settings aren't acceptable any more, but IMO a quality scale from -1 to, say, -5 is sufficient.

I will do my usual tests as soon as possible, but my father has died so at the moment I will only seldom look up HA.
Title: lossyWAV Development
Post by: GeSomeone on 2008-04-18 10:39:17
there is now only one spreading function (that for quality preset -1).

Does that mean a performance hit? (I remember that -1 uses the most CPU but I'm not sure if that was partly because of the spreading).
Title: lossyWAV Development
Post by: Nick.C on 2008-04-18 10:57:04
there is now only one spreading function (that for quality preset -1).
Does that mean a performance hit? (I remember that -1 uses the most CPU but I'm not sure if that was partly because of the spreading).
Not at all, the spreading is actually quicker for -1 as fewer bins are averaged across the frequency ranges. The performance hit for -1 is related to the extra 256 sample FFT's which are calculated.

However, I have implemented the floating point quality presets from -1.0000 to -7.0000 and am considering removing the 256 sample FFT from the -1 quality preset (it can still be added using -1a to -1.9999a) which would make all quality presets default to 2 FFT analysis lengths (64 sample and 1024 sample) with 128, 256 and 512 sample FFTs remaining optional.

I will post beta v0.9.4 later today with the FP quality presets enabled and -1 defaulting to 2 FFT analysis lengths.
Title: lossyWAV Development
Post by: SokilOff on 2008-04-18 10:57:32
There is a wish for going lower in bitrate, and due to additional internal quality assuring mechanisms we can do so without going very risky. This however isn't backed up by the initial idea anymore. To me the -4 quality setting still is transparent and yields a bitrate of ~350 kbps. Even when going lower like with -6 (~310 kbps) the rare and subtle inaccuracies are easily acceptable to me.


Thanks for detailed explanations. But is there any difference between lossyWAV and f.i. WavPack lossy ? At high bitrates (> 320-350 kbps) WavPack lossy seems to sound transparent too. Is there any advantages for lossyWAV over WavPack in lossy mode ?
Title: lossyWAV Development
Post by: Nick.C on 2008-04-18 11:43:49
There is a wish for going lower in bitrate, and due to additional internal quality assuring mechanisms we can do so without going very risky. This however isn't backed up by the initial idea anymore. To me the -4 quality setting still is transparent and yields a bitrate of ~350 kbps. Even when going lower like with -6 (~310 kbps) the rare and subtle inaccuracies are easily acceptable to me.
Thanks for detailed explanations. But is there any difference between lossyWAV and f.i. WavPack lossy ? At high bitrates (> 320-350 kbps) WavPack lossy seems to sound transparent too. Is there any advantages for lossyWAV over WavPack in lossy mode ?
Only really that lossyWAV is compatible with a number of lossless codecs which make use of the wasted-bits approach so is in one sense codec independent. I don't know if anyone has carried out any comparisons between WavPack lossy and lossyWAV output - it would be interesting though....
Title: lossyWAV Development
Post by: 2Bdecided on 2008-04-18 11:58:13
IIRC the lossy version of Wavpack on stable release supports CBR only. I can't remember if there's VBR in beta test.

lossyWAV is pure VBR, and does not support CBR.

Cheers,
David.
Title: lossyWAV Development
Post by: halb27 on 2008-04-18 12:11:18


There is a wish for going lower in bitrate, and due to additional internal quality assuring mechanisms we can do so without going very risky. This however isn't backed up by the initial idea anymore. To me the -4 quality setting still is transparent and yields a bitrate of ~350 kbps. Even when going lower like with -6 (~310 kbps) the rare and subtle inaccuracies are easily acceptable to me.


Thanks for detailed explanations. But is there any difference between lossyWAV and f.i. WavPack lossy ? At high bitrates (> 320-350 kbps) WavPack lossy seems to sound transparent too. Is there any advantages for lossyWAV over WavPack in lossy mode ?

lossyWAV has a quality control, whereas wavPack lossy hasn't. With wavPack lossy you give a target bitrate which is internally converted to an accuracy demand for the predictor error. This is not directly related to overall accuracy (cause in case the predictor is seriously wrong, it takes a high degree of accuracy for the predictor error to get at a good overall accuracy). There's more to it which takes into account special problems, but roughly speaking it's like that.
The disadvantage of lossyWAV as compared to wavPack lossy is that for the lossless part a small blocksize of 512 samples is necessary to make good use of the varying bits-to-remove. This however makes the lossless codec less efficient. Moreover David Bryant has implemented an effective noise shaping in wvPack losssy. Noise shaping in lossyWAV is work in progress and at the moment is problematic as it blows up bitrate of the lossless codec because of added high frequency hiss of rather high volume.

So at the moment I think with an average bitrate of >~ 400 kbps lossyWAV is to be preferred (using -2 or -1) because of the better accuracy control.
At a bitrate of roughly 350 kbps I think both codecs' quality is comparable. They are both expected to be transparent with few exceptions. Maybe the number of exceptions is a bit fewer with lossyWAV, but that's speculation, and I think we can be very content with both codecs.
At a bitrate below ~300 kbps I think wavPack lossy is preferable because of it's more efficient coding which becomes more and more important the lower we go with bitrate.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-18 20:54:59
Right, I've reassessed the quality presets in light of:

a) discovering the bug in the preparation of the skewing parameters;
b) changing the default number of FFT analyses to 2 lengths for all quality presets;
c) a desire to tighten up the spreading function (same for all quality presets);
d) introducing the floating point quality presets;
e) a desire to increase the quality of the highest quality preset.

So, now the permitted range for quality preset is -0.0000 to -7.0000 (resolution 0.0001).

Corresponding internal settings:
Code: [Select]
  spreading_function_string         : string[29]='22223-22224-12235-12246-12357';
  quality_noise_threshold_shifts    : array[0..Quality_Presets] of Integer = (-12,-8,-4,0,4,8,12,16);
  quality_signal_to_noise_ratio     : array[0..Quality_Presets] of Integer = (30,27,24,21,20,19,18,17);
  quality_clips_per_channel         : array[0..Quality_Presets] of Integer = (0,0,0,1,2,3,3,3);

Resultant bitrates (for my 53 problem sample set):

-0.0: 610kbps; -0.5: 589kbps; -1.0: 566kbps; -1.5: 543kbps; -2.0: 520kbps; -2.5: 496kbps; -3.0: 472kbps; -3.5: 451kbps;
-4.0: 431kbps; -4.5: 412kbps; -5.0: 395kbps; -5.5: 379kbps; -6.0: 364kbps; -6.5: 350kbps; -7.0: 338kbps.

lossyWAV beta v0.9.4 attached to post #1 in this thread.
Title: lossyWAV Development
Post by: halb27 on 2008-04-18 22:34:36
I encoded my regular sample set (which is a bit typical of what at least I encode usually) with 0.9.4 -3 and got an average bitrate of 417 kbps.
As the new -3 roughly corresponds to the old -2 this is a good result IMO.
I also see sense in shifting the meaning of the quality levels a bit in favor of the top quality edge.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-18 22:39:35
I encoded my regular sample set (which is a bit typical of what at least I encode usually) with 0.9.4 -3 and got an average bitrate of 417 kbps.
As the new -3 roughly corresponds to the old -2 this is a good result IMO.
I also see sense in shifting the meaning of the quality levels a bit in favor of the top quality edge.
Excellent!

I've processed my 10 album test set at -1: 520kbps and -7: 300kbps. Over the weekend, I'll process the other integer settings for comparison. I'm listening to Mike Oldfield's Tr3s Lunas at -7 (297kbps) and it's adequate for DAP use (at least).
Title: lossyWAV Development
Post by: Nick.C on 2008-04-19 14:26:53
....I'll process the other integer settings for comparison.
I've finished processing the other integer presets:

FLAC: 854kbps; -0: 573kbps; -1: 520kbps; -2: 467kbps; -3: 417kbps; -4: 376kbps; -5: 343kbps; -6: 318kbps; -7: 300kbps.

-0 results in 2.25GB from 3.35GB original (67.1%); -7 results in 1.17GB from 3.35GB original (35.1%).
Title: lossyWAV Development
Post by: halb27 on 2008-04-19 15:25:21
I just did this as well for my regular sample set as well as my problem sample set, and the results for the regular music are partially extremely close to yours, Nick:

          regular music             problem samples
-0            566 kbps                      633 kbps
-1            518 kbps                      596 kbps
-2            467 kbps                      554 kbps
-3            417 kbps                      509 kbps
-4            372 kbps                      467 kbps
-5            335 kbps                      427 kbps
-6            305 kbps                      392 kbps
-7            281 kbps                      360 kbps

This looks nice IMO.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-19 21:18:48
I've finished my FLAC > lossyFLAC transcode at -5a: 298 discs, 3838 tracks, 11d12:56:19, 102GB > 39.5GB, 886kbps > 340kbps.
Title: lossyWAV Development
Post by: botface on 2008-04-20 12:45:30
I know that nobody is soliciting opinions but in case anybody's interested .....

I think it's absolutely right to concentrate on getting the original concept implemented for an initial release and adding noise shaping and anything else in a subsequent release. I also think that giving preference to the higher quality settings is a good idea, but then I am more interested in quality than file sizes. For what it's worth I think LossyWAV is now looking like a much more professional, mature product.

I have converted a number of files using 0.9.4. They were a mixture of 16/44.1 and 24/48. All were sourced from vinyl. Not being very comfortable with the command line I used IFLCDrop, which only has "1" as the highest quality option available. Using "1" the 16/44.1 files ended up with bit rates ranging from 505kbps to 525kbps. The 24/48 files had a range of 535kbps to 555kbps - as an aside, I guess this gives credence to the argument that using 24 bits for vinyl is overkill as LossyWAV obviously doesn't think most of the extra bits are storing anything worth keeping. To me these are ideal bit rates as they nicely fill the gap between existing lossy and lossless.

I am in the process of arching my vinyl collection and while I will continue to use lossless for records that are very important to me, I will definetly use LossyWAV for those that are just "nice to have" (especially if someone comes up with a nice friendly GUI front end).

If there is any interest in developing LossyWAV for use ith higher bit depths/sample rates I'd be very happy to help with testing - bearing in mind that I'm not very technical, just a music lover - as I'd like to be able contribute in some way
Title: lossyWAV Development
Post by: Nick.C on 2008-04-20 17:13:14
I know that nobody is soliciting opinions but in case anybody's interested .....

I think it's absolutely right to concentrate on getting the original concept implemented for an initial release and adding noise shaping and anything else in a subsequent release. I also think that giving preference to the higher quality settings is a good idea, but then I am more interested in quality than file sizes. For what it's worth I think LossyWAV is now looking like a much more professional, mature product.

I have converted a number of files using 0.9.4. They were a mixture of 16/44.1 and 24/48. All were sourced from vinyl. Not being very comfortable with the command line I used IFLCDrop, which only has "1" as the highest quality option available. Using "1" the 16/44.1 files ended up with bit rates ranging from 505kbps to 525kbps. The 24/48 files had a range of 535kbps to 555kbps - as an aside, I guess this gives credence to the argument that using 24 bits for vinyl is overkill as LossyWAV obviously doesn't think most of the extra bits are storing anything worth keeping. To me these are ideal bit rates as they nicely fill the gap between existing lossy and lossless.

I am in the process of arching my vinyl collection and while I will continue to use lossless for records that are very important to me, I will definetly use LossyWAV for those that are just "nice to have" (especially if someone comes up with a nice friendly GUI front end).

If there is any interest in developing LossyWAV for use ith higher bit depths/sample rates I'd be very happy to help with testing - bearing in mind that I'm not very technical, just a music lover - as I'd like to be able contribute in some way
We're always looking for opinions / advice / (constructive) criticism, so your comments are very welcome.

As it is, lossyWAV should handle up to 32bit integer samples and there is no limit on sample-rate. Oh, and it will handle up to 8 channels, although this is an arbitrary limit and could be changed (although I can't imagine a scenario that would require it).

Feedback on the quality of the output at higher sample rates (>=96khz) and higher bit-depths (24bit or 32bit) would be very gratefully received!
Title: lossyWAV Development
Post by: Nick.C on 2008-04-20 20:01:39
Taking into account what botface was saying about the higher quality presets, I've had (yet) another think about the basis for the presets. If the -nts and -snr values were changed from:
Code: [Select]
  quality_noise_threshold_shifts    : array[0..Quality_Presets] of Integer = (-12,-8,-4,0,4,8,12,16);
  quality_signal_to_noise_ratio     : array[0..Quality_Presets] of Integer = (30,27,24,21,20,19,18,17);
to
Code: [Select]
  quality_noise_threshold_shifts    : array[0..Quality_Presets] of Integer = (-24,-16,-8,0,4,8,12,16);
  quality_signal_to_noise_ratio     : array[0..Quality_Presets] of Integer = (39,33,27,21,20,19,18,17);

Then the resultant bitrates (for my 53 problem sample set) would change from:

-0.0: 610kbps; -0.5: 589kbps; -1.0: 566kbps; -1.5: 543kbps; -2.0: 520kbps; -2.5: 496kbps; -3.0: 472kbps;

to

-0.0: 717kbps; -0.5: 686kbps; -1.0: 651kbps; -1.5: 611kbps; -2.0: 555kbps; -2.5: 519kbps; -3.0: 472kbps;

with the remaining presets:

-3.5: 451kbps; -4.0: 431kbps; -4.5: 412kbps; -5.0: 395kbps; -5.5: 379kbps; -6.0: 364kbps; -6.5: 350kbps; -7.0: 338kbps

staying the same.

[edit] Oh, I forgot to say: there's an undocumented parameter, currently called -lowpass (the name should maybe be changed - suggestions please) which changes the upper limit of the frequency range used in the spreading-function / minimum value / average value calculations. Permitted values are in the range 13.5kHz to 20.05kHz. [/edit]
Title: lossyWAV Development
Post by: halb27 on 2008-04-20 21:23:05
I prefer the old settings, at least as far as -nts is concerned.
Because of the reduced efficiency of the lossless codec due to the small blocksize it happens rather often  with non-complex or quiet music when targeting at a lossyFLAC average bitrate of just 380 kbps, that a totally lossless codec yields a lower bitrate than lossyFLAC (that's why I used lossless wavPack instead of lossyFLAC to encode many tracks from my collection which contain classical music or music with few instruments). The higher we choose the bitrate the more often does this happen.
2Bdecided's basic priciple is fulfilled with -nts 0, so for an additional security -nts -4/-8/-12 should be sufficient IMO even for the most cautious people.

I also like the 16 kHz limit for doing the bits-to-remove calculations. Going lower would reduce the risk that bits-to-remove is 0 for the only reason that there's no (or only low volume) HF content around 16 kHz. But it increases the risk of not taking audible HF noise into account. When going higher it's the other way around.
IMO the best compromise is very much what you are doing right now.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-20 22:13:44
I prefer the old settings, at least as far as -nts is concerned.
Because of the reduced efficiency of the lossless codec due to the small blocksize it happens rather often  with non-complex or quiet music when targeting at a lossyFLAC average bitrate of just 380 kbps, that a totally lossless codec yields a lower bitrate than lossyFLAC (that's why I used lossless wavPack instead of lossyFLAC to encode many tracks from my collection which contain classical music or music with few instruments). The higher we choose the bitrate the more often does this happen.
2Bdecided's basic priciple is fulfilled with -nts 0, so for an additional security -nts -4/-8/-12 should be sufficient IMO even for the most cautious people.

I also like the 16 kHz limit for doing the bits-to-remove calculations. Going lower would reduce the risk that bits-to-remove is 0 for the only reason that there's no (or only low volume) HF content around 16 kHz. But it increases the risk of not taking audible HF noise into account. When going higher it's the other way around.
IMO the best compromise is very much what you are doing right now.
Okay - points taken - reassuring, really.

To satisfy my curiousity, I ran my 53 problem sample set and 10 album test set at what would be quality presets -8, -9 and -10 using the same increments as the existing -3 to -7, i.e. -nts=20,24,28; -snr=16,15,14 respectively, and got:

53 problem sample set: (-7: 338kbps) -8: 318kbps; -9: 303kbps; -10: 292kbps;

10 album test set: (-7: 300kbps) -8: 286kbps; -9: 278kbps; -10: 273kbps;
Title: lossyWAV Development
Post by: halb27 on 2008-04-20 22:28:13
From just careful listening (especially to the problem samples): is -8/-9/-10 acceptable with respect to the still high quality everybody may expect from a >=270 kbps encoding?
Title: lossyWAV Development
Post by: Nick.C on 2008-04-20 22:40:14
From just careful listening (especially to the problem samples): is -8/-9/-10 acceptable with respect to the still high quality everybody may expect from a >=270 kbps encoding?
I think that tomorrow night I'll start a lossyFLAC -10 transcode of my 102GB collection (should take about 6½hours or so) and I'll listen to it over a couple of days until:

a) I am content that it satisfies the requirement;

or

b) I am not content, therefore transcode to -9 (then -8 if necessary) and repeat the longer term listening test.

Or, maybe I should leave the quality presets alone and use -snr and -nts to "roll-my-own" without pushing lossyWAV too far....?
Title: lossyWAV Development
Post by: 2Bdecided on 2008-04-21 10:47:44
I prefer the old settings, at least as far as -nts is concerned.
Because of the reduced efficiency of the lossless codec due to the small blocksize it happens rather often  with non-complex or quiet music when targeting at a lossyFLAC average bitrate of just 380 kbps, that a totally lossless codec yields a lower bitrate than lossyFLAC.
Do you think it would be worth checking for this automatically?

If found you could increase the blocksize specified to the lossless encoder when encoding the lossyWAV output (e.g. 1024, 2048 etc), and if that didn't help you could just keep the lossless version?

The latter would be really easy if you were transcoding from FLAC (or whatever) lossless anyway (if the lossyWAV program can find the files). Otherwise, I can't see any way of doing the checking other than encoding the file to lossless to see how big it comes out - which is going to double or triple the time spent on lossless encoding! Thoughts?

Cheers,
David.
Title: lossyWAV Development
Post by: botface on 2008-04-21 17:06:08
Taking into account what botface was saying about the higher quality presets, I've had (yet) another think about the basis for the presets. If the -nts and -snr values were changed from:
Code: [Select]
  quality_noise_threshold_shifts    : array[0..Quality_Presets] of Integer = (-12,-8,-4,0,4,8,12,16);
  quality_signal_to_noise_ratio     : array[0..Quality_Presets] of Integer = (30,27,24,21,20,19,18,17);
to
Code: [Select]
  quality_noise_threshold_shifts    : array[0..Quality_Presets] of Integer = (-24,-16,-8,0,4,8,12,16);
  quality_signal_to_noise_ratio     : array[0..Quality_Presets] of Integer = (39,33,27,21,20,19,18,17);

Then the resultant bitrates (for my 53 problem sample set) would change from:

-0.0: 610kbps; -0.5: 589kbps; -1.0: 566kbps; -1.5: 543kbps; -2.0: 520kbps; -2.5: 496kbps; -3.0: 472kbps;

to

-0.0: 717kbps; -0.5: 686kbps; -1.0: 651kbps; -1.5: 611kbps; -2.0: 555kbps; -2.5: 519kbps; -3.0: 472kbps;

with the remaining presets:

-3.5: 451kbps; -4.0: 431kbps; -4.5: 412kbps; -5.0: 395kbps; -5.5: 379kbps; -6.0: 364kbps; -6.5: 350kbps; -7.0: 338kbps

staying the same.
[edit] Oh, I forgot to say: there's an undocumented parameter, currently called -lowpass (the name should maybe be changed - suggestions please) which changes the upper limit of the frequency range used in the spreading-function / minimum value / average value calculations. Permitted values are in the range 13.5kHz to 20.05kHz. [/edit]

Nick,
      It looks like changing the default -snr and -nts as per your thought will result in bitrates not much lower than lossless, which seems a bit pointless. Anyway, a very cautious person could use higher settings if they want to as it stands.

Re the undocumented feature, a snappy name doesn't spring to mind, but given my interest in higher sample rates could it be useful there? Maybe you could even have different defaults and allowable ranges based on the sample rate of the source file.

Speaking of higher sample rates, I volunteered to do some testing yesterday. It turns out that I can't produce 32 bit integer and since we know that LossyWAV works quite happily with 24 bit is there anything worthwhile I could test? I'd be more than willing to to do some testing on 24/48, 24/64, 24/88.2 and 24/96 if you think it might tell you something you don't already know. If you do want me to do some testing could you suggest the kind of music that's most likely to reveal problems. I'd assume acoustic instruments would be good especially if they had the ability to sustain notes over a long period EG violin, whistle, pipes etc.
Title: lossyWAV Development
Post by: GeSomeone on 2008-04-21 21:12:42
If the -nts and -snr values were changed from:
Code: [Select]
  quality_noise_threshold_shifts    : array[0..Quality_Presets] of Integer = (-12,-8,-4,0,4,8,12,16);
to
Code: [Select]
  quality_noise_threshold_shifts    : array[0..Quality_Presets] of Integer = (-24,-16,-8,0,4,8,12,16);

Funny, I would rather suggest  (-6,-4,-2,0,2,4,6,8). Usually there is a sweet spot (like the lowest bit rate with no audible artifacts). If you move away from there extremely, at least theoretically you would be either inefficient or get audible differences.
But as your testing indicates that the "low" quality is still acceptable, I can see the how a gliding scale from lossless to lossy would be filling all the bit rate gaps at the same time 
Title: lossyWAV Development
Post by: Nick.C on 2008-04-21 21:28:41
From just careful listening (especially to the problem samples): is -8/-9/-10 acceptable with respect to the still high quality everybody may expect from a >=270 kbps encoding?
I managed to listen to my 53 problem sample set at -10 and -9 today - definitely too much hiss at -10 (only for certain samples, however) and it is still noticable (and annoying, although I'm deliberately listening out for it) at -9. I will try to listen to -8 tomorrow.

@Botface: I don't really know which samples / tracks will be best for evaluation purposes - although I would guess that as Gurubooleez has a library of 150 instrumental samples, instrumental will be useful - I think harpsichord is a difficult type to retain transparency for.

@GeSomeone: I'm going to stick with -12..16 (step 4) for the -nts values. The change from integer to float for the quality preset selection has made use selection of intermediate points much simpler rather than manual -snr and -nts selections.
Title: lossyWAV Development
Post by: halb27 on 2008-04-22 08:37:21
I prefer the old settings, at least as far as -nts is concerned.
Because of the reduced efficiency of the lossless codec due to the small blocksize it happens rather often  with non-complex or quiet music when targeting at a lossyFLAC average bitrate of just 380 kbps, that a totally lossless codec yields a lower bitrate than lossyFLAC.
Do you think it would be worth checking for this automatically?

If found you could increase the blocksize specified to the lossless encoder when encoding the lossyWAV output (e.g. 1024, 2048 etc), and if that didn't help you could just keep the lossless version?

The latter would be really easy if you were transcoding from FLAC (or whatever) lossless anyway (if the lossyWAV program can find the files). Otherwise, I can't see any way of doing the checking other than encoding the file to lossless to see how big it comes out - which is going to double or triple the time spent on lossless encoding! Thoughts?

Cheers,
David.

I noticed this kind of "issue" when I encoded my entire collection a few months ago. The lossless version in my archive was encoded with TAK then and I just compared the lossyFLAC file size with that of the TAK files.
I then tried lossless FLAC but noticed that in these cases with quiet and/or non-complex music where TAK was very efficient the lossless FLAC -8 efficiency was clearly inferior. So in these cases lossyFLAC suffers not only from the small blocksize but also from the relatively low performance of FLAC. So I used lossless wavPack which I can play on my DAP and which gives efficient results in these cases.

I'm afraid there will be no automatic solution, but IMO it's not really a problem. Most people are expected to transcode from a lossless codec so a lossyFLAC file size increase can be easily seen. Moreover an easy and very straightforward approach can be used, an aproach like "don't care if lossyFLAC bitrate goes up cause it happens rarely" (very adequate for people who usually listen to rock/pop or similar music) or "for solo instrument music and non-complex classical music avoid using lossyFLAC and go straight to an appropriate lossless codec" (adequate for lovers of solo instrument music or similar).

After all these cases say "a lossless codec is very efficient here" more than "lossyFLAC is bad here".
Title: lossyWAV Development
Post by: halb27 on 2008-04-22 09:03:21
From just careful listening (especially to the problem samples): is -8/-9/-10 acceptable with respect to the still high quality everybody may expect from a >=270 kbps encoding?
I managed to listen to my 53 problem sample set at -10 and -9 today - definitely too much hiss at -10 (only for certain samples, however) and it is still noticable (and annoying, although I'm deliberately listening out for it) at -9. I will try to listen to -8 tomorrow.

@Botface: I don't really know which samples / tracks will be best for evaluation purposes - although I would guess that as Gurubooleez has a library of 150 instrumental samples, instrumental will be useful - I think harpsichord is a difficult type to retain transparency for.

@GeSomeone: I'm going to stick with -12..16 (step 4) for the -nts values. The change from integer to float for the quality preset selection has made use selection of intermediate points much simpler rather than manual -snr and -nts selections.

There was a time when I thought for a more refined quality setting -nts is what should be used, and as a primary quality setting -1/-2/-3 is what we should have.

But now that we have arrived at these many primary quality settings I think this is the better way to go, but in this case we should concentrate on the primary quality setting and hide details like -nts from the user in the final version (or devide them clearly apart into the advanced options).

From just bitrate/expected quality I like your -0 to -7 settings of 0.9.4 very much (or maybe an aditional -8 in case that's useful).
It's a clear thing (especially when ommitting the a and b appendix which IMO should go into the advanced options as well [as something like -analyses n]).
IMO the default quality should be -3 (in 0.9.4 scale) and we should clearly state that -0 and -1 are useful only as an extremely high quality lossy substitution for a lossless encoding.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-22 12:24:05
There was a time when I thought for a more refined quality setting -nts is what should be used, and as a primary quality setting -1/-2/-3 is what we should have.

But now that we have arrived at these many primary quality settings I think this is the better way to go, but in this case we should concentrate on the primary quality setting and hide details like -nts from the user in the final version (or devide them clearly apart into the advanced options).

From just bitrate/expected quality I like your -0 to -7 settings of 0.9.4 very much (or maybe an aditional -8 in case that's useful).
It's a clear thing (especially when ommitting the a and b appendix which IMO should go into the advanced options as well [as something like -analyses n]).
IMO the default quality should be -3 (in 0.9.4 scale) and we should clearly state that -0 and -1 are useful only as an extremely high quality lossy substitution for a lossless encoding.
I will move -nts and -snr to the advanced settings section and implement another advanced setting -analyses <n> per your suggestion, while at the same time removing the a,b or c suffix to the quality preset parameter.

I will also add a -8 parameter for this stage of testing (-nts=20; -snr=16).

lossyWAV beta v0.9.5 attached to post #1 in this thread.
Title: lossyWAV Development
Post by: jesseg on 2008-04-22 20:39:57
It's really amazing to see lossyWAV get to where it's at now, and thanks to everyone helping out with this, from the original idea to just the "fans".  I really like where it's come to by this point, and thanks to Nick especially of course, and for almost always knowing which "whims" were good to explore, and which ones to hold off on.  Without that wisdom, we would still be back before v0.50  =)

I agree with the advanced settings in the help, and it might be a good thing even to leave the advanced stuff complete out of the normal help which comes up with, for instance:
Code: [Select]
lossyWAV
lossyWAV -help
and go with something like LAME, and have a
Code: [Select]
lossyWAV -longhelp
for the advnaced stuff, that way only one line is needed to mention that there's more help if you use that variable, but it's not crucial that the information is seen for "normal" and "safe" use of lossyWAV.

It also would be nice to get some graphics gurus in here, to do up a right proper logo for this.  Perhaps some submissions from the likes of deviant art or the like?  I've never used those types of sites much, but there should be a few things out there that would support that kind of thing.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-22 21:01:29
It's really amazing to see lossyWAV get to where it's at now, and thanks to everyone helping out with this, from the original idea to just the "fans".  I really like where it's come to by this point, and thanks to Nick especially of course, and for almost always knowing which "whims" were good to explore, and which ones to hold off on.  Without that wisdom, we would still be back before v0.50  =)

I agree with the advanced settings in the help, and it might be a good thing even to leave the advanced stuff complete out of the normal help which comes up with, for instance:
Code: [Select]
lossyWAV
lossyWAV -help
and go with something like LAME, and have a
Code: [Select]
lossyWAV -longhelp
for the advnaced stuff, that way only one line is needed to mention that there's more help if you use that variable, but it's not crucial that the information is seen for "normal" and "safe" use of lossyWAV.

It also would be nice to get some graphics gurus in here, to do up a right proper logo for this.  Perhaps some submissions from the likes of deviant art or the like?  I've never used those types of sites much, but there should be a few things out there that would support that kind of thing.
Where lossyWAV is now is a result of input, feedback and constructive criticism - without any of these, it would not be the utility it is. Thanks are due to all who have bothered to contribute in any way.

I really like the -help and -longhelp parameter suggestions (or maybe combine -help -detail to get the more detailed help ). I'll get busy writing the explanatory text to go behind them and will "hide" the advanced options so that they are not shown when just "lossyWAV" is used as a command line.

I haven't had any success integrating an icon into the Delphi binary - maybe I'm missing something, maybe it's a feature that is missing from the free version of Turbo Delphi.
Title: lossyWAV Development
Post by: jesseg on 2008-04-22 21:16:35
As long as the executable has a resource section (yours do already, for version info) then you can use one of several resource editors to import a new .ico into the executable, and they should automatically create and icon group resource too.

Here's two of my favs, both are free:
http://www.wilsonc.demon.co.uk/d10resourceeditor.htm (http://www.wilsonc.demon.co.uk/d10resourceeditor.htm)
http://www.angusj.com/resourcehacker/ (http://www.angusj.com/resourcehacker/)

They are both useful because each of them have their own querks.  I recommend using XN whenever you can though.  Also usefull is LordPE or PEiD, for rebuilding your executables.  That also cuts down on the file size slightly.

And of course there's exe compressors, UPX isn't bad (i use a custom made "uber-brute-force" version which packs each half KB section with multiple types, and then going into previous sections, to figure out what lengths and what compression types will give maximum compression, and yes it's VERY slow, especially with executables over 1MB )

For an executable under 100kb, I would recommend FSG, but it is picked up as a false alarm by 3 un-popular AV (because their developers are too lazy to write support for unpacking FSG into their AV I guess, of which publicly available source-code already exists for C and Assembly.)
http://programmerstools.org/node/50 (http://programmerstools.org/node/50)

But yeah, without packing I'm usually able to get lossyWAV down to around 80KB... with the lossyWAV icon i made, which has all 4 sizes.  If you stripped it down to just 32x32 size, it could get even smaller.
Title: lossyWAV Development
Post by: jesseg on 2008-04-22 21:30:06
Here's an example of one with the icon, and rebuilt.... as well as FSG and my UPX uber-brute.  My UPX actually beats FSG on this one, so I no longer suggest using FSG, but UPX, if you pack it at all.  See attached.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-22 21:34:55
As long as the executable has a resource section (yours do already, for version info) then you can use one of several resource editors to import a new .ico into the executable, and they should automatically create and icon group resource too.

Here's two of my favs, both are free:
http://www.wilsonc.demon.co.uk/d10resourceeditor.htm (http://www.wilsonc.demon.co.uk/d10resourceeditor.htm)
http://www.angusj.com/resourcehacker/ (http://www.angusj.com/resourcehacker/)

They are both useful because each of them have their own querks.  I recommend using XN whenever you can though.  Also usefull is LordPE or PEiD, for rebuilding your executables.  That also cuts down on the file size slightly.
Hehehe.... What fun, adding an icon at this late stage! I downloaded XN Resource Editor and have had a preliminary play about - it's *really* easy. I still like your fading vertical bit cascade with zeroed lsb's at the bottom as a background - not too sure about the font though.
Title: lossyWAV Development
Post by: botface on 2008-04-23 10:09:52
With regard to help/advanced help/useability etc I think it makes sense to keep standard options to a minimum. Ideally just a quality level needing to be specified. Not only does that keep things simple it also means that non-techies like me only need to worry about the end result they're looking for and can rest assured that sensible settings have been used by default for all important parameters. Personally, I'd like to see quality settings going from lowest to highest (1 - 9) rather than the other way round, but that's only a personal feeling.

On help specifically, I'd like to see some mention of how the various values of any setting are going to affect the end result. EG :
"-nts <n>      set noise_threshold_shift to n dB (-48.0dB<=n<=+36.0dB);
              (-ve values reduce bits to remove, +ve values increase)."
is fine as far as it goes but it might be more helpful if it went on to say something like "So, the higher the negative value used the less the liklehood of noise being introduced but at the expense of a higher file size after processing through a lossless codec whereas the higher the positive value ......... etc". Maybe you feel that's over the top for help, if so maybe it could go in the wiki. By the way that's a bit out of date at the moment
Title: lossyWAV Development
Post by: halb27 on 2008-04-23 11:26:45
I'm also very fond of simplicity and clarity.
I support the suggestion that the advanced options should only be mentioned within something like a longhelp.

Thinking about the advanced options IMO only -nts and -analyses should be used.

It should be mentioned that there is no need to explicitly use -nts or -analyses as these are taken care of by the quality levels.

My suggestion for the -nts <n> description:

-nts 0 yields transparency according to experience.
A negative <n> adds a security margin (and increases file size) which is supposed to be overkill but maybe welcome when lossyWAV is used as a substitute for lossless archiving or similar applications with an extremely high quality demand.
A positive <n> yields a smaller filesize but adds the risk of audible deviations from the original. Due to additional internal precautions however a small <n> like
-nts 4 is expected not to harm transparency.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-04-23 15:50:17
Personally, I'd like to see quality settings going from lowest to highest (1 - 9) rather than the other way round
I'd just logged on to suggest exactly this!

Now it's grown from -1 -2 -3 to a non-integer 0-10 scale, I think it might make sense to tie it into a scale that people already understand. The obvious one for me is the one Vorbis uses - Q5 is transparent, lower might not be, higher is overkill / safety margin. Others may have other suggestions.

I apologise for not suggesting this sooner!

Cheers,
David.
Title: lossyWAV Development
Post by: Raiden on 2008-04-23 16:47:06
non-integer 0-10 scale

Lame uses that scale, too...
Title: lossyWAV Development
Post by: Dynamic on 2008-04-23 18:26:41
Personally, I'd like to see quality settings going from lowest to highest (1 - 9) rather than the other way round
I'd just logged on to suggest exactly this!

Now it's grown from -1 -2 -3 to a non-integer 0-10 scale, I think it might make sense to tie it into a scale that people already understand. The obvious one for me is the one Vorbis uses - Q5 is transparent, lower might not be, higher is overkill / safety margin. Others may have other suggestions.


Yes indeed, if you use "q" or "Q" for quality, this seems eminently sensible as -q 5.0 is also the "standard" transparent setting for Musepack, and increasing quality should correspond to an increasing number.

Conversely, LAME VBR (because MP3 isn't necessarily a VBR format) uses -V (not -Q) and here, 0 is the highest quality and bitrate while 9 is the lowest, so people most familiar with LAME might not get the expected behaviour. This discrepancy has always been true of different JPEG image apps, some using discrete settings, some using a "Quality" (0 is worst quality) scale and some using a "Compression" scale (0 is best quality), none of which seemed to correspond very closely to the scale in different apps.

Your original scale for the betas released to date corresponds to the degree of "loss" or "compression" allowed, and oddly enough, with 2 or 3 being equivalent to the "transparent" standard, it corresponds rather closely to LAME's current VBR scale.

Regardless of what you choose, I'd suggest that if you're calling it "quality" it should be a "0 is worse quality than 9" type of scale, and if you're calling is "loss" or "compression" it should be a "0 is better quality than 9" type of scale. Given that "constant quality" is what VBR is all about, my vote is for calling the scale quality and reversing from where you are now.

Happily, if "-q 5.00" in future corresponds to the current -2 or -3 transparent setting, then "-q 0.00" or "-q 1.00" would pretty-much correspond to the current -7 or -8 (one of which you'll decide is the lowest acceptable quality for low-battery portable use).
Title: lossyWAV Development
Post by: halb27 on 2008-04-23 19:09:45
Though I personally don't care much about whether the quality scale goes up or down I like this idea of having a quality scale analogous to that of Vorbis as Dynamic describes it:

-q 0 = -8
-q 1 = -7
-q 2 = -6
-q 3 = -5
-q 4 = -4
-q 5 = -3
-q 6 = -2
-q 7 = -1
-q 8 = -0

This way I think everybody familiar with vorbis gets an immediate and intuitive feeling about the meaning of the quality setting.

I was thinking about the advanced options again, and with these differentiated quality scales I think we should drop -nts from the user interface for the final version. IMO just -analyses should make it into the advanced options.
Title: lossyWAV Development
Post by: Dynamic on 2008-04-23 19:44:04
On a side note, presumably the same quality scale or loss scale as you decide upon for the release version of lossyWAV could be used in any future dedicated hybrid lossy encoder based on the same kind of analysis as lossyWAV (if anybody considers it worth developing - see last paragraph).

For example, if Wavpack or FLAC had a "constant-quality" or "VBR" lossy mode based on the same type of analysis as lossyWAV, then instead of using it to zero the LSBs over a whole block it could be used to define the maximum allowable prediction residual error that should remain in the audio. That could be done by defining the bit-depth of the residuals that get stored (probably the easiest method) or defining the maximum allowable error in the residuals and choosing those residuals in some other way. (The limited bit depth of the predictor, or metadata within the file header or block header might be an indicator of lossy processing, but I guess it's much harder to spot than zeroed LSBs when decoded to PCM, but that's always going to be a problem with other types of lossy, such as MP2, MP3, AAC, Vorbis, ADPCM and the like).

Actually, unless I'm missing something, I guess an encoder like that could retain long block lengths for predictor efficiency but could still use the shorter overlapping FFT analysis windows like lossyWAV to define the allowable uncoded prediction residual error at various times within the block. It might even be possible to continuously (smoothly) vary the allowable error from sample to sample within the block to follow the profile of permissible noise given by the FFT windows that overlap on that sample according to some interpolation or in proportion to the value of each lapping window function centred around each FFT window's centre sample.

Obviously, the predictor's value is dependent on the previous samples, which are now different thanks to the permitted error, and this may worsen the prediction slightly (but this hasn't stopped Wavpack lossy from creating remarkable bitrate reductions with remarkable quality).

Both approaches based on this analysis method hold out great hope for transparent or high quality lossy audio with fairly modest bitrate and relatively low decoding complexity and a closely equivalent quality scale that could show any bitrate savings between methods quite accurately. A correction file for restoring true lossless is compatible with either method (unless you get into serious noise shaping and it becomes too large to use).

LossyWAV certainly delivers most of the possible gain in compression and it is compatible with a number of well-supported lossless codecs completely unchanged (with the option of converting from, say WavPack to FLAC according to support on the playback target without further audio loss), so the possible efficiency advantage of the second approach may be a case of diminishing returns and reduced flexibility regarding re-encoding to another codec. It would be even worse in terms of waiting for decoder support if such an implementation were no longer compatible with existing FLAC or Wavpack decoders, especially those embedded into playback devices.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-23 20:09:20
So many posts to take in - with so many valid observations / ideas / comments / etc....

How about:

-0..-8 > -q 10 .. -q 0?

i.e. -0 > -q 10.0; -1 > -q 8.3333; -2 > -q 6.6667; -3 > -q 5.0; -4 > -q 4.0; -5 > -q 3.0; -6 > -q 2.0; -7 > -q 1.0; -8 > -q 0.0?

-snr and -nts could be removed from the user interface in v1.0.0, along with -noclips (perhaps).
Title: lossyWAV Development
Post by: halb27 on 2008-04-23 20:55:50
That's fine, IMO, and, yes, I forgot about -noclips: I'd like to see -noclips in the advanced options.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-23 21:39:28
That's fine, IMO, and, yes, I forgot about -noclips: I'd like to see -noclips in the advanced options.
I'll get to work on beta v0.9.6 tomorrow (I've been installing another RAID card in my server and moving drives about this evening....).

The focus for beta v0.9.6 will be to implement the -q <n> parameter and remove the -<n> parameter, to significantly simplify the basic settings and to introduce the -help and -help -detail parameters / combination to give basic help (beyond that given by running lossyWAV with no parameters) and advanced help (with the advanced settings added).
Title: lossyWAV Development
Post by: robert on 2008-04-23 22:13:14
Conversely, LAME VBR (because MP3 isn't necessarily a VBR format) uses -V (not -Q) and here, 0 is the highest quality and bitrate while 9 is the lowest, so people most familiar with LAME might not get the expected behaviour. This discrepancy has always been true of different JPEG image apps, some using discrete settings, some using a "Quality" (0 is worst quality) scale and some using a "Compression" scale (0 is best quality), none of which seemed to correspond very closely to the scale in different apps.

Your original scale for the betas released to date corresponds to the degree of "loss" or "compression" allowed, and oddly enough, with 2 or 3 being equivalent to the "transparent" standard, it corresponds rather closely to LAME's current VBR scale.
It didn't surprise me to see lossyWav doing it the same way as we did with LAME. We share the idea, that we have to add more noise, to get smaller files or a higher compression ratio. And the question is, how much would you like to distort your input signal.

Quote
Regardless of what you choose, I'd suggest that if you're calling it "quality" it should be a "0 is worse quality than 9" type of scale, and if you're calling is "loss" or "compression" it should be a "0 is better quality than 9" type of scale.
So, you would prefer to fly 9th class over 1st class? In school I would prefer to get the note 1 (best) over 6 (worst). I don't think higher quality is associated with higher numbers naturaly, it depends on your social context.

Quote
Given that "constant quality" is what VBR is all about, my vote is for calling the scale quality and reversing from where you are now.
Well, by choosing any switch, the user can only degrade quality by increasing the number of bits to remove. Or did I miss a quality enhancement switch?

Don't get me wrong, I'm fine with whatever quality/compression scheme Nick wants lossyWav to have.
Title: lossyWAV Development
Post by: halb27 on 2008-04-24 12:36:07
Instead of using Vorbis' -q n quality scale we could use Lame's -V n quality scale of course.
It's all a matter of taste.
I personally prefer the Vorbis analogy, not because of the scale direction which doesn't matter to me at all, but because I have a positive feeling towards the correspondence of -3 with -q5 and the corresponding consequences for the other quality settings. Such a -q5 can be considered transparent with a probability extremely close to 1, and from -q6 on there is an ever increasing security margin with a large security margin range to choose from.
With the Lame analogy I see problems. Which -V setting should correspond to -3? It would have to be -V3 or worse qualitywise in order to have our -2 to -0 correspond with higher -V settings.
I feel more comfortable having -3 correspond with Vorbis -q5 than with Lame -V3.
Moreover because of lossyWAV's properties I think it's good to put some emphasis to the extremely high quality settings (useful for high quality lossy archiving for instance).
With Nick's last proposal we have a lot of -q levels which deal with this high end demand (while we still have a lot of -q settings dealing with the lower end).
With the Lame -V levels it wouldn't be like that (or only if we let lossyWAV -3 correspond with something like -V5 which I think isn't very adequate).
Title: lossyWAV Development
Post by: robert on 2008-04-24 13:02:23
I don't see the point, why should any lossyWav setting match any Vorbis or LAME setting? If you say lossyWav -3 matches Vorbis -q5, why should I use lossyWav at all? If both are equal in quality, I would choose the smaller files. LossyWav wanted to fill a gap between lossless and other lossy encodings, if you want to pick up the vorbis scale, shouldn't it be from -q5(?) to -q20 then?? And no, I'm not proposing to choose a LAME alike scale, it's just, the lossyWav original settings made sense to me and I don't see any need to change that. Just my two cents.
Title: lossyWAV Development
Post by: GeSomeone on 2008-04-24 13:43:07
-snr and -nts could be removed from the user interface in v1.0.0, along with -noclips (perhaps).

I'd like to keep -nts (in the avanced category of course) it is the most meaningful parameter to tweak (apart from the -q n).
I'm neutral on the -quality vs. -n scale  but if you want to change it, now is the time (before the first "final/stable").  It would be nice to know the settings of the integer values of whatever scale is chosen (nmt snr)

BTW is the default still corresponding with -2 ?  I suggest to move the default to -3 of -4 (of the old scale).
Title: lossyWAV Development
Post by: carpman on 2008-04-24 13:54:46
I don't see the point, why should any lossyWav setting match any Vorbis or LAME setting? ... LossyWav wanted to fill a gap between lossless and other lossy encodings, if you want to pick up the vorbis scale, shouldn't it be from -q5(?) to -q20 then?? .....  it's just, the lossyWav original settings made sense to me and I don't see any need to change that. Just my two cents.


Agree entirely. That's my two cents.

C.
Title: lossyWAV Development
Post by: halb27 on 2008-04-24 13:58:26
It's pure emotion, no real reason. lossyWAV -3 quality isn't the same as Vorbis -q5's of course.
To me - maybe only to me - it's like this:
If I'd use Vorbis and struggle for transparency in a robust way while trying not to waste file size I'm fine with -q5 (of course a matter of taste and -q4 or -q6 are candidates for an appropriate setting as well). If I'd want an additional safety margin (maybe for archiving purposes) I'd better use -q6 or higher.
With the Lame -V settings as an alternative there's simply not so much room for various high end settings (also a matter of taste).

Sure the analogies no matter whether it's about Vorbis or Lame have their drawbacks as they may suggest that we get a quality at ~400 kbps that we can have at ~200 kbps or below using Vorbis or Lame.

Despite this my personal preference is still with the Vorbis-like scale, but the many words I've used to try to make that understand are misleading: after all I don't care much about it. I'm also happy with the original lossyWAV scale.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-24 13:58:47
-snr and -nts could be removed from the user interface in v1.0.0, along with -noclips (perhaps).
I'd like to keep -nts (in the avanced category of course) it is the most meaningful parameter to tweak (apart from the -q n).
I'm neutral on the -quality vs. -n scale  but if you want to change it, now is the time (before the first "final/stable").  It would be nice to know the settings of the integer values of whatever scale is chosen (nmt snr)
I could be persuaded to leave -nts in the advanced options....

[edit] Throughout the development of lossyWAV, -1, -2 and -3 have always been called quality presets. Yes, I agree that 1st class is better than 2nd class, but where does 0th class fit in (as it doesn't exist in normal speech). So, I've gone for a quality-increases-with-value-of-numerical-preset approach, on a scale of 0 to 10. Moving from -1, -2 and -3 to -1 to -7 it seems a logical progression to allow 100,000 quality preset options between -q 0.0 to -q 10.0 with a 0.0001 resolution rather than the original 3. This will allow the user to chose a personal transparency level much more easily than if they had to juggle -nts and -snr manually. Maybe some explanation will need to be added to the wiki with comparisons with previous preset bitrates. [/edit]

I've implemented the -q 0 to 10 quality preset selection and have had a thought. Up until now, the maximum bits-to-remove has been limited to (rms-value-of-all-samples-in-codec-block - 3). I am considering introducing a mechanism which would change the 3 by adding the quality-preset value divided by 4, i.e. at -q 10 subtract 5.5 rather than 3.0. This would increase the output of my 53 problem sample set from 611kbps to 616kbps at -q 10 (-nts -12 -snr 30) and from 472kbps to 482kbps at -q 5 (-nts 0 -snr 21).
Title: lossyWAV Development
Post by: halb27 on 2008-04-24 14:05:30
I've implemented the -q 0 to 10 quality preset selection and have had a thought. Up until now, the maximum bits-to-remove has been limited to (rms-value-of-all-samples-in-codec-block - 3). I am considering introducing a mechanism which would change the 3 by adding the quality-preset value divided by 4, i.e. at -q 10 subtract 5.5 rather than 3.0. This would increase the output of my 53 problem sample set from 611kbps to 616kbps at -q 10 (-nts -12 -snr 30) and from 472kbps to 482kbps at -q 5 (-nts 0 -snr 21).

I like the idea.
Title: lossyWAV Development
Post by: GeSomeone on 2008-04-24 14:15:25
I am considering introducing a mechanism which would change the 3 by adding the quality-preset value divided by 4, i.e. at -q 10 subtract 5.5 rather than 3.0.

If I understand correctly, this would only matter for the really quiet parts (or tracks).
Your proposal would increase the bit rates overall (for tracks with quiet passages), how about to even that out by lowering the first constant to (e.g.) 2.  -q 0 => 2+0, -5 => 2+1.25,  -10 => 2+2.5  ?
Title: lossyWAV Development
Post by: Nick.C on 2008-04-24 14:22:33
I am considering introducing a mechanism which would change the 3 by adding the quality-preset value divided by 4, i.e. at -q 10 subtract 5.5 rather than 3.0.
If I understand correctly, this would only matter for the really quiet parts (or tracks).
Your proposal would increase the bit rates overall (for tracks with quiet passages), how about to even that out by lowering the first constant to (e.g.) 2.  -q 0 => 2+0, -5 => 2+1.25,  -10 => 2+2.5  ?
I'm trying it, but - all of the recent ABX'ing has been done with a minimum of 3 bits kept - so I am reluctant to change the lower limit....

[edit] At -q 0: 2.0 = 306kbps; 2.5 = 310kbps; 2.75 = 313kbps; 3.0 (existing) = 318kbps.

Maybe this could be an advanced option instead, i.e. -minbits <n> would allow the user to add n bits to the minimum_bits_to_keep value at -q 10, n/2 at -q 5, 0 at -q 0, etc. [/edit]
Title: lossyWAV Development
Post by: GeSomeone on 2008-04-24 14:43:36
all of the recent ABX'ing has been done with a minimum of 3 bits kept - so I am reluctant to change the lower limit....

In my example only -q 0-3 would get just slightly lower minimum bits to keep, -q 4 and up would still have at least 3.  It was just an idea, introduce more variability, but try not bloat the "default" bit rate without an ABXable reason..  (You can also change the constant to 2.25 or 2.5 if you think it is necessary)
Title: lossyWAV Development
Post by: Nick.C on 2008-04-24 14:46:57
all of the recent ABX'ing has been done with a minimum of 3 bits kept - so I am reluctant to change the lower limit....
In my example only -q 0-3 would get just slightly lower minimum bits to keep, -q 4 and up would still have at least 3.  It was just an idea, introduce more variability, but try not bloat the "default" bit rate without an ABXable reason..  (You can also change the constant to 2.25 or 2.5 if you think it is necessary)
Or, just allow the user to select a minimum-bits-to-keep between 0 and 8(?), defaulting to 3 for no user input?
Title: lossyWAV Development
Post by: 2Bdecided on 2008-04-24 14:56:47
I think it does make sense to try to go with something like an existing quality scale, rather than inventing yet another one.

I was going to suggest keeping nts, but I thought that changes from the original algorithm meant that the effects of wild changes to nts were bounded by other parameters - so really, if you want a given effect, you either change all those paramters, or use a quality pre-set that does it for you. So I don't think nts has to stay in a stable release, if the quality pre-sets are well tested.

Cheers,
David.
Title: lossyWAV Development
Post by: GeSomeone on 2008-04-24 15:07:46
Or, just allow the user to select a minimum-bits-to-keep between 0 and 8(?), defaulting to 3 for no user input?

Alright with me, but that is not the same idea i.e. non scaling and an extra advanced setting.
Title: lossyWAV Development
Post by: halb27 on 2008-04-24 17:53:31
[Or, just allow the user to select a minimum-bits-to-keep between 0 and 8(?), defaulting to 3 for no user input?

That's ok for me, too.

Now that the encoder has changed a bit I'd like to do another listening test. Because listening tests aren't so much fun I'd like to do this at a time where the encoder is not expected to change again before the final release.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-24 20:57:46
[Or, just allow the user to select a minimum-bits-to-keep between 0 and 8(?), defaulting to 3 for no user input?
That's ok for me, too.

Now that the encoder has changed a bit I'd like to do another listening test. Because listening tests aren't so much fun I'd like to do this at a time where the encoder is not expected to change again before the final release.
lossyWAV beta v0.9.6 attached to post #1 in this thread.
Title: lossyWAV Development
Post by: [JAZ] on 2008-04-25 18:43:01
Guys, good work

I've been following this thread since its start, (tested it around 0.4 or so) and just thought to test it again.

I took a wav of a piece of song, encoded at q0, q5 and q10, and i really don't hear anything wrong at q0. (listening with headphones, volume near to top). Of course, this is not a direct ABX, but if i can't hear what to abx..

This song is noisy by design (reverb, distorted synths), so probably not the best one to hear for lossywav artifacts, but a proove of its usefulness.

The bottom side:

Flac -5 : 1017kbsp
lossywav -q 10 : 667kbps 561kbps
lossywav -q 5 : 499kbps 402kbps
lossywav -q 0 : 329kbps 289kbps

bottom line 2:
Just by curiosity, i encoded all them with lame 3.97 with -V 5 --vbr-new.
Original and -q 10 encode at the same bitrate, 144, while -q 0 encoded at 141.

[Edit: oops!!! I forgot the "-b 512" for flac.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-25 21:00:14
Guys, good work

I've been following this thread since its start, (tested it around 0.4 or so) and just thought to test it again.

I took a wav of a piece of song, encoded at q0, q5 and q10, and i really don't hear anything wrong at q0. (listening with headphones, volume near to top). Of course, this is not a direct ABX, but if i can't hear what to abx..

This song is noisy by design (reverb, distorted synths), so probably not the best one to hear for lossywav artifacts, but a proove of its usefulness.

The bottom side:

Flac -5 : 1017kbsp
lossywav -q 10 : 667kbps 561kbps
lossywav -q 5 : 499kbps 402kbps
lossywav -q 0 : 329kbps 289kbps

bottom line 2:
Just by curiosity, i encoded all them with lame 3.97 with -V 5 --vbr-new.
Original and -q 10 encode at the same bitrate, 144, while -q 0 encoded at 141.

[Edit: oops!!! I forgot the "-b 512" for flac.
Essentially, you're listening out for hiss as lossyWAV adds full spectrum noise when it removes bits from the samples.
Title: lossyWAV Development
Post by: M on 2008-04-26 03:42:42
Essentially, your listening out for hiss as lossyWAV adds full spectrum noise when it removes bits from the samples.

Nick, without slogging back through the previous 45 pages (I've read them all at one time or another, but not all tonight!) is there anything else specific we should be listening for at this point?

A little hiss isn't necessarily a bad thing. Analog tape is filled with it... and a vinyl groove can reproduce it nicely. If it was there in the beginning, and is too aggressively removed - as is all-too-often the case on modern reissues of classic material - the sound can be worse (which leads folks to spend all sorts of time tracking down earlier, un-remastered versions!).

    - M.
Title: lossyWAV Development
Post by: halb27 on 2008-04-26 06:17:43
Apart from hiss there is a chance with the lower quality settings that the high frequency region sounds a tiny bit like having changed in pitch.
Last night I started my listening test, and with 00000_00595ms it's exactly like this when using -q 3.
(Not too much of a surprise though. IIRC AlexB provided this sample and found this very issue at a higher bitrate with a lossyWAV version several months ago).
The 'problem' at -q 3  is very subtle though as are the samples with added hiss.

In theory other problems may exist (any kind of distortion), especially with the very low quality settings, but we don't have any such experience so far.
Title: lossyWAV Development
Post by: halb27 on 2008-04-26 11:12:45
I just finished my listening test with my usual problem samples Atemlied, badvilbel, bibilolo, 00000_00595ms, Blackbird/Yesterday, bruhns, dither_noise_test, eig, fiocco, furious, harp40_1, herding_calls, keys_1644ds, Livin_In_The_Future, S37_OTHERS_MartenotWaves_A, triangle-2_1644ds, trumpet, Under The Boardwalk.

I used -q 3 because this is a slightly lower quality setting than what was my transparency setting with the version I used with my last listening test.

My first 3 samples were 00000_00595ms, Atem-lied, and badvilbel. With 00000_00595ms I could hear the apparently changed pitch again and abxed it 7/10. With badvilbel I could here added hiss, arrived at 5/5 when abxing but missed afteerwards. Similar results for Atem-lied.
I do my listening test not only to find out about the goods and bads of lossyWAV in general but especially in order to find out which setting I should use with my real collection. With regard to this I am not content with -q 3 though I have to admit that my abx results aren't clear enough as a good basis for a decision. But I don't want to go so scientific: for my personal demands -q 3 isn't safe enough. Don't get me wrong: The deviations from the original are very subtle (to me, and my abx results show this).

So I tried -q 4, and this time I tried all my samples. Usually everything is fine, but I could abx the added hiss of badvilbel 8/10, and, as a surprise, triangle-2_1644ds 8/10. The problem with triangle is hard to describe: no hiss, no change in pitch, just some kind of very subtle distortion. I have the suspicion that S37_OTHERS_MartenotWaves_A (added hiss) and Under The Boardwalk (change in perceived pitch) aren't perfect either, but after a good start of 4/4 I missed badly.

I continued with -q 5 for these 4 samples. S37_OTHERS_MartenotWaves_A was okay now, but I abxed badvilbel and Under The Boardwalk 7/10. With triangle I arrived at 4/4 but missed later.

I am not as content with the current version as I was before.
While I think the quality is very acceptable at a quality level like -q 3 (only subtle issues) it's not like this for -q 5, at least not for me.

Do we have a regression? I'm afraid we have. I know listening tests in different situations aren't exactly comparable (at least my hearing abilities aren't always the same), and maybe the triangle problem existed before and I just didn't hear it.
But because I did several listening tests before which were more satisfying I'm afraid there is a regression.
The thing that changed recently as to my best knowledge was that the skewing was relaxed and the accuracy demands especially at the high frequency edge were strengthened (because of the general use of -1's spreading function). Maybe the high skewing was a good mechanism to take good care of the higher quality demands of problematic samples.
I'll try to investigate a bit in this direction.
Title: lossyWAV Development
Post by: collector on 2008-04-26 12:39:54
I am not as content with the current version as I was before.
While I think the quality is very acceptable at a quality level like -q 3 (only subtle issues) it's not like this for -q 5, at least not for me.

Strange. According to the helpfile q10 is highest quality and q0 is lowest bitrate, so most of the time chances are that q5 is better than q3 ?

Eleven steps in options are way too much for me at the moment. So when I aim for space saving I don't use parameters at all. With version 0.9.4 that equals to -0. A lossy image.flac resulted in 210 MB instead of 335 MB. Nice.
I noticed the progression to count up to 256 MB, then count down to 0, and then counting up again to the end which was 551 MB for that disc image. Savings 125 MB..

(I was trying to do so via mareo but somehow that failed. Will give it a try later. EAC > mareo > wav > lossywav > flac )
Title: lossyWAV Development
Post by: halb27 on 2008-04-26 13:25:47

I am not as content with the current version as I was before.
While I think the quality is very acceptable at a quality level like -q 3 (only subtle issues) it's not like this for -q 5, at least not for me.

Strange. According to the helpfile q10 is highest quality and q0 is lowest bitrate, so most of the time chances are that q5 is better than q3 ? ...

Sure. What I tried to say is: with -q 3 (expected bitrate: ~335 kbps on average) my quality demands are such that I can accept the subtle deviations from the original. With -q 5 (expected bitrate: ~420 kbps on average) I personally don't though -q 5 quality sure is better than that of -q 3. With -q 5 I expect full transparency.
Title: lossyWAV Development
Post by: halb27 on 2008-04-26 14:36:51
I tried again v0.8.8 which some time ago I found to be transparent at -4 with my samples.
Now I can hear the problems with badvilbel and Under the Boardwalk at -4. I didn't hear a problem with triangle but maybe I'm just less sensitive for this problem right now than I was this morning. After all it's a very subtle issue.
Going -2 I still could hear an increased hiss with badvilbel, but hear no problems with triangle and Under the Boardwalk.

So I think the main explanation is that right now I seem to be more sensitive towards the problems than I was before. It's not clear however whether apart from that there's a real quality advantage of v0.8.8 over v0.9.6.

More experience is very welcome.

P.S.: We shouldn't care too much about badvilbel. There's noise in the original, and a subtly added hiss onto it doesn't change a lot.
Title: lossyWAV Development
Post by: GeSomeone on 2008-04-26 17:26:55
.. with -q 3 (expected bitrate: ~335 kbps on average) my quality demands are such that I can accept the subtle deviations from the original. [..] With -q 5 I expect full transparency.

Thanks for your testing time and time again. From 0.9.6 we have a completely new quality scale, maybe your ideal setting has shifted to perhaps -q 5.6472  .
Is there a way to see what (internal) settings that are applied? nts, snr, bits_to_keep and such? To find a possible regression, the first thing would be to compare the parameter settings of the new version with those of a previous one.
Title: lossyWAV Development
Post by: halb27 on 2008-04-26 17:40:25
Because of my listening test results for triangle I also looked at it technically using -detail with v0.8.8 and v0.9.6.
Though the results aren't totally comparable it's hard to beleive that there should have been a regression with v0.9.6.
Guess differences heard are due to my different sensitivity this morning and this afternoon.
Title: lossyWAV Development
Post by: halb27 on 2008-04-26 18:15:28
>> maybe your ideal setting has shifted to perhaps -q 5.6472 <<

Yes, obviously my ideal setting has changed, and for the biggest part I think it's due to an actual better hearing than the one I had when I did the listening tests before.

I just listened to the 4 critical samples using -q 6 and everything's fine.
So I will use -q 6 in the future - I don't have to care about a bitrate like 450 kbps.

But I suggest we change the default quality setting to -q 6 because we always wanted to have a transparent default setting.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-26 20:32:44
>> maybe your ideal setting has shifted to perhaps -q 5.6472 <<

Yes, obviously my ideal setting has changed, and for the biggest part I think it's due to an actual better hearing than the one I had when I did the listening tests before.

I just listened to the 4 critical samples using -q 6 and everything's fine.
So I will use -q 6 in the future - I don't have to care about a bitrate like 450 kbps.

But I suggest we change the default quality setting to -q 6 because we always wanted to have a transparent default setting.
Many thanks (yet again) for your efforts in listening to processed samples. Would it be better to:

a) Move current -q 6 to -q 5 stretching the higher presets and squeezing the lower presets;

or

b) Move all presets down one (adding a new -q 10 and -q 0 falls off the bottom).

Everything was going too smoothly....... .
Title: lossyWAV Development
Post by: halb27 on 2008-04-26 21:33:52
I personally don't care much about it as long as the default is what is now -q 6.

Whether to drop current -q 0 or not depends on the usability of -q 0 with respect to what is to be expected by users of -q 0.
It would be kind if the one or other potential user of a low -q setting could share his opinion.
Title: lossyWAV Development
Post by: gasmann on 2008-04-26 21:56:00
Many thanks (yet again) for your efforts in listening to processed samples. Would it be better to:

a) Move current -q 6 to -q 5 stretching the higher presets and squeezing the lower presets;

or

b) Move all presets down one (adding a new -q 10 and -q 0 falls off the bottom).

Everything was going too smoothly....... .


Oh, please, don't do that!  I do use -q 0, really! Please don't drop it!

But hey, you could do as vorbis does, adding something like -q -1
Title: lossyWAV Development
Post by: Nick.C on 2008-04-26 21:58:39
Many thanks (yet again) for your efforts in listening to processed samples. Would it be better to:

a) Move current -q 6 to -q 5 stretching the higher presets and squeezing the lower presets;

or

b) Move all presets down one (adding a new -q 10 and -q 0 falls off the bottom).

Everything was going too smoothly....... .
Oh, please, don't do that!  I do use -q 0, really! Please don't drop it!

But hey, you could do as vorbis does, adding something like -q -1
I'll squash and squeeze rather than remove the current -q 0.
Title: lossyWAV Development
Post by: gasmann on 2008-04-26 22:15:55
ok, thank you! I would have been fine with q -1, too... As long as I can continue using it, it's alright

halb27 said users should share their opinion... well I regard myself a user  It's not transparent to me (I could just easily abx a song 7/7), but I like the fact that quality is much more stable than that of say mp3. I didn't find any serious problems on particular "problem samples". And this noise that is introduced is much less annoying than mp3 artifacts. Of course, at this bitrate mp3 generally does a better job, but I always have to fear there is a problem sample 

However, I don't use lossyWAV for archiving, that'll always have to be truely lossless, pardon. I use this low-bitrate flacs for listening only.
Title: lossyWAV Development
Post by: jesseg on 2008-04-27 04:33:02
I have two suggestions.

1.
Code: [Select]
-verbose
speaks for itself.

2.
To add to the lossyWAV metadata that gets saved in the wav files... the settings configuration string and version number of lossyWAV that were used to process the wav.
Title: lossyWAV Development
Post by: halb27 on 2008-04-27 06:57:20
... but I like the fact that quality is much more stable than that of say mp3. I didn't find any serious problems on particular "problem samples". And this noise that is introduced is much less annoying than mp3 artifacts. ...

Though I'm striving for transparency your description made me curious about the behavior of -q 0.
First I encoded my regular track set which I use for getting an idea of the average bitrate I have to expect when using a particular setting. The result was 263 kbps which is very low compared to what we had before with the lowest settings. The more was I surprised that I was pleased when listening to the encoded tracks. Quality is very good to me! This made me dare to use my problem samples with it. More surprise: abxing isn't very hard of course with most of the problems, but: with the exception of eig and furious the deviations from the original are not obvious at all and not at all annoying. Going -q 1 BTW (average bitrate: 281 kbps with my regular track set) made even furious not annoying to me and eig acceptable.

I didn't care much about the very low quality settings before, but, Nick, with your recent changes with the encoder I think you've succeeded in giving lossyWAV an extremely broad useful quality/bitrate range!
Thanks a lot.
Title: lossyWAV Development
Post by: botface on 2008-04-27 11:03:34

... but I like the fact that quality is much more stable than that of say mp3. I didn't find any serious problems on particular "problem samples". And this noise that is introduced is much less annoying than mp3 artifacts. ...

Though I'm striving for transparency your description made me curious about the behavior of -q 0.
First I encoded my regular track set which I use for getting an idea of the average bitrate I have to expect when using a particular setting. The result was 263 kbps which is very low compared to what we had before with the lowest settings. The more was I surprised that I was pleased when listening to the encoded tracks. Quality is very good to me! This made me dare to use my problem samples with it. More surprise: abxing isn't very hard of course with most of the problems, but: with the exception of eig and furious the deviations from the original are not obvious at all and not at all annoying. Going -q 1 BTW (average bitrate: 281 kbps with my regular track set) made even furious not annoying to me and eig acceptable.

I didn't care much about the very low quality settings before, but, Nick, with your recent changes with the encoder I think you've succeeded in giving lossyWAV an extremely broad useful quality/bitrate range!
Thanks a lot.

I'll second that. I've been doing some testing with higher bit depths/sample rates (24/64, 24/88.2, 24/96). This was primarily to see if I could perceive any advantage to using them. As a starting point I decided to rip some tracks at 16/44.1 and encode tham at the lowest quality setting so that I could get a good idea of the type of degradation I was looking for. Very much to my surprise I could hardly hear a difference - just a very slight increase in "hiss" but at such a low level that I wouldn't have noticed it if I wasn't listening for it. I was expecting something like the hiss levels you used to get with cassette or a weak FM station.

On the "-q" settings. I don't have any particular axe to grind and don't want to muddy the waters but do we really need 10 quality levels? From lowest to highest we have a final bit rate of something like 250kbps to 550kbps so each change in -q setting gives only a very slight change in the result. Before LossyWAV came along I used Wavpack Lossy. I used the "-b" setting to set bits per sample rather than a specific bit rate. Using a BPS range of 3 to 6 gives pretty much the same final range in kbps as LossyWAV's 0 to 10 and I never found it inconvenient especialy since, like LossyWAV, it's possible to specify decimal number EG 4.6, 3.8 etc to get the result you want.
Title: lossyWAV Development
Post by: GeSomeone on 2008-04-27 13:45:25
I just listened to the 4 critical samples using -q 6 and everything's fine.
Would it be better to:

a) Move current -q 6 to -q 5 stretching the higher presets and squeezing the lower presets;

or

b) Move all presets down one (adding a new -q 10 and -q 0 falls off the bottom).

Or (too obvious?) just change the default to -q 6 for now?

I'd like to ask Halb27 if he's willing to do an ABX of (current) -q 5 vs.  -q 6 for those 4 problem samples. That would help rule out difference in hearing sensitivity.
Title: lossyWAV Development
Post by: collector on 2008-04-27 14:19:24
On the "-q" settings. I don't have any particular axe to grind and don't want to muddy the waters but do we really need 10 quality levels? From lowest to highest we have a final bit rate of something like 250kbps to 550kbps so each change in -q setting gives only a very slight change in the result.

The more settings (11) the less steps I test. On my slow computer I skip the best and worst, so I test with 9,7,5 and 3 until I detect problems. Default is sufficient (whether 5 or 6).
Title: lossyWAV Development
Post by: Nick.C on 2008-04-27 16:42:52
... but I like the fact that quality is much more stable than that of say mp3. I didn't find any serious problems on particular "problem samples". And this noise that is introduced is much less annoying than mp3 artifacts. ...
Though I'm striving for transparency your description made me curious about the behavior of -q 0.
First I encoded my regular track set which I use for getting an idea of the average bitrate I have to expect when using a particular setting. The result was 263 kbps which is very low compared to what we had before with the lowest settings. The more was I surprised that I was pleased when listening to the encoded tracks. Quality is very good to me! This made me dare to use my problem samples with it. More surprise: abxing isn't very hard of course with most of the problems, but: with the exception of eig and furious the deviations from the original are not obvious at all and not at all annoying. Going -q 1 BTW (average bitrate: 281 kbps with my regular track set) made even furious not annoying to me and eig acceptable.

I didn't care much about the very low quality settings before, but, Nick, with your recent changes with the encoder I think you've succeeded in giving lossyWAV an extremely broad useful quality/bitrate range!
Thanks a lot.
I think that the -snr parameter has a lot to do with some of these problem samples.

I would propose something like

quality_signal_to_noise_ratios    : array[0..Quality_Presets] of Double  = (18,18.87,19.81,20.8,21.86,23,24.21,25.51,26.91,28.4,30);

instead of

quality_signal_to_noise_ratios    : array[0..Quality_Presets] of Double  = (16,17,18,19,20,21,22.8,24.6,26.4,28.2,30);
Title: lossyWAV Development
Post by: halb27 on 2008-04-27 19:04:08
... quality_signal_to_noise_ratios    : array[0..Quality_Presets] of Double  = (18,18.87,19.81,20.8,21.86,23,24.21,25.51,26.91,28.4,30); ...

As this makes things more defensive: go ahead.
But why these strange steps like 18.87?
Title: lossyWAV Development
Post by: Nick.C on 2008-04-27 19:16:44
... quality_signal_to_noise_ratios    : array[0..Quality_Presets] of Double  = (18,18.87,19.81,20.8,21.86,23,24.21,25.51,26.91,28.4,30); ...
As this makes things more defensive: go ahead.
But why these strange steps like 18.87?
I was looking for a smooth curve, so I worked out the power required to make 18 translate to 30 in 10 steps (i.e. snr:=power(snr[i-1],z)).
Title: lossyWAV Development
Post by: halb27 on 2008-04-27 19:37:09
... quality_signal_to_noise_ratios    : array[0..Quality_Presets] of Double  = (18,18.87,19.81,20.8,21.86,23,24.21,25.51,26.91,28.4,30); ...
As this makes things more defensive: go ahead.
But why these strange steps like 18.87?
I was looking for a smooth curve, so I worked out the power required to make 18 translate to 30 in 10 steps (i.e. snr:=power(snr[i-1],z)).

I see. And it's all internal anyway.

... I'd like to ask Halb27 if he's willing to do an ABX of (current) -q 5 vs.  -q 6 for those 4 problem samples. ...

OK. Luckily it's just 3 samples cause MartenotWaves was alright with -q 5 (triangle as well in the sense that I couldn't abx it, but there is a suspicion that it isn't perfect as I started with 4/4).

I could not abx badvilbel and triangle -q 5 vs. -q 6.
My result for Under the Boardwalk was 7/10 (the same as -q 5 vs. original).

Obviously this isn't a big issue for me with -q 5, but I'd like to play it safe when using lossyWAV.
Apart from that my 58 year old ears are a bit trained now to these samples, but there are certainly ears out there which perform a lot better.
Hopefully we get a lot of more listening experience feedback.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-27 21:15:08
Using the proposed revision to the -snr parameter, the following bitrates were achieved when I processed my 53 problem sample set:

Code: [Select]
|-------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
|  lossyWAV   |  -q 0   |  -q 1   |  -q 2   |  -q 3   |  -q 4   |  -q 5   |  -q 6   |  -q 7   |  -q 8   |  -q 9   |  -q 10  |
|-------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
| beta v0.9.6 | 318kbps | 338kbps | 364kbps | 394kbps | 431kbps | 472kbps | 500kbps | 529kbps | 557kbps | 584kbps | 611kbps |
|-------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
| variant #1  | 327kbps | 346kbps | 370kbps | 400kbps | 435kbps | 475kbps | 502kbps | 530kbps | 557kbps | 584kbps | 611kbps |
|-------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|
| variant #2  | 327kbps | 348kbps | 373kbps | 403kbps | 438kbps | 477kbps | 504kbps | 531kbps | 558kbps | 585kbps | 611kbps |
|-------------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|---------|


However, looking at that, not enough is done around the -q 5 mark, so I'm going to try:

variant #2: quality_signal_to_noise_ratios : array[0..Quality_Presets] of Double = (18.0,19.2,20.4,21.6,22.8,24.0,25.2,26.4,27.6,28.8,30.0);

instead of

variant #1: quality_signal_to_noise_ratios : array[0..Quality_Presets] of Double = (18.0,18.9,19.8,20.8,21.9,23.0,24.2,25.5,26.9,28.4,30.0);

instead of

quality_signal_to_noise_ratios : array[0..Quality_Presets] of Double = (16.0,17.0,18.0,19.0,20.0,21.0,22.8,24.6,26.4,28.2,30.0);

I am a bit happier with the spread of the bitrate outputs from the various quality presets. I'll have a think and probably post beta v0.9.7 tomorrow.
Title: lossyWAV Development
Post by: halb27 on 2008-04-28 08:47:48
IMO the more defensive -snr values are welcome especially for the low bitrate settings. It's not so important for -q 5+ IMO.

I wonder about something else. Do we have a specific problem with impulses? (eig - a very serious mp3 pre-echo problem - shows the worst performance at -q 0, and it's so bad around the impulses, and Under the Boardwalk seems to have a small problem at -q 5, and the slightly changed pitch I perceive is with drums. I also remember that AlexB's very first lossyWAV -3 listening experience led to a changed pitch detection, and his sample is full of percussion.).
May be impulses should be taken special care of. Trying to improve is possible without hard listening tests by striving at a good eig performance at -q 0. Maybe a special impulses detection could help which automatically lowers the number of bits to remove drastically?
Title: lossyWAV Development
Post by: Nick.C on 2008-04-28 12:07:31
How to search for an impulse though.....?

Might one approach be to split the codec-block into 8/15 (16/31?) 50% overlapping chunks and take RMS values of the samples in each chunk, then look at the relative magnitudes of the per-chunk-RMS-results to try to spot an impulse?

Or, perform 16 (or 32) sample FFT's (8/15 or 4/7, 50% overlapping) and look at the maximum bin result in each?

Or, just use the maximum bin (skewed / unskewed?) result from the 9 x 64 sample FFT's already calculated per channel per codec-block and try to spot the high value?
Title: lossyWAV Development
Post by: halb27 on 2008-04-28 13:40:43
How to search for an impulse though.....?

Might one approach be to split the codec-block into 8/15 (16/31?) 50% overlapping chunks and take RMS values of the samples in each chunk, then look at the relative magnitudes of the per-chunk-RMS-results to try to spot an impulse?

Or, perform 16 (or 32) sample FFT's (8/15 or 4/7, 50% overlapping) and look at the maximum bin result in each?

Or, just use the maximum bin (skewed / unskewed?) result from the 9 x 64 sample FFT's already calculated per channel per codec-block and try to spot the high value?

I have no idea what's best. All of your proposals make sense to me.
Title: lossyWAV Development
Post by: GeSomeone on 2008-04-28 16:09:32
Do we have a specific problem with impulses? (eig - a very serious mp3 pre-echo problem - shows the worst performance at -q 0, [..] Under the Boardwalk seems to have a small problem at -q 5, [..]).
Maybe impulses should be taken special care of.

How to search for an impulse though.....?

Just trying to think along in finding an approach for this (just a bunch of questions to consider, I'm afraid)
First of all: is this new or more severe than in previous versions? (if that's true .. what was changed).
Can the transients be catched with one of the existing mechanisms? e.g. Does it get better when raising -nts (I know it's hidden from the interface right now). The -nts value distribution (over the -q's) has been changed lately, is it working properly?
Less likely, does adding FFT's help?

Could it be -snr could help this too? (try with a high quality_signal_to_noise_ratio).

It is a suspicion from me too that sounds like drums with hi-hats sometime sound not as "crisp" at settings below -q 5. But you can't take my word for it as I'm terrible at ABX, after 2x I usually hear a no difference anymore.
Title: lossyWAV Development
Post by: halb27 on 2008-04-28 20:59:06
How to search for an impulse though.....?

I've looked at eig and Under The Boardwalk using a wav editor.
Maybe a very simple procedure does it: watch the difference of the value of two consecutive samples. If the absolute value of the difference is larger than a certain threshold: reduce the number of bits to remove depending on the size of the difference.
Make the threshold and number-of-bits-to-remove-depence on the sample difference more demanding for the higher -q settings.
Here's the critical beginning of eig (http://home.arcor.de/horstalb/Xfer2/eig_essence.flac) in case you haven't got eig, Nick, if you want to play with it.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-28 21:03:50
How to search for an impulse though.....?
I've looked at eig and Under The Boardwalk using a wav editor.
Maybe a very simple procedure does it: watch the difference of the value of two consecutive samples. If the absolute value of the difference is larger than a certain threshold: reduce the number of bits to remove depending on the size of the difference.
Make the threshold and number-of-bits-to-remove-depence on the sample difference more demanding for the higher -q settings.
Here's the critical beginning of eig (http://home.arcor.de/horstalb/Xfer2/eig_essence.flac) in case you haven't got eig, Nick, if you want to play with it.
Many thanks for the insight - I'll get coding to implement a "net" to find the maximum absolute difference between samples for each channel in a codec-block.

I have already implemented a search for the bin with the maximum value, the simple average value and the minimum value for each FFT analysis.

Thanks for something else to chew on!
Title: lossyWAV Development
Post by: halb27 on 2008-04-28 21:20:17
Just a remark, Nick, as you like so much to use your 53 sample set:
For judging about the negative impact this impulse-defensive idea will have on bitrate: please use a set of regular music to get an impression of the consequences. With problem samples it's welcome that bitrate goes up, with regular music it's not. My just 12 entire tracks set of regular music is encoded quickly, and the bitrate results have always been close to your more advanced multi-album test. So I suggest you use just a selection of a couple of full length tracks. Just take care a bit that the musical content of the tracks selected isn't too similar.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-04-29 15:19:09
One way to throw more bits at impulses is to use a shorter FFT, e.g. 32.

I'm not saying you should, only that it could be worth trying. It'll "see" the space around impulses more, which may be a good or bad thing overall.

Cheers,
David.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-29 20:04:40
One way to throw more bits at impulses is to use a shorter FFT, e.g. 32.

I'm not saying you should, only that it could be worth trying. It'll "see" the space around impulses more, which may be a good or bad thing overall.

Cheers,
David.
Many thanks, David, for the advice - it was also the simplest by far to implement at the expense of additional process time.

lossyWAV beta v0.9.7 attached to post #1 in this thread.

[edit] Processed 53 sample problem set bitrates: (10 album test set to follow).

Code: [Select]
|----------------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
|      lossyWAV        | -q 0  | -q 1  | -q 2  | -q 3  | -q 4  | -q 5  | -q 6  | -q 7  | -q 8  | -q 9  | -q 10 |
|----------------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| beta v0.9.6          |318kbps|338kbps|364kbps|394kbps|431kbps|472kbps|500kbps|529kbps|557kbps|584kbps|611kbps|
|----------------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| beta v0.9.7          |327kbps|346kbps|370kbps|400kbps|435kbps|475kbps|502kbps|530kbps|557kbps|584kbps|611kbps|
|----------------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| beta v0.9.7 -impulse |342kbps|360kbps|383kbps|412kbps|446kbps|485kbps|513kbps|540kbps|567kbps|594kbps|619kbps|
|----------------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
Title: lossyWAV Development
Post by: lvqcl on 2008-04-29 20:49:00
Quote
E:\Utils\LossyWAV>lossyWAV.exe test.wav -analyses
%lossyWAV Error% : No analyses value given.

E:\Utils\LossyWAV>lossyWAV.exe test.wav -analyses 2
lossyWAV beta v0.9.7, Copyright © 2007,2008 Nick Currie.
lossyWAV is issued with NO WARRANTY WHATSOEVER and is free software.

%lossyWAV Error% : Incorrect option: "-analyses"


It seems something was broken in 0.9.6 -> 0.9.7 change
Title: lossyWAV Development
Post by: Nick.C on 2008-04-29 21:10:17
Quote
E:\Utils\LossyWAV>lossyWAV.exe test.wav -analyses
%lossyWAV Error% : No analyses value given.

E:\Utils\LossyWAV>lossyWAV.exe test.wav -analyses 2
lossyWAV beta v0.9.7, Copyright © 2007,2008 Nick Currie.
lossyWAV is issued with NO WARRANTY WHATSOEVER and is free software.

%lossyWAV Error% : Incorrect option: "-analyses"


It seems something was broken in 0.9.6 -> 0.9.7 change
Thanks for that, revised version v0.9.7 going up now....

Bitrates for 10 album test set:

beta v0.9.6 : -q 10: 573kbps; -q 5: 417kbps; -q 0: 286kbps
beta v0.9.7 : -q 10 -impulse: 580kbps; -q 5 -impulse: 429kbps; -q 0 -impulse: 310kbps
Title: lossyWAV Development
Post by: ckjnigel on 2008-04-29 21:48:15
I've spent nearly an hour trying to use the batch file that's in the wiki for foobar, but it just won't go no matter what edits I make.
Could someone post a copy of a flossy bat that works? -- preferably using just a c: drive.
I did get the latest beta to work from the command line and my initial impression is very favorable -- "Yours Is No Disgrace" 29.0Mb vs. 65.2Mb is a worthwhile saving.
FWIW, there might be others like me who hadn't realized that this is a pre-processor for FLAC rather than a codec that creates altered WAV files. 
An obvious application for me would be converting language instruction CDs to greatly reduced semi-lossless.  There'd likely be time savings using this rather than downsampling to 32000 and converting to mono in CoolEdit before converting to FLAC.  An additional step has been amplifying 6db, but, perhaps there's a foobar plgin for that?
Anyway, thanks much for your efforts.  The original idea was a cunning and clever one, but there's obviously been plenty of perspiration since...
Title: lossyWAV Development
Post by: Nick.C on 2008-04-29 21:52:26
I've spent nearly an hour trying to use the batch file that's in the wiki for foobar, but it just won't go no matter what edits I make.
Could someone post a copy of a flossy bat that works? -- preferably using just a c: drive.
I did get the latest beta to work from the command line and my initial impression is very favorable -- "Yours Is No Disgrace" 29.0Mb vs. 65.2Mb is a worthwhile saving.
FWIW, there might be others like me who hadn't realized that this is a pre-processor for FLAC rather than a codec that creates altered WAV files. 
An obvious application for me would be converting language instruction CDs to greatly reduced semi-lossless.  There'd likely be time savings using this rather than downsampling to 32000 and converting to mono in CoolEdit before converting to FLAC.  An additional step has been amplifying 6db, but, perhaps there's a foobar plgin for that?
Anyway, thanks much for your efforts.  The original idea was a cunning and clever one, but there's obviously been plenty of perspiration since...
Is the batch file on a path with spaces in it? I found this to be an elusive problem to solve. That is why my batch file is in a simple <drive>:\BIN\ directory, as are flac.exe and lossyWAV.exe - also, ensure that the batch file references the correct locations of the two relevant .exe files.

[edit2] Oh, and it's a lossy pre-processor which produces modified WAV files. It works with other codecs apart from FLAC, although I use FLAC by preference as it is compatible with TCPMP v0.81 on my iPAQ. [/edit2]

[edit] For example: flossy.bat
Code: [Select]
@echo off
c:\data_nic\bin\lossyWAV %1 %3 %4 %5 %6 %7 %8 %9 -low -nowarn -quiet
c:\data_nic\bin\flac.exe -5 -f -b 512 "%~N1.lossy.wav" -o"%~N2.flac"
del "%~N1.lossy.wav"
with the batch file, lossyWAV.exe and flac.exe in the same directory, i.e. C:\DATA_NIC\BIN\ and called from foobar2000 with:

Encoder: cmd.exe

Extension: lossy.flac (NOT .lossy.flac!!!)

Parameters: /d /c c:\data_nic\bin\flossy.bat %s %d <insert your parameters here>

example parameters could be: -q 4 -impulse [/edit]
Title: lossyWAV Development
Post by: halb27 on 2008-04-29 22:06:04
Thank you Nick, for your new version.

I'm too tired now for abxing higher quality settings, but gave it a try using -q 0 -impulse for eig.
Yes, there's an abxable improvement with eig.

Bitrate increase for regular music isn't very remarkable: my regular music track set went up from 417 kbps (v0.9.6 -q 5) to 427 kbps (v0.9.7 -q 5 -impulse).

Hope I can do more listening tests tomorrow.

What -spf values are using for the 32 samples FFTs, Nick?
Title: lossyWAV Development
Post by: Nick.C on 2008-04-29 22:10:51
Thank you Nick, for your new version.

I'm too tired now for abxing higher quality settings, but gave it a try using -q 0 -impulse for eig.
Yes, there's an abxable improvement with eig.

Bitrate increase for regular music isn't very remarkable: my regular music track set went up from 417 kbps (v0.9.6 -q 5) to 427 kbps (v0.9.7 -q 5 -impulse).

Hope I can do more listening tests tomorrow.

What -spf values are using for the 32 samples FFTs, Nick?
I iterated a few times until I just used 22223 (the same as for the 64 sample FFT) as increasing the 2's results in a higher bitrate(!). 22222 was also higher in bitrate - 22223 seems to be a sweet spot (some averaging, but not too much).
Title: lossyWAV Development
Post by: halb27 on 2008-04-29 22:19:08
I iterated a few times until I just used 22223 (the same as for the 64 sample FFT) as increasing the 2's results in a higher bitrate(!). 22222 was also higher in bitrate - 22223 seems to be a sweet spot (some averaging, but not too much).

Thank you, Nick.
Do you mind making -spf temporarily available to the user again? (I'm only interested in playing around with the -spf setting for the 32 samples FFT, I'm just curious about the quality of 22222).
Title: lossyWAV Development
Post by: Nick.C on 2008-04-29 22:20:32
I iterated a few times until I just used 22223 (the same as for the 64 sample FFT) as increasing the 2's results in a higher bitrate(!). 22222 was also higher in bitrate - 22223 seems to be a sweet spot (some averaging, but not too much).
Thank you, Nick.
Do you mind making -spf temporarily available to the user again? (I'm only interested in playing around with the -spf setting for the 32 samples FFT, I'm just curious about the quality of 22222).
I'll post a beta v0.9.7b shortly.
Title: lossyWAV Development
Post by: jesseg on 2008-04-30 05:37:37
Is the batch file on a path with spaces in it? I found this to be an elusive problem to solve.


I can send you a modified version that handles that, and unicode...  but yeah, some things are best left simple.  Let me know if I could help when I can.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-30 07:40:32
Is the batch file on a path with spaces in it? I found this to be an elusive problem to solve.
I can send you a modified version that handles that, and unicode...  but yeah, some things are best left simple.  Let me know if I could help when I can.
The problem seems to be with either foobar2000 or cmd.exe (I never did determine which). As soon as I removed spaces from the path to the batch file everything began to work.

Unicode handling in which sense?
Title: lossyWAV Development
Post by: ckjnigel on 2008-04-30 10:27:31
Is the batch file on a path with spaces in it? I found this to be an elusive problem to solve. That is why my batch file is in a simple <drive>:\BIN\ directory, as are flac.exe and lossyWAV.exe - also, ensure that the batch file references the correct locations of the two relevant .exe files.

[edit2] Oh, and it's a lossy pre-processor which produces modified WAV files. It works with other codecs apart from FLAC, although I use FLAC by preference as it is compatible with TCPMP v0.81 on my iPAQ. [/edit2]

[edit] For example: flossy.bat
Code: [Select]
@echo off
c:\data_nic\bin\lossyWAV %1 %3 %4 %5 %6 %7 %8 %9 -low -nowarn -quiet
c:\data_nic\bin\flac.exe -5 -f -b 512 "%~N1.lossy.wav" -o"%~N2.flac"
del "%~N1.lossy.wav"
with the batch file, lossyWAV.exe and flac.exe in the same directory, i.e. C:\DATA_NIC\BIN\ and called from foobar2000 with:

Encoder: cmd.exe

Extension: lossy.flac (NOT .lossy.flac!!!)

Parameters: /d /c c:\data_nic\bin\flossy.bat %s %d <insert your parameters here>

example parameters could be: -q 4 -impulse [/edit]


That's got it working!   
Thanks for the quick reply!
Batch files make me nostalgic for DOS 3.3  -- NOT!
Before I waste lots of time, assure me that there'd be no benefit taking a lossy.flac and converting it into MP3, AAC (HE, LC), OGG or some other lossy. (That's as an alternative to using a lower quality in the native encoder, thus relying on the inbuilt psychoacoustic tunings.)  I just started thinking about creating Nero LC-AAC files from your semi-lossies as an alternative to HE-AAC for my Sony-Ericsson musicphone...
Garf claims HE-AAC isn't battery-thirsty (though it is CPU hungry), but I have doubts.
Title: lossyWAV Development
Post by: halb27 on 2008-04-30 12:07:32
Before I waste lots of time, assure me that there'd be no benefit taking a lossy.flac and converting it into MP3, AAC (HE, LC), OGG or some other lossy. ...

When targeting at mp3, aac, ogg, etc. it's always best you encode from the original or a lossless codec.
If you use a high quality setting of lossyWAV (for instance for archiving instead of using a lossless archive) and convert from this to mp3, it is expected however that the quality loss due to this transcoding is insignificant.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-30 12:12:40
That's got it working!   
Thanks for the quick reply!
Batch files make me nostalgic for DOS 3.3  -- NOT!
Before I waste lots of time, assure me that there'd be no benefit taking a lossy.flac and converting it into MP3, AAC (HE, LC), OGG or some other lossy. (That's as an alternative to using a lower quality in the native encoder, thus relying on the inbuilt psychoacoustic tunings.)  I just started thinking about creating Nero LC-AAC files from your semi-lossies as an alternative to HE-AAC for my Sony-Ericsson musicphone...
Garf claims HE-AAC isn't battery-thirsty (though it is CPU hungry), but I have doubts.
Glad to be of service .

The added compression in certain lossless codecs is only due to the exploitation of the wasted-bits mechanism. Transcoding from a lossyWAV processed lossless files has not (to my knowledge) been well explored yet.

@halb27: I've been looking at making spreading functions for all of the FFT lengths more conservative. I'm trying: 22222-22222-22223-12233-12234-12234 and although the bitrate goes up a bit, it may be attractive.

[edit] 53 problem sample bitrates beta v0.9.8:
Code: [Select]
|----------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| lossyWAV | -q 10 | -q 9  | -q 8  | -q 7  | -q 6  | -q 5  | -q 4  | -q 3  | -q 2  | -q 1  | -q 0  |
|----------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| v0.9.8   |635kbps|609kbps|583kbps|556kbps|528kbps|500kbps|457kbps|419kbps|386kbps|358kbps|336kbps|
|----------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| v0.9.8 i |644kbps|619kbps|594kbps|567kbps|539kbps|512kbps|469kbps|431kbps|399kbps|372kbps|351kbps|
|----------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
Looking at the bitrates however, it may be that this is too conservative. Advice / comment / opinion will be very well received.... [/edit]

[edit2]Trying 22222-22223-22223-12234-12234-12235, I get:
Code: [Select]
|----------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| lossyWAV | -q 10 | -q 9  | -q 8  | -q 7  | -q 6  | -q 5  | -q 4  | -q 3  | -q 2  | -q 1  | -q 0  |
|----------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| v0.9.8   |622kbps|596kbps|569kbps|541kbps|514kbps|487kbps|445kbps|408kbps|377kbps|351kbps|331kbps|
|----------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| v0.9.8 i |635kbps|611kbps|585kbps|557kbps|530kbps|503kbps|462kbps|425kbps|394kbps|369kbps|349kbps|
|----------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
[/edit2]
Title: lossyWAV Development
Post by: halb27 on 2008-04-30 15:55:30
Hallo Nick,

Bitrate increase of problem samples is welcome.
I always wonder in the first place what's the bitrate increase of regular music.
My personal opinion is that we should be very defensive towards the HF region with the short FFTs in the first place, and this addresses the 32 sample FFTs in case we consider these as something useful in the end.
Maybe it's a good strategy to leave the -spf setting for the standard analysis, but with a user supplied -analyses not just add one or more analyses but use a more defensive -spf setting for the added analyses.

Thank you for v0.9.7b. I'm curious about the bitrate with regular music and the quality of this version.
Title: lossyWAV Development
Post by: Nick.C on 2008-04-30 15:57:40
Hallo Nick,

Bitrate increase of problem samples is welcome.
I always wonder in the first place what's the bitrate increase of regular music.
My personal opinion is that we should be very defensive towards the HF region with the short FFTs only.

Thank you for v0.9.7b. I'm curious about the bitrate with regular music and the quality of this version.
We've got visitors until Monday, but I'll try to get some of my 10 album test set results out tonight / tomorrow.
Title: lossyWAV Development
Post by: halb27 on 2008-04-30 16:04:23
We've got visitors until Monday, but I'll try to get some of my 10 album test set results out tonight / tomorrow.

Take your time, don't hurry.
Title: lossyWAV Development
Post by: halb27 on 2008-04-30 21:18:23
I did some listening tests using v0.9.7b -impulse -spf 22222-22223-22224-12235-12246-12357.
At -q 0 I could abx the difference against v0.9.7 with eig, but I wouldn't call it a significant improvement.
At -q 1 I couldn't, quality is high in both cases.

Bitrate for my regular track set goes up from 427 kbps to 442 kbps at -q 5 which I agree is way too high and could only be justified by a remarkable quality improvement.

I tested those samples from my last listening test which weren't totally correct.
I used v0.9.7 -q 3 -impulse.
Everything was fine to me with the exception of triangle-2_1644ds, though my abxing wasn't very good (7/10). Comparing this with v0.9.7b -q 3 -impulse -spf 22222-22223-22224-12235-12246-12357 I found that the encoding is identical to that of v0.9.7 -q 3 -impulse.

Judging from the results available so far I do think -impulse has a favorable effect. But it's fine the way you do it with v0.9.7, and the 22222 approach isn't efficient just as you said (I was just curious).
Guess I'll do some more listening tests tomorrow in order to confirm the positive effect of -impulse (hoping that everything will be alright now at -q 5 or even -q 4).
Title: lossyWAV Development
Post by: jesseg on 2008-05-01 01:01:41
The problem seems to be with either foobar2000 or cmd.exe (I never did determine which). As soon as I removed spaces from the path to the batch file everything began to work.

Unicode handling in which sense?


If there's spaces in a static path in a batch file, it's easy enough to deal with by wrapping double-quotes around the path\executable parth of the command.  Such as:
Code: [Select]
@echo off
"c:\OMG S P A C E S\bin\lossyWAV.exe" %1 %3 %4 %5 %6 %7 %8 %9 -low -nowarn -quiet
"c:\OMG S P A C E S\bin\flac.exe" -5 -f -b 512 "%~N1.lossy.wav" -o "%~N2.flac"
del "%~N1.lossy.wav"


But to handle that with a variable as a path, it's a bit more complicated but fine once you know how to do left() and right() type functions with variables.

As a nice side effect of "massaging" the variables that way, you can handle also any other chars in the path, as well as in other variables such as those you might use for tagging.  Unicode for the tagging is mainly what I would think it's most useful for.
Title: lossyWAV Development
Post by: Nick.C on 2008-05-01 07:54:45
If there's spaces in a static path in a batch file, it's easy enough to deal with by wrapping double-quotes around the path\executable parth of the command.  Such as:
Code: [Select]
@echo off
"c:\OMG S P A C E S\bin\lossyWAV.exe" %1 %3 %4 %5 %6 %7 %8 %9 -low -nowarn -quiet
"c:\OMG S P A C E S\bin\flac.exe" -5 -f -b 512 "%~N1.lossy.wav" -o "%~N2.flac"
del "%~N1.lossy.wav"
But to handle that with a variable as a path, it's a bit more complicated but fine once you know how to do left() and right() type functions with variables.

As a nice side effect of "massaging" the variables that way, you can handle also any other chars in the path, as well as in other variables such as those you might use for tagging.  Unicode for the tagging is mainly what I would think it's most useful for.
I have previously used spaces in the path-to-exe-files in the batch file, surrounded by double quotes as you mentioned - I just don't from preference as I keep the batch file in the same directory as the two exe files, but if there are spaces in the path to the batch file itself there seems to be a problem between foobar2000 and cmd.exe.
Title: lossyWAV Development
Post by: collector on 2008-05-01 11:13:58
You could have look at Speek's batchencoder at Speek's Frontends (http://members.home.nl/w.speek)
It's one of his fabulous front ends, and it works great for me.
Title: lossyWAV Development
Post by: halb27 on 2008-05-01 12:17:34
I've computed the bitrates [kbps] for my full length regular track set using various settings:

           v0.96         v0.97      v0.9.7 -impulse      v0.9.7b -impulse -spf 22222-....
-q 0        263            272                290                                297
-q 1        281            288                304                                313
-q 2        305            311                325                                334
-q 3        335            339                351                                363
-q 4        372            375                385                                399
-q 5        417            418                427                                442
-q 6        447            448                456                                471
-q 7        477            477                485                                500

I've done the same thing with my short length problem tracks set but it's not worth while giving it here: it's more or less the same relations, just at a higher bitrate level.
I hoped bitrate would increase more with the problems than with the regular music as I was used to this behavior at the time we had a stronger skewing at the low frequency edge.

I did a lot of listening tests in order to arrive at conclusions what IMO should be the way to go.
From the consequences (not the history of my tests):

a) At the low bitrate edge we shouldn't use the 32 sample FFTs IMO.
I listened to furious, eig, and castanets, comparing the results of v0.9.7b -q 0 -impulse -spf 22222-.... with those of v0.9.7 -q 1, as well as  v0.9.7b -q 1 -impulse -spf 22222-.... with those of v0.9.7 -q 2.
As the result IMO it's better to use the next quality level instead of using -impulse -spf 22222-..... with both requirung roughly the same bitrate. The quality result was pretty clear for furious, there was no real difference for castanets, and just eig -q 0 -impulse -spf 22222-..... was is a tiny bit better to me than -q 1, and when doing the same comparison with quality levels advanced by 1 the preference for the 32 FFT is gone for eig too.
So using -impulse -spf 22222-.... is advantegous in rare cases whereas using the next quality level without the 32 sample FFTs brings a more general improvement.
Mooreover you've arrived at very good quality, Nick, way below 300 kbps, and if we used the 32 sample FFTs we would be at ~300 kbps which I guess isn't too attractive for the low bitrate users.

b) This morning with fresh ears I listened to all my potentially problematic samples again using v0.9.7 -q 4 -impulse. Now I could hear problems again with badvilbel (9/10), castanets (7/10), triangle (7/10).
Trying v0.9.7b -q 4 -impulse -spf 22222-.... made badvilbel and castanets non-abxable to me. With triangle I found out none of the 2 variants used with the 32 samples FFT produced a different file than the one when the 32 samples FFT isn't used at all.
I did the same thing with -q 5 instead of -q 4. Now badvilbel and castanets are okay to me using just -impulse (no -spf 22222-...). With triangle it's the same thing however as with -q 4: 7/10.
So the triangle problem can't be tackled by the 32 sample FFT, some other problems can but it looks like it's necessary to use a 22222 spreading.

As for the new -snr values when looking at the bitrate table they seem not to have a vital effect from -q 5 upwards.
Can you please further increase the -snr values from -q 5 up (maybe starting with a more gentle increase already at -q 3 or -q 4), Nick? I wonder if the problems can be tackled by this. When looking at this as an alternative to the 32 sample FFT accepting say half of the bitrate increase of the 32 sample FFts there's room for  a very serious increase of the -snr value.
Title: lossyWAV Development
Post by: botface on 2008-05-01 13:17:40
You could have look at Speek's batchencoder at Speek's Frontends (http://members.home.nl/w.speek)
It's one of his fabulous front ends, and it works great for me.

I too love his frontends particularly Multi frontend that I use all the time. I've been trying to get Batchenc to work with LossyWAV but I've obviously got something wrong. I wonder if you can help

I have batchenc and LossyWAV in the same directory and the output directory in Batchenc set to the same as the input directory but when I run it Batchenc places the output files in its own directory (IE the one that contains Batchenc and LossyWAV). The only way I can get it to put the output files where I want them is to use the -o parameter in LossyWAV. Can you help me with that?

Thanks
Title: lossyWAV Development
Post by: Nick.C on 2008-05-01 14:11:54
I did a lot of listening tests in order to arrive at conclusions what IMO should be the way to go.
From the consequences (not the history of my tests):

a) At the low bitrate edge we shouldn't use the 32 sample FFTs IMO.
I listened to furious, eig, and castanets, comparing the results of v0.9.7b -q 0 -impulse -spf 22222-.... with those of v0.9.7 -q 1, as well as  v0.9.7b -q 1 -impulse -spf 22222-.... with those of v0.9.7 -q 2.
As the result IMO it's better to use the next quality level instead of using -impulse -spf 22222-..... with both requirung roughly the same bitrate. The quality result was pretty clear for furious, there was no real difference for castanets, and just eig -q 0 -impulse -spf 22222-..... was is a tiny bit better to me than -q 1, and when doing the same comparison with quality levels advanced by 1 the preference for the 32 FFT is gone for eig too.
So using -impulse -spf 22222-.... is advantegous in rare cases whereas using the next quality level without the 32 sample FFTs brings a more general improvement.
Mooreover you've arrived at very good quality, Nick, way below 300 kbps, and if we used the 32 sample FFTs we would be at ~300 kbps which I guess isn't too attractive for the low bitrate users.

b) This morning with fresh ears I listened to all my potentially problematic samples again using v0.9.7 -q 4 -impulse. Now I could hear problems again with badvilbel (9/10), castanets (7/10), triangle (7/10).
Trying v0.9.7b -q 4 -impulse -spf 22222-.... made badvilbel and castanets non-abxable to me. With triangle I found out none of the 2 variants used with the 32 samples FFT produced a different file than the one when the 32 samples FFT isn't used at all.
I did the same thing with -q 5 instead of -q 4. Now badvilbel and castanets are okay to me using just -impulse (no -spf 22222-...). With triangle it's the same thing however as with -q 4: 7/10.
So the triangle problem can't be tackled by the 32 sample FFT, some other problems can but it looks like it's necessary to use a 22222 spreading.

As for the new -snr values when looking at the bitrate table they seem not to have a vital effect from -q 5 upwards.
Can you please further increase the -snr values from -q 5 up (maybe starting with a more gentle increase already at -q 3 or -q 4), Nick? I wonder if the problems can be tackled by this. When looking at this as an alternative to the 32 sample FFT accepting say half of the bitrate increase of the 32 sample FFts there's room for  a very serious increase of the -snr value.
How do you envisage the -snr values changing? At v0.9.7 the quality related presets are:

Code: [Select]
  spreading_function_string         : string[precalc_analyses*(spread_zones+2)-1]='22222-22223-22224-12235-12246-12357';

  quality_noise_threshold_shifts    : array[0..Quality_Presets] of Double  = (20,16,12,8,4,0,-2.4,-4.8,-7.2,-9.6,-12);

v0.9.6:  quality_signal_to_noise_ratios    : array[0..Quality_Presets] of Double  = (16,17,18,19,20,21,22.8,24.6,26.4,28.2,30); //v0.9.6

v0.9.7:  quality_signal_to_noise_ratios    : array[0..Quality_Presets] of Double  = (18,18.87,19.81,20.8,21.86,23,24.21,25.51,26.91,28.4,30); //variant #1

  quality_clips_per_channel         : array[0..Quality_Presets] of Integer = (3,3,3,3,2,1,0,0,0,0,0);


Firstly, I propose that the 32 sample FFT spreading function will be 22222 rather than 22223 as if it is used at all, it should be at its most(?) effective.

Secondly, I propose that -snr be (18,19,20,21,22,25,28,31,34,37,40).

Bitrates for 53 problem sample:
Code: [Select]
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
|   lossyWAV    | -q 0  | -q 1  | -q 2  | -q 3  | -q 4  | -q 5  | -q 6  | -q 7  | -q 8  | -q 9  | -q 10 |
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| beta v0.9.6   |318kbps|338kbps|364kbps|394kbps|431kbps|472kbps|500kbps|529kbps|557kbps|584kbps|611kbps|
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| beta v0.9.7   |327kbps|346kbps|370kbps|400kbps|435kbps|475kbps|502kbps|530kbps|557kbps|584kbps|611kbps|
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| beta v0.9.7 i |342kbps|360kbps|383kbps|412kbps|446kbps|485kbps|513kbps|540kbps|567kbps|594kbps|619kbps|
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| beta v0.9.8   |327kbps|347kbps|371kbps|400kbps|435kbps|479kbps|511kbps|543kbps|574kbps|604kbps|632kbps|
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| beta v0.9.8 i |346kbps|365kbps|390kbps|419kbps|454kbps|500kbps|533kbps|564kbps|595kbps|624kbps|653kbps|
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
Title: lossyWAV Development
Post by: halb27 on 2008-05-01 15:17:21
... Firstly, I propose that the 32 sample FFT spreading function will be 22222 rather than 22223 as if it is used at all, it should be at its most(?) effective. ...


This makes sense to me.

...Secondly, I propose that -snr be (18,19,20,21,22,25,28,31,34,37,40). ...


If this yields the bitrate of 0.9.8 in your table this is reasonable for me too. The details depend on what we will do with the 32 samples FFT, cause I think we can't add several defensive methods or make the existing ones more defensive without sacrificing too much bitrate while lettng -q 5 be the setting with -nts 0.

At the moment I'd like to play around a bit trying to bring the triangle issue (though it's a subtle one) down with -q 4 or at least -q 5, and I'd like to try this first with a higher -snr value.
For this purpose it would be kind if you could also make the -snr option temporarily available for the user, Nick.
Title: lossyWAV Development
Post by: Nick.C on 2008-05-01 17:07:55
If this yields the bitrate of 0.9.8 in your table this is reasonable for me too. The details depend on what we will do with the 32 samples FFT, cause I think we can't add several defensive methods or make the existing ones more defensive without sacrificing too much bitrate while lettng -q 5 be the setting with -nts 0.

At the moment I'd like to play around a bit trying to bring the triangle issue (though it's a subtle one) down with -q 4 or at least -q 5, and I'd like to try this first with a higher -snr value.
For this purpose it would be kind if you could also make the -snr option temporarily available for the user, Nick.
Yes, those quality preset parameters refer to v0.9.8 and correspond to the achieved bitrates.

lossyWAV beta v0.9.8 attached to post #1 in this thread. (-nts and -snr parameters temorarily re-enabled).
Title: lossyWAV Development
Post by: halb27 on 2008-05-01 22:17:03
Thank you for v0.9.8, Nick.

The bitrate table for my full length regular track set up to -q 7:

-q 0   272 kbps
-q 1   289 kbps
-q 2   311 kbps
-q 3   340 kbps
-q 4   376 kbps
-q 5   422 kbps
-q 6   454 kbps
-q 7   486 kbps

So from -q 5 up the increased -snr do have a better effect compared to the values of 0.9.7.
Up to -q 4 however the effect is more or less negligible.

If we do want to have the 32 samples FFT when using the higher quality settings (and I think we should - hopefully we've got rid of the perception of a changed pitch by this, and maybe a 64 samples FFT was really too long as the short FFT at least with very high quality demands in mind). Even if a 32 sample FFT isn't necessary for very high quality it's one more weapon in the set of additional defensive tools).
The question is at what quality level to start with the 32 samples FFT. I think we definitely should have it with our default -nts 0 setting of -q 5. -q 5 -impulse takes an average bitrate of 446 kbps with my regular test set. So for the quality settings lower than -q 5 we have to redefine parameters a bit in order to get a roughly equally spaced quality and bitrate scale.

I've played around a lot with the parameters, and listened to especially the low bitrate settings.
What I've found, Nick, doesn't correspond totally with your idea of having all the quality parameter increase when going from one quality level to the next. Technically however I think there isn't any problem: The various quality parameters are defined for the integer quality levels. For a non-integer quality level you can take the quality parameters of the next lower integer quality level with the exception of -nts for which you can do a linear interpolation. As a rough description. For more details see below.

My suggestion for the quality parameters of integer quality levels:

-q 0: v0.9.8 -q 0 (average bitrate for my full length regular track set: 272 kbps).

For the next quality level IMO the best quality increase per kbps is by increasing -snr significantly. I've listened to -q 1 with various -snr setting, and going directly to -snr 22 brings an astonishing good quality to eig and furious which otherwise do suffer a bit from these low bitrate settings (though other than with these samples quality is remarkably high).

-q 1: v0.9.8 -q 1 -snr 22 (average bitrate for my full length regular track set: 307 kbps).

Once at -snr 22 increasing -snr further isn't so important. What should be more in focus is decreasing the -nts value significantly from the rather high value of -q 1's
-nts 16:

-q 2: v0.9.8 -q 2 -snr 22 -nts 9 (average bitrate for my full length regular track set: 337 kbps).

For -q 3 -nts should get lower one more time, but as we've arrived already at a modest -nts value I think we should use -impulse with -q 3 (if not we can lower -nts a bit more):

-q 3: v0.9.8 -q 3 -snr 22 -nts 6 -impulse (average bitrate for my full length regular track set: 382 kbps).

-q 4 should lower the -nts value one more time:

-q 4: v0.9.8 -q 4 -nts 3 -impulse (average bitrate for my full length regular track set: 409 kbps).

-q 5: v0.9.8 -q 5 -impulse (average bitrate for my full length regular track set: 446 kbps).

-q 6: v0.9.8 -q 6 -impulse (average bitrate for my full length regular track set: 479 kbps).

and so on.

Thus -nts goes smoothly from 9 (-q 2) to 6 (-q 3) to 3 (-q 4) to 0 (-q 5).

I've tried triangle with this setting of -q 4, and I can't abx it.
I've also looked at the effect -snr has for triangle. Unfortunately -snr doesn't have an influence on bits-to-remove (only with extreme values for -snr), just as is the case with -impulse. A lower -nts value is the only thing that helps.

This is my suggestion.
Compared to what we have right now IMO this means an improved quality for -q 1 and -q 2 while having bitrate still pretty low. As for -q 3+ we have an improved quality for the high end demand though at the cost of a higher bitrate. Moreover I think there is more 'meaning' in the quality steps up to -q 5.

A remark on the continuous quality scale as only the basic approach as I think of it is described above.
For the -q 0...1  range I think there should be linear interpolation (or any other continuous variation) with -snr as well as -nts.
With the -impulse parameter we can't have a smooth transition, and starting -impulse with -q 3.0 necessarily makes a larger jump in bitrate when going from -q 2 to -q 3 (more dramatically when going from -q 2.9 to -q 3.0).
Looking at the quality level analogy of Vorbis it's a bit similar to the situation where Vorbis starts using a different stereo handling at certain quality levels.
Title: lossyWAV Development
Post by: The Sheep of DEATH on 2008-05-01 23:32:41
i use a custom made "uber-brute-force" version which packs each half KB section with multiple types, and then going into previous sections, to figure out what lengths and what compression types will give maximum compression, and yes it's VERY slow, especially with executables over 1MB


Where can one go about finding this version?  Googling uber brute force upx leads to nothing but this thread.
Title: lossyWAV Development
Post by: collector on 2008-05-02 10:31:54
I've been trying to get Batchenc to work with LossyWAV but I've obviously got something wrong. I wonder if you can help
I have batchenc and LossyWAV in the same directory and the output directory in Batchenc set to the same as the input directory but when I run it Batchenc places the output files in its own directory (IE the one that contains Batchenc and LossyWAV). The only way I can get it to put the output files where I want them is to use the -o parameter in LossyWAV. Can you help me with that?

I had that too. But if it isn't a problem to use the -o parameter do so. The only thing is you have to change (fill in) the output directory name in the command line.
My command line for testing with lossyWav is "lossywav.exe <infile> -q 3 -force -o E:\iPOD". I admit it's more convenient when the programm uses <<output directory same as input directory >>. I'll experiment with that

[edit] A normal command line can be "FOR %%X IN ("*".WAV) DO LOSSYWAV "%%X" -q 3", without the outer quotes of course. [/edit]
Title: lossyWAV Development
Post by: botface on 2008-05-02 13:54:11

I've been trying to get Batchenc to work with LossyWAV but I've obviously got something wrong. I wonder if you can help
I have batchenc and LossyWAV in the same directory and the output directory in Batchenc set to the same as the input directory but when I run it Batchenc places the output files in its own directory (IE the one that contains Batchenc and LossyWAV). The only way I can get it to put the output files where I want them is to use the -o parameter in LossyWAV. Can you help me with that?

I had that too. But if it isn't a problem to use the -o parameter do so. The only thing is you have to change (fill in) the output directory name in the command line.
My command line for testing with lossyWav is "lossywav.exe <infile> -q 3 -force -o E:\iPOD". I admit it's more convenient when the programm uses <<output directory same as input directory >>. I'll experiment with that

[edit] A normal command line can be "FOR %%X IN ("*".WAV) DO LOSSYWAV "%%X" -q 3", without the outer quotes of course. [/edit]

Thanks for responding. My command line is pretty much like yours. While it's not that much hassle having to specify the output directory I often find myself wanting to convert several albums at once, each one being in a separate input directory and wanting to keep them separate. I assumed I was doing something wrong because Multi Frontend, which is from the same developer and which I also use, will quite happily deal with a bunch of files from several different directories and put them back in the directories they originated from. Shame it can't handle LossyWAV .
Title: lossyWAV Development
Post by: collector on 2008-05-02 15:19:03
Thanks for responding. My command line is pretty much like yours. While it's not that much hassle having to specify the output directory I often find myself wanting to convert several albums at once, each one being in a separate input directory and wanting to keep them separate. I assumed I was doing something wrong
Shame it can't handle LossyWAV .

I am fond of those gadgets, batchfiles and front ends too. This afternoon I wrote myself a little batchfile that preserves the long filenames. For recursive actions I use the program 'sweep' together with it. It works and you can PM me if you're interested.
Title: lossyWAV Development
Post by: Nick.C on 2008-05-02 21:04:25
My suggestion for the quality parameters of integer quality levels:

-q 0: v0.9.8 -q 0 (average bitrate for my full length regular track set: 272 kbps).

-q 1: v0.9.8 -q 1 -snr 22 (average bitrate for my full length regular track set: 307 kbps).

-q 2: v0.9.8 -q 2 -snr 22 -nts 9 (average bitrate for my full length regular track set: 337 kbps).

-q 3: v0.9.8 -q 3 -snr 22 -nts 6 -impulse (average bitrate for my full length regular track set: 382 kbps).

-q 4: v0.9.8 -q 4 -nts 3 -impulse (average bitrate for my full length regular track set: 409 kbps).

-q 5: v0.9.8 -q 5 -impulse (average bitrate for my full length regular track set: 446 kbps).

-q 6: v0.9.8 -q 6 -impulse (average bitrate for my full length regular track set: 479 kbps).

...I've tried triangle with this setting of -q 4, and I can't abx it....
I've implemented these changes into beta v0.9.8b and the bitrates for my 53 problem sample set are as follows:
Code: [Select]
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
|   lossyWAV    | -q 0  | -q 1  | -q 2  | -q 3  | -q 4  | -q 5  | -q 6  | -q 7  | -q 8  | -q 9  | -q 10 |
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| beta v0.9.6   |318kbps|338kbps|364kbps|394kbps|431kbps|472kbps|500kbps|529kbps|557kbps|584kbps|611kbps|
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| beta v0.9.7   |327kbps|346kbps|370kbps|400kbps|435kbps|475kbps|502kbps|530kbps|557kbps|584kbps|611kbps|
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| beta v0.9.8   |327kbps|347kbps|371kbps|400kbps|435kbps|479kbps|511kbps|543kbps|574kbps|604kbps|632kbps|
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| beta v0.9.8b  |327kbps|364kbps|398kbps|438kbps|463kbps|500kbps|533kbps|564kbps|595kbps|624kbps|653kbps|
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
Looking at the rate of change of bitrate, I might be tempted to increase the -nts on -3 slightly to "smooth out" a peak in the rate of change of bitrate, however these bitrates look fairly interesting....

lossyWAV beta v0.9.8b attached to post #1 in this thread.
Title: lossyWAV Development
Post by: halb27 on 2008-05-02 21:29:46
Thank you, Nick.

It would be very nice if we could get some more listening feedback no matter at which quality scale.
Title: lossyWAV Development
Post by: jesseg on 2008-05-02 23:32:01
Where can one go about finding this version?  Googling uber brute force upx leads to nothing but this thread.


http://upxshell.sourceforge.net/ (http://upxshell.sourceforge.net/)

and yeah, I guess it's actually "Ultra Brute", but really "Uber" is just as meaningless.  Anyways, that's the app.
Title: lossyWAV Development
Post by: halb27 on 2008-05-04 09:34:25
It's not important, but I suggest a minor defensive change of -snr 22 to -snr 23.5 for -q 2 to -q 4.

My first motivation was a mere optical one:
With this the average bitrate for my regular tracks test set is 272 - 307 - 344 - 388 - 413 - 446 for -q 0 ... -q 5, which IMO is a tiny bit more equally spaced especially when going from -q 2 to -q 3 than 272 - 307 - 337 - 382 - 409 - 446.
Moreover we have a smooth -snr transition 22 - 23.5 - 25.

But I also think it's a good thing to have especially -q 2 more defensive while hardly sacrificing bitrate.
Sure between -q 1 and -q 2 -snr should vary as well as -nts. I think this makes an in-between quality level like -q 1.5 more attractive too.
Title: lossyWAV Development
Post by: Nick.C on 2008-05-04 13:56:50
It's not important, but I suggest a minor defensive change of -snr 22 to -snr 23.5 for -q 2 to -q 4.

My first motivation was a mere optical one:
With this the average bitrate for my regular tracks test set is 272 - 307 - 344 - 388 - 413 - 446 for -q 0 ... -q 5, which IMO is a tiny bit more equally spaced especially when going from -q 2 to -q 3 than 272 - 307 - 337 - 382 - 409 - 446.
Moreover we have a smooth -snr transition 22 - 23.5 - 25.

But I also think it's a good thing to have especially -q 2 more defensive while hardly sacrificing bitrate.
Sure between -q 1 and -q 2 -snr should vary as well as -nts. I think this makes an in-between quality level like -q 1.5 more attractive too.
lossyWAV beta v0.9.8c attached to post #1 in this thread.
Title: lossyWAV Development
Post by: halb27 on 2008-05-04 17:12:18
Thank you, Nick.
Title: lossyWAV Development
Post by: Nick.C on 2008-05-04 20:17:39
Thank you, Nick.
I'm half tempted to change (18,22,23.5,23.5,23.5,25,28,31,34,37,40) to (18,22,22.75,23.5,24.25,25,28,31,34,37,40) as it appeals more to my linear tendencies....
Title: lossyWAV Development
Post by: halb27 on 2008-05-04 21:24:42
Thank you, Nick.
I'm half tempted to change (18,22,23.5,23.5,23.5,25,28,31,34,37,40) to (18,22,22.75,23.5,24.25,25,28,31,34,37,40) as it appeals more to my linear tendencies....

This makes sense to me too.
Title: lossyWAV Development
Post by: Nick.C on 2008-05-04 21:53:34
Thank you, Nick.
I'm half tempted to change (18,22,23.5,23.5,23.5,25,28,31,34,37,40) to (18,22,22.75,23.5,24.25,25,28,31,34,37,40) as it appeals more to my linear tendencies....
This makes sense to me too.
Comparing the two:
Code: [Select]
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
|   lossyWAV    | -q 0  | -q 1  | -q 2  | -q 3  | -q 4  | -q 5  | -q 6  | -q 7  | -q 8  | -q 9  | -q 10 |
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| beta v0.9.8b  |327kbps|364kbps|398kbps|438kbps|463kbps|500kbps|533kbps|564kbps|595kbps|624kbps|653kbps|
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| beta v0.9.8c  |327kbps|364kbps|406kbps|445kbps|468kbps|500kbps|533kbps|564kbps|595kbps|624kbps|653kbps|
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| beta v0.9.8d  |327kbps|364kbps|402kbps|445kbps|471kbps|500kbps|533kbps|564kbps|595kbps|624kbps|653kbps|
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
Title: lossyWAV Development
Post by: halb27 on 2008-05-04 21:57:36
It was clear that the bitrate difference between -q 2 and -q 3 would increase, but it's more or less negligible.
Title: lossyWAV Development
Post by: Nick.C on 2008-05-04 22:03:23
It was clear that the bitrate difference between -q 2 and -q 3 would increase, but it's more or less negligible.
Looking at the rate of change of bitrate, I'm more inclined to leave -snr preset values as per v0.9.8c as it gives a more even spread of bitrate.
Title: lossyWAV Development
Post by: halb27 on 2008-05-04 22:36:20
It was clear that the bitrate difference between -q 2 and -q 3 would increase, but it's more or less negligible.
Looking at the rate of change of bitrate, I'm more inclined to leave -snr preset values as per v0.9.8c as it gives a more even spread of bitrate.

Practically speaking I prefer the way it's done with 0.9.8c a little bit too.
Title: lossyWAV Development
Post by: Nick.C on 2008-05-05 13:32:08
It was clear that the bitrate difference between -q 2 and -q 3 would increase, but it's more or less negligible.
Looking at the rate of change of bitrate, I'm more inclined to leave -snr preset values as per v0.9.8c as it gives a more even spread of bitrate.
Practically speaking I prefer the way it's done with 0.9.8c a little bit too.
I've re-visited the -spf function for higher frequencies at longer FFT lengths.

was: spreading_function_string        : string[precalc_analyses*(spread_zones+2)-1]='22222-22223-22224-12235-12246-12357';
now:  spreading_function_string        : string[precalc_analyses*(spread_zones+2)-1]='22222-22223-22224-12234-12245-12356';

this gives the following for my 53 problem sample set:
Code: [Select]
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
|   lossyWAV    | -q 0  | -q 1  | -q 2  | -q 3  | -q 4  | -q 5  | -q 6  | -q 7  | -q 8  | -q 9  | -q 10 |
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| beta v0.9.8c  |327kbps|364kbps|406kbps|445kbps|468kbps|500kbps|533kbps|564kbps|595kbps|624kbps|653kbps|
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| beta v0.9.8d2 |328kbps|365kbps|407kbps|446kbps|470kbps|501kbps|534kbps|565kbps|596kbps|626kbps|654kbps|
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|

10 Album Test Set:
Code: [Select]
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
|   lossyWAV    | -q 0  | -q 1  | -q 2  | -q 3  | -q 4  | -q 5  | -q 6  | -q 7  | -q 8  | -q 9  | -q 10 |
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|
| beta v0.9.8d2 |298kbps|???kbps|???kbps|???kbps|???kbps|463kbps|???kbps|???kbps|???kbps|???kbps|639kbps|
|---------------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|-------|


Attached is a detailed analysis of bitrate vs quality preset for each sample in my 53 problem sample set.
Title: lossyWAV Development
Post by: halb27 on 2008-05-05 14:19:47
...
was: spreading_function_string        : string[precalc_analyses*(spread_zones+2)-1]='22222-22223-22224-12235-12246-12357';
now:  spreading_function_string        : string[precalc_analyses*(spread_zones+2)-1]='22222-22223-22224-12234-12245-12356';

this gives the following for my 53 problem sample set:

As always I'm more interested in the bitrate increase for regular music (or in the relation bitrate increase of regular vs. problem tracks with a warm welcome to settings where bitrate increase is higher for the problem tracks).
Anyway, in this case it looks like difference is next to nothing in either case. This slightly more defensive setting is more or less for free.
Title: lossyWAV Development
Post by: Nick.C on 2008-05-07 13:19:37
As always I'm more interested in the bitrate increase for regular music (or in the relation bitrate increase of regular vs. problem tracks with a warm welcome to settings where bitrate increase is higher for the problem tracks).
Anyway, in this case it looks like difference is next to nothing in either case. This slightly more defensive setting is more or less for free.
lossyWAV beta v0.9.8d attached to post #1 in this thread.
Title: lossyWAV Development
Post by: halb27 on 2008-05-07 15:43:34
Thank you, Nick.
As you tidied up the code a bit: does that mean it is necessary to do another listening test? Or were it just very minor changes?

Other than that I don't know how the community feels about it, but I'd welcome to go final now.
It doesn't look that we'll have essential changes any more, and it also looks as if there isn't a real need for changes, covering now the bitrate range from ~270 kbps up to 500+ kbps in a useful way.
Title: lossyWAV Development
Post by: Nick.C on 2008-05-07 16:08:20
Thank you, Nick.
As you tidied up the code a bit: does that mean it is necessary to do another listening test? Or were it just very minor changes?

Other than that I don't know how the community feels about it, but I'd welcome to go final now.
It doesn't look that we'll have essential changes any more, and it also looks as if there isn't a real need for changes, covering now the bitrate range from ~270 kbps up to 500+ kbps in a useful way.
No further changes to the mechanics of the method, only slight speed-up changes.

My only area of concern is with the -help and -longhelp (more the -longhelp) with respect to level of detail required to allow the user to make informed decisions....

The changes to the -spf parameters will not necessitate any more listening tests, I think the circa 1kbps increase in bitrate across the range of quality presets can only improve quality.

If no dissenting voices are forthcoming with respect to quality issues or required modifications to the help pages, v1.0.0 will be released on 12th May 2008 at 20:31 (precisely..... ).
Title: lossyWAV Development
Post by: jesseg on 2008-05-07 16:57:12
I can't wait to see that as certified news on the front page.  Awesome.

What about a logo?  If the icon I made is used in the v1.0 binaries, then the logo could probably be delayed indefinitely, at least until there is a website/sf.net page for lossyWAV.
Title: lossyWAV Development
Post by: lvqcl on 2008-05-07 18:17:32
Some players (fb2k, WV winamp plugin) can use correction files for WavPack to play .WV+.WVC files losslessly. It seems no player can use *.lwcdf.XYZ with *.lossy.XYZ (say, .lossy.flac and .lwcdf.flac). It is of no importance for me, but...
Title: lossyWAV Development
Post by: halb27 on 2008-05-07 18:31:53
... v1.0.0 will be released on 12th May 2008 at 20:31 (precisely..... ).

Great.
I've hesitated for so long reencoding my entire collection, but I'll do it with v1.0.0.
Guess I'll say goodbye to lossless archiving, will use a unique final VHQ lossyFLAC version instead for all the tracks in my collection, and will never touch them again.
(Life will be easier then as I hate having to manually overwrite the automatically generated replay gain values for quite a series of tracks after reencoding).
Title: lossyWAV Development
Post by: Nick.C on 2008-05-07 19:14:06
I've hesitated for so long reencoding my entire collection, but I'll do it with v1.0.0.
Guess I'll say goodbye to lossless archiving, will use a unique final VHQ lossyFLAC version instead for all the tracks in my collection, and will never touch them again.
(Life will be easier then as I hate having to manually overwrite the automatically generated replay gain values for quite a series of tracks after reencoding).
I've probably transcoded that portion (about 60%) of my collection which I have ripped to FLAC on my server about 15 to 20 times as lossyWAV has been in development....

I'll still be keeping the FLAC though.... (!)
Title: lossyWAV Development
Post by: botface on 2008-05-08 09:30:07
"My only area of concern is with the -help and -longhelp (more the -longhelp) with respect to level of detail required to allow the user to make informed decisions....

The changes to the -spf parameters will not necessitate any more listening tests, I think the circa 1kbps increase in bitrate across the range of quality presets can only improve quality.

If no dissenting voices are forthcoming with respect to quality issues or required modifications to the help pages, v1.0.0 will be released on 12th May 2008 at 20:31 (precisely..... )."
[/quote]

Great news and well done everybody who's been involved especially Nick.C & halb27 of course.

I think you're right that the help/longhelp need to be good and, well, helpful. Also the wiki needs to be up to date and accurate.

I'm sure that people will want to rip to lossy.wav or even straight to lossy.flac, lossy.wv or whatever, but I guess that's something for others to pick up on once it's gone live.

Once again, well done. It's a great achievement
Title: lossyWAV Development
Post by: halb27 on 2008-05-08 11:41:19
...My only area of concern is with the -help and -longhelp (more the -longhelp) with respect to level of detail required to allow the user to make informed decisions....

I was thinking about this area.
As a result I suggest we don't use the kind of advanced options we had so far.
We have arrived at these very differentiating -q levels (already too many levels for some users), and with this I don't see any sense in using even the most basic advanced option -nts. Why not just use a corresponding -q level? With -snr it's the same thing. We have a pretty defensive -snr setting with each quality level, but on the other hand not a lot of bitrate can be saved when going less defensive. IMO it's pretty balanced. Keeping away -nts and -snr from the user has the advantage that there's no need for describing which I guess is a difficult job.
The only useful option for the user IMO is the -clips options though I agree it's pretty personal and not really important either. In case we address the clipping as a user option I suggest we just use something like -noclips which makes sure that no clipping occurs (as done with the high -q levels). This can easily be described. IMO it can be one of the standard options.
Title: lossyWAV Development
Post by: Nick.C on 2008-05-08 12:09:06
I was thinking about this area.
As a result I suggest we don't use the kind of advanced options we had so far.
We have arrived at these very differentiating -q levels (already too many levels for some users), and with this I don't see any sense in using even the most basic advanced option -nts. Why not just use a corresponding -q level? With -snr it's the same thing. We have a pretty defensive -snr setting with each quality level, but on the other hand not a lot of bitrate can be saved when going less defensive. IMO it's pretty balanced. Keeping away -nts and -snr from the user has the advantage that there's no need for describing which I guess is a difficult job.
The only useful option for the user IMO is the -clips options though I agree it's pretty personal and not really important either. In case we address the clipping as a user option I suggest we just use something like -noclips which makes sure that no clipping occurs (as done with the high -q levels). This can easily be described. IMO it can be one of the standard options.
I could always change the -q 0 to 10 to -q 0 to 1.0 with a default of 0.5.... Actually, the more I think about that, the more I like it - and it's different from lame, ogg vorbis, flac, etc.
Title: lossyWAV Development
Post by: carpman on 2008-05-08 12:19:08
I could always change the -q 0 to 10 to -q 0 to 1.0 with a default of 0.5.... Actually, the more I think about that, the more I like it - and it's different from lame, ogg vorbis, flac, etc.

I like this idea (and also halb27's re. keeping options simple)

By the way, looking forward to world lossyWAV day on 12th May. How does one celebrate this? By listening to 24 hours of music?

C.
Title: lossyWAV Development
Post by: Nick.C on 2008-05-08 13:26:35
I could always change the -q 0 to 10 to -q 0 to 1.0 with a default of 0.5.... Actually, the more I think about that, the more I like it - and it's different from lame, ogg vorbis, flac, etc.
I like this idea (and also halb27's re. keeping options simple)

By the way, looking forward to world lossyWAV day on 12th May. How does one celebrate this? By listening to 24 hours of music?

C.
Code: [Select]
Procedure Celebrate;
Begin
  Repeat
    Success:=Drink_Beer and not Spill_Beer;
  Until Success=False;
  Goto Bed;
End;
Title: lossyWAV Development
Post by: halb27 on 2008-05-08 13:32:33
I could always change the -q 0 to 10 to -q 0 to 1.0 with a default of 0.5.... Actually, the more I think about that, the more I like it - and it's different from lame, ogg vorbis, flac, etc.

Hmmm, I'm afraid using something like -q 1.0 is emotionally assciated with 'full (100%) quality', -q 0.6 is pretty much below in quality (though in reality it's overkill quality), and -q 0.15 is pretty bad (though in reality it's excellent). The problem is the association with a percentage quality scale.
But maybe it's just me who looks at it this way. In a sense we have this problem also with the -q 0 ... 10 scale, but because of the lacking association with a percentage scale the problem is less severe IMO.
BTW your suggestion is similar to the Nero AAC quality scale - however I don't see a problem in having a similar scale as another encoder as long as the emotional quality associations are corresponding, at least in the mid quality scale range.
Title: lossyWAV Development
Post by: Nick.C on 2008-05-08 13:43:41
Hmmm, I'm afraid using something like -q 1.0 is emotionally assciated with 'full (100%) quality', -q 0.6 is pretty much below in quality (though in reality it's overkill quality), and -q 0.15 is pretty bad (though in reality it's excellent). The problem is the association with a percentage quality scale.
But maybe it's just me who looks at it this way. In a sense we have this problem also with the -q 0 ... 10 scale, but because of the lacking association with a percentage scale the problem is less severe IMO.
BTW your suggestion is similar to the Nero AAC quality scale - however I don't see a problem in having a similar scale as another encoder as long as the emotional quality associations are corresponding, at least in the mid quality scale range.
I take your point - 0.0 to 1.0 does indeed have an immediate association with 0% to 100%. I'll leave the -q scale as is.
Title: lossyWAV Development
Post by: collector on 2008-05-08 18:12:34
I take your point - 0.0 to 1.0 does indeed have an immediate association with 0% to 100%. I'll leave the -q scale as is.

Please do  . Can I ask for a progress bar (or percentage) and an option to create a log file ? I can't catch/save that informative 'average'line. When processing my disc images it's nice to see the progress counting up and down and up again, but I haven't a clue where it's going to. Anyway, many thanks for the program.
Title: lossyWAV Development
Post by: Nick.C on 2008-05-08 18:56:15
I take your point - 0.0 to 1.0 does indeed have an immediate association with 0% to 100%. I'll leave the -q scale as is.
Please do  . Can I ask for a progress bar (or percentage) and an option to create a log file ? I can't catch/save that informative 'average'line. When processing my disc images it's nice to see the progress counting up and down and up again, but I haven't a clue where it's going to. Anyway, many thanks for the program.
A progress bar in what sense and to what purpose? I suppose I could append a line to a log file with a "-l <filepath\filename>" type parameter but what would you want it to contain? When running from the command line there is a progress output of sorts which doesn't include the %age complete, rather amount of data processed (duration of WAV file processed) and number of bits removed so far.

A bit more detail as to what you would want to see would be nice.
Title: lossyWAV Development
Post by: halb27 on 2008-05-08 19:33:27
... Can I ask for a progress bar ...

Everybody has its own way of how to practically do the encoding. If you give foobar2000 a try as a GUI for encoding you get a nice progress bar (and a good GUI). You just have to configure lossyWAV once for foobar2000 usage which isn't hard if you have a batch file for doing the lossyFLAC stuff (or whatever way you use lossyWAV).
Title: lossyWAV Development
Post by: halb27 on 2008-05-08 20:06:14
I was asked for where to download the samples I usually try as potential problem samples for lossyWAV.

I've uploaded them for a limited time on my webspace. They can be downloaded from here (http://home.arcor.de/horstalb/lossyWAVpotentialProblems/).

Most of the samples don't show up problems (to me) even at a low quality setting like -q 1.5.
Even those samples that aren't perfect have an excellent quality to me even at -q 1.5.
I can abx however few of the samples at a mid quality level like -q 4.
With the triangle sample I don't know really how to think about it. In a strict sense I can't abx it because my results are too poor. Sometimes I can't hear a problem at all (at -q 4). But there are times when I can repeatedly get results like 4/4 before I start producing more and more errors.

It would be nice to learn about other people's experience.
Title: lossyWAV Development
Post by: collector on 2008-05-10 09:22:11
Everybody has its own way of how to practically do the encoding.

I agree with that, but a percentage figure counting up while processing (like flac does) is more informative than counting up and down and up again to 256 MB (..).
Foobar won't work on my old and slower pc. And is rather overkill just to use it for just the information
Well, forget about the progression bar; it's still a great program. And mareo and the batchfiles do their tasks finely.
Title: lossyWAV Development
Post by: collector on 2008-05-10 09:40:41
A progress bar in what sense and to what purpose? I suppose I could append a line to a log file with a "-l <filepath\filename>" type parameter but what would you want it to contain? When running from the command line there is a progress output of sorts which doesn't include the %age complete, rather amount of data processed (duration of WAV file processed) and number of bits removed so far.
A bit more detail as to what you would want to see would be nice.

A progress %age would be more convenient to me while working with dos-screen-output and batchfiles than the up- and downcounting to just 256 MB. When processing a disc image of say 771 MB, it's useless to see the up and downcounting not knowing where it ends. The percentage like flac does is fine.
And while processing I see an 'average' line with info that I would like to save. I'm a list and reports man. Especially in the testperiods it's nice to see what lossywav did with my files. I know most people us xp and foobar, but I'm with win98 on a slower and older computer, which is sufficient for playing my music and surfing.
Title: lossyWAV Development
Post by: Nick.C on 2008-05-10 10:57:22
A progress bar in what sense and to what purpose? I suppose I could append a line to a log file with a "-l <filepath\filename>" type parameter but what would you want it to contain? When running from the command line there is a progress output of sorts which doesn't include the %age complete, rather amount of data processed (duration of WAV file processed) and number of bits removed so far.
A bit more detail as to what you would want to see would be nice.
A progress %age would be more convenient to me while working with dos-screen-output and batchfiles than the up- and downcounting to just 256 MB. When processing a disc image of say 771 MB, it's useless to see the up and downcounting not knowing where it ends. The percentage like flac does is fine.
And while processing I see an 'average' line with info that I would like to save. I'm a list and reports man. Especially in the testperiods it's nice to see what lossywav did with my files. I know most people us xp and foobar, but I'm with win98 on a slower and older computer, which is sufficient for playing my music and surfing.
Thanks for the additional information - I didn't realise that the size was overflowing - will be fixed. As I know the size of the WAV file, it is trivial to put in %age complete. I'll see what I can to to help with the final stats line that you want to save....
Title: lossyWAV Development
Post by: jesseg on 2008-05-10 15:23:30
The only thing I could see a percentage being useful for is if someone wants to send that information back into a progress bar in a front-end.  But then...  it's probably just as easy for the person coding the front-end to convert the file size and file completed numbers to percentage... as it would be for Nick.C to do it on his end.

And yeah, either way it's only going to show the percentage for each file, NOT a batch job, cos lossyWAV doesn't know about your batch job.

Although it's much easier to do the same in C or ASM.  I think OggdropXPd does this already, doesn't it?
Title: lossyWAV Development
Post by: Nick.C on 2008-05-11 08:40:32
A progress %age would be more convenient to me while working with dos-screen-output and batchfiles than the up- and downcounting to just 256 MB. When processing a disc image of say 771 MB, it's useless to see the up and downcounting not knowing where it ends. The percentage like flac does is fine.
And while processing I see an 'average' line with info that I would like to save. I'm a list and reports man. Especially in the testperiods it's nice to see what lossywav did with my files. I know most people us xp and foobar, but I'm with win98 on a slower and older computer, which is sufficient for playing my music and surfing.
Okay, I've modified the MB counter and it shouldn't overflow - I'll test a large WAV file tonight.
A %age counter (x.xx%) has been added which should be enough resolution to see progress on an older computer / large file.
The "total bits removed / total codec blocks" detail has been removed as well.
I was toying with a "time already taken / total time to process completion" predictor - any use to anyone?
Title: lossyWAV Development
Post by: collector on 2008-05-11 10:43:27
Okay, I've modified the MB counter and it shouldn't overflow - I'll test a large WAV file tonight.
A %age counter (x.xx%) has been added which should be enough resolution to see progress on an older computer / large file.

Great.  No need for the decimals, but OK. It sounds good.
Quote
I was toying with a "time already taken / total time to process completion" predictor - any use to anyone?
Yep. To me. ETA is always nice to know for impatient people with slow machines. (You'll understand I won't use the -below, -low options..)
Title: lossyWAV Development
Post by: Nick.C on 2008-05-12 11:05:19
I think issuing lossyWAV under the GNU GPL and Copyleft will be better in the long run for the project, so that is my plan. Does the following look acceptable? I will be including a url to the current GNU GPL in the help as well as a copy of GPL.txt in the .zip distribution file, this is somewhat of a work in progress as I have never publicly released software before.

How do these look:
Code: [Select]
lossyWAV v1.0.0, Copyright (C) 2007,2008 Nick Currie. Copyleft.
Issued as free software; License: GNU GPL; Issued with NO WARRANTY WHATSOEVER.

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-q <n>        quality preset (10 = highest quality, 0 = lowest bitrate;
              -q 5 is generally accepted to be transparent) default = -q 5.

Standard Options:

-check        check if WAV file has already been processed; default=off.
              errorlevel=16 if already processed, 0 if not.
-correction   write correction file while processing WAV file; default=off.
-force        forcibly over-write output file if it exists; default=off.
-help         display help.
-longhelp     display extended help.
-merge        merge existing lossy.wav and lwcdf.wav files.
-noclips      set allowable number of clips per channel per codec block to 0;
              default: -q 0 to 3 = 3; -q 4 = 2; -q 5 = 1; -q 6 to 10 = 0.
-o <folder>   destination folder for the output file(s).

Special thanks:

David Robinson for the method itself and motivation to implement it.
Don Cross for the original Pascal source for the FFT algorithm used.
Horst Albrecht for valuable tuning input and feedback.


Code: [Select]
lossyWAV v1.0.0, Copyright (C) 2007,2008 Nick Currie. Copyleft.
Issued as free software; License: GNU GPL; Issued with NO WARRANTY WHATSOEVER.

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-q <n>        quality preset (10 = highest quality, 0 = lowest bitrate;
              -q 5 is generally accepted to be transparent) default = -q 5.

Standard Options:

-check        check if WAV file has already been processed; default=off.
              errorlevel=16 if already processed, 0 if not.
-correction   write correction file while processing WAV file; default=off.
-force        forcibly over-write output file if it exists; default=off.
-help         display help.
-longhelp     display extended help.
-merge        merge existing lossy.wav and lwcdf.wav files.
-noclips      set allowable number of clips per channel per codec block to 0;
              default: -q 0 to 3 = 3; -q 4 = 2; -q 5 = 1; -q 6 to 10 = 0.
-o <folder>   destination folder for the output file(s).

Advanced Options:

-analyses <n> select number of FFT analysis lengths to use; (2<=n<=5);
              default=2, i.e. 64 sample and 1024 sample FFT analyses;
              (3=2+128 sample FFT; 4=3+256 sample FFT; 5=4+512 sample FFT).
-fft32        enable 32 sample FFT for improved impulse detection;
              defaults: -q 0 to 2 = off; -q 3 to 10 = on.
-minbits <n>  select minimum bits to keep (0.0<=n<=8.0, resolution = 0.01);
              default = (2.9,2.95,3,3.125,3.25,3.375,3.5,3.625,3.75,3.875,4)
-scale <n>    scaling factor from WaveGain / etc; default = 1.000000; n<>0!

System Options:

-detail       enable detailed output mode
-nowarn       suppress lossyWAV warnings.
-quiet        significantly reduce screen output.

-below        set process priority to below normal.
-low          set process priority to low.

Special thanks:

David Robinson for the method itself and motivation to implement it.
Don Cross for the original Pascal source for the FFT algorithm used.
Horst Albrecht for valuable tuning input and feedback.


Code: [Select]
C:\Data_NIC\_WAV\WAV\tmp>lossywav ..\_swavyy\"Jean Michel Jarre - [1976] Oxygene.wav" -q 0 -force
lossyWAV v1.0.0, Copyright (C) 2007,2008 Nick Currie. Copyleft.
Issued as free software; License: GNU GPL; Issued with NO WARRANTY WHATSOEVER.
Processing : Jean Michel Jarre - [1976] Oxygene.wav
Format     : 44.10kHz; 2 ch.; 16 bit.
Progress   : 2.25%, 9.00MB, 4.7743 bits; 9.94x; 5.38s/239.56s;

Code: [Select]
C:\Data_NIC\_WAV\WAV\tmp>lossywav ..\_swavyy\"Jean Michel Jarre - [1976] Oxygene.wav" -q 0 -force
lossyWAV v1.0.0, Copyright (C) 2007,2008 Nick Currie. Copyleft.
Issued as free software; License: GNU GPL; Issued with NO WARRANTY WHATSOEVER.
Processing : Jean Michel Jarre - [1976] Oxygene.wav
Format     : 44.10kHz; 2 ch.; 16 bit.
Average    : 400.56MB; 6.3596 bits; 24.46x; 97.34s;
%lossyWAV Warning% : 6 sample(s) clipped to limiting amplitude.

Code: [Select]
C:\Data_NIC\_WAV\WAV\tmp>lossywav ..\_swavyy\"Jean Michel Jarre - [1976] Oxygene.wav" -q 0 -force -quiet -nowarn
Jean Michel Jarre - [1976] Oxygene.wav; 400.56MB; 6.3596; 23.24x; 102.47s; C:6;
Title: lossyWAV Development
Post by: Mitch 1 2 on 2008-05-12 13:20:05
It looks alright to me.
Title: lossyWAV Development
Post by: Nick.C on 2008-05-12 13:30:44
It looks alright to me.
Thanks, Mitch.

Many thanks to Robert for the pointer to the GNU Coding Standards documentation. Does the following seem to more closely comply?
Code: [Select]
lossyWAV 1.0.0, Copyright (C) 2007,2008 Nick Currie. Copyleft.
Issued as free software; License: GNU GPL; Issued with NO WARRANTY WHATSOEVER.

Usage   : lossyWAV <input wav file> <options>

Example : lossyWAV musicfile.wav

Quality Options:

-q, --quality <n>   quality preset (10=highest quality, 0=lowest bitrate;
                    -q 5 is generally accepted to be transparent)
                    default=-q 5.

Standard Options:

-c, --check         check if WAV file has already been processed; default=off.
                    errorlevel=16 if already processed, 0 if not.
-C, --correction    write correction file for processed WAV file; default=off.
-f, --force         forcibly over-write output file if it exists; default=off.
-h, --help          display help.
-L, --longhelp      display extended help.
-M, --merge         merge existing lossy.wav and lwcdf.wav files.
-N, --noclips       set allowable number of clips / channel / codec block to 0;
                    default=3,3,3,3,2,1,0,0,0,0,0 (-q 0 to -q 10)
-o, --outdir <dir>  destination directory for the output file(s).
-v, --version       display the lossyWAV version number.

Advanced Options:

-a, --analyses <n>  select number of FFT analysis lengths to use; (2<=n<=5);
                    default=2, i.e. 64 sample and 1024 sample FFT analyses;
                    (3=+128 sample FFT; 4=+256 sample FFT; 5=+512 sample FFT).
-F, --fft32         enable 32 sample FFT for improved impulse detection;
                    defaults: -q 0 to 2=off; -q 3 to 10=on.
-m, --minbits <n>   select minimum bits to keep (0.00<=n<=8.00);
                    default=2.9,2.95,3,3.125,3.25,3.375,3.5,3.625,3.75,3.875,4.
-s, --scale <n>     scaling factor from WaveGain, etc; default=1.000000; n<>0!

System Options:

-d, --detail        enable detailed output mode
-n, --nowarnings    suppress lossyWAV warnings.
-Q, --quiet         significantly reduce screen output.

-b, --below         set process priority to below normal.
-l, --low           set process priority to low.

Special thanks:

David Robinson for the method itself and motivation to implement it.
Don Cross for the original Pascal source for the FFT algorithm used.
Horst Albrecht for valuable tuning input and feedback.
Title: lossyWAV Development
Post by: botface on 2008-05-12 15:12:43
Looks good to me
Title: lossyWAV Development
Post by: GeSomeone on 2008-05-12 17:12:52
lossyWAV 1.0.0 (http://www.hydrogenaudio.org/forums/index.php?showtopic=63225)
 
Title: lossyWAV Development
Post by: gasmann on 2008-05-12 20:09:51
Is stdin for encoding technically impossible? It would be great if one could live without these temporary files 
Everything else is soo great! Thanks a lot!
Title: lossyWAV Development
Post by: Nick.C on 2008-05-12 20:14:08
Is stdin for encoding technically impossible? It would be great if one could live without these temporary files 
Everything else is soo great! Thanks a lot!
I don't think so, I've not tried it in Delphi.... I will certainly add it to the list for 1.1.0, along with noise shaping.
Title: lossyWAV Development
Post by: Nick.C on 2008-05-12 20:58:28
lossyWAV 1.0.0 released. (http://www.hydrogenaudio.org/forums/index.php?act=ST&f=32&t=63225)
Title: lossyWAV Development
Post by: gasmann on 2008-05-13 08:42:10
well, since it's gpl'ed now, can you please make the sourcecode available to me? I want to look into an stdin 'hack' 

thanks
Title: lossyWAV Development
Post by: Nick.C on 2008-05-13 08:45:16
well, since it's gpl'ed now, can you please make the sourcecode available to me? I want to look into an stdin 'hack' 

thanks
Yes - I will post the code in post #1 of this thread at lunchtime - I haven't yet added all of the GNU GPL / Copyleft related notices to all of the units. Currently amending the WAV chunk handling to deal properly with unrecognised chunks.

NB: Copyleft means you will have to share the hack  (i.e. save me the trouble of working it out for myself....  ).
Title: lossyWAV Development
Post by: gasmann on 2008-05-13 11:48:28
Of course I will do that 
But I can't promise anything... I'm still quite a beginner (but I own Delphi 6 because we need it at school), so please don't depend on it too much.
Title: lossyWAV Development
Post by: GeSomeone on 2008-05-13 13:08:36
It would be an excellent time to start a new thread when post 1.0 development will begin 
Title: lossyWAV Development
Post by: Nick.C on 2008-05-13 13:20:34
Of course I will do that 
But I can't promise anything... I'm still quite a beginner (but I own Delphi 6 because we need it at school), so please don't depend on it too much.
I have found an "issue" with the --merge parameter for 24-bit files. I will post source as soon as this is amended. Thanks for your patience.

Nick.

@gesomeone: When I have "fixed" 1.0.0 with respect to the WAV chunk issue and the --merge issue, I will indeed start a new thread .
Title: lossyWAV Development
Post by: SebastianG on 2008-05-13 15:35:49
This post is slightly off-topic.

[...] GNU Coding Standards documentation.

I took the chance to check it (http://www.gnu.org/prep/standards/html_node/Source-Language.html#Source-Language) again. Granted, it sounds obvious that C is probably the "most used" programming language in the open source community and that you can count on more people willing to help with your C-program simply because there're more programmers capable of doing so. But I think it's not appropriate to apply the same policies to non-GNU software like lossyWAV. It would be "behind the times" to still advocate C over C++ for example. C++ became an ISO standard 10 years ago! There're plenty of good reasons why people should try to move on. RAII (http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization) is just one example.

Sorry, but I just felt compelled to say this ...

edit: I think the previous comment needs some clarification. I realize that nobody has advocated the use a specific language for lossyWav and I also intend not to do so. It just expresses my personal opinion regarding pros and cons of C and C++ in general.


Cheers,
SG
Title: lossyWAV Development
Post by: Nick.C on 2008-05-13 22:58:40
lossyWAV 1.0.0b source released, see post #1 in this thread.
Title: lossyWAV Development
Post by: rbrito on 2008-09-16 23:42:43
I think issuing lossyWAV under the GNU GPL and Copyleft will be better in the long run for the project, so that is my plan.


This was a nice and welcome choice, I think. BTW, what about putting the code in a public repository, so that you can manage the changes? If you want some help with repositories, I think that I can help. This would ease a lot the development and release of the files and also allow the project to gain momentum.

Perhaps a project hosted on SourceForge or BerliOS, with a Subversion repository would be nice, as it would prevent loosing history of the files and directories. I see that you are mostly using it under Windows and don't know C.

I want to use lossyWAV on some of my bootlegs, since I am under severe space constraints. I want this so badly, but:

I think that it would be a good learning excercise for both of us maintaining a version that can be compiled on many Linux distributions (and this means Ubuntu, Gentoo etc users) and also people using MacOS X.

I can start translating some of the .pas files into C (some of them seem quite trivial, while others I don't know). I would welcome help from other people so that we can get this as fast as possible released.

Well, I think that I can contribute to this project in some manner and also learn quite a bit of things and have a way to "give back".

Regards, Rogério Brito.
Title: lossyWAV Development
Post by: rbrito on 2008-09-16 23:55:49

I think issuing lossyWAV under the GNU GPL and Copyleft will be better in the long run for the project, so that is my plan.


This was a nice and welcome choice, I think. BTW, what about putting the code in a public repository, so that you can manage the changes? If you want some help with repositories, I think that I can help. This would ease a lot the development and release of the files and also allow the project to gain momentum.
(...)


I can create a sourceforge project, if you wish. I propose it to be called lossywav, uncapitalized, since I don't know if SF handles mixed-case works well. I'm really willing to do things to get some bitrate reduction on my files.

Which version would be checked in as a working/development copy? Oh, and you can use subversion with windows perfectly, as there are clients to this. It seems that tortoisesvn is one client that is made by the people that developed Subversion.


Thanks, Rogério Brito.
Title: lossyWAV Development
Post by: Nick.C on 2008-09-17 08:00:58
Thanks for your enthusiasm and offer of transcoding skills - v1.1.0b would be the best to start from. There is already a sourceforge project created, it is just empty.

I can revert the IA-32 / x87 assembler to Delphi (though don't hold your breath - I've modified the implementation since it was Delphi, so it will be a transcode back to Delphi rather than a copy and paste). Also, you may be able to replace the nFFTdouble unit with an existing library.

If you wish to take over the management of the sourceforge project, let me know and it's yours....
Title: lossyWAV Development
Post by: rbrito on 2008-09-17 08:30:16
Thanks for your enthusiasm and offer of transcoding skills - v1.1.0b would be the best to start from. There is already a sourceforge project created, it is just empty.


Hi, Nick. I just checked the page and I think that we can cooperate on that. My id on sourceforge is just "rbrito". I'd be glad to be of help (and to accept patches from other people).

Quote
I can revert the IA-32 / x87 assembler to Delphi (though don't hold your breath - I've modified the implementation since it was Delphi, so it will be a transcode back to Delphi rather than a copy and paste).


Well, any way that you wish. And I hope that you don't mind me asking some pascal questions once in a while...

Quote
Also, you may be able to replace the nFFTdouble unit with an existing library.


It seems that we have a lot of choices regarding FFT... I just checked the repository on Debian and came up with this:

Code: [Select]
fftw-dev - library for computing Fast Fourier Transforms
fftw-docs - documentation for fftw
fftw2 - library for computing Fast Fourier Transforms
libfftw3-3 - library for computing Fast Fourier Transforms
libfftw3-dev - library for computing Fast Fourier Transforms
libfftw3-doc - documentation for fftw version 3
libgsl0ldbl - GNU Scientific Library (GSL) -- library package
mffm-fftw-dev - A C++ wrapper for the fftw.org C library (version 3)
mffm-fftw1c2 - A C++ wrapper for the fftw.org C library (version 3)
sfftw-dev - library for computing Fast Fourier Transforms
sfftw2 - library for computing Fast Fourier Transforms


Quote
If you wish to take over the management of the sourceforge project, let me know and it's yours....


I guess that I don't want to have the entire management, but having it shared... It can be a good learning exercise for both of us, I think.


Thank you for your very warm reply.

Rogério Brito.


Quote

Also, you may be able to replace the nFFTdouble unit with an existing library.


It seems that we have a lot of choices regarding FFT... I just checked the repository on Debian and came up with this:
(...)


Oh, and then, there is Daniel J. Bernstein (http://cr.yp.to/djb.html)'s fft programs (http://cr.yp.to/djbfft.html) and all his programs are in the public domain, which means that we can incorporate them into lossywav without worrying about the license.

BTW, nice that you chose GPLv3 as a license.


Regards, Rogério Brito.
Title: lossyWAV Development
Post by: 2Bdecided on 2008-09-17 11:19:23
Nick,

I know little and understand less, but isn't LGPL "better" for things like codecs that people might want to link into their own software without worrying about licensing issues?

I'm recalling a recent discussion on HA with spoon (dBpoweramp) where he was taking some stick for linking GPL software.

As I say, I don't know what I'm talking about, but I thought I'd mention it anyway!

Cheers,
David.
Title: lossyWAV Development
Post by: jido on 2008-09-17 12:20:13
I don't see much point in converting to C or C++ since Pascal is portable too.

If there are Delphi-specific bits, Lazarus (http://www.lazarus.freepascal.org/modules.php?op=modload&name=StaticPage&file=index&sURL=about&quot;) may provide the solution.

To my understanding, if LossyWAV remains an independant program, then GPL is appropriate. If it becomes a library it would be good to look into LGPL.
Title: lossyWAV Development
Post by: rbrito on 2008-09-17 20:56:30
I don't see much point in converting to C or C++ since Pascal is portable too.


Perhaps you didn't notice that the source contains a good deal of ia32 assembly? This makes the program unportable.


Regards, Rogério Brito.
Title: lossyWAV Development
Post by: Nick.C on 2008-09-17 21:04:30
Perhaps you didn't notice that the source contains a good deal of ia32 assembly?
I've started work on reducing the amount of assembly. I will release source of 1.1.1e when I have got further with this exercise.

Two routines converted back so far - shift_codec_blocks and fill_fft_input. Approximately a 20% speed penalty though....

[edit] This reduction process is totally reversible by means of a "USEPASONLY" compiler directive - if not defined, all converted routines revert to assembly language. [/edit]
Title: lossyWAV Development
Post by: jido on 2008-09-17 21:19:35

I don't see much point in converting to C or C++ since Pascal is portable too.


Perhaps you didn't notice that the source contains a good deal of ia32 assembly? This makes the program unportable.


Regards, Rogério Brito.

May not be portable to other architectures but certainly portable to Linux and MacOS X. Freepascal supports assembly parts.
Title: lossyWAV Development
Post by: SebastianG on 2008-09-18 09:32:00
I don't see much point in converting to C or C++ since Pascal is portable too.

It depends on what kind of Pascal you mean and what "portable" means to you. There's an ISO standard for Pascal but it's pretty old and purely procedural (no sign of OO). But yeah, the GNU compiler collection includes a Pascal compiler that supposedly supports Borland's OO-Pascal/Delphi flavour. The programming language choice is also a political/pragmatic question. There're lots of usefull C libraries out there. Also, if you intend to go open source the number of people that can help you fix bugs / add features is of course linked to the programming language. For example: I don't like to touch/write any Pascal code. Attracting developers is more of a problem if you stick with Pascal, I suppose. Note: I'm not suggesting a conversion to C or C++. It's not my place to tell.

Perhaps you didn't notice that the source contains a good deal of ia32 assembly? This makes the program unportable.

I'd think so too. It smells a bit like premature optimization, to be honest.

May not be portable to other architectures but certainly portable to Linux and MacOS X. Freepascal supports assembly parts.

It does. But AFAIK it only supports the AT&T notation and Borland used to use Intel notation. So you'd have to rewrite your ASM parts unless there's a tool around for doing that conversion. I might be wrong, though.

Cheers,
SG
Title: lossyWAV Development
Post by: rbrito on 2008-09-19 01:08:58

I don't see much point in converting to C or C++ since Pascal is portable too.

It depends on what kind of Pascal you mean and what "portable" means to you. There's an ISO standard for Pascal but it's pretty old and purely procedural (no sign of OO). But yeah, the GNU compiler collection includes a Pascal compiler that supposedly supports Borland's OO-Pascal/Delphi flavour. The programming language choice is also a political/pragmatic question. There're lots of usefull C libraries out there. Also, if you intend to go open source the number of people that can help you fix bugs / add features is of course linked to the programming language. For example: I don't like to touch/write any Pascal code. Attracting developers is more of a problem if you stick with Pascal, I suppose. Note: I'm not suggesting a conversion to C or C++. It's not my place to tell.


This is one important thing that you raise here.

Quote

Perhaps you didn't notice that the source contains a good deal of ia32 assembly? This makes the program unportable.

I'd think so too. It smells a bit like premature optimization, to be honest.


Agreed 100%. And doing some optimization should be the task of the compiler. Including assembly just messes up the ability of the compiler to use registers at will, in theory, which would mean that it could generate poorer code.

Quote

May not be portable to other architectures but certainly portable to Linux and MacOS X. Freepascal supports assembly parts.

It does. But AFAIK it only supports the AT&T notation and Borland used to use Intel notation. So you'd have to rewrite your ASM parts unless there's a tool around for doing that conversion. I might be wrong, though.


Not only that, but as I recall from my freshman years (please, don't ask when, it was way, way more than a decade ago  ), Pascal used a stack call convention which was exactly the opposite from C. I'm sure that I will be corrected here if I'm mistaken.

Besides that, I don't see why a program like this has to be tied to ia32. For instance, I would like it to be useful on amd64, powerpc, ia32, mips, m68k etc. This program is not platform specific. It just does some computations. Nothing more, nothing less.


Regards, Rogério Brito.
Title: lossyWAV Development
Post by: rbrito on 2008-09-19 05:07:03
Perhaps you didn't notice that the source contains a good deal of ia32 assembly?
I've started work on reducing the amount of assembly. I will release source of 1.1.1e when I have got further with this exercise.

Two routines converted back so far - shift_codec_blocks and fill_fft_input. Approximately a 20% speed penalty though....

[edit] This reduction process is totally reversible by means of a "USEPASONLY" compiler directive - if not defined, all converted routines revert to assembly language. [/edit]


Hi, Nick.

Well, that's actually a good thing. Can you add me as a project admin there? I can start with something that's not a moving target, then. And getting it under a version control system will ease your "Release Management" way of doing things.

Regards, Rogério Brito.
Title: lossyWAV Development
Post by: rbrito on 2008-09-19 05:22:42
If you wish to take over the management of the sourceforge project, let me know and it's yours....


Hi again, Nick.

Just if you want to see a little bit of my work on another project on SourceForge you can have a look at this log of a project (http://ppc-evtd.svn.sourceforge.net/viewvc/ppc-evtd/avr-evtd/?sortby=date&view=log). As you can see, I'm mostly the only committer, despite the fact that the project was opened a long time ago.

Regarding me being the sole admin of the ;pssywav project, I would prefer if we both collaborated as admins, as I'm not that experienced with digital sound processing as you seem to be. But porting to other architectures will also bring us some issues like arches being "big-endian" or "little-endian" (for instance).

I can deal a bit with the "Release Management" side of things and having things under SVN would help you keep the history of your modifications.


Regards, Rogério Brito.
Title: lossyWAV Development
Post by: gottkaiser on 2009-01-06 14:03:48
@Nick.C
Any news about the development of lossyWAV?
Title: lossyWAV Development
Post by: Nick.C on 2009-01-06 14:38:19
The latest development thread (1.2.0) is here (http://www.hydrogenaudio.org/forums/index.php?showtopic=65499).
Title: lossyWAV Development
Post by: gottkaiser on 2009-01-07 18:23:30
Didn't find that. Thanks Nick.

Could sombody tell me the correct code to transcode to LossyWAV with correction file?

At the moment I'm using foobar200 with following comandlines:
Code: [Select]
Encoder: c:\windows\system32\cmd.exe
Extension: lossy.tak
Parameters: /d /c C:\Programme\"Audio Tools"\foobar2000\lossyWAV.exe - --standard --silent --stdout|C:\Programme\"Audio Tools"\foobar2000\Takc.exe -e -p4 -fsl512 -ihs - %d
Format is: lossless or hybrid
Highest BPS mode supported: 24

What do I have to add to get the correction file? It should have the same file name just with the ".lwcdf.tak" extension. I know it has to do with the Command "-C" but I don't get it to work.

thanks in advance.
Title: lossyWAV Development
Post by: halb27 on 2009-02-07 08:55:02
Here comes lossyWAV v1.1.0 for testing:

[attachment=4871:lossyWAV.zip]
Title: lossyWAV Development
Post by: Dynamic on 2009-02-10 17:46:38
Could sombody tell me the correct code to transcode to LossyWAV with correction file?

At the moment I'm using foobar200 with following comandlines:


Sorry for the lateness of my reply.

I think correction files are incompatible with piping the output of lossyWAV direct to the standard input of an encoder (such as TAK). That's because there are now two output files from lossyWAV and there's no way on the command line to create two instances of TAK to accept two streams of standard input from lossyWAV's two outputs.

The way to do it would be using commandline or batch files and I think you'll need an uncompressed PCM WAV file as input to lossyWAV to generate wav and correction files which you can then compress using takc when complete. You may wish to copy tags across as well using another commandline tool or by means of a separate CUE sheet.

Personally, correction files aren't of great interest to me so I have no specific suggestions.
Title: lossyWAV Development
Post by: Dynamic on 2009-02-10 19:06:04
You may wish to copy tags across as well using another commandline tool or by means of a separate CUE sheet.

Personally, correction files aren't of great interest to me so I have no specific suggestions.


According to the Wiki (http://wiki.hydrogenaudio.org/index.php?title=CueTools), CUEtools can compress any lossless file using lossyWAV + correction file.
Title: lossyWAV Development
Post by: Nick.C on 2009-02-10 19:23:45
.... and I think you'll need an uncompressed PCM WAV file as input to lossyWAV to generate wav and correction files which you can then compress....
You can pipe FLAC decompressed output into lossyWAV and create processed and correction output files (just not with --stdout!).