Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: lossyWAV Development (Read 574058 times) previous topic - next topic
0 Members and 2 Guests are viewing this topic.

lossyWAV Development

Reply #975
In my own tests, lossyWAV 0.8.8 is significantly faster than version 0.8.7.
 
Would it make sense to drop all of the extra FFT analysis options (-Xa/b/c), in favour of an "-exhaustive" (or "-e") parameter, which would perform analysis passes using all suitable FFT sizes?
I ran some tests (process my 53 problem sample set (125.1MB) 5 times and discard highest lowest time, taking average of the remaining 3) on a 2.0GHz Core2Duo (single instance, nothing else running.....) and got the following for v0.8.8:
Code: [Select]
|======|==================|==================|
|  QS  | Time/Rate v0.8.8 | Time/Rate v0.8.7 |
|======|==================|==================|
|  -7  | 15.71s / 47.34x  | 20.47s / 36.33x  |
|  -7a | 21.29s / 34.93x  | 27.84s / 26.71x  |
|  -7b | 27.13s / 27.41x  | 35.97s / 20.67x  |
|  -7c | 32.44s / 22.92x  | 43.44s / 17.12x  |
|======|==================|==================|
So, I *think* that I would rather leave the 3 options in place as the extra analyses still have a major effect on the processing time / rate. All tests were carried out with the input files cached in memory to ignore read latency.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #976
Now that the machinery has changed quite a bit I tried to abx my problem sample set
Atemlied, badvilbel, bibilolo, , Blackbird/Yesterday, bruhns, dither_noise_test, eig, fiocco, furious, harp40_1, herding_calls, keys_1644ds, Livin_In_The_Future, S37_OTHERS_MartenotWaves_A, triangle-2_1644ds, trumpet, Under The Boardwalk.

My personal transparency level is where quality level -4 is now. So I tried to abx quality level -4, and I can only say I can't abx any problem. The only thing mentionable is very weak suspicion that Atemlied is not totally transparent (at the very moment when the 'music' starts [sec. 0.0-1.6]), but my abx result doesn't back this up at all.

So everything is great also with the changed machinery which brought a surprisingly high speed increase (from memory - didn't do a real comparison).
Because of the good speed I also encoded my regular track and my problem sample set to learn about average bitrate for the various quality levels:

quality    regular/problem set [kbps]
       -1              467/561
       -2              418/518
       -3              372/472
       -4              346/447
       -5              325/421   
       -6              306/397
       -7              291/375

These are very good properties IMO.
Even for a purist who wants to keep the basic principle and uses -2, the average bitrate for regular music is only slightly above 400 kbps. Average bitrate for the problems is 100 kbps higher on average which IMO is more than enough of security (with earlier and less sophisticated lossyWAV versions ~470 kbps on average for my problem set was necessary to make them transparent).
-3 and -4 are the perfect solutions IMO for the non-purists struggling for transparency.
From -5 up there is an increasing risk of arriving at non-transparent results, but judging from practical listening quality is still very good (just tried [no abxing] some samples from my regular track set using -6 and was very content).

Wonderful work, Nick. Congratulations.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #977
Thanks for the tests halb27. Good to hear we have a strong 300..350k range. Bitrates are also looking good with -2 giving archive quality at half the normal bitrate of lossless.

lossyWAV Development

Reply #978
... As an aside (and I know that looking at the spectrum in foobar is not any way to evaluate anything....) I looked at the spectral output for a lossyWAV correction file (replaygained +45dB or so) and almost all of the signal was in the high end of the spectrum - so it "looks" like my implementation of your noise shaping filter works!

Finally I tried -shaping too and also looked at the correction file's spectrum. Noise is less audible than without shaping, so it works well. Noise gathers mainly in the highest spf frequency zone and above (12.4+ khz). Because of this bitrate is often expected to be higher than without shaping in order to arrive at the same S/N ratio in the highest frequency zone.

Some proposals on the bitrate bloat issue:

a) on a per block basis decide which shaping yields the higher number of bits to remove:
shaping 0 or shaping 1 (or shaping x, y, z, ....).
I came to this idea because shaping 1 does not always yield a bitrate bloat. For dither_noise_test shaping 0 yields 705 kbps, whereas shaping 1 yields 295 kbps (when used together with -7), and I couldn't hear a problem.
Sure this means at least doubling the encoding time. Moreover maybe the changing of the noise shaping is audible (but the same argument applies as towards the sudden noise increase and decrease when the anti-clipping strategy goes to work: as long as the noise is hidden we shouldn't mind).
Maybe an autoshaping strategy like this is most adequate: start for the first block with a low shaping value like shaping = 0.2.
For any current block: try the shaping done with the last block, as well as a with two shaping values that add resp. subtract a certain delta from the last shaping value. Always use the shaping from the three possible values which maximizes the number of bits to remove.
This is the basic principle. In order to save some work, checking for changing the shaping value is not necessarily done on every block or with both directions. For instance with the current block it is checked only whether an increase of the shaping value is useful. On the next block the check goes in the opposite direction: check only whether decreasing the shaping value is useful, and so on interchangingly possibly increasing resp. decreasing shaping. Things like that. Maybe the frequency of the changes can trigger pauses for the shaping checking. When changes are rare it's not useful to do the checking with every block.

b) Maybe a more direct approach is more efficient: decide on the spectrum of the signal which shaping to use. When there's a lot of HF in the music, putting the noise into the HF region should be a good thing. Not quite so when there's no HF present in the music to hide the noise.

c) Maybe giving the potential to shift the noise also towards low frequencies instead of only shifting up may be useful with the approaches of a) or b).

d) Think of quality level -7 as of targeting at rather low bitrate lovers accepting some compromise. So for -7 soften the accuracy requirements for the highest spf frequency zone. For instance don't check this zone at all other than for the 64 sample FFT analyses (so far this shouldn't seriously hurt). If necessary: use a spreading of 5 or even 6 instead of 4 for the highest frequency zone in the 64 sample FFT analyses (this hurts - a bit for a spreading of 5, more so when going higher).
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #979
Some proposals on the bitrate bloat issue:

a) on a per block basis decide which shaping yields the higher number of bits to remove:
shaping 0 or shaping 1 (or shaping x, y, z, ....).
I came to this idea because shaping 1 does not always yield a bitrate bloat. For dither_noise_test shaping 0 yields 705 kbps, whereas shaping 1 yields 295 kbps (when used together with -7), and I couldn't hear a problem.
Sure this means at least doubling the encoding time. Moreover maybe the changing of the noise shaping is audible (but the same argument applies as towards the sudden noise increase and decrease when the anti-clipping strategy goes to work: as long as the noise is hidden we shouldn't mind).
Maybe an autoshaping strategy like this is most adequate: start for the first block with a low shaping value like shaping = 0.2.
For any current block: try the shaping done with the last block, as well as a with two shaping values that add resp. subtract a certain delta from the last shaping value. Always use the shaping from the three possible values which maximizes the number of bits to remove.
This is the basic principle. In order to save some work, checking for changing the shaping value is not necessarily done on every block or with both directions. For instance with the current block it is checked only whether an increase of the shaping value is useful. On the next block the check goes in the opposite direction: check only whether decreasing the shaping value is useful, and so on interchangingly possibly increasing resp. decreasing shaping. Things like that. Maybe the frequency of the changes can trigger pauses for the shaping checking. When changes are rare it's not useful to do the checking with every block.

b) Maybe a more direct approach is more efficient: decide on the spectrum of the signal which shaping to use. When there's a lot of HF in the music, putting the noise into the HF region should be a good thing. Not quite so when there's no HF present in the music to hide the noise.

c) Maybe giving the potential to shift the noise also towards low frequencies instead of only shifting up may be useful with the approaches of a) or b).

d) Think of quality level -7 as of targeting at rather low bitrate lovers accepting some compromise. So for -7 a soften the accuracy requirements for the highest spf frequency zone. For instance don't check this zone at all other than for the 64 sample FFTs (so far this shouldn't seriosly hurt). If necessary: use a spreading of 5 or even 6 instead of 4 for the highest frequency zone in the 64 sample FFT analyses (this hurts - a bit for a spreading of 5, more so when going higher).
Unfortunately, the -shaping parameter barely changes the bits-to-remove as calculated by lossyWAV, however as SebastianG said earlier it makes the predictors in the lossless codec work less efficiently. I wonder if a variable shaping which is 1.0 at 8 or more bits to remove and 0.0 at 0 bits to remove, i.e. a resolution of 0.125 shaping per bit-to-remove, might be effective?

I'll try it out this evening and if it works at all, I'll post a new beta, probably with a parameter "-autoshape" which will be compatible with -shaping <n> in the sense that if autoshape says -shaping 0.125, but -shaping has been specified as 0.5 then shaping will be in the range 0.5 to 1, treating the -shaping value as a minimum value.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #980
Unfortunately, the -shaping parameter barely changes the bits-to-remove as calculated by lossyWAV, however as SebastianG said earlier it makes the predictors in the lossless codec work less efficiently. ...

I see, and now that you write it I remember SebastianG's remark. If this is so: maybe shifting noise downwards in a controlled way makes things easier for the predictor. It's often helpful with wavPack lossy when using rather low bitrate.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #981
Unfortunately, the -shaping parameter barely changes the bits-to-remove as calculated by lossyWAV, however as SebastianG said earlier it makes the predictors in the lossless codec work less efficiently. ...
I see, and now that you write it I remember SebastianG's remark. If this is so: maybe shifting noise downwards in a controlled way makes things easier for the predictor. It's often helpful with wavPack lossy when using rather low bitrate.
That would take a new noise shaping function. SebastianG very kindly donated his 44.1kHz and 48kHz functions to get noise shaping working in lossyWAV, but I have no idea how to derive a new one which ideally would push noise above 20kHz and below, say, 10Hz.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #982
[That would take a new noise shaping function. SebastianG very kindly donated his 44.1kHz and 48kHz functions to get noise shaping working in lossyWAV, but I have no idea how to derive a new one which ideally would push noise above 20kHz and below, say, 10Hz.
I see, but maybe some day you will run upon such a function.

Another idea for the low bitrate lovers:
What about lowpassing to 17 or so kHz before letting the lossless codec do its job. Guess that brings the bitrate bloat down a bit. I'll try that using sox.

Good Lord: BS of course as this destroys the work of lossyWAV.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #983
Quite a pity, but interesting:

While with low bitrate settings the bitrate bloat is very remarkable when using -shaping 1, both in an absolute and, more so, in a relative sense (+25% for -7 with my regular full length track set), the absolute and relative difference is lower for the high quality settings (+12% for -2, +9% for -1).
Listening to the correction file of a -1 encoding the noise is so much less audible with -shaping 1 that it may be desirable to use -shaping 1 especially with quality -1.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #984
-

lossyWAV Development

Reply #985
I have a question for Nick C.

The resulting processed WAV file is smaller than the original ?
Eg. I could burn an entire lossyWAV album in WAV/PCM format without taking the whole space it would with the standard WAV?
I'm afraid not - all lossyWAV does is to zero lsb's in each sample as required. It does not change the bitdepth of the sample and therefore does not change the size of the ouput file, other than to add a 'fact' chunk with the lossyWAV processing information near the beginning of the WAV file.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #986
-

lossyWAV Development

Reply #987
so it's the lossless codec that takes advantage over the processed WAV...
Exactly right, David mentioned this in the first post in his original thread.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #988
lossyWAV beta v0.8.9 attached to post #1 in this thread.

Thanks for the tests halb27. Good to hear we have a strong 300..350k range. Bitrates are also looking good with -2 giving archive quality at half the normal bitrate of lossless.
I remember your comments in the first page of the thread and I am glad that we are getting close to your desired 340kbps...... 

Using the -7 -autoshape with my 53 sample set I get 366.8kbps compared to 348.1kbps for -7 and 399.2kbps for -7 -shaping 1.0.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

 

lossyWAV Development

Reply #989
... Using the -7 -autoshape with my 53 sample set I get 366.8kbps compared to 348.1kbps for -7 and 399.2kbps for -7 -shaping 1.0.

Sounds promising. I'm curious what it looks like with regular music.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #990
... Using the -7 -autoshape with my 53 sample set I get 366.8kbps compared to 348.1kbps for -7 and 399.2kbps for -7 -shaping 1.0.
Sounds promising. I'm curious what it looks like with regular music.
I've had a thought - the current implementation of -autoshape is analogous to maximum-bits-to-remove was prior to the use of the RMS value of the codec-block (expressed in bits) to determine the variable maximum-bits-to-remove. I am working on a variant which will take into account the RMS value of the codec-block at the same time - should be posted tomorrow.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #991
Anyway it looks very promising already:

-7 -autoshape => 325/385 kbps (my regular/problem set)
-4 -autoshape => 369/450 kbps (my regular/problem set)
-2 -autoshape => 432/519 kbps (my regular/problem set)

Again, penalty is lower the higher the quality setting.

For a fair comparison I encoded 'Livin_In_The_Future' with -5 as well as -7 -autoshape.
Looking at the spectrum noise behavior is better with -7 -autoshape up to ~ 9 kHz.
This is a valuable extension to the effect of the skewing machinery which keeps noise especially low up to ~ 3 kHz.
So the entire range of the fundamentals is kept within pretty low noise this way.
Of course this isn't a judgement about audible quality in the end.
I listened to the correction files of -5 and -7 -autoshape and as expected the coloured -autoshape noise of the -7 encoding is less audible than than the white noise of -5 though it's higher in ampitude.
Again this doesn't really tell about quality for quality levels using a positive -nts value.
With -nts 0 or negative however I think the quality control mechanism makes sure everything is fine (assuming the control mechanism is really working, and we don't have a reason to doubt that).
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #992
As it is now, lossyFLAC + lwcdfFLAC is still smaller than a plain FLAC, especially using lossyWAV -1 preset.


Sorry to revive a 2-week-old quote, but I've been away from the forums for quite some time. The above comment surprised me.

I'd be surprised if the lossless combination of (lossyFLAC plus correction files) would on average come out to occupy less disk space than plain FLAC for the same music. I say this because, for example, the combination (Wavpack hybrid plus correction file) is usually a little larger than plain Wavpack lossless, and this seems understandable because you're sacrificing some efficiency to enable you to split the total information content into a playable but smaller file plus a near-random noise correction file.

If it were generally true, then you've just found a way to improve the lossless compression ratio of FLAC - an unlikely result (certainly in comparison to optimal lossless FLAC settings and block lengths), but potentially valuable if true.

As an aside, I'm loving your work, everyone. This project looks rather exciting.

I would be fascinated to probe the boundaries of how good lossyWAV is as a transcoding source for conventional lossy encoders - i.e. to store my music on my PC in, say, lossyFLAC or lossyWV format, then transcode for portable devices on demand (I've already considered applying RG Album Gain before using lossless compression, accepting a theoretical but inaudible SNR degradation on overly-loud albums in exchange for reducing the bitrate).

As an approach to verifying transcoding robustness, I wonder about choosing, say, known LAME problem samples and encoding those from original WAV versus lossyWAV sources. If the artifact behaviour stays broadly similar, is that a valid reassurance that lossyWAV at setting X makes a robust source for transcoding so long as no other problems have been found with the problem samples that affect Wavpack Lossy, such as atemlied? Even for problems that are fixed in newer LAME versions, I guess one could use an older version of LAME and a newer one to check that the original artifact and the fixed version are substantially unchanged when using lossyWAV.

This approach might then help to guide the choice of quality setting for those who desire a transcoding source. I presume lossyWAV with quality -2, for example, would even work well as source material for transcoding down in quality into lossyWAV quality -7 for the PDA-DAP low battery drain approach because the changes made should barely affect the measured noise floor compared to the original WAV, and you're going much more aggressive anyway on that second pass to quality -7.

Best regards,
Dynamic
Dynamic – the artist formerly known as DickD

lossyWAV Development

Reply #993
Great work Nick.

Re: the bitrate bloat due to "shaping" - surely the whole point of shaping the noise is so that you can add more of it while maintaining the same level in the audible band? So when you enable shaping, you should also add a threshold shift. (Sorry if you're doing this already!).

Re: lower bitrate than normal FLAC: lossyFLAC uses a different default block size. Sometimes the output is smaller than the default FLAC blocksize, sometimes it's larger - maybe this is what's happening? Overall, what you say is correct: lossy+correction should be larger than lossless, but it would be nice if it wasn't!

Re: transcodability: I tested with mp3 problem samples very early on. It would be worth re-testing with the current version. I found you can't be too aggressive if you want to avoid any audible difference (e.g. -1!), but if "different but not worse" is good enough, you can use normal settings (which at the time was -2!). that horrible "trumpet" sample was quite revealing.

Cheers,
David.

lossyWAV Development

Reply #994
Great work Nick.

Re: the bitrate bloat due to "shaping" - surely the whole point of shaping the noise is so that you can add more of it while maintaining the same level in the audible band? So when you enable shaping, you should also add a threshold shift. (Sorry if you're doing this already!).

Re: lower bitrate than normal FLAC: lossyFLAC uses a different default block size. Sometimes the output is smaller than the default FLAC blocksize, sometimes it's larger - maybe this is what's happening? Overall, what you say is correct: lossy+correction should be larger than lossless, but it would be nice if it wasn't!

Re: transcodability: I tested with mp3 problem samples very early on. It would be worth re-testing with the current version. I found you can't be too aggressive if you want to avoid any audible difference (e.g. -1!), but if "different but not worse" is good enough, you can use normal settings (which at the time was -2!). that horrible "trumpet" sample was quite revealing.

Cheers,
David.
Thanks David, I'm glad you like it!

Re: Bitrate Bloat - I'm not doing that at present as my ears are not really sensitive enough, but if anyone wants to do some testing, I would suggest something along the lines of the following, taking into account the relationship between -snr and -nts:

Code: [Select]
  quality_noise_threshold_shifts    : array[1..Quality_Levels] of Integer = (-3,-0,3,6,9,12,15);

  quality_signal_to_noise_ratio     : array[1..Quality_Levels] of Integer = (24,22,20,19,18,17,16);


So, if I was going to go further, I would initially add 3 to -nts for every 1 taken from -snr, i.e. -8 = -nts 18 -snr 15; -9 = -nts 21 -snr 14; etc.

I tried "-nts 30 -snr 11 -autoshape" and with my problem set it doesn't sound particularly bad - probably a starting point.

Re: Transcodability - lossyWAV does not allow re-processing of an already processed file. I would prefer to keep it that way, although if the 'fact' chunk is removed, the program will not be able to tell the difference.

I found a small bug in the noise shaping code and also some quite nice speedups (approx 7% to 10%), so:

lossyWAV beta v0.9.0 attached to post #1 in this thread.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #995
I have run some new tests (process my 53 problem sample set (125.1MB)  on a 2.0GHz Core2Duo (single instance, nothing else running.....) and got the following for beta v0.9.0:
Code: [Select]
|======|==================|==================|
|  QS  | Time/Rate v0.8.8 | Time/Rate v0.9.0 |
|======|==================|==================|
|  -7  | 15.71s / 47.34x  | 14.34s / 51.86x  |
|  -7a | 21.29s / 34.93x  | 19.30s / 38.54x  |
|  -7b | 27.13s / 27.41x  | 24.56s / 30.27x  |
|  -7c | 32.44s / 22.92x  | 29.47s / 25.23x  |
|======|==================|==================|
All tests were carried out with the input files cached in memory to ignore read latency.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #996
Thank you, Nick.
I'd like to do some abxing using -autoshape, but because abxing isn't so much fun I've been waiting for your version which takes the RMS value into account.
Does v0.9.0 contain this feature?
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #997
Thank you, Nick.
I'd like to do some abxing using -autoshape, but because abxing isn't so much fun I've been waiting for your version which takes the RMS value into account.
Does v0.9.0 contain this feature?
I tried to implement the -autoshape taking into account RMS value and the bitrate went through the roof. I may re-visit it, but I think the -autoshape in v0.9.0 is fairly robust, 0% shaping at 0 bits-to-remove and 100% shaping at (bits-per-sample - 3) bits-to-remove. Using -7 -autoshape -snr 11 -nts 30, my 53 problem sample set ends up at 327.3kbps, and the quality is not too bad - a starting point as I said above.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #998
OK, I'll try to abx my usual problem samples with v0.9.0 -7 -autoshape. I'll also search for other tracks looking for hiss or other HF problems.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #999
OK, I'll try to abx my usual problem samples with v0.9.0 -7 -autoshape. I'll also search for other tracks looking for hiss or other HF problems.
Many thanks!
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)