Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: lossyWAV Development (Read 561211 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

lossyWAV Development

Reply #550
I don't see a use for a quasi-lossy bitrate reduction of a lossless format, if the reduction is known to produce artifacts in reasonable configurations. If people are able to ABX this in a wide variety of different modes, that doesn't give me much confidence in using lossyWAV at all, no matter what the settings. If I can deal with a probabilistic chance of artifact audibility, why not stay with lossy?

a) We have a system of lossyWAV + a lossless codec which makes up for a lossy codec.
So you do lossy encoding when using lossyWAV, and the good and bad of the procedure must be measured against that of other lossy codecs. Which is a very subjective thing of course when comparing lossy codecs of very good quality.
b) AFAIK nobody ever experienced an artifact even with our lowest quality mode -3. Quality was extremely good from the very start. You're welcome to do some listening tests and report about it.
This doesn't seem like the sort of algorithm that lends itself to tuning.

??? For a very long period we had great quality but at a bitrate of ~500 kbps on average. But we've investigated and optimized David Bryant's idea of doing the averaging of the FFT outcome according to the length of the critical bands, and we differentiate on doing this depending on FFT length. We've optimized the -skew parameter where a rather high -skew value does an extremely good job at differentiating between spots in the music which have to be handled defensively or not. We've introduced the -snr parameter which adds benefits for the differentiation work of -skew. We've found a solution to the theoretical clipping issue. We've improved the way the FFT analyses covers the lossyWAV blocks for security reasons.
So we ended up with an average bitratre of ~350 kbps for -3 with not the least quality issue known. -2 and -1 IMO provide for adequately varying internals to make it promising for the cautious minded of the various kind.
As a consequence IMO the only really useful option apart from the quality parameter is -nts. I personally however wouldn't mind if the advanced options are kept even in the final release if they are clearly marked as such (maybe hidden in the commandline help, but documented in the external documentation).
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #551
Forgive me for asking a fundamental (and admittedly critical) question; I'm very late to this particular party. Before I start, I must say this idea (and all the work that has gone into it) is incredible, and I would not hesitate to use it once the kinks are ironed out. From the original post by 2BDecided:
Quote
This isn't about psychoacoustics. What you can or can't hear doesn't come into it. Instead, you perform a spectrum analysis of the signal, note what the lowest spectrum level is, and throw away everything below it. (If this seems a little harsh, you can throw in an offset to this calculation, e.g. -6dB to make it more careful, or +6dB to make it more aggressive!).
I don't see a use for a quasi-lossy bitrate reduction of a lossless format, if the reduction is known to produce artifacts in reasonable configurations. If people are able to ABX this in a wide variety of different modes, that doesn't give me much confidence in using lossyWAV at all, no matter what the settings. If I can deal with a probabilistic chance of artifact audibility, why not stay with lossy?

This doesn't seem like the sort of algorithm that lends itself to tuning. If the technique is independent of psychoacoustics, then the only advanced setting that ought to exist is -skew.

Is that too harsh? Perhaps I'm being overly critical on beta code?
The beta nature only really reflects the status of the code with respect to bug reports which will (probably) come in. This method / pre-processor was initially intended to allow the benefits of a lossy codec to be "wrapped" in a lossless codec. The method is David's, Halb27 and I have only implemented it in Delphi and added a few tweaks along the way.

At various points along the way, people have assisted with setting determination through personal ABX'ing of particularly problematic samples (Big thanks to Halb27, Shadowking, Wombat & Gurubooleez). Valued input has been made by 2Bdecided, Bryant, TBeck, Mitch 1 2, Josef Pohm, SebastianG, user, collector, Dynamic, GeSomeone, Robert, verbajim, [JAZ], BGonz808, M & Jesseq.

At the present time I don't think that the method is "known" to produce any artifacts with default settings (however if anyone can tell me differently, I would be very appreciative of the particular sample to try and iron it out).

Yes there have been very few individuals involved in ABX'ing / settings development, but I take it that that just means that this is a niche program only wanted by a few people.

From a purely personal perspective, I have found the drive to develop it through feedback from those who have made comments along the way and from a desire to use lossyFLAC on my iPAQ (GSPlayer v2.25 & GSPFlac.DLL)

In keeping with David's wishes, the only command line options in the final revision will be quality levels -1,-2 & -3 and the -nts parameter (unless, as Halb27 has indicated we leave the advanced options in the code but don't "advertise" them outside of the accompanying PDF / TXT file.

Why don't you give it a try? It's certainly robust enough to handle a Foobar2000 transcode of about 1500 files without falling over (the largest of which was circa 60 minutes).
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #552
Yes there have been very few individuals involved in ABX'ing / settings development, but I take it that that just means that this is a niche program only wanted by a few people.
Personally, I have been following this thread avidly from the start, but I lack the ears to be testing very high quality lossy audio, or the expertise to offer technical advise or cause debate.

This gives me an opportunity to thank you all though for the work that you have put in.  I think this is an extremely exciting development.

I think Axon's question was well worth the ask:  much of the discussion in this thread is - to complete laymans like myself - of a complex technical nature.  Given that lossyWAV sits somewhere between high quality 'psychoacoustic' lossy and lossless quality, it is necessary to explain to the general masses what users can expect from this process.

Personally I have been considering a Wavpack lossy backup of my music for a while.  It is possible that using lossyWAV as a pre-processor may be more suited to my needs (or whims).

Also, I cannot simply 'give it a try'.  I am highly unlikely to find an issue.  What I need to know is that people with excellent ears and technical knowledge can assure me that this process will create a near-perfect archive from which I can safely transcode to lossy for use on my DAP, or car stereo.

After re-reading my post I think I've just realised why we're called 'users'.
I'm on a horse.

lossyWAV Development

Reply #553
Yes there have been very few individuals involved in ABX'ing / settings development, but I take it that that just means that this is a niche program only wanted by a few people.
Personally, I have been following this thread avidly from the start, but I lack the ears to be testing very high quality lossy audio, or the expertise to offer technical advise or cause debate.

This gives me an opportunity to thank you all though for the work that you have put in.  I think this is an extremely exciting development.

I think Axon's question was well worth the ask:  much of the discussion in this thread is - to complete laymans like myself - of a complex technical nature.  Given that lossyWAV sits somewhere between high quality 'psychoacoustic' lossy and lossless quality, it is necessary to explain to the general masses what users can expect from this process.

Personally I have been considering a Wavpack lossy backup of my music for a while.  It is possible that using lossyWAV as a pre-processor may be more suited to my needs (or whims).

Also, I cannot simply 'give it a try'.  I am highly unlikely to find an issue.  What I need to know is that people with excellent ears and technical knowledge can assure me that this process will create a near-perfect archive from which I can safely transcode to lossy for use on my DAP, or car stereo.

After re-reading my post I think I've just realised why we're called 'users'.
Thanks are always appreciated.

I totally agree that the question is valid and requires an answer. Technically, I am not really the person to answer it, just the programmer.

Also, I will be using my lossyFLAC collection in tandem with my FLAC collection rather than replacing the latter with the former, essentially, lossyFLAC is my lossy transcode.

Until more ears have validated the current quality level settings, we're not going to be in the position to reassure new users of the quality of the output.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #554
Personally I have been considering a Wavpack lossy backup of my music for a while.  It is possible that using lossyWAV as a pre-processor may be more suited to my needs (or whims).

Also, I cannot simply 'give it a try'.  I am highly unlikely to find an issue.  What I need to know is that people with excellent ears and technical knowledge can assure me that this process will create a near-perfect archive from which I can safely transcode to lossy for use on my DAP, or car stereo.

After re-reading my post I think I've just realised why we're called 'users'.

The more I'm into audio compression the more I think it's upto personal decisions (and personal a priori preferences) what codec and setting to use. Objective findings always have a limited scope.
My personal key event was the 128 kbps listening test of Lame 3.97 where Lame came out more or less on par with codecs like Vorbis. I have no doubt this test was done with great care, but I personally would never use 3.97 at a bitrate of 128 kbps (due to the 'sandpaper' noise and similar problems). Luckily 3.98 has overcome these problems, and is still about to improve.

So it's true that more listening experience by especially well-respected ears is most welcome, but IMO it's not a sine qua non thing. Technical knowledge can't assure transparency anyway.

So in the end what IMO counts is that any experience tells that everything is fine so far (finally we do have public experience though we like to get more). And of course any potential user must like the idea of being close to lossless (from the technical view of the overall procedure which is not necessarily related to quality), and must not care about a bitrate of 350 kbps or higher. Otherwise he wouldn't use it.

As you have considered using wavPack lossy you don't care about extremely high bitrate, and you like the idea of being with a clean signalpath associated with going a near-lossless way, cause otherwise you would use very hiqh quality Vorbis or similar. Using lossyWAV you're more or less in the same situation as if you used wavPack lossy. We can expect wavPack lossy high mode at 400 kbps using dynamic noise shaping giving transparent results in nearly any situation and non-annoying results even on the hardest stuff, and all this without a real quality control so far. With lossyWAV the situation is the same (hopefully even better due to the existing quality control which can be said to have proved being effective).

The main problem with very high quality codecs is: while it's easy to prove the codec has an issue by giving a sample, it's impossible to prove a codec is transparent in a universal sense. So in the end the most adequate attitude IMO is once very high quality is assured at least in a basic sense: don't care as long as no counterexamples are given.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #555
Thank you both for your responses.

Technical knowledge can't assure transparency anyway.
If it's technical knowledge of a lossless operation then it can.

The techniques that are being used in lossyWAV are complete gibberish to me.  In my limited understanding though, what was originally proposed was the removal of near-useless bits from the WAVE, to make mor efficient use of basic compression routines within the encoders (e.g.: FLAC's wasted_bits).  You speak below of "a clean signalpath": this is really what I am discussing.  If someone with a technical knowledge of the algorithms used can assure users that the resulting signal has merely had some negligable information removed with no further processing then that to me would suggest that there was less room for a bug in the algorithm, or that the decision making process was more simple and therefore less prone to erratic behaviour.  I don't think I'm making myself clear.

As you have considered using wavPack lossy you don't care about extremely high bitrate, and you like the idea of being with a clean signalpath associated with going a near-lossless way, cause otherwise you would use very hiqh quality Vorbis or similar. Using lossyWAV you're more or less in the same situation as if you used wavPack lossy. We can expect wavPack lossy high mode at 400 kbps using dynamic noise shaping giving transparent results in nearly any situation and non-annoying results even on the hardest stuff, and all this without a real quality control so far. With lossyWAV the situation is the same (hopefully even better due to the existing quality control which can be said to have proved being effective).
Exactly.

The main problem with very high quality codecs is: while it's easy to prove the codec has an issue by giving a sample, it's impossible to prove a codec is transparent in a universal sense. So in the end the most adequate attitude IMO is once very high quality is assured at least in a basic sense: don't care as long as no counterexamples are given.
Agreed.  And, of course, such claims are will be taken with a pinch of salt until a lot of testing has been undertaken.  And, of course, testing high quality encodes is not easy.
I'm on a horse.

lossyWAV Development

Reply #556
If someone with a technical knowledge of the algorithms used can assure users that the resulting signal has merely had some negligable information removed with no further processing then that to me would suggest that there was less room for a bug in the algorithm, or that the decision making process was more simple and therefore less prone to erratic behaviour.  I don't think I'm making myself clear.
As I have an implicit knowledge of the workings of the 3 main procedures involved in the process (having transcoded them from Matlab > Delphi > IA-32 Assembler) I will work on a process flow explanation.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #557
As I have an implicit knowledge of the workings of the 3 main procedures involved in the process (having transcoded them from Matlab > Delphi > IA-32 Assembler) I will work on a process flow explanation.
I would be very interested to read a non-technical explanation of the processes involved; however I feel awful for increasing your workload.

Please only do so if you believe that it will be necessary for other users to make the decision also.

Thanks again.
I'm on a horse.

lossyWAV Development

Reply #558
... If someone with a technical knowledge of the algorithms used can assure users that the resulting signal has merely had some negligable information removed with no further processing then that to me would suggest that there was less room for a bug in the algorithm, or that the decision making process was more simple and therefore less prone to erratic behaviour.  ...

Yes, that's what makes the procedure attractive to me too though I'm afraid we won't get a kind of security from the mere process itself.
I can try to describe the procedure from my understanding which isn't perfect at all:

As you write the basic idea is to form (now) 512 sample blocks and decide for each block how many of the least significant bits not to use (set to 0). Lossless codecs like FLAC can make use of the reduced number of bits per sample in these blocks, and in order to be effective the block size of the lossless codec should be identical to the lossyWAV block size (or an integer multiple of it in case the lossless codec works more efficient in an overall sense with longer blocks). FLAC works fine with a blocksize of 512.

The usual 16 bit accuracy of wave samples is necessary mainly to give a good accuracy to low volume spots in the music and allow for a good dynamic range. At moderate to low volume spots far less than 16 bits are used for signal representation (that's why lossless codecs yield a good compression ratio in these cases). At high volume spots not the entire 16 bits are needed usually. Roughly speaking a certain number of rather high value bits are needed for loud spots (while the lower value bits can be zero), and a certain number of low value bits are needed for quieter music (and the high value bits are zero). That's the main background of the method. We care about the louder spots and reduce accuracy of representation here.
Dropping a certain amount of least significantly bits means adding noise to the original. This added noise is not necessarily perceived as the kind of analog noise/hiss known from for instance tape recordings.

So the main thing is to decide on how many least significant bits to drop. From a bird's view the frequency spectrum of the 512 sample block is calculated and the frequency region with the lowest energy is searched. The idea is to preserve this energy, don't let it get drowned in the added noise, and this done by keeeping sample accuracy high enough by looking up this minimum energy level in a table that tells how many bits are possible to remove depending on energy level and frequency. The table was found a priori by examining white noise behavior with respect to our purposes.

The real process is a bit more complicated letting several FFTs do the frequency spectrum analysis according to what they're best at: short FFTs responding to quickly changing signals but with a very restricted resolution at low to medium frequencies, and long FFTs giving good frequency resolution but not responding very quickly. Nick.C has done a good job in letting the FFTs cover the lossyWAV blocks very accurately - more than was done originally.
Moreover in order not to have to keep up high accuracy due to pure hazard, a certain averaging is done over the outcome of the FFT analyses. A lot of tuning has been done on this in order to achieve good quality and relatively low bitrate.
A huge sensitivity bias is given to the low to medium frequency range by using the -skew and -snr options. This is done in analogy to the fact that the usual transform codecs give priority to the accurate representation of low to medium frequencies. The improvement in quality control by using -skew is so strong that we have decided that a noise threshold of +6 is sufficient for -3 (in the a priori theory -nts should be 0).
For -2 we also default to the slightly positive -nts 2, and only with -1 we use a defensive -2. Other than that that the different quality levels differ for the main part in how they do the FFT analyses. With -3 we use 2 different FFT lengths for each block, -2 uses 3 different FFT lengths, and it's a total of 4 FFT lengths for -1. Moreover the averaging of the FFT results is done in an increasingly defensive way when going from -3 to -1.

After having decided about how many least significant bits to remove (set to 0) the samples of the lossyWAV block are rounded to the corresponding values. This rounding can lead to clipping, but we have found a solution to avoid it (by simply dropping less bits in the block so long until no clipping occurs).

Hope that helps.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #559
I've started a new wiki article here. The article is incomplete and probably inaccurate. It is also in need of a "technical details" section, possibly along the lines of what you posted above.
lossyFLAC (lossyWAV -q 0; FLAC -b 512 -e)

lossyWAV Development

Reply #560
I've started a new wiki article here. The article is incomplete and probably inaccurate. It is also in need of a "technical details" section, possibly along the lines of what you posted above.

Wonderful idea, good job.
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #561
.....Nick.C has done a good job in letting the FFTs cover the lossyWAV blocks very accurately - more than was done originally.....
I don't think that that is the case, the original method overlaps the ends of the codec_block by half an fft_length and overlaps fft's by half an fft_length. The -overlap parameter overlaps the ends by half an fft_length and overlaps fft's by 5/8 of an fft_length.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #562
I can try to describe the procedure from my understanding which isn't perfect at all:
...
Hope that helps.
Yes.  Thank you for your time.  I'm slowly getting there.

I'm not sure if you can answer this, and it may be better left for the documentation, but I am left wondering between the differences of -1, -2 and -3.  Is -3 thought to be transparent in all known situations now?  The obvious next question being: so why bother with -2 and -3?

I guess the same could be said with LAME -V0 and 320kbps CBR, but I'm expecting lossyWAV to have less of a grey area.

Personally, I'd like to hope that -2 (as default) is 'considered transparent until a problem sample can be found', -3 is overkill for the more paranoid amongst us, and -1 introduces a slight amount of risk.  Apologies if the description of these presets has been discussed recenty elsewhere.
I'm on a horse.

lossyWAV Development

Reply #563
I can try to describe the procedure from my understanding which isn't perfect at all:
...
Hope that helps.
Yes.  Thank you for your time.  I'm slowly getting there.

I'm not sure if you can answer this, and it may be better left for the documentation, but I am left wondering between the differences of -1, -2 and -3.  Is -3 thought to be transparent in all known situations now?  The obvious next question being: so why bother with -2 and -3?

I guess the same could be said with LAME -V0 and 320kbps CBR, but I'm expecting lossyWAV to have less of a grey area.

Personally, I'd like to hope that -2 (as default) is 'considered transparent until a problem sample can be found', -3 is overkill for the more paranoid amongst us, and -1 introduces a slight amount of risk.  Apologies if the description of these presets has been discussed recenty elsewhere.
Exactly those last descriptions, but in reverse order: -1 = overkill; -2 = what you said; -3 = (may, although not yet proven) introduce a slight amount of risk.

The reason for -1 is that you may want to do other things with the output of lossyWAV; -2 is considered to be a very robust intermediate between -1 and -3; -3 is the "I want a lower bitrate and I want "acceptable" (rather than transparent) output" setting, which at the moment is better than its target.

My view of the process:

Read WAV header from input file;
Write WAV header to output file;

Create reference_threshold tables for each fft_length for each bits_to_remove (1 to 32) - not required as precalculated data is used to re-create the surface for each window / dither combination (yes, it changes with both..... ) - This calculates the mean fft output from the analysis of the difference between the random noise signal and its bit_removed compatriot;

Create threshold_indices from selected reference_threshold table (window / dither combo) - basically, determine how many bits_to_remove for a given minimum dB value;

Read WAV data in a codec_block_size chunk (all channels at once) and for each channel:

Carry out FFT analyses (3 for 1024 sample fft on 512 codec_block_size up to 33 for a 64 sample fft on 512 codec_block_size) on each channel of the codec_block, for each fft_analysis:

  Calculate magnitudes of FFT output (from complex number);

  Skew magnitudes (currently -36dB at 20Hz to 0dB at 3545Hz, following a 1-sin(angle) curve where angle is the proportion of 1 given by (log(this_bin_frequency)-log(min_bin_frequency))/(log(max_bin_frequency)-log(min_bin_frequency)))    by the relevant amount;

  Spread skewed magnitudes using the relevant spreading function (e.g. 23358-...... means average 2 bins in the first zone, 3 in the second and third zones, 5 in the fourth zone and 8 in the fifth zone), retaining the minimum value and the average value of the skewed results;

  minimum_threshold=floor(min(minimum_skewed_result+nts,average_skewed_result-snr));

  Look up Threshold_Index table for the relevant fft_length to determine bits to remove for that particular fft_analysis;

When all fft_analyses for a particular codec_block are complete, determine the minimum bits_to_remove value and use that to:

Remove_bits: For each sample in each channel of the codec_block bit_removed_sample:=round(sample/(2^bits_to_remove))*(2^bits_to_remove). If in the remove_bits process a sample falls outwith the upper or lower bound then decrease bits_to_remove and start the remove_bits process again.

Write processed codec_block and repeat;

Close files and exit.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #564
Exactly those last descriptions, but in reverse order: -1 = overkill; -2 = what you said; -3 = (may, although not yet proven) introduce a slight amount of risk.
Excellent news.  I will have to spend some time reading your explanation as it, on a quick skim, still seems quite technical to me.  Perhaps, as I try to comprehend myself, I can suggest a n00b translation to your technical explanation, that may help to produce the final documentation?

Anyway, the reason I came to post again: WOW!

I have tested lossyWAV previously, but - given the frequency of releases - have really been waiting for it to get to beta before testing fully.

I have just used it and FLAC on my TAK corpus, and am astounded by the savings, using the default settings.

Code: [Select]
File  FLAC    lossyWAV+FLAC
===========================
00    1054    376
01    728    366
02    765    390
03    1013    413
04    883    425
05    860    469
06    1084    455
07    981    419
08    1052    399
09    873    393
10    1026    511
11    853    367
12    834    422
13    1016    435
14    954    403
15    867    390
16    1068    397
17    861    376
18    787    442
19    909    394
20    1142    400
21    760    384
22    1022    410
23    1030    394
24    917    433
25    914    384
26    810    401
27    878    354
28    1040    449
29    912    442
30    895    419
31    913    411
32    1010    402
33    1018    397
34    831    429
35    939    410
36    1038    402
37    1084    439
38    825    381
39    999    413
40    1007    408
41    1037    505
42    1054    408
43    897    418
44    839    364
45    924    425
46    898    431
47    890    398
48    1014    414
49    999    412
Bloody good work gentlemen!

I am under the impression that I can also use TAK and WavPack already.  I need to do some more reading to see, if anything, what I need to do to test these also.
I'm on a horse.

lossyWAV Development

Reply #565
.....Nick.C has done a good job in letting the FFTs cover the lossyWAV blocks very accurately - more than was done originally.....
I don't think that that is the case, the original method overlaps the ends of the codec_block by half an fft_length and overlaps fft's by half an fft_length. The -overlap parameter overlaps the ends by half an fft_length and overlaps fft's by 5/8 of an fft_length.

Oops, I thought the new overlapping was done throughout. So without the -overlap option FFT overlapping is done as before and it takes the -overlap option to do the new overlapping (we discussed something like 8 pages back)?
lame3995o -Q1.7 --lowpass 17

lossyWAV Development

Reply #566
.....Nick.C has done a good job in letting the FFTs cover the lossyWAV blocks very accurately - more than was done originally.....
I don't think that that is the case, the original method overlaps the ends of the codec_block by half an fft_length and overlaps fft's by half an fft_length. The -overlap parameter overlaps the ends by half an fft_length and overlaps fft's by 5/8 of an fft_length.
Oops, I thought the new overlapping was done throughout. So without the -overlap option FFT overlapping is done as before and it takes the -overlap option to do the new overlapping (we discussed something like 8 pages back)?
Yes, exactly - the new 5/8th fft_length overlapping system doesn't have me totally "sold" to make it the default, but it is still a selectable option.

@Synthetic Soul -  Glad you like it sir - now, does it bear listening to? Oh, and which quality level was that?

@Axon - Thanks for stimulating a very interesting series of posts!
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #567
Anyway, the reason I came to post again: WOW!

I have tested lossyWAV previously, but - given the frequency of releases - have really been waiting for it to get to beta before testing fully.

I have just used it and FLAC on my TAK corpus, and am astounded by the savings, using the default settings.
You ain't seen nothin' yet! You should try using lossyWAV -3 with FLAC -8 -b 512.
lossyFLAC (lossyWAV -q 0; FLAC -b 512 -e)

lossyWAV Development

Reply #568
@Synthetic Soul - smile.gif Glad you like it sir - now, does it bear listening to? Oh, and which quality level was that?
I've been casually listening to the files while testing, and of course can hear no discernable difference.  Default settings for both lossyWAV (-2) and FLAC (-5).

I will soon be posting results for WavPack and TAK defaults also.

@Axon - Thanks for stimulating a very interesting series of posts!
Indeed.  I've not felt it was the time to get involved before now, but I think it's now time for us more casual testers to show our interest.
I'm on a horse.

lossyWAV Development

Reply #569
OK, here's my results for FLAC, WavPack and TAK on default settings

Code: [Select]
Encoder          |  Command
===================================================================
FLAC 1.2.1       |  flac -b 512 <source>
WavPack 4.42a2   |  wavpack --merge-blocks --blocksize=512 <source>
TAK 1.0.2 Final  |  takc -e -fsl512 <source>

===============================================================
File  |    FLAC    Lossy  |    WavPack Lossy  |    TAK    Lossy
Code: [Select]
00    |    1054    376    |    1048    367    |    1034    360
01    |    728    366    |    728    374    |    708    359
02    |    765    390    |    766    395    |    742    378
03    |    1013    413    |    1013    421    |    997    406
04    |    883    425    |    880    421    |    867    413
05    |    860    469    |    858    491    |    798    445
06    |    1084    455    |    1077    458    |    1071    447
07    |    981    419    |    976    418    |    955    410
08    |    1052    399    |    1046    395    |    1040    391
09    |    873    393    |    871    401    |    823    372
10    |    1026    511    |    1029    524    |    1011    504
11    |    853    367    |    853    374    |    827    355
12    |    834    422    |    832    429    |    811    414
13    |    1016    435    |    1010    435    |    1000    425
14    |    954    403    |    948    402    |    927    396
15    |    867    390    |    864    397    |    841    380
16    |    1068    397    |    1066    400    |    1059    393
17    |    861    376    |    860    382    |    829    365
18    |    787    442    |    783    440    |    774    431
19    |    909    394    |    907    393    |    879    382
20    |    1142    400    |    1140    396    |    1130    394
21    |    760    384    |    767    390    |    740    370
22    |    1022    410    |    1014    408    |    1004    400
23    |    1030    394    |    1025    391    |    1022    385
24    |    917    433    |    913    444    |    888    423
25    |    914    384    |    910    381    |    884    371
26    |    810    401    |    811    404    |    784    383
27    |    878    354    |    871    366    |    855    346
28    |    1040    449    |    1033    459    |    1019    443
29    |    912    442    |    911    444    |    877    421
30    |    895    419    |    889    431    |    843    403
31    |    913    411    |    914    415    |    874    389
32    |    1010    402    |    1003    401    |    992    393
33    |    1018    397    |    1009    398    |    994    387
34    |    831    429    |    859    457    |    793    411
35    |    939    410    |    940    417    |    908    395
36    |    1038    402    |    1032    399    |    1027    393
37    |    1084    439    |    1088    453    |    1071    430
38    |    825    381    |    829    392    |    796    367
39    |    999    413    |    993    408    |    986    399
40    |    1007    408    |    999    405    |    990    398
41    |    1037    505    |    1029    516    |    1012    497
42    |    1054    408    |    1046    403    |    1035    395
43    |    897    418    |    901    426    |    882    408
44    |    839    364    |    830    377    |    798    354
45    |    924    425    |    920    425    |    909    414
46    |    898    431    |    899    435    |    881    426
47    |    890    398    |    882    393    |    875    384
48    |    1014    414    |    1006    412    |    997    401
49    |    999    412    |    992    409    |    984    400
==============================================================
Avg  |    940    412    |    937    415    |    917    400
I'm on a horse.

lossyWAV Development

Reply #570
I've started a new wiki article here. The article is incomplete and probably inaccurate. It is also in need of a "technical details" section, possibly along the lines of what you posted above.

As your documentation reports which codecs support LossyWAV and which don't, the following is my experience about the missing ones.

MP4ALS and LPAC support LossyWAV very very well.

SHN should, but I didn't bother to actually check.

On the other side, unless I made some kind of mistake, in my tests APE, LA and ALAC didn't even show to be able to support wasted bits detection at all! OFR supports wasted bits but I can't see a way for it to use a 512 samples frame size (nor my OPINION is that OFR was designed to work with such a small frame size).

lossyWAV Development

Reply #571
You ain't seen nothin' yet! You should try using lossyWAV -3 with FLAC -8 -b 512.
Using -8 does little for my corpus by the looks of it.  I've only tested the first 25 files so far, but it only take the average bitrate from 933 to 930 for those files.

In fact, using lossyFLAC and encoding using -5 yields, on average, a file 43.90% the size of the standard FLAC, but with -8 it is merely 43.93% the size.

Edit: Sorry, in my haste to test I have forgotten that I'm still using lossyWAV files processed using -2.  Perhaps with -3 there is a more drastic improvement.
I'm on a horse.

lossyWAV Development

Reply #572
I've started a new wiki article here. The article is incomplete and probably inaccurate. It is also in need of a "technical details" section, possibly along the lines of what you posted above.
As your documentation reports which codecs support LossyWAV and which don't, the following is my experience about the missing ones.

MP4ALS and LPAC support LossyWAV very very well.

SHN should, but I didn't bother to actually check.

On the other side, unless I made some kind of mistake, in my tests APE, LA and ALAC didn't even show to be able to support wasted bits detection at all! OFR supports wasted bits but I can't see a way for it to use a 512 samples frame size (nor my OPINION is that OFR was designed to work with such a small frame size).
As long as the target codec can work on a multiple of the lossyWAV codec_block_size, or use -cbs xxx to set the lossyWAV codec_block_size to the same as the target codec, or I get off my behind and implement a -ofr parameter to specify codec specific settings (as for WMALSL).

We may be early beta, but if anyone has any ideas as to improvements / additions / changes they might like to see then let me know you can pm me or e-mail me from here if you don't want to post publicly.

I am gratified to see that the code is quite robust as the error reports have dwindled.... <avalanche!>

Mitch 1 2 is doing a great job with the wiki article, I should get round to my bit of it.
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)

lossyWAV Development

Reply #573
Thanks for the excellent responses.

I think I may not have stated my concerns accurately or completely in my first post. I was certainly wrong to assume that artifacts have been found in recent -1 and -2 tests. But my beef isn't quite with the existence of artifacts, or that the bit reduction process is necessarily obscure (although halb's and Nick's posts did a lot to explain them). It's that the entire design process of the algorithm seems obscure, and clarifying it (and potentially formalising it) would go a long way to help explain to users exactly what this is good for.

2BDecided's original post seemed to imply that the transparency of bit reduction can be solely proved based on one psychoacoustic principle: spectral masking below a noise floor. This appears to be one of the more fundamental results of psychoacoustics, and fairly hard limits on audibility can be determined a priori to listening tests, based on the literature.

This is the biggest advantage lossyWAV may have compared to other lossy formats. Most lossy encoders exploit multiple psychoacoustic effects to reduce bitrate while maintaining transparency. If one effect is exploited too aggressively, out of several effects being exploited in parallel, transparency is lost and an artifact is audible. But lossyWAV, if it only relies upon spectral masking, has only one point of failure, and one that is very well understood. The quantization distortion should not induce artifacts under any other psychoacoustic effect. That's in incredibly strong selling point to convince people to use lossyWAV for many, many applications.

But the sheer number of tunings that have occured in the final product (regardless of whether or not they are eventually made available to the user) made me question how ironclad this advantage really is. It seems to me that the algorithm should be proven transparent a priori of any listening tests, based entirely on signal processing principles, and only very little psychoacoustic principles (based only on masking the quantization noise with the background noise). But instead, the settings seem like they are based primarily on listening tests. Those are a correct testing method for lossy codecs, but for an encoder this agonizingly close to being able to be formally verified? The tunings have the slight air of a sausage factory behind them. The end result is tasty, but the means to the end are rather unsavory.

Perhaps lossyWAV has simply evolved to use slightly more psychoacoustic phenomena than a simple theory of spectral masking. That appears to be the justification for -skew and the spreading functions. Certainly, a tight argument can be made for taking into account the width of the critical bands to adjust the sensitivity of low/high frequencies. But it still seems like the other options are pretty much pulled out of a hat.

What would be ideal is if each step of the algorithm is shown to follow logically from critical band masking theory, or from a small finite set of psychoacoustic effects, and to show that the algorithm is immune to artifacts from other effects.

Perhaps I'm talking out of line by asserting that an algorithm like this can be formally verified?

lossyWAV Development

Reply #574
Thanks for the excellent responses.

I think I may not have stated my concerns accurately or completely in my first post. I was certainly wrong to assume that artifacts have been found in recent -1 and -2 tests. But my beef isn't quite with the existence of artifacts, or that the bit reduction process is necessarily obscure (although halb's and Nick's posts did a lot to explain them). It's that the entire design process of the algorithm seems obscure, and clarifying it (and potentially formalising it) would go a long way to help explain to users exactly what this is good for.

2BDecided's original post seemed to imply that the transparency of bit reduction can be solely proved based on one psychoacoustic principle: spectral masking below a noise floor. This appears to be one of the more fundamental results of psychoacoustics, and fairly hard limits on audibility can be determined a priori to listening tests, based on the literature.

This is the biggest advantage lossyWAV may have compared to other lossy formats. Most lossy encoders exploit multiple psychoacoustic effects to reduce bitrate while maintaining transparency. If one effect is exploited too aggressively, out of several effects being exploited in parallel, transparency is lost and an artifact is audible. But lossyWAV, if it only relies upon spectral masking, has only one point of failure, and one that is very well understood. The quantization distortion should not induce artifacts under any other psychoacoustic effect. That's in incredibly strong selling point to convince people to use lossyWAV for many, many applications.

But the sheer number of tunings that have occured in the final product (regardless of whether or not they are eventually made available to the user) made me question how ironclad this advantage really is. It seems to me that the algorithm should be proven transparent a priori of any listening tests, based entirely on signal processing principles, and only very little psychoacoustic principles (based only on masking the quantization noise with the background noise). But instead, the settings seem like they are based primarily on listening tests. Those are a correct testing method for lossy codecs, but for an encoder this agonizingly close to being able to be formally verified? The tunings have the slight air of a sausage factory behind them. The end result is tasty, but the means to the end are rather unsavory.

Perhaps lossyWAV has simply evolved to use slightly more psychoacoustic phenomena than a simple theory of spectral masking. That appears to be the justification for -skew and the spreading functions. Certainly, a tight argument can be made for taking into account the width of the critical bands to adjust the sensitivity of low/high frequencies. But it still seems like the other options are pretty much pulled out of a hat.

What would be ideal is if each step of the algorithm is shown to follow logically from critical band masking theory, or from a small finite set of psychoacoustic effects, and to show that the algorithm is immune to artifacts from other effects.

Perhaps I'm talking out of line by asserting that an algorithm like this can be formally verified?
Certainly not talking out of line, but beyond my limited knowledge, as I said - I'm just the programmer. The -skew and -spread (and -snr I suppose) functions and settings have certainly been arrived at heuristically. I've worked up beta v0.5.2 (attached) Superseded... to allow the original concept settings to be implemented using a -0 parameter (as closely as possible due to slight changes in the conv / spread combined function). Use -0 -clipping to emulate the original method settings, -0 -fft 10101 -clipping to emulate the three analysis version. -nts is the only other parameter available to you under the original method.

As to number of tunings, -fft, -nts, -snr, -skew and -spread are the only tunings used in the 3 default quality settings, others such as -clipping, -dither, -overlap, -window, -allowable are all defaulted to off.

I must stress that looking at the file sizes of the output of vanilla -0, I am fairly certain that artifacts will show in Atem_lied at the very least.

***** -0 is not a permanent quality setting, merely a response to a request. *****
lossyWAV -q X -a 4 -s h -A --feedback 2 --limit 15848 --scale 0.5 | FLAC -5 -e -p -b 512 -P=4096 -S- (having set foobar to output 24-bit PCM; scaling by 0.5 gives the ANS headroom to work)