lossyWAV Development

Topic: lossyWAV Development (Read 573681 times) previous topic - next topic

0 Members and 3 Guests are viewing this topic.

lossyWAV Development

Reply #1075 – 2008-04-23 10:09:52

With regard to help/advanced help/useability etc I think it makes sense to keep standard options to a minimum. Ideally just a quality level needing to be specified. Not only does that keep things simple it also means that non-techies like me only need to worry about the end result they're looking for and can rest assured that sensible settings have been used by default for all important parameters. Personally, I'd like to see quality settings going from lowest to highest (1 - 9) rather than the other way round, but that's only a personal feeling.

On help specifically, I'd like to see some mention of how the various values of any setting are going to affect the end result. EG :
"-nts <n> set noise_threshold_shift to n dB (-48.0dB<=n<=+36.0dB);
(-ve values reduce bits to remove, +ve values increase)."
is fine as far as it goes but it might be more helpful if it went on to say something like "So, the higher the negative value used the less the liklehood of noise being introduced but at the expense of a higher file size after processing through a lossless codec whereas the higher the positive value ......... etc". Maybe you feel that's over the top for help, if so maybe it could go in the wiki. By the way that's a bit out of date at the moment

lossyWAV Development

Reply #1076 – 2008-04-23 11:26:45

I'm also very fond of simplicity and clarity.
I support the suggestion that the advanced options should only be mentioned within something like a longhelp.

Thinking about the advanced options IMO only -nts and -analyses should be used.

It should be mentioned that there is no need to explicitly use -nts or -analyses as these are taken care of by the quality levels.

My suggestion for the -nts <n> description:

-nts 0 yields transparency according to experience.
A negative <n> adds a security margin (and increases file size) which is supposed to be overkill but maybe welcome when lossyWAV is used as a substitute for lossless archiving or similar applications with an extremely high quality demand.
A positive <n> yields a smaller filesize but adds the risk of audible deviations from the original. Due to additional internal precautions however a small <n> like
-nts 4 is expected not to harm transparency.

lossyWAV Development

Reply #1077 – 2008-04-23 15:50:17

Quote from: botface on 2008-04-23 10:09:52

Personally, I'd like to see quality settings going from lowest to highest (1 - 9) rather than the other way round

I'd just logged on to suggest exactly this!

Now it's grown from -1 -2 -3 to a non-integer 0-10 scale, I think it might make sense to tie it into a scale that people already understand. The obvious one for me is the one Vorbis uses - Q5 is transparent, lower might not be, higher is overkill / safety margin. Others may have other suggestions.

I apologise for not suggesting this sooner!

Cheers,
David.

lossyWAV Development

Reply #1078 – 2008-04-23 16:47:06

Quote from: 2Bdecided on 2008-04-23 15:50:17

non-integer 0-10 scale

Lame uses that scale, too...

lossyWAV Development

Reply #1079 – 2008-04-23 18:26:41

Quote from: 2Bdecided on 2008-04-23 15:50:17

Quote from: botface on 2008-04-23 10:09:52
Personally, I'd like to see quality settings going from lowest to highest (1 - 9) rather than the other way round
I'd just logged on to suggest exactly this!

Now it's grown from -1 -2 -3 to a non-integer 0-10 scale, I think it might make sense to tie it into a scale that people already understand. The obvious one for me is the one Vorbis uses - Q5 is transparent, lower might not be, higher is overkill / safety margin. Others may have other suggestions.

Yes indeed, if you use "q" or "Q" for quality, this seems eminently sensible as -q 5.0 is also the "standard" transparent setting for Musepack, and increasing quality should correspond to an increasing number.

Conversely, LAME VBR (because MP3 isn't necessarily a VBR format) uses -V (not -Q) and here, 0 is the highest quality and bitrate while 9 is the lowest, so people most familiar with LAME might not get the expected behaviour. This discrepancy has always been true of different JPEG image apps, some using discrete settings, some using a "Quality" (0 is worst quality) scale and some using a "Compression" scale (0 is best quality), none of which seemed to correspond very closely to the scale in different apps.

Your original scale for the betas released to date corresponds to the degree of "loss" or "compression" allowed, and oddly enough, with 2 or 3 being equivalent to the "transparent" standard, it corresponds rather closely to LAME's current VBR scale.

Regardless of what you choose, I'd suggest that if you're calling it "quality" it should be a "0 is worse quality than 9" type of scale, and if you're calling is "loss" or "compression" it should be a "0 is better quality than 9" type of scale. Given that "constant quality" is what VBR is all about, my vote is for calling the scale quality and reversing from where you are now.

Happily, if "-q 5.00" in future corresponds to the current -2 or -3 transparent setting, then "-q 0.00" or "-q 1.00" would pretty-much correspond to the current -7 or -8 (one of which you'll decide is the lowest acceptable quality for low-battery portable use).

lossyWAV Development

Reply #1080 – 2008-04-23 19:09:45

Though I personally don't care much about whether the quality scale goes up or down I like this idea of having a quality scale analogous to that of Vorbis as Dynamic describes it:

-q 0 = -8
-q 1 = -7
-q 2 = -6
-q 3 = -5
-q 4 = -4
-q 5 = -3
-q 6 = -2
-q 7 = -1
-q 8 = -0

This way I think everybody familiar with vorbis gets an immediate and intuitive feeling about the meaning of the quality setting.

I was thinking about the advanced options again, and with these differentiated quality scales I think we should drop -nts from the user interface for the final version. IMO just -analyses should make it into the advanced options.

lossyWAV Development

Reply #1081 – 2008-04-23 19:44:04

On a side note, presumably the same quality scale or loss scale as you decide upon for the release version of lossyWAV could be used in any future dedicated hybrid lossy encoder based on the same kind of analysis as lossyWAV (if anybody considers it worth developing - see last paragraph).

For example, if Wavpack or FLAC had a "constant-quality" or "VBR" lossy mode based on the same type of analysis as lossyWAV, then instead of using it to zero the LSBs over a whole block it could be used to define the maximum allowable prediction residual error that should remain in the audio. That could be done by defining the bit-depth of the residuals that get stored (probably the easiest method) or defining the maximum allowable error in the residuals and choosing those residuals in some other way. (The limited bit depth of the predictor, or metadata within the file header or block header might be an indicator of lossy processing, but I guess it's much harder to spot than zeroed LSBs when decoded to PCM, but that's always going to be a problem with other types of lossy, such as MP2, MP3, AAC, Vorbis, ADPCM and the like).

Actually, unless I'm missing something, I guess an encoder like that could retain long block lengths for predictor efficiency but could still use the shorter overlapping FFT analysis windows like lossyWAV to define the allowable uncoded prediction residual error at various times within the block. It might even be possible to continuously (smoothly) vary the allowable error from sample to sample within the block to follow the profile of permissible noise given by the FFT windows that overlap on that sample according to some interpolation or in proportion to the value of each lapping window function centred around each FFT window's centre sample.

Obviously, the predictor's value is dependent on the previous samples, which are now different thanks to the permitted error, and this may worsen the prediction slightly (but this hasn't stopped Wavpack lossy from creating remarkable bitrate reductions with remarkable quality).

Both approaches based on this analysis method hold out great hope for transparent or high quality lossy audio with fairly modest bitrate and relatively low decoding complexity and a closely equivalent quality scale that could show any bitrate savings between methods quite accurately. A correction file for restoring true lossless is compatible with either method (unless you get into serious noise shaping and it becomes too large to use).

LossyWAV certainly delivers most of the possible gain in compression and it is compatible with a number of well-supported lossless codecs completely unchanged (with the option of converting from, say WavPack to FLAC according to support on the playback target without further audio loss), so the possible efficiency advantage of the second approach may be a case of diminishing returns and reduced flexibility regarding re-encoding to another codec. It would be even worse in terms of waiting for decoder support if such an implementation were no longer compatible with existing FLAC or Wavpack decoders, especially those embedded into playback devices.

lossyWAV Development

Reply #1082 – 2008-04-23 20:09:20

So many posts to take in - with so many valid observations / ideas / comments / etc....

How about:

-0..-8 > -q 10 .. -q 0?

i.e. -0 > -q 10.0; -1 > -q 8.3333; -2 > -q 6.6667; -3 > -q 5.0; -4 > -q 4.0; -5 > -q 3.0; -6 > -q 2.0; -7 > -q 1.0; -8 > -q 0.0?

-snr and -nts could be removed from the user interface in v1.0.0, along with -noclips (perhaps).

lossyWAV Development

Reply #1083 – 2008-04-23 20:55:50

That's fine, IMO, and, yes, I forgot about -noclips: I'd like to see -noclips in the advanced options.

lossyWAV Development

Reply #1084 – 2008-04-23 21:39:28

Quote from: halb27 on 2008-04-23 20:55:50

That's fine, IMO, and, yes, I forgot about -noclips: I'd like to see -noclips in the advanced options.

I'll get to work on beta v0.9.6 tomorrow (I've been installing another RAID card in my server and moving drives about this evening....).

The focus for beta v0.9.6 will be to implement the -q <n> parameter and remove the -<n> parameter, to significantly simplify the basic settings and to introduce the -help and -help -detail parameters / combination to give basic help (beyond that given by running lossyWAV with no parameters) and advanced help (with the advanced settings added).

lossyWAV Development

Reply #1085 – 2008-04-23 22:13:14

Quote from: Dynamic on 2008-04-23 18:26:41

Conversely, LAME VBR (because MP3 isn't necessarily a VBR format) uses -V (not -Q) and here, 0 is the highest quality and bitrate while 9 is the lowest, so people most familiar with LAME might not get the expected behaviour. This discrepancy has always been true of different JPEG image apps, some using discrete settings, some using a "Quality" (0 is worst quality) scale and some using a "Compression" scale (0 is best quality), none of which seemed to correspond very closely to the scale in different apps.

Your original scale for the betas released to date corresponds to the degree of "loss" or "compression" allowed, and oddly enough, with 2 or 3 being equivalent to the "transparent" standard, it corresponds rather closely to LAME's current VBR scale.

It didn't surprise me to see lossyWav doing it the same way as we did with LAME. We share the idea, that we have to add more noise, to get smaller files or a higher compression ratio. And the question is, how much would you like to distort your input signal.

Quote

Regardless of what you choose, I'd suggest that if you're calling it "quality" it should be a "0 is worse quality than 9" type of scale, and if you're calling is "loss" or "compression" it should be a "0 is better quality than 9" type of scale.

So, you would prefer to fly 9th class over 1st class? In school I would prefer to get the note 1 (best) over 6 (worst). I don't think higher quality is associated with higher numbers naturaly, it depends on your social context.

Quote

Given that "constant quality" is what VBR is all about, my vote is for calling the scale quality and reversing from where you are now.

Well, by choosing any switch, the user can only degrade quality by increasing the number of bits to remove. Or did I miss a quality enhancement switch?

Don't get me wrong, I'm fine with whatever quality/compression scheme Nick wants lossyWav to have.

lossyWAV Development

Reply #1086 – 2008-04-24 12:36:07

Instead of using Vorbis' -q n quality scale we could use Lame's -V n quality scale of course.
It's all a matter of taste.
I personally prefer the Vorbis analogy, not because of the scale direction which doesn't matter to me at all, but because I have a positive feeling towards the correspondence of -3 with -q5 and the corresponding consequences for the other quality settings. Such a -q5 can be considered transparent with a probability extremely close to 1, and from -q6 on there is an ever increasing security margin with a large security margin range to choose from.
With the Lame analogy I see problems. Which -V setting should correspond to -3? It would have to be -V3 or worse qualitywise in order to have our -2 to -0 correspond with higher -V settings.
I feel more comfortable having -3 correspond with Vorbis -q5 than with Lame -V3.
Moreover because of lossyWAV's properties I think it's good to put some emphasis to the extremely high quality settings (useful for high quality lossy archiving for instance).
With Nick's last proposal we have a lot of -q levels which deal with this high end demand (while we still have a lot of -q settings dealing with the lower end).
With the Lame -V levels it wouldn't be like that (or only if we let lossyWAV -3 correspond with something like -V5 which I think isn't very adequate).

lossyWAV Development

Reply #1087 – 2008-04-24 13:02:23

I don't see the point, why should any lossyWav setting match any Vorbis or LAME setting? If you say lossyWav -3 matches Vorbis -q5, why should I use lossyWav at all? If both are equal in quality, I would choose the smaller files. LossyWav wanted to fill a gap between lossless and other lossy encodings, if you want to pick up the vorbis scale, shouldn't it be from -q5(?) to -q20 then?? And no, I'm not proposing to choose a LAME alike scale, it's just, the lossyWav original settings made sense to me and I don't see any need to change that. Just my two cents.

lossyWAV Development

Reply #1088 – 2008-04-24 13:43:07

Quote from: Nick.C on 2008-04-23 20:09:20

-snr and -nts could be removed from the user interface in v1.0.0, along with -noclips (perhaps).

I'd like to keep -nts (in the avanced category of course) it is the most meaningful parameter to tweak (apart from the -q n).
I'm neutral on the -quality vs. -n scale but if you want to change it, now is the time (before the first "final/stable"). It would be nice to know the settings of the integer values of whatever scale is chosen (nmt snr)

BTW is the default still corresponding with -2 ? I suggest to move the default to -3 of -4 (of the old scale).

lossyWAV Development

Reply #1089 – 2008-04-24 13:54:46

Quote from: robert on 2008-04-24 13:02:23

I don't see the point, why should any lossyWav setting match any Vorbis or LAME setting? ... LossyWav wanted to fill a gap between lossless and other lossy encodings, if you want to pick up the vorbis scale, shouldn't it be from -q5(?) to -q20 then?? ..... it's just, the lossyWav original settings made sense to me and I don't see any need to change that. Just my two cents.

Agree entirely. That's my two cents.

C.

lossyWAV Development

Reply #1090 – 2008-04-24 13:58:26

It's pure emotion, no real reason. lossyWAV -3 quality isn't the same as Vorbis -q5's of course.
To me - maybe only to me - it's like this:
If I'd use Vorbis and struggle for transparency in a robust way while trying not to waste file size I'm fine with -q5 (of course a matter of taste and -q4 or -q6 are candidates for an appropriate setting as well). If I'd want an additional safety margin (maybe for archiving purposes) I'd better use -q6 or higher.
With the Lame -V settings as an alternative there's simply not so much room for various high end settings (also a matter of taste).

Sure the analogies no matter whether it's about Vorbis or Lame have their drawbacks as they may suggest that we get a quality at ~400 kbps that we can have at ~200 kbps or below using Vorbis or Lame.

Despite this my personal preference is still with the Vorbis-like scale, but the many words I've used to try to make that understand are misleading: after all I don't care much about it. I'm also happy with the original lossyWAV scale.

lossyWAV Development

Reply #1091 – 2008-04-24 13:58:47

Quote from: GeSomeone on 2008-04-24 13:43:07

Quote from: Nick.C on 2008-04-23 20:09:20
-snr and -nts could be removed from the user interface in v1.0.0, along with -noclips (perhaps).
I'd like to keep -nts (in the avanced category of course) it is the most meaningful parameter to tweak (apart from the -q n).
I'm neutral on the -quality vs. -n scale but if you want to change it, now is the time (before the first "final/stable"). It would be nice to know the settings of the integer values of whatever scale is chosen (nmt snr)

I could be persuaded to leave -nts in the advanced options....

[edit] Throughout the development of lossyWAV, -1, -2 and -3 have always been called quality presets. Yes, I agree that 1st class is better than 2nd class, but where does 0th class fit in (as it doesn't exist in normal speech). So, I've gone for a quality-increases-with-value-of-numerical-preset approach, on a scale of 0 to 10. Moving from -1, -2 and -3 to -1 to -7 it seems a logical progression to allow 100,000 quality preset options between -q 0.0 to -q 10.0 with a 0.0001 resolution rather than the original 3. This will allow the user to chose a personal transparency level much more easily than if they had to juggle -nts and -snr manually. Maybe some explanation will need to be added to the wiki with comparisons with previous preset bitrates. [/edit]

I've implemented the -q 0 to 10 quality preset selection and have had a thought. Up until now, the maximum bits-to-remove has been limited to (rms-value-of-all-samples-in-codec-block - 3). I am considering introducing a mechanism which would change the 3 by adding the quality-preset value divided by 4, i.e. at -q 10 subtract 5.5 rather than 3.0. This would increase the output of my 53 problem sample set from 611kbps to 616kbps at -q 10 (-nts -12 -snr 30) and from 472kbps to 482kbps at -q 5 (-nts 0 -snr 21).

lossyWAV Development

Reply #1092 – 2008-04-24 14:05:30

Quote from: Nick.C on 2008-04-24 13:58:47

I've implemented the -q 0 to 10 quality preset selection and have had a thought. Up until now, the maximum bits-to-remove has been limited to (rms-value-of-all-samples-in-codec-block - 3). I am considering introducing a mechanism which would change the 3 by adding the quality-preset value divided by 4, i.e. at -q 10 subtract 5.5 rather than 3.0. This would increase the output of my 53 problem sample set from 611kbps to 616kbps at -q 10 (-nts -12 -snr 30) and from 472kbps to 482kbps at -q 5 (-nts 0 -snr 21).

I like the idea.

lossyWAV Development

Reply #1093 – 2008-04-24 14:15:25

Quote from: Nick.C on 2008-04-24 13:58:47

I am considering introducing a mechanism which would change the 3 by adding the quality-preset value divided by 4, i.e. at -q 10 subtract 5.5 rather than 3.0.

If I understand correctly, this would only matter for the really quiet parts (or tracks).
Your proposal would increase the bit rates overall (for tracks with quiet passages), how about to even that out by lowering the first constant to (e.g.) 2. -q 0 => 2+0, -5 => 2+1.25, -10 => 2+2.5 ?

lossyWAV Development

Reply #1094 – 2008-04-24 14:22:33

Quote from: GeSomeone on 2008-04-24 14:15:25

Quote from: Nick.C on 2008-04-24 13:58:47
I am considering introducing a mechanism which would change the 3 by adding the quality-preset value divided by 4, i.e. at -q 10 subtract 5.5 rather than 3.0.
If I understand correctly, this would only matter for the really quiet parts (or tracks).
Your proposal would increase the bit rates overall (for tracks with quiet passages), how about to even that out by lowering the first constant to (e.g.) 2. -q 0 => 2+0, -5 => 2+1.25, -10 => 2+2.5 ?

I'm trying it, but - all of the recent ABX'ing has been done with a minimum of 3 bits kept - so I am reluctant to change the lower limit....

[edit] At -q 0: 2.0 = 306kbps; 2.5 = 310kbps; 2.75 = 313kbps; 3.0 (existing) = 318kbps.

Maybe this could be an advanced option instead, i.e. -minbits <n> would allow the user to add n bits to the minimum_bits_to_keep value at -q 10, n/2 at -q 5, 0 at -q 0, etc. [/edit]

lossyWAV Development

Reply #1095 – 2008-04-24 14:43:36

Quote from: Nick.C on 2008-04-24 14:22:33

all of the recent ABX'ing has been done with a minimum of 3 bits kept - so I am reluctant to change the lower limit....

In my example only -q 0-3 would get just slightly lower minimum bits to keep, -q 4 and up would still have at least 3. It was just an idea, introduce more variability, but try not bloat the "default" bit rate without an ABXable reason.. (You can also change the constant to 2.25 or 2.5 if you think it is necessary)

lossyWAV Development

Reply #1096 – 2008-04-24 14:46:57

Quote from: GeSomeone on 2008-04-24 14:43:36

Quote from: Nick.C on 2008-04-24 14:22:33
all of the recent ABX'ing has been done with a minimum of 3 bits kept - so I am reluctant to change the lower limit....
In my example only -q 0-3 would get just slightly lower minimum bits to keep, -q 4 and up would still have at least 3. It was just an idea, introduce more variability, but try not bloat the "default" bit rate without an ABXable reason.. (You can also change the constant to 2.25 or 2.5 if you think it is necessary)

Or, just allow the user to select a minimum-bits-to-keep between 0 and 8(?), defaulting to 3 for no user input?

lossyWAV Development

Reply #1097 – 2008-04-24 14:56:47

I think it does make sense to try to go with something like an existing quality scale, rather than inventing yet another one.

I was going to suggest keeping nts, but I thought that changes from the original algorithm meant that the effects of wild changes to nts were bounded by other parameters - so really, if you want a given effect, you either change all those paramters, or use a quality pre-set that does it for you. So I don't think nts has to stay in a stable release, if the quality pre-sets are well tested.

Cheers,
David.

lossyWAV Development

Reply #1098 – 2008-04-24 15:07:46

Quote from: Nick.C on 2008-04-24 14:46:57

Or, just allow the user to select a minimum-bits-to-keep between 0 and 8(?), defaulting to 3 for no user input?

Alright with me, but that is not the same idea i.e. non scaling and an extra advanced setting.

lossyWAV Development

Reply #1099 – 2008-04-24 17:53:31

Quote from: Nick.C on 2008-04-24 14:46:57

[Or, just allow the user to select a minimum-bits-to-keep between 0 and 8(?), defaulting to 3 for no user input?

That's ok for me, too.

Now that the encoder has changed a bit I'd like to do another listening test. Because listening tests aren't so much fun I'd like to do this at a time where the encoder is not expected to change again before the final release.

Notice