Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: An idea of audio encode algorithm, based on maximum allowed volume of  (Read 35970 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.


An idea of audio encode algorithm, based on maximum allowed volume of

Reply #27
Well, since Wavpack lossy doesn't just discard data at random to achieve data reduction, I will go ahead and say it does. But that depends on your definition of what makes a psycho-acoustic model. Extrapolating my line of thought would of course also mean that LPCM 44.1kHz 16bit has an implied psycho-acoustic model dictated by human hearing (bandwidth and SNR). Maybe that is taking it a bit far, but at what point does "exploiting the limits of human hearing" become a psycho-acoustic model?

Anyway, I added that line in my original post to acknowledge the fact that the OP (as far as I have understood) isn't just trying to create ~8-bit audio, but rather imposing a bound on the maximum allowed error after Wavpack's psycho-acoustic lossy treatment. So the actual error might be a lot less. I believe the OP may have been misunderstood on this account which he could use to disregard some of the other arguments. I did not want to present this attack vector.


An idea of audio encode algorithm, based on maximum allowed volume of

Reply #29
In that case, maybe I need to revise my definition of what constitutes a "psycho-acoustic model". Does it need to include all 5 factors listed in the HA-Wiki?

To stick with Wavpack, the manual makes no claim of Wavpack lossy using temporal masking or simultaneous masking (and no claim to the contrary as far as I can tell). But whatever Wavpack lossy is doing seems to work pretty well, so it must be doing something right with regard to exploiting the limits of our hearing. If the principles behind its processing should not be called psycho-acoustic, what then should they be called?

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #30
Can you play the correction file to a Wavpack lossy file? If it just sounds like noise then I suppose it is not using psychoacoustics. Otherwise you would hear parts of the music that were omitted because they would be masked anyway.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #31
I tried the old inversion trick on a drum & bass song and you can hear percussive elements; in my example you can clearly make out the snare hits. There seem to be no tonal elements.

On an a-capella song you hear broadband noise with a similar amplitude response as the original. "s" sounds in the original produce short bursts of high pitched noise in the difference file.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #32
Premises:
(1) If a residual signal created by mixing the uncompressed signal and the complement of its compressed counterpart (= difference signal) has elements that are audibly correlated to the musically identifiable content from the former, this indicates that psychoacoustics were involved in the process of compression.
(2) If such a signal does not have elements that are audibly correlated to the uncompressed signal, this indicates the absence of psychoacoustics.
Neither is a valid rule.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #33
For the record, I'm not in any position to define what constitutes a "proper" psychoacoustic model.  I've seen WavPack Lossy been classified as not having a psychoacoustic model and as having a weak psychoacoustic model.  I'm pretty sure I've seen the same done for LossyWAV.  It seems easy enough for someone like myself to simply hedge a bet and say it has a weak psychoacoustic model.

At any rate, I hope we can agree that what was being offered by the OP can't even be classified as having a weak psychoacoustic model.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #34
At any rate, I hope we can agree that what was being offered by the OP can't even be classified as having a weak psychoacoustic model.

Actually this seems to be his aim in proposing a somewhat "pure mathematical" method, exactly to get rid of psychoacoustic models altogether because in his own words...
Quote
... psychoacoustic models do not garantee you anything. They give you only approximate results and sometimes fail.
... I live by long distance.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #35
The point is in reducing file size without any audible loss of quality on all inputs possible with 100% guarantee.
I think that's called FLAC.

Cheers,
David.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #36
Lossless is the only true guarantee.

LossyWAV's original approach was to use pure mathematics to measure the noise floor present in the actual music and allow a little less noise than its minimum value with no other special tricks.

The OP seems to want some amount of reduced accuracy calculated based on the signal strength to ensure it's always safe, and may or may not have realised that this is mathematically the same as saying noise is added.

I think lossyWAV --maxclips 0 (i.e. lossyWAV standard) hasn't even shown non-transparency on extreme full-scale test signals, so lossyFLAC, lossyWV, lossyTAK etc are all viable.

In real music, the --maxclips 0 can be omitted with no reported problem samples. (The artificial test sample that generated the clipping problem was about 10 dB louder to the ear than today's maximally loud albums - or --maxclips 0 can be included at the expense of a minor bitrate increase).

The first versions (or if you turn of all noise shaping in later versions using --shaping 0) make the bare minimum psychoacoustic assumption that the one kind of white noise is indistinguishable from another and keeps the added noise below the minimum noise measured in the signal's audible spectrum without making any kind of filtering or frequency-shaped noise.  (This is the flavour of lossyWAV support included in CUETools and CUERipper, and I've had no problems using it as a source for transcoding into conventional lossy)

Really, I think this amounts to what the OP asked about, and is slightly better. It amounts to effectively remastering at the minimum required bit-depth, dynamically changing it throughout the music. Lossy encoders like FLAC, TAK, WavPack and WMA-Lossless can take advantage of the reduced bit-depth to save bitrate (but ALAC, most notably, cannot).

Thanks again to 2Bdecided for the idea of lossyFLAC and NickC for implementing lossyWAV from it (and all the others who helped).

The improvements to lossyWAV up to v1.3 (adaptive shaping of the added noise to match the signal spectrum) seem to have been very safe and conservative and seem to have actually hidden the noise better with more margin of safety, but that does amount to extending the bare minimum psychoacoustic model to slightly more advanced (still safe, sound assumptions).
Dynamic – the artist formerly known as DickD

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #37
And it makes perfect sense: if you wouldn't consider the input level, a quiet signal would sound worse after your coding than a loud but otherwise identical signal.

First, if I understand you correctly: turn the volume control on maximum, and you will hear the noise... but nobody listens music on such a volume.

You don't have to turn up the volume control. Convert some CD audio to 8 bit/sample, that gives you the ~45 dB difference level you want. You'll find it's not enough for many music files, especially ones with long fade-outs.

Quote
Also, I've made a test: encoded one sample into WavPack 192 kbps (lowest possible), and track peak of the difference file was 0.077026. Then I decreased the volume on 40 dB, encoded again in 192 kbps, and you think track peak of the difference file was about the same 0.077026? No, it was 0.000854. Encoders know about such a tricks, so we are in safety here.

Yes, we are in safety because - like I wrote - encoders work with something similar to instantaneous (segmental) SNR! This is supported by what Gecko found: he "... tried the old inversion trick on a drum & bass song and you can hear percussive elements". (Edit: Since that's the case I'd join greynol and consider this a weak psychoacoustic model).

Quote from: 2Bdecided link=msg=0 date=
Quote
The point is in reducing file size without any audible loss of quality on all inputs possible with 100% guarantee.

I think that's called FLAC.

Or 512-kbps AAC or Opus. I cannot think of any signal which would not be coded transparently at that bitrate. Because I don't know of any signal which is not transparent at e.g. Winamp AAC VBR 6 at half that bitrate on average. (Edit: I'm talking about stereo here of course).

Chris
If I don't reply to your reply, it means I agree with you.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #38
The point is in reducing file size without any audible loss of quality on all inputs possible with 100% guarantee.
I think that's called FLAC.

Lossless is the only true guarantee.

The popularity of lossless is based mostly on placebo effect. File size of lossless does not match real quality it has.
Quote
I think lossyWAV --maxclips 0 (i.e. lossyWAV standard) hasn't even shown non-transparency on extreme full-scale test signals, so lossyFLAC, lossyWV, lossyTAK etc are all viable.
In real music, the --maxclips 0 can be omitted with no reported problem samples. (The artificial test sample that generated the clipping problem was about 10 dB louder to the ear than today's maximally loud albums - or --maxclips 0 can be included at the expense of a minor bitrate increase).

The difference, I hear in samples I've posted, is not about clipping. It is simply a noise, added by lossyWAV, which is sometimes too loud. Also I do not see, how --maxclips 0 changes the situation. The weakest point of lossyWAV is when there is a quiet noise and some simple one tone signal, which can be a single note of some musical instrument.
Quote
The first versions (or if you turn of all noise shaping in later versions using --shaping 0) make the bare minimum psychoacoustic assumption that the one kind of white noise is indistinguishable from another and keeps the added noise below the minimum noise measured in the signal's audible spectrum without making any kind of filtering or frequency-shaped noise.  (This is the flavour of lossyWAV support included in CUETools and CUERipper, and I've had no problems using it as a source for transcoding into conventional lossy)

Yes, I checked --shaping 0 and --shaping 1 also, and it seems they are not audible on "standard" preset (they are audible on "economic"). So it seems to be, that adaptive noise shaping, used in lossyWAV by default, is not the best choise for quality presets "standard" and higher. Though with ANS some samples are not audible on "extraportable", where "shaping 0" and "shaping 1" are clearly audible.
Quote
The improvements to lossyWAV up to v1.3 (adaptive shaping of the added noise to match the signal spectrum) seem to have been very safe and conservative and seem to have actually hidden the noise better with more margin of safety.

I also think so, but sometimes ANS puts too much noise where there is no enough space for it, and it becomes clearly audible, so I think it is a possible direction for improvement of lossyWAV.
Convert some CD audio to 8 bit/sample, that gives you the ~45 dB difference level you want. You'll find it's not enough for many music files, especially ones with long fade-outs.

If do it without dithering, the noise will be about -1.4 dB, and if use dithering, yes, it will be about -45 dB, but it all will be in high frequencies, so using equalizer will make it easily audible. What I'm talking about is not just such a simple technic as converting into 8 bit. As Gecko wrote:
Quote
Anyway, I added that line in my original post to acknowledge the fact that the OP (as far as I have understood) isn't just trying to create ~8-bit audio, but rather imposing a bound on the maximum allowed error after Wavpack's psycho-acoustic lossy treatment.

But I have found, that -45dB is audible for WavPack, so just simple restriction of maximum volume of error signal will be not efficient enough. Anyway, what I want is some vbr quality oriented mode of WavPack, and then it will be more clear, how good it is.
Quote
Or 512-kbps AAC or Opus. I cannot think of any signal which would not be coded transparently at that bitrate. Because I don't know of any signal which is not transparent at e.g. Winamp AAC VBR 6 at half that bitrate on average. (Edit: I'm talking about stereo here of course).

It depends on what to call "transparent". Vorbis is audible on 619 kbps (check "FighterBeatLoop" sample in Uploads section), but it sounds good on much lower bitrates.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #39
Convert some CD audio to 8 bit/sample, that gives you the ~45 dB difference level you want. You'll find it's not enough for many music files, especially ones with long fade-outs.


If do it without dithering, the noise will be about -1.4 dB, and if use dithering, yes, it will be about -45 dB, but it all will be in high frequencies, so using equalizer will make it easily audible.


The audio quality of this approach will already be poor.  If you're also going to use EQ, you should absolutely apply it BEFORE you encode.  Otherwise you will need to tolerate much lower quality or else even higher bitrates. 


An idea of audio encode algorithm, based on maximum allowed volume of

Reply #40
Hey everyone, I just had this great idea that should revolutionize rockets and space travel. My idea is to have a rocket with two big tanks: one filled with coal and the other one filled with compressed air. Then you just mix the two and set it on fire. This should be much more effective than existing rockets!

More seriously softrunner, you have to realize that what you're proposing is just as outrageous as my statement above. In both cases, it's a bad approximation of fundamentals that have been known for decades (at least since the 60s in the case of audio). I would strongly suggest you learn more about fundamentals of audio and perception rather than argue with people who are trying to point out all the flaws in your reasoning.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #41
But I have found, that -45dB is audible for WavPack, so just simple restriction of maximum volume of error signal will be not efficient enough.

That's what we have been trying to tell you...

You seem to be looking for some form of holy grail of lossy audio encoding: great compression, zero artifacts, super simple algorithm. Many smart people have spent a lot of time and effort to give us good compression and few artifacts. But the algorithms involved usually aren't very simple.

If you can do better, fantastic! But so far, you're approach hasn't been very convincing.

Maybe you really should be looking into lossyWAV. I believe the original Matlab code is still around here somewhere... Ah, yes: http://www.hydrogenaudio.org/forums/index....showtopic=55522

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #42
Quote
Or 512-kbps AAC or Opus. I cannot think of any signal which would not be coded transparently at that bitrate. Because I don't know of any signal which is not transparent at e.g. Winamp AAC VBR 6 at half that bitrate on average. (Edit: I'm talking about stereo here of course).
It depends on what to call "transparent". Vorbis is audible on 619 kbps (check "FighterBeatLoop" sample in Uploads section), but it sounds good on much lower bitrates.
Zoom:
Quote
It depends on what to call "transparent".
The irony is strong with this one. How do you define “transparent”, then? To me, it seems as though your ideal definition is transparency for everyone all the time. Setting aside how patently absurd that idea is since transparency specifically refers to specific combinations of listener and material, your pointing out how a codec that is usually transparent at much more sensible bitrates fails to be transparent at a very high bitrate with one particular sample does not support your argument: it’s actually undercutting it. There will always be exceptions to transparency, at least for certain people and certain signals, and none of your nice-sounding-in-novice-theory-but-baseless-in-practice ideas are likely to change that. At least develop a consistent narrative before you try to make everyone implement it at your behest.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #43
I checked --shaping 0 and --shaping 1 also, and it seems they are not audible on "standard" preset
Are you saying that lossyWAV standard without noise shaping is transparent?

I hope so. That is supposed to be the point, though I wouldn't stake my reputation (never mind my life) on it.

If I've understood you correctly, I think it's the closest thing you're going to get to your goal.

Cheers,
David.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #44
It depends on what to call "transparent".
The irony is strong with this one. How do you define “transparent”, then? To me, it seems as though your ideal definition is transparency for everyone all the time. Setting aside how patently absurd that idea is...
Why is it absurd? The HA mantra is that CD quality audio is transparent WRT a stereo source. There are caveats (most importantly, gain riding may break it; lousy implementations may break it), but here at least, it's not a controversial statement.

If altering a single bit in a CD quality audio signal renders it non-transparent (i.e. that single bit change is audible) to some person under some reasonable listening circumstances, then miraculously CD quality really does define transparency, and nothing less counts. However, I suspect that's not the case. With a bound set of use cases, you can probably create something that's more efficient than lossless coding of CD quality audio while remaining transparent to the stereo source. If you feed lossyWAV with 24-bits, for example, it'll cope with the gain riding that CD quality audio will not, while (I expect!  ) remaining transparent, and at a lower bitrate.

Quote
since transparency specifically refers to specific combinations of listener and material, your pointing out how a codec that is usually transparent at much more sensible bitrates fails to be transparent at a very high bitrate with one particular sample does not support your argument: it’s actually undercutting it. There will always be exceptions to transparency, at least for certain people and certain signals, and none of your nice-sounding-in-novice-theory-but-baseless-in-practice ideas are likely to change that.
It seems to me that the more "clever" you try to be in designing a codec, the lower a bitrate you can achieve for transparency "most" of the time, the fewer number of problems samples you will have, but greater the percentage bitrate increase is required to deal with those few problem samples you have left. I am talking anecdotally - I have no evidence or justification for this.



However, you are absolutely correct to call out softrunner on their use of "transparent", because until you define what you mean, the rest of the discussion is pointless. e.g. is it transparent even if you post-process with...
1) EQ. If so, how much?
2) DRC. If so, how much?
3) stereo processing. If so, what? Logic7? Vocal Cut? etc
4) phasing and flanging? Other DSP effects and production techniques?
5) adding an inverted copy of the original signal and amplifying the result by 100dB?

Only lossless is transparent with post-process number 5  I suspect 4 can be almost as tricky, and 3 is quite tricky. 1 and 2 can be accounted for and bounded.

Good luck softrunner.

Cheers,
David.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #45
It depends on what to call "transparent".
The irony is strong with this one. How do you define “transparent”, then? To me, it seems as though your ideal definition is transparency for everyone all the time. Setting aside how patently absurd that idea is...
Why is it absurd? The HA mantra is that CD quality audio is transparent WRT a stereo source. There are caveats (most importantly, gain riding may break it; lousy implementations may break it), but here at least, it's not a controversial statement.
I wasn’t talking about CDDA: I was talking about lossy encoders. I thought that would be obvious as the latter is the subject of the thread, but I suppose I appreciate your frequent efforts to remove all possible ambiguity from my posts.

My conclusion is that a lossy codec/setting that will always be transparent is unlikely enough when said transparency applies only to one person, never mind entire swathes of the population. I also have to side with all the other developers who would probably have implemented the various things that are being suggested if they were worthwhile or functioned the way that softrunner imagines.

At the end of the day, if one is going to worry so much about transparency, especially when the lack thereof is probably going to be highly rare with existing codecs at modestly high settings, the answer is to use a lossless codec. I just don’t see the point in supposing about a guaranteed perfect lossy codec when no such thing can exist by definition as something created by imperfect organisms. And if I carry on typing, I’m going to get way OT with idle armchair philosophy, so I’d better stop!

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #46
Sorry db1989, I'm not trying to personally attack you! I was even a little worried it would look like that when I replied to yet another of your posts. It's just coincidence that several of your recent posts have made me want to jump in and explore that particular topic in more depth. Case in point.

Let's think outside the box: of course CD audio is lossy. It's lossy compared to the stereo source signal. You could even say that it has a very simple static psychoacoustic and listening room model: it assumes we cannot hear anything above 20kHz*, and that noise 90dB* down from peak signal level will typically be inaudible. Those assumptions aren't entirely true 100% of the time, but they'll do for almost any music listening you can imagine.

(* = as envisaged; CD can do better in practice these days with good oversampling and good noise shaping.)

In terms of preserving, say, an analogue tape or vinyl record, we say that CD really is transparent. The frequency limit is beyond all but the best hearing, and the noise floor is below that of the source. Note that this is not the same as saying it preserves even the noise level perfectly, because adding a lower level noise (-90dB) will raise the noise floor (typ. -60dB) a little - it's just that the change is inaudible (a fraction of a dB). That's psychoacoustics again.


There are things you can do to a CD quality signal that make barely any more psychoacoustic assumptions than any of the above. They are lossy (referenced to the CD quality signal), but so "safe" that some people are happy saying they are transparent. I call them near-lossless, but there might be a better name.

The point I think I'm trying to make is that, apart from mathematically perfect coding of a set of numbers (which is lossless),  it's all shades of grey (though hopefully not 50 of them), rather than black and white. Is it transparent? Is it lossless? Does it use a psychoacoustic model? I am usually perfectly happy with the general definitions we use of these words. However, when you get very very picky, or start to talk about absolute transparency of lossy codecs, or start to talk about the "psychoacoustic" model of near-lossless codecs, I think you have to be really careful.


I agree that anyone who is worried about these things should use a lossless format for the audio ripped from their CDs.

Cheers,
David.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #47
Sorry db1989, I'm not trying to personally attack you! I was even a little worried it would look like that when I replied to yet another of your posts. It's just coincidence that several of your recent posts have made me want to jump in and explore that particular topic in more depth. Case in point.
Oh no, not at all! I’ve never been offended, and I do appreciate clarifications and elaborations from someone who obviously has more experience in the relevant fields as well as a more creative outlook. Mentioning the fact that it’s happened more than once was just a little joke, somewhat poking fun at myself, and certainly not alleging anything against you.

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #48

An idea of audio encode algorithm, based on maximum allowed volume of

Reply #49
Let's think outside the box: of course CD audio is lossy. It's lossy compared to the stereo source signal. You could even say that it has a very simple static psychoacoustic and listening room model: it assumes we cannot hear anything above 20kHz*, and that noise 90dB* down from peak signal level will typically be inaudible. Those assumptions aren't entirely true 100% of the time, but they'll do for almost any music listening you can imagine.
Kind of reductio ad infinitum: if we go along with this line of reasoning, we should say that what our hearing system transmits to our brain is a lossy version of the real soud event, and air molecules vibration is a sampled and quantized lossy reproduction of actual instrument's vibrations...
But, if a tree falls in the forest when nobody's there, does it make any noise?

As I see it, sound, or better, music production is all about psychoacoustic and the reference model is always our hearing system, so arguing about bandwidth and SNR limits of CD format (*) means willing to (re)produce something that not only nobody could realistically hear, but that wasn't even in composer's or player's or instrument builder's mind in the first place!

(*) as an end user's format at least, just to take into account meaningful reasons in favour of 24/96 the highest technically feasible format at the moment for recording, mixing, mastering stages etc...
... I live by long distance.