HydrogenAudio

Lossy Audio Compression => Other Lossy Codecs => Topic started by: softrunner on 2013-03-05 23:11:18

Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: softrunner on 2013-03-05 23:11:18
Full topic title: "An idea of audio encode algorithm, based on maximum allowed volume of signals difference"

Recently I have discovered for myself, that the difference of the source and encoded audio can be easily obtained by inverting source audio and mixing it with the encoded one. Then the idea of encode algorithm came into my head: just try to keep the signals difference at the same level (or less), defined by user. Thus, the audio quality is simply measured by volume of the difference of the signals, and this difference is nothing but distortions, produced by encoder.
The whole algorithm looks like this:
1. Take maximum allowed volume of signals difference from user.
1. Make a copy of source audio and invert it.
2. Split both source and inverted audio on frames of the same size.
3. Encode first frame of source audio, mix the result with first frame of inverted audio and calculate the volume of obtained difference.
4. If the volume of the difference is higher, than allowed by user, add some bitrate and repeat from item no. 3.
5. If the volume of the difference is not higher, than allowed by user, add first encoded frame to the final output.
6. Repeat items 3-5 with second, third, etc... frames, until the end of the source file.

Of cause, this algorithm is much slower then just direct encode, but definately if should not be slower, than video encoding (and people are ready to wait for many hours while their videos are being encoded).

I tried to reproduce this algorithm manually by test using WavPack hybrid mode as an encoder (source audio sample was splitted on 11 parts of 1 second), and it showed, that 23.4 % of space/bitrate could be saved. Another important thing is that the user is guaranteed, that he will not get distortions with volume level, higher then he expects, so he can safely encode many files simultaneously without looking at the content. User gets freed both from unnecessary waste of bitrate and uncontrolled distortions.

The only thing is needed is that some audio developers get interested in this idea and implement it as a computer program.

The whole set of files of the WavPack test I've made is here (http://www.mediafire.com/?40928vrkx6wmsbz).
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: saratoga on 2013-03-05 23:21:34
Quote
Then the idea of encode algorithm came into my head: just try to keep the signals difference at the same level (or less), defined by user. Thus, the audio quality is simply measured by volume of the difference of the signals, and this difference is nothing but distortions, produced by encoder.


The problem with this approach is that audibility has very little to do with the absolute volume due to masking.  So an error at -20dBFS might be inaudible if its masked, and very audible at -40dbFS if its not masked.  So usually what codecs do is when they get to step 4, they compute masking thresholds and adjust the error based on how audible it will be.  In this case you will likely find that huge error signals are often highly tolerable, while small error signals often not.

Quote
Of cause, this algorithm is much slower then just direct encode


It doesn't have to be.  You can make this extremely fast by changing the quantization decisions made in your encoder to reflect the absolute error directly, thus no iterative encoding will be needed.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: softrunner on 2013-03-05 23:36:34
The problem with this approach is that audibility has very little to do with the absolute volume due to masking.  So an error at -20dBFS might be inaudible if its masked, and very audible at -40dbFS if its not masked.  So usually what codecs do is when they get to step 4, they compute masking thresholds and adjust the error based on how audible it will be.  In this case you will likely find that huge error signals are often highly tolerable, while small error signals often not.

This is the question of further improvements, and it is already an implementation of some psychoacoustics model. I am more interesting in the idea, I've described, because of some substitution for lossless, which is in my opinion is too excessive. I've made many tests of lossy encoders, and get realized, that they cannon satisfy me in this role. WavPack hybrid is much better, but it is not flexible, and you never know, which distortions you get on output.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: saratoga on 2013-03-05 23:58:25
I've made many tests of lossy encoders, and get realized, that they cannon satisfy me in this role. WavPack hybrid is much better, but it is not flexible, and you never know, which distortions you get on output.


What I meant is that looking at the absolute difference really doesn't tell you anything useful, so while you know what "distortion" is present (or rather the error signal power), you don't really have any idea what it actually sounds like or how good the quality is.  The reason codecs do things differently is that the simple approach you're thinking of doesn't actually work.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: greynol on 2013-03-06 01:22:14
None of the lossy codecs commonly discussed on this forum will benefit from this type of analysis.  As has been discussed many times over, it is completely wrong-headed and useless.

However (and IIRC), WavPack Lossy does not use a psychoacoustic model, so this might loosely apply.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: DVDdoug on 2013-03-06 20:01:20
softrunner,

If you want to demonstrate to yourself how "the sound of the difference" (subtraction) is NOT the same as the "difference in the sound", here are a couple of simple experiments -

Delay one sound by a few milliseconds.    This will make no difference in the sound, but when you subtract you will get a huge difference file that's about as loud as either original file with a weird-sounding comb filter effect.

Invert the copy and subtract.  Again, there is no difference in sound between the two files.  But of course, when you subtract a negative it's the same as adding a positive, and you will get a difference file that's twice as loud as either original (and probably clipped).
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: C.R.Helmrich on 2013-03-06 20:21:09
just try to keep the signals difference at the same level (or less), defined by user. Thus, the audio quality is simply measured by volume of the difference of the signals, and this difference is nothing but distortions, produced by encoder.

The only thing is needed is that some audio developers get interested in this idea and implement it as a computer program.

If I understand you correctly, audio developers have implemented this idea already 4 decades ago. The simplest case: take some high-word-length audio (e.g. a CD rip) and convert it to e.g. 8-bit PCM. Your difference signal will always be at the same level, depending on the target word-length. Slightly more elaborate cases: A-Law (http://en.wikipedia.org/wiki/A-law_algorithm) or µ-Law (http://en.wikipedia.org/wiki/%CE%9C-law_algorithm). There your maximum allowed volume of the difference signal is also known.

Chris
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: softrunner on 2013-03-07 15:59:12
The problem with this approach is that audibility has very little to do with the absolute volume due to masking.

But who can guarantee, that this masking will work, and that the difference will not be audible on all input signals? The whole idea is not about audibility, it is about using minimum bitrate for maximum mathematical closeness of output audio to input audio, just pure calculations, which seems to be the only guarantee here.

However (and IIRC), WavPack Lossy does not use a psychoacoustic model, so this might loosely apply.

At least, we should give it a try...

Delay one sound by a few milliseconds.    This will make no difference in the sound, but when you subtract you will get a huge difference file

If an encoder do some shifts of audio on a timeline, that means that the idea of this topic is simply not appliable to it.

If I understand you correctly, audio developers have implemented this idea already 4 decades ago.

Well, I do not see any software, which uses maximum allowed error level of audio as an input parameter.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: 2Bdecided on 2013-03-07 16:14:51
Well, I do not see any software, which uses maximum allowed error level of audio as an input parameter.
That's because (as others have said!) to guarantee this you do not need anything clever at all. You just need to reduce the bitdepth of the audio signal by an amount equivalent to the difference (=noise) you're willing accept. You'll get 6dB more noise per extra bit dropped. Lower bitdepth = lower bitrate when losslessly encoded. So, use any audio editor that allows you to change the bitdepth, then use almost any lossless codec on the result = job done.

For a smarter way of doing it, take a look at lossyWAV. I think you can bound how many bits it removes.

Cheers,
David.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: greynol on 2013-03-07 17:05:39
LossyWAV is commonly discussed here and I lamented not including it shortly after posting.

But who can guarantee, that this masking will work, and that the difference will not be audible on all input signals? The whole idea is not about audibility, it is about using minimum bitrate for maximum mathematical closeness of output audio to input audio, just pure calculations, which seems to be the only guarantee here.

Who can guarantee that this "maximum mathematical closeness" will work, especially when it makes no attempt to consider how the human auditory system functions?  Also, please don't insult our intelligence by suggesting that we must try all possible input signals before rejecting the assertion that this idea will do better than already established practice built upon well established knowledge when you have not even offered any evidence supporting your concept.

If this isn't about audibility then I completely fail to see the point.  Audio quality is one of the primary determinants in gauging the performance of a lossy encoder.  Other worthwhile determinants will focus on performance/ease of coding/decoding related issues.  Perhaps someone can make a case as to why "maximum mathematical closeness" affects either of these groups or if it may fall into a new and equally important group.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: db1989 on 2013-03-07 17:51:14
Well, I do not see any software, which uses maximum allowed error level of audio as an input parameter.
Spoiler: If it isn’t used, there’s probably a reason that it isn’t used.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: ExUser on 2013-03-07 19:20:29
Well, I do not see any software, which uses maximum allowed error level of audio as an input parameter.


There are a lot of people hating on this concept. We do, actually, see this used in practice, it's just simpler than you'd think. You can change the maximum allowed error level of audio simply by altering the number of bits allocated per sample in an uncompressed context. With an appropriate codec, you can use fractional numbers of bits-per-sample. Then you can compress it down losslessly for a further reduction in file size.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: saratoga on 2013-03-07 19:25:49
LossyWAV is commonly discussed here and I lamented not including it shortly after posting.


I thought about posting that, but his test samples posted above are 160-256kpbs, so I think hes interested in highly compressed audio, whereas lossy wav is going to be about 2x that bitrate for good results. 
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: greynol on 2013-03-07 19:29:25
@Canar:

Please show me a lossy algorithm with no psychoacoustic model that beats one with a psychoacoustic model where the metric is how low you can go in average bitrate and achieve transparency or near transparency for non-contrived test samples.

As I see it, the issue at hand is that the level of a difference signal is being ranked over audibility.

EDIT: Saratoga beat me in demonstrating that bitrate is an important factor.  To add, WavPack Lossy isn't exactly regarded as being competitive in the sub-256kbit range either.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: ExUser on 2013-03-07 19:32:49
Please show me a lossy algorithm with no psychoacoustic model that beats one with a psychoacoustic model where the metric is how low you can go in average bitrate and achieve transparency or near transparency for non-contrived test samples.
Oh, I agree that it'll never be competitive. I'm just saying that if you look at it the right way, we kind of already have such a thing.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: Nessuno on 2013-03-07 19:54:10
But who can guarantee, that this masking will work, and that the difference will not be audible on all input signals? The whole idea is not about audibility, it is about using minimum bitrate for maximum mathematical closeness of output audio to input audio, just pure calculations, which seems to be the only guarantee here.

Your theory is mathematically wrong: human auditory system is a non linear system, so the subtraction of two input signals (lossless - computed error) doesn't work the way you expect to.
We use psychoacoustic models exactly because we haven't an exact mathematical description of the auditory system, otherwise lossy compression would be deterministic and "just pure calculation" (well, more or less... anyway still more complex than sums and subtractions).
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: C.R.Helmrich on 2013-03-07 21:51:11
Indeed. Softrunner, if you want mathematical closeness (whatever that means), you should at least consider the following clarification of your objective: try to keep the signals difference at the same level (or less) relative to the instantaneous level of the input signal. In other words, you should try to keep the instantaneous signal-to-noise ratio (SNR) at the same level (or higher).

That's already quite close to what modern audio encoders do, by the way. And it makes perfect sense: if you wouldn't consider the input level, a quiet signal would sound worse after your coding than a loud but otherwise identical signal.

Chris
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: softrunner on 2013-03-09 02:09:46
You just need to reduce the bitdepth of the audio signal by an amount equivalent to the difference (=noise) you're willing accept. You'll get 6dB more noise per extra bit dropped. Lower bitdepth = lower bitrate when losslessly encoded. So, use any audio editor that allows you to change the bitdepth, then use almost any lossless codec on the result = job done.

And I have to do all this manually for all lossless files I have? Actually, that's not what I'm willing for. And I think, this will not be efficient enough.
Quote
For a smarter way of doing it, take a look at lossyWAV.

I know about lossyWAV. First, it does not accept maximum allowed volume of error signal as an input parameter, volume of distortions is dependant on material, being processed. Also, check this (http://www.hydrogenaudio.org/forums/index.php?showtopic=99755) sample. On it lossyWAV gives distortions, which are audible even at "extreme" preset (235 kbps; in FLAC this sample uses 301 kbps), so we have to admit, that it's bits reduction and masking technics do not work properly on all kind of audio material. And for this sample WavPack even on 96 kbps gives perfect result, it's error signal is extremely quiet, so, the only thing needed is just to guide WavPack, teach it, where to use more bitrate and where to reduce it.

Who can guarantee that this "maximum mathematical closeness" will work

Definately it will work. It you do not allow distortions of some volume, they will never appear there. Very simple logic, which simply works. And I do not claim, that it is the final destination. I accept, that it is possible to allow encoder to be more aggressive in certain circumstances, but it is the question of separate research for each encoder separately. Firstly, very simple approach should be implemented.
Quote
Also, please don't insult our intelligence by suggesting that we must try all possible input signals before rejecting the assertion that this idea will do better than already established practice built upon well established knowledge when you have not even offered any evidence supporting your concept.

I do not know exactly, how it will work, but I want to try it, because already established practice does not work good enough. All, we do, is just a blind play with bitrates, believing, that we have some quality there. And when we find one more killersample, we realize, that it was just a believe.
Quote
If this isn't about audibility then I completely fail to see the point.

The point is in reducing file size without any audible loss of quality on all inputs possible with 100% guarantee. That means, that there will be no more killersamples at all. Every user will use his own level of allowed distortions, dependent on sensibility of his ears, and he will know exactly, what he gets.

You can change the maximum allowed error level of audio simply by altering the number of bits allocated per sample in an uncompressed context. With an appropriate codec, you can use fractional numbers of bits-per-sample. Then you can compress it down losslessly for a further reduction in file size.

All this is far from real practice, and I'm not against existing methodics, on the contrary, I am for using them, but with looking at the result they give.

so I think hes interested in highly compressed audio, whereas lossy wav is going to be about 2x that bitrate for good results.

No, encoder can use as much bitrate, as it can for max. allowed signals difference. For substituting lossless I would accept the difference of approximately -45 dB and lower if it would be efficient enough.

We use psychoacoustic models exactly because we haven't an exact mathematical description of the auditory system, otherwise lossy compression would be deterministic and "just pure calculation" (well, more or less... anyway still more complex than sums and subtractions).

One more time, this psychoacoustic models do not garantee you anything. They give you only approximate results and sometimes fail.

And it makes perfect sense: if you wouldn't consider the input level, a quiet signal would sound worse after your coding than a loud but otherwise identical signal.

First, if I understand you correctly: turn the volume control on maximum, and you will hear the noise... but nobody listens music on such a volume. Also, I've made a test: encoded one sample into WavPack 192 kbps (lowest possible), and track peak of the difference file was 0.077026. Then I decreased the volume on 40 dB, encoded again in 192 kbps, and you think track peak of the difference file was about the same 0.077026? No, it was 0.000854. Encoders know about such a tricks, so we are in safety here.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: saratoga on 2013-03-09 03:00:23
so I think hes interested in highly compressed audio, whereas lossy wav is going to be about 2x that bitrate for good results.

No, encoder can use as much bitrate, as it can for max. allowed signals difference. For substituting lossless I would accept the difference of approximately -45 dB and lower if it would be efficient enough.


Like 2Bdecided suggested, -45 dB error corresponds to 8 bit PCM.  You can do this just by peak normalizing your tracks, converting the files to 8 bit and then compressing with flac.  No need for anything new.

Results will not be great though.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: greynol on 2013-03-09 07:31:21
The point is in reducing file size without any audible loss of quality on all inputs possible with 100% guarantee. That means, that there will be no more killersamples at all. Every user will use his own level of allowed distortions, dependent on sensibility of his ears, and he will know exactly, what he gets.

Quite funny how you can say this so casually.  Anyway, good luck with that.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: Nessuno on 2013-03-09 09:15:52
softrunner, you evidently lack the theorical bases to cope with many aspects of this argument, but don't want to listen to what people here with more knowledge and experience than yours on this ground is trying to explain you because you are in love with your "simple and revolutionary new theory".

It's quite a common pattern, so best wishes for your idea and its implementation, but I think this discussion here has became a non sequitur.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: db1989 on 2013-03-09 10:53:19
In support of Nessuno’s conclusions, as well as the juicy number quoted by greynol, we have this:
I do not know exactly, how it will work, but I want to try it, because already established practice does not work good enough. All, we do, is just a blind play with bitrates, believing, that we have some quality there. And when we find one more killersample, we realize, that it was just a believe.
Yeah. OK.

I don’t feel like trying to respond methodically to your, erm, points. What I will say is that (1) a one-size-fits-all approach is not going to work, regardless of how nice and easy it might sound and how much you like it for that reason*, and (2) a uniform level of noise throughout one stream does not necessarily mean a uniform level of non-audibility of the same noise.

Again, if you’re wondering why this hasn’t been done despite apparently being so simple, you need to consider the very real possibility that it hasn’t been done because it’s too simple.

* And this sentiment takes us back to your previous ideas about VBR encoding, wherein you were also effectively demanding that people create an encoder that can guarantee transparency to everyone at a single setting. That wasn’t viable, either.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: Gecko on 2013-03-09 11:06:20
On a very basic level, lossy encoders have a mechanism for
1) introducing distortion
2) evaluating the audibility of said distortion (the psycho-acoustic model)
3) storing the distorted data

These three need to work hand-in-hand to make an efficient encoder.

If 2) deviates from the actual way we humans hear, 1) will in some cases add too much distortion, resulting in audible artifacts, and in other cases not add enough distortion, so that 3) will waste many bits.

The "maximum allowed volume of error signal" is a crude representation of human hearing and will thus feature the aforementioned problems, even if it is just used to augment Wavpack's psycho-acoustic model.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: greynol on 2013-03-09 16:46:31
So WavPack does have a psychoacoustic model?
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: greynol on 2013-03-09 16:55:18
* And this sentiment takes us back to your previous ideas about VBR encoding, wherein you were also effectively demanding that people create an encoder that can guarantee transparency to everyone at a single setting. That wasn’t viable, either.

Yep, I remember that same line of nonsense.

I think he should focus his efforts in enlisting our help to solve problems concerning energy and the environment.  Surely a Nobel prize is just in reach.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: Gecko on 2013-03-10 16:10:45
So WavPack does have a psychoacoustic model?

Are you implying it doesn't?
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: greynol on 2013-03-10 16:50:36
If you know then say.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: Gecko on 2013-03-10 18:16:22
Well, since Wavpack lossy doesn't just discard data at random to achieve data reduction, I will go ahead and say it does. But that depends on your definition of what makes a psycho-acoustic model. Extrapolating my line of thought would of course also mean that LPCM 44.1kHz 16bit has an implied psycho-acoustic model dictated by human hearing (bandwidth and SNR). Maybe that is taking it a bit far, but at what point does "exploiting the limits of human hearing" become a psycho-acoustic model?

Anyway, I added that line in my original post to acknowledge the fact that the OP (as far as I have understood) isn't just trying to create ~8-bit audio, but rather imposing a bound on the maximum allowed error after Wavpack's psycho-acoustic lossy treatment. So the actual error might be a lot less. I believe the OP may have been misunderstood on this account which he could use to disregard some of the other arguments. I did not want to present this attack vector.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: greynol on 2013-03-10 18:31:38
Sorry, but that really doesn't cut it.

Could someone with a bit more knowledge on the matter clarify?
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: Gecko on 2013-03-11 17:49:40
In that case, maybe I need to revise my definition of what constitutes a "psycho-acoustic model". Does it need to include all 5 factors listed in the HA-Wiki?

To stick with Wavpack, the manual makes no claim of Wavpack lossy using temporal masking or simultaneous masking (and no claim to the contrary as far as I can tell). But whatever Wavpack lossy is doing seems to work pretty well, so it must be doing something right with regard to exploiting the limits of our hearing. If the principles behind its processing should not be called psycho-acoustic, what then should they be called?
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: pdq on 2013-03-11 18:25:39
Can you play the correction file to a Wavpack lossy file? If it just sounds like noise then I suppose it is not using psychoacoustics. Otherwise you would hear parts of the music that were omitted because they would be masked anyway.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: Gecko on 2013-03-11 19:02:46
I tried the old inversion trick on a drum & bass song and you can hear percussive elements; in my example you can clearly make out the snare hits. There seem to be no tonal elements.

On an a-capella song you hear broadband noise with a similar amplitude response as the original. "s" sounds in the original produce short bursts of high pitched noise in the difference file.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: db1989 on 2013-03-11 19:21:03
Premises:
(1) If a residual signal created by mixing the uncompressed signal and the complement of its compressed counterpart (= difference signal) has elements that are audibly correlated to the musically identifiable content from the former, this indicates that psychoacoustics were involved in the process of compression.
(2) If such a signal does not have elements that are audibly correlated to the uncompressed signal, this indicates the absence of psychoacoustics.
Neither is a valid rule.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: greynol on 2013-03-11 19:56:43
For the record, I'm not in any position to define what constitutes a "proper" psychoacoustic model.  I've seen WavPack Lossy been classified as not having a psychoacoustic model and as having a weak psychoacoustic model.  I'm pretty sure I've seen the same done for LossyWAV.  It seems easy enough for someone like myself to simply hedge a bet and say it has a weak psychoacoustic model.

At any rate, I hope we can agree that what was being offered by the OP can't even be classified as having a weak psychoacoustic model.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: Nessuno on 2013-03-11 20:57:50
At any rate, I hope we can agree that what was being offered by the OP can't even be classified as having a weak psychoacoustic model.

Actually this seems to be his aim in proposing a somewhat "pure mathematical" method, exactly to get rid of psychoacoustic models altogether because in his own words...
Quote
... psychoacoustic models do not garantee you anything. They give you only approximate results and sometimes fail.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: 2Bdecided on 2013-03-12 09:47:58
The point is in reducing file size without any audible loss of quality on all inputs possible with 100% guarantee.
I think that's called FLAC.

Cheers,
David.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: Dynamic on 2013-03-12 11:42:55
Lossless is the only true guarantee.

LossyWAV's original approach was to use pure mathematics to measure the noise floor present in the actual music and allow a little less noise than its minimum value with no other special tricks.

The OP seems to want some amount of reduced accuracy calculated based on the signal strength to ensure it's always safe, and may or may not have realised that this is mathematically the same as saying noise is added.

I think lossyWAV --maxclips 0 (i.e. lossyWAV (http://wiki.hydrogenaudio.org/index.php?title=LossyWAV) standard) hasn't even shown non-transparency on extreme full-scale test signals, so lossyFLAC, lossyWV, lossyTAK etc are all viable.

In real music, the --maxclips 0 can be omitted with no reported problem samples. (The artificial test sample that generated the clipping problem was about 10 dB louder to the ear than today's maximally loud albums - or --maxclips 0 can be included at the expense of a minor bitrate increase).

The first versions (or if you turn of all noise shaping in later versions using --shaping 0) make the bare minimum psychoacoustic assumption that the one kind of white noise is indistinguishable from another and keeps the added noise below the minimum noise measured in the signal's audible spectrum without making any kind of filtering or frequency-shaped noise.  (This is the flavour of lossyWAV support included in CUETools and CUERipper, and I've had no problems using it as a source for transcoding into conventional lossy)

Really, I think this amounts to what the OP asked about, and is slightly better. It amounts to effectively remastering at the minimum required bit-depth, dynamically changing it throughout the music. Lossy encoders like FLAC, TAK, WavPack and WMA-Lossless can take advantage of the reduced bit-depth to save bitrate (but ALAC, most notably, cannot).

Thanks again to 2Bdecided for the idea of lossyFLAC and NickC for implementing lossyWAV from it (and all the others who helped).

The improvements to lossyWAV up to v1.3 (adaptive shaping of the added noise to match the signal spectrum) seem to have been very safe and conservative and seem to have actually hidden the noise better with more margin of safety, but that does amount to extending the bare minimum psychoacoustic model to slightly more advanced (still safe, sound assumptions).
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: C.R.Helmrich on 2013-03-12 20:46:01
And it makes perfect sense: if you wouldn't consider the input level, a quiet signal would sound worse after your coding than a loud but otherwise identical signal.

First, if I understand you correctly: turn the volume control on maximum, and you will hear the noise... but nobody listens music on such a volume.

You don't have to turn up the volume control. Convert some CD audio to 8 bit/sample, that gives you the ~45 dB difference level you want. You'll find it's not enough for many music files, especially ones with long fade-outs.

Quote
Also, I've made a test: encoded one sample into WavPack 192 kbps (lowest possible), and track peak of the difference file was 0.077026. Then I decreased the volume on 40 dB, encoded again in 192 kbps, and you think track peak of the difference file was about the same 0.077026? No, it was 0.000854. Encoders know about such a tricks, so we are in safety here.

Yes, we are in safety because - like I wrote - encoders work with something similar to instantaneous (segmental) SNR! This is supported by what Gecko found (http://www.hydrogenaudio.org/forums/index.php?s=&showtopic=99787&view=findpost&p=827002): he "... tried the old inversion trick on a drum & bass song and you can hear percussive elements". (Edit: Since that's the case I'd join greynol and consider this a weak psychoacoustic model).

Quote from: 2Bdecided link=msg=0 date=
Quote
The point is in reducing file size without any audible loss of quality on all inputs possible with 100% guarantee.

I think that's called FLAC.

Or 512-kbps AAC or Opus. I cannot think of any signal which would not be coded transparently at that bitrate. Because I don't know of any signal which is not transparent at e.g. Winamp AAC VBR 6 at half that bitrate on average. (Edit: I'm talking about stereo here of course).

Chris
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: softrunner on 2013-03-22 02:16:12
The point is in reducing file size without any audible loss of quality on all inputs possible with 100% guarantee.
I think that's called FLAC.

Lossless is the only true guarantee.

The popularity of lossless is based mostly on placebo effect. File size of lossless does not match real quality it has.
Quote
I think lossyWAV --maxclips 0 (i.e. lossyWAV (http://wiki.hydrogenaudio.org/index.php?title=LossyWAV) standard) hasn't even shown non-transparency on extreme full-scale test signals, so lossyFLAC, lossyWV, lossyTAK etc are all viable.
In real music, the --maxclips 0 can be omitted with no reported problem samples. (The artificial test sample that generated the clipping problem was about 10 dB louder to the ear than today's maximally loud albums - or --maxclips 0 can be included at the expense of a minor bitrate increase).

The difference, I hear in samples I've posted, is not about clipping. It is simply a noise, added by lossyWAV, which is sometimes too loud. Also I do not see, how --maxclips 0 changes the situation. The weakest point of lossyWAV is when there is a quiet noise and some simple one tone signal, which can be a single note of some musical instrument.
Quote
The first versions (or if you turn of all noise shaping in later versions using --shaping 0) make the bare minimum psychoacoustic assumption that the one kind of white noise is indistinguishable from another and keeps the added noise below the minimum noise measured in the signal's audible spectrum without making any kind of filtering or frequency-shaped noise.  (This is the flavour of lossyWAV support included in CUETools and CUERipper, and I've had no problems using it as a source for transcoding into conventional lossy)

Yes, I checked --shaping 0 and --shaping 1 also, and it seems they are not audible on "standard" preset (they are audible on "economic"). So it seems to be, that adaptive noise shaping, used in lossyWAV by default, is not the best choise for quality presets "standard" and higher. Though with ANS some samples are not audible on "extraportable", where "shaping 0" and "shaping 1" are clearly audible.
Quote
The improvements to lossyWAV up to v1.3 (adaptive shaping of the added noise to match the signal spectrum) seem to have been very safe and conservative and seem to have actually hidden the noise better with more margin of safety.

I also think so, but sometimes ANS puts too much noise where there is no enough space for it, and it becomes clearly audible, so I think it is a possible direction for improvement of lossyWAV.
Convert some CD audio to 8 bit/sample, that gives you the ~45 dB difference level you want. You'll find it's not enough for many music files, especially ones with long fade-outs.

If do it without dithering, the noise will be about -1.4 dB, and if use dithering, yes, it will be about -45 dB, but it all will be in high frequencies, so using equalizer will make it easily audible. What I'm talking about is not just such a simple technic as converting into 8 bit. As Gecko wrote:
Quote
Anyway, I added that line in my original post to acknowledge the fact that the OP (as far as I have understood) isn't just trying to create ~8-bit audio, but rather imposing a bound on the maximum allowed error after Wavpack's psycho-acoustic lossy treatment.

But I have found, that -45dB is audible for WavPack, so just simple restriction of maximum volume of error signal will be not efficient enough. Anyway, what I want is some vbr quality oriented mode of WavPack, and then it will be more clear, how good it is.
Quote
Or 512-kbps AAC or Opus. I cannot think of any signal which would not be coded transparently at that bitrate. Because I don't know of any signal which is not transparent at e.g. Winamp AAC VBR 6 at half that bitrate on average. (Edit: I'm talking about stereo here of course).

It depends on what to call "transparent". Vorbis is audible on 619 kbps (check "FighterBeatLoop" sample in Uploads section), but it sounds good on much lower bitrates.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: saratoga on 2013-03-22 02:24:41
Convert some CD audio to 8 bit/sample, that gives you the ~45 dB difference level you want. You'll find it's not enough for many music files, especially ones with long fade-outs.


If do it without dithering, the noise will be about -1.4 dB, and if use dithering, yes, it will be about -45 dB, but it all will be in high frequencies, so using equalizer will make it easily audible.


The audio quality of this approach will already be poor.  If you're also going to use EQ, you should absolutely apply it BEFORE you encode.  Otherwise you will need to tolerate much lower quality or else even higher bitrates. 

Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: jmvalin on 2013-03-22 06:41:28
Hey everyone, I just had this great idea that should revolutionize rockets and space travel. My idea is to have a rocket with two big tanks: one filled with coal and the other one filled with compressed air. Then you just mix the two and set it on fire. This should be much more effective than existing rockets!

More seriously softrunner, you have to realize that what you're proposing is just as outrageous as my statement above. In both cases, it's a bad approximation of fundamentals that have been known for decades (at least since the 60s in the case of audio). I would strongly suggest you learn more about fundamentals of audio and perception rather than argue with people who are trying to point out all the flaws in your reasoning.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: Gecko on 2013-03-22 07:48:48
But I have found, that -45dB is audible for WavPack, so just simple restriction of maximum volume of error signal will be not efficient enough.

That's what we have been trying to tell you...

You seem to be looking for some form of holy grail of lossy audio encoding: great compression, zero artifacts, super simple algorithm. Many smart people have spent a lot of time and effort to give us good compression and few artifacts. But the algorithms involved usually aren't very simple.

If you can do better, fantastic! But so far, you're approach hasn't been very convincing.

Maybe you really should be looking into lossyWAV. I believe the original Matlab code is still around here somewhere... Ah, yes: http://www.hydrogenaudio.org/forums/index....showtopic=55522 (http://www.hydrogenaudio.org/forums/index.php?showtopic=55522)
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: db1989 on 2013-03-22 10:51:54
Quote
Or 512-kbps AAC or Opus. I cannot think of any signal which would not be coded transparently at that bitrate. Because I don't know of any signal which is not transparent at e.g. Winamp AAC VBR 6 at half that bitrate on average. (Edit: I'm talking about stereo here of course).
It depends on what to call "transparent". Vorbis is audible on 619 kbps (check "FighterBeatLoop" sample in Uploads section), but it sounds good on much lower bitrates.
Zoom:
Quote
It depends on what to call "transparent".
The irony is strong with this one. How do you define “transparent”, then? To me, it seems as though your ideal definition is transparency for everyone all the time. Setting aside how patently absurd that idea is since transparency specifically refers to specific combinations of listener and material, your pointing out how a codec that is usually transparent at much more sensible bitrates fails to be transparent at a very high bitrate with one particular sample does not support your argument: it’s actually undercutting it. There will always be exceptions to transparency, at least for certain people and certain signals, and none of your nice-sounding-in-novice-theory-but-baseless-in-practice ideas are likely to change that. At least develop a consistent narrative before you try to make everyone implement it at your behest.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: 2Bdecided on 2013-03-22 13:43:31
I checked --shaping 0 and --shaping 1 also, and it seems they are not audible on "standard" preset
Are you saying that lossyWAV standard without noise shaping is transparent?

I hope so. That is supposed to be the point, though I wouldn't stake my reputation (never mind my life) on it.

If I've understood you correctly, I think it's the closest thing you're going to get to your goal.

Cheers,
David.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: 2Bdecided on 2013-03-22 13:57:29
It depends on what to call "transparent".
The irony is strong with this one. How do you define “transparent”, then? To me, it seems as though your ideal definition is transparency for everyone all the time. Setting aside how patently absurd that idea is...
Why is it absurd? The HA mantra is that CD quality audio is transparent WRT a stereo source. There are caveats (most importantly, gain riding may break it; lousy implementations may break it), but here at least, it's not a controversial statement.

If altering a single bit in a CD quality audio signal renders it non-transparent (i.e. that single bit change is audible) to some person under some reasonable listening circumstances, then miraculously CD quality really does define transparency, and nothing less counts. However, I suspect that's not the case. With a bound set of use cases, you can probably create something that's more efficient than lossless coding of CD quality audio while remaining transparent to the stereo source. If you feed lossyWAV with 24-bits, for example, it'll cope with the gain riding that CD quality audio will not, while (I expect!  ) remaining transparent, and at a lower bitrate.

Quote
since transparency specifically refers to specific combinations of listener and material, your pointing out how a codec that is usually transparent at much more sensible bitrates fails to be transparent at a very high bitrate with one particular sample does not support your argument: it’s actually undercutting it. There will always be exceptions to transparency, at least for certain people and certain signals, and none of your nice-sounding-in-novice-theory-but-baseless-in-practice ideas are likely to change that.
It seems to me that the more "clever" you try to be in designing a codec, the lower a bitrate you can achieve for transparency "most" of the time, the fewer number of problems samples you will have, but greater the percentage bitrate increase is required to deal with those few problem samples you have left. I am talking anecdotally - I have no evidence or justification for this.



However, you are absolutely correct to call out softrunner on their use of "transparent", because until you define what you mean, the rest of the discussion is pointless. e.g. is it transparent even if you post-process with...
1) EQ. If so, how much?
2) DRC. If so, how much?
3) stereo processing. If so, what? Logic7? Vocal Cut? etc
4) phasing and flanging? Other DSP effects and production techniques?
5) adding an inverted copy of the original signal and amplifying the result by 100dB?

Only lossless is transparent with post-process number 5  I suspect 4 can be almost as tricky, and 3 is quite tricky. 1 and 2 can be accounted for and bounded.

Good luck softrunner.

Cheers,
David.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: db1989 on 2013-03-22 14:20:45
It depends on what to call "transparent".
The irony is strong with this one. How do you define “transparent”, then? To me, it seems as though your ideal definition is transparency for everyone all the time. Setting aside how patently absurd that idea is...
Why is it absurd? The HA mantra is that CD quality audio is transparent WRT a stereo source. There are caveats (most importantly, gain riding may break it; lousy implementations may break it), but here at least, it's not a controversial statement.
I wasn’t talking about CDDA: I was talking about lossy encoders. I thought that would be obvious as the latter is the subject of the thread, but I suppose I appreciate your frequent efforts to remove all possible ambiguity from my posts.

My conclusion is that a lossy codec/setting that will always be transparent is unlikely enough when said transparency applies only to one person, never mind entire swathes of the population. I also have to side with all the other developers who would probably have implemented the various things that are being suggested if they were worthwhile or functioned the way that softrunner imagines.

At the end of the day, if one is going to worry so much about transparency, especially when the lack thereof is probably going to be highly rare with existing codecs at modestly high settings, the answer is to use a lossless codec. I just don’t see the point in supposing about a guaranteed perfect lossy codec when no such thing can exist by definition as something created by imperfect organisms. And if I carry on typing, I’m going to get way OT with idle armchair philosophy, so I’d better stop!
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: 2Bdecided on 2013-03-22 15:37:58
Sorry db1989, I'm not trying to personally attack you! I was even a little worried it would look like that when I replied to yet another of your posts. It's just coincidence that several of your recent posts have made me want to jump in and explore that particular topic in more depth. Case in point.

Let's think outside the box: of course CD audio is lossy. It's lossy compared to the stereo source signal. You could even say that it has a very simple static psychoacoustic and listening room model: it assumes we cannot hear anything above 20kHz*, and that noise 90dB* down from peak signal level will typically be inaudible. Those assumptions aren't entirely true 100% of the time, but they'll do for almost any music listening you can imagine.

(* = as envisaged; CD can do better in practice these days with good oversampling and good noise shaping.)

In terms of preserving, say, an analogue tape or vinyl record, we say that CD really is transparent. The frequency limit is beyond all but the best hearing, and the noise floor is below that of the source. Note that this is not the same as saying it preserves even the noise level perfectly, because adding a lower level noise (-90dB) will raise the noise floor (typ. -60dB) a little - it's just that the change is inaudible (a fraction of a dB). That's psychoacoustics again.


There are things you can do to a CD quality signal that make barely any more psychoacoustic assumptions than any of the above. They are lossy (referenced to the CD quality signal), but so "safe" that some people are happy saying they are transparent. I call them near-lossless, but there might be a better name.

The point I think I'm trying to make is that, apart from mathematically perfect coding of a set of numbers (which is lossless),  it's all shades of grey (though hopefully not 50 of them), rather than black and white. Is it transparent? Is it lossless? Does it use a psychoacoustic model? I am usually perfectly happy with the general definitions we use of these words. However, when you get very very picky, or start to talk about absolute transparency of lossy codecs, or start to talk about the "psychoacoustic" model of near-lossless codecs, I think you have to be really careful.


I agree that anyone who is worried about these things should use a lossless format for the audio ripped from their CDs.

Cheers,
David.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: db1989 on 2013-03-22 17:24:42
Sorry db1989, I'm not trying to personally attack you! I was even a little worried it would look like that when I replied to yet another of your posts. It's just coincidence that several of your recent posts have made me want to jump in and explore that particular topic in more depth. Case in point.
Oh no, not at all! I’ve never been offended, and I do appreciate clarifications and elaborations from someone who obviously has more experience in the relevant fields as well as a more creative outlook. Mentioning the fact that it’s happened more than once was just a little joke, somewhat poking fun at myself, and certainly not alleging anything against you.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: 2Bdecided on 2013-03-22 17:49:45
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: Nessuno on 2013-03-23 10:00:42
Let's think outside the box: of course CD audio is lossy. It's lossy compared to the stereo source signal. You could even say that it has a very simple static psychoacoustic and listening room model: it assumes we cannot hear anything above 20kHz*, and that noise 90dB* down from peak signal level will typically be inaudible. Those assumptions aren't entirely true 100% of the time, but they'll do for almost any music listening you can imagine.
Kind of reductio ad infinitum: if we go along with this line of reasoning, we should say that what our hearing system transmits to our brain is a lossy version of the real soud event, and air molecules vibration is a sampled and quantized lossy reproduction of actual instrument's vibrations...
But, if a tree falls in the forest when nobody's there, does it make any noise?

As I see it, sound, or better, music production is all about psychoacoustic and the reference model is always our hearing system, so arguing about bandwidth and SNR limits of CD format (*) means willing to (re)produce something that not only nobody could realistically hear, but that wasn't even in composer's or player's or instrument builder's mind in the first place!

(*) as an end user's format at least, just to take into account meaningful reasons in favour of 24/96 the highest technically feasible format at the moment for recording, mixing, mastering stages etc...
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: softrunner on 2013-03-25 02:10:57
Convert some CD audio to 8 bit/sample, that gives you the ~45 dB difference level you want. You'll find it's not enough for many music files, especially ones with long fade-outs.

If do it without dithering, the noise will be about -1.4 dB, and if use dithering, yes, it will be about -45 dB, but it all will be in high frequencies, so using equalizer will make it easily audible.

The audio quality of this approach will already be poor.  If you're also going to use EQ, you should absolutely apply it BEFORE you encode.  Otherwise you will need to tolerate much lower quality or else even higher bitrates.

Yes, but, I did not say, that -45 dB, given by this approach, and -45 dB, given by WavPack hybrid, are of the same quality. Definately they are not. I wrote only about WavPack, which I had tested.

You seem to be looking for some form of holy grail of lossy audio encoding: great compression, zero artifacts, super simple algorithm. Many smart people have spent a lot of time and effort to give us good compression and few artifacts. But the algorithms involved usually aren't very simple.

No, the idea is in running already existing encoders many times (increasing bitrate) until they give proper result. And the decision of how proper the result is, should be made by computer program at runtime. Of course, every encoder has its own properties, so the way of the evaluation of the result should consider this properties.

Quote
It depends on what to call "transparent".
The irony is strong with this one. How do you define “transparent”, then? To me, it seems as though your ideal definition is transparency for everyone all the time. Setting aside how patently absurd that idea is since transparency specifically refers to specific combinations of listener and material, your pointing out how a codec that is usually transparent at much more sensible bitrates fails to be transparent at a very high bitrate with one particular sample does not support your argument: it’s actually undercutting it. There will always be exceptions to transparency, at least for certain people and certain signals, and none of your nice-sounding-in-novice-theory-but-baseless-in-practice ideas are likely to change that. At least develop a consistent narrative before you try to make everyone implement it at your behest.

I do not use the word "transparent" at all. I prefer audible/inaudible instead. Yes, my approach is that the difference should be inaudible for all humans (not dogs, cats, snakes etc.). We are humans, so there are restrictions of our perceptibility. If you do not hear the difference, it does not mean, that it is not there, and if the difference is there, it does not mean, that you (any human) can hear it. Audio listening is an objective thing. Usually people do not hear the difference because they are not attentive, patent etc. enough. They actually can do it, but silent mind is needed first.
In my opinion, for encoder there should not be any exceptions of input audio (when you try to substitute lossless). Otherwise, use Opus 208 kbps and be happy. It gives high quality for all types of music.

Are you saying that lossyWAV standard without noise shaping is transparent?

I can not say for sure. At 32 kHz sample it is audible, for 44 kHz and higher it is probably not, but deeper tests are needed. (with adaptive noise shaping 44 kHz is audible)
Quote
If I've understood you correctly, I think it's the closest thing you're going to get to your goal.

Yes, as far, as I tested it, it can be safely used instead of lossless.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: lvqcl on 2013-03-25 15:12:54
And the decision of how proper the result is, should be made by computer program at runtime.

So... where can I download this program?
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: probedb on 2013-03-25 15:44:46
No, the idea is in running already existing encoders many times (increasing bitrate) until they give proper result. And the decision of how proper the result is, should be made by computer program at runtime. Of course, every encoder has its own properties, so the way of the evaluation of the result should consider this properties.

Surely this could leave said program running indefinitely due to never matching the criteria? How do you define 'proper' for every possible type of audio?

I do not use the word "transparent" at all. I prefer audible/inaudible instead. Yes, my approach is that the difference should be inaudible for all humans (not dogs, cats, snakes etc.).

So you don't think that when someone says something is 'transparent' to them that they don't mean that all artifacts are 'inaudible' to them? I don't think you understand what transparency is.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: Gecko on 2013-03-25 17:22:04
No, the idea is in running already existing encoders many times (increasing bitrate) until they give proper result. And the decision of how proper the result is, should be made by computer program at runtime.

But how does the external program determine whether the result is "proper"? Simple approaches will not be both near-transparent and efficient. You need something more complex to achieve both simultaneously. Neither simple nor complex approaches can guarantee transparency, as long as you are not exactly reproducing the input signal.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: bryant on 2013-03-28 03:59:54
I tried the old inversion trick on a drum & bass song and you can hear percussive elements; in my example you can clearly make out the snare hits. There seem to be no tonal elements.

On an a-capella song you hear broadband noise with a similar amplitude response as the original. "s" sounds in the original produce short bursts of high pitched noise in the difference file.

Sorry that I wasn't following this thread more closely and so missed this. What you are hearing is the dynamic noise shaping feature that measures high-frequencies in the source and tilts the spectral balance of the quantization noise (generated by the lossy mode) up or down in an attempt to have it more likely masked by the source audio. This gave a nice improvement to some samples where high-frequency transients would sometimes result in nasty bursts of low-frequency noise. It's very much like the adaptive noise shaping of LossyWAV, but simpler.

On the broader question, there are several operations and parameters in the lossy mode that I have added over the years (with lots of help from people with much better hearing than mine) to improve the transparency of the lossy mode, and they're all based on psychoacoustic principles, but that doesn't make it a psychoacoustic codec, IMO, because it doesn't implement any hearing model and it has no VBR mode wherein the bitrate is altered according to some estimate of perceptual quality.

In any event, it's not a purely mathematical operation like ADPCM either, so saying that it has a weak psychoacoustic model certainly would not bother me. 
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: greynol on 2013-03-28 06:20:15
Thanks for chiming-in, David!

Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: 2Bdecided on 2013-03-28 09:34:21
Let's think outside the box: of course CD audio is lossy. It's lossy compared to the stereo source signal. You could even say that it has a very simple static psychoacoustic and listening room model: it assumes we cannot hear anything above 20kHz*, and that noise 90dB* down from peak signal level will typically be inaudible. Those assumptions aren't entirely true 100% of the time, but they'll do for almost any music listening you can imagine.

Kind of reductio ad infinitum: if we go along with this line of reasoning, we should say that what our hearing system transmits to our brain is a lossy version of the real soud event
...which it absolutely is...
Quote
and air molecules vibration is a sampled and quantized lossy reproduction of actual instrument's vibrations...
But, if a tree falls in the forest when nobody's there, does it make any noise?
I don't want to go that far.  I only care about what we can hear. Which is exactly what you said...

Quote
As I see it, sound, or better, music production is all about psychoacoustic and the reference model is always our hearing system, so arguing about bandwidth and SNR limits of CD format (*) means willing to (re)produce something that not only nobody could realistically hear, but that wasn't even in composer's or player's or instrument builder's mind in the first place!
My point was that, in a really esoteric discussion like this one, we have to be 100% clear what we mean by transparent, and what we mean by psychoacoustic model. It's a failure to understand these two things properly that lets the OP make some statements that many reading here will judge to be ridiculous.

If you can hear 23kHz tones (some people can), CD isn't transparent, by any accurate definition of that word.

Cheers,
David.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: db1989 on 2013-03-28 12:41:12
If you can hear 23kHz tones (some people can), CD isn't transparent, by any accurate definition of that word.
Such people are very rare, at least as adults. In any case, being able to hear a tone does not necessarily predict what one can and cannot here in actual music where, by definition, multiple tones are composited. As usual, a DBT is the only way to assess transparency or the lack thereof in such a case, and I suspect even people who can hear beyond 20 kHz in pure tones might not have such luck with actual musical signals.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: 2Bdecided on 2013-03-28 16:30:55
If you can hear 23kHz tones (some people can), CD isn't transparent, by any accurate definition of that word.
Such people are very rare, at least as adults. In any case, being able to hear a tone does not necessarily predict what one can and cannot here in actual music where, by definition, multiple tones are composited. As usual, a DBT is the only way to assess transparency or the lack thereof in such a case, and I suspect even people who can hear beyond 20 kHz in pure tones might not have such luck with actual musical signals.
Ah, you see, I knew that would be the response, and fair enough. However, no where in the definition of transparency that I would expect to use does it say "actual music signals" - the idea is that, for some signal, at some bitrate, for some listener, using some equipment, in some usage scenario, the codec is transparent (i.e. indistinguishable from the original in a double-blind test) - and through time and generalised testing, the probability emerges that the codec is transparent for most signals/listeners/equipment/scenarios at a given bitrate. The concept of complete transparency (all signals/listeners/equipment/scenarios) is quite unattainable IMO - except for a mathematically lossless transformation. Yet the OP seems to want complete transparency, and thinks a computer programme is going to be able to judge when this is achieved.

HA is a fun place to make the argument about it not being actual music, and therefore unimportant. HA was born when mp3 couldn't encode certain artificial kinds of music very well at all. The inventors of mp3 probably didn't think it was actual music, and didn't know or care that strings of synthetic impulses (for example) were handled very badly by mp3.

Cheers,
David.

P.S. apparently the ability to hear 23-24kHz, at levels of around 80-100dB SPL, is quite widespread in younger people (e.g. under 25). Normal hi-fi can't reproduce it, and I can't imagine why anyone would want to listen to it - but then, some people would never want to listen to undial.wav, or Aphex Twin, or Merzbow, or...
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: db1989 on 2013-03-28 16:38:07
Ah, you see, I knew that would be the response, and fair enough. However, no where in the definition of transparency that I would expect to use does it say "actual music signals" - the idea is that, for some signal, at some bitrate, for some listener, using some equipment, in some usage scenario, the codec is transparent […] The concept of complete transparency (all signals/listeners/equipment/scenarios) is quite unattainable IMO - except for a mathematically lossless transformation. Yet the OP seems to want complete transparency, and thinks a computer programme is going to be able to judge when this is achieved.
Good points. I was just defending CDDA as a musical medium since you obviously aren’t denying its suitability but it’s always possible that someone might take that quote wrongly.

Quote
HA is a fun place to make the argument about it not being actual music, and therefore unimportant. HA was born when mp3 couldn't encode certain artificial kinds of music very well at all. The inventors of mp3 probably didn't think it was actual music, and didn't know or care that strings of synthetic impulses (for example) were handled very badly by mp3.
Heh, also a good point. Again, I’m definitely not denying the utility of synthetic signals in increasing the technical quality of an encoder, and possibly in a way that transmits to more commonly audible material. It just helps to avoid placing too much emphasis on pure tones when, as I said, the ability to hear them need not reflect the actual lowpasses someone can discern in real material comprising complex waveforms, and also, pure tones aren’t likely to give encoders much trouble in comparison to synthetically concocted complex tones.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: DonP on 2013-03-28 16:45:48
If you can hear 23kHz tones (some people can), CD isn't transparent, by any accurate definition of that word.
Such people are very rare, at least as adults. In any case, being able to hear a tone does not necessarily predict what one can and cannot here in actual music where, by definition, multiple tones are composited. As usual, a DBT is the only way to assess transparency or the lack thereof in such a case, and I suspect even people who can hear beyond 20 kHz in pure tones might not have such luck with actual musical signals.


First, music is not by definition multiple tones at once.  It could be a single line melody, or even just rhythm on a single note.  Or a CD could have non musical audio.

Second, depending on masking to make it transparent takes it into the realm of lossy, which was the original point of this sub-topic.

Third, why limit the domain to people old enough to have presumably reduced hearing?





Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: db1989 on 2013-03-28 16:50:19
First, music is not by definition multiple tones at once.  It could be a single line melody, or even just rhythm on a single note.  Or a CD could have non musical audio.
OK, then allow me to clarify what I hoped was clear but perhaps was badly worded: by multiple tones, I meant timbres more complex than single sinewaves.

Quote
Third, why limit the domain to people old enough to have presumably reduced hearing?
I have no desire to do this.

I was making some general and simplistic points about the ability to hear pure tones at a given frequency vs. that frequency’s relevance in common types of material containing multiple harmonics. I’m not trying to reshape how people develop codecs or claim that I know better. Developers are obviously free to test and process in whichever ways and on whichever types of material, ‘realistic’ or not, they choose. softrunner in particular might need some radical new methodologies to get this project off the ground…
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: jmvalin on 2013-03-28 18:42:38
It's a failure to understand these two things properly that lets the OP make some statements that many reading here will judge to be ridiculous.


No, what made the OP ridiculous is that it's proposing a "new" idea that not only ignores all advances that have been made in the past 40 years, but even shows misunderstanding of what was known 40-year ago. Hint: G.711 (mu-law/A-law) is 40 years old and even at that time it was known that the noise energy had to be modulated with the signal amplitude and that constant-level noise is a dumb idea.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: 2Bdecided on 2013-03-28 20:19:23
I think he implied a noise floor relative to peak level, or something. That would be NICAM.

Cheers,
David.
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: jmvalin on 2013-03-28 20:49:25
I think he implied a noise floor relative to peak level, or something. That would be NICAM.


Well, the OP mentions working in chunks of 1 second, which would be pretty useless for setting a relative floor. So far less advanced than mu-law (1972) and NICAM (which according to Wikipedia is from 1964). I guess that makes the idea worse than 50 year old technology. But that's OK, I've got a much better idea involving wax and needles 
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: Nessuno on 2013-03-28 21:29:06
First, music is not by definition multiple tones at once.  It could be a single line melody, or even just rhythm on a single note.  Or a CD could have non musical audio.
OK, then allow me to clarify what I hoped was clear but perhaps was badly worded: by multiple tones, I meant timbres more complex than single sinewaves.

I'm defintely with your idea of music, but to be sincere, after Cage's 4'33'' I'll not be that much surprised if someone will come out with, say, "6279000", a composition made of a single 23kHz tone.

And maybe someone else will rush to buy it on HD format...
Title: An idea of audio encode algorithm, based on maximum allowed volume of
Post by: 2Bdecided on 2013-03-29 11:16:03
I've got a much better idea involving wax and needles 
Oh, get with the times  - wax, needles, and an iPhone...
http://www.youtube.com/watch?v=ik8sJds4hV8 (http://www.youtube.com/watch?v=ik8sJds4hV8)


Cheers,
David.