Skip to main content

Notice

Please note that most of the software linked on this forum is likely to be safe to use. If you are unsure, feel free to ask in the relevant topics, or send a private message to an administrator or moderator. To help curb the problems of false positives, or in the event that you do find actual malware, you can contribute through the article linked here.
Topic: Psy models: will they ever be able to replace listening tests? (Read 28579 times) previous topic - next topic
0 Members and 1 Guest are viewing this topic.

Psy models: will they ever be able to replace listening tests?

Reply #25
How are you going to make such a psymodel without doing listening tests first? (That was one of the original points of muaddib!)
I've never doubted that designing a psy model involves lots and lots of listening tests. But the advantage would be that once a model has been defined, it can do its job very fast and with constant results.
So, using mechanical evaluation may someday have life for transparency, in my opinion, although I see nothing (public) satisfactory for stereo or spatial hearing yet (but it is possible, that I'm sure of).
I would expect mechanical evaluation to be mostly and especially usable for transparency verification (just like ABX/ABCHR). Audio codecs are just a small target of interest. What about audible effects of opamps, electrical components, cables, AD/DA converters etc.? It would be interesting to know if time and budget spent on R&D for a product has an effect on sonic performance at all.
There will probably be a need for several psy models like xx% of the average population and a "golden-ear" world's best (no-one can hear better). As soon as a reliable listening test reveals new threshold data, the psy model could be updated. An audio device designer could check his modifications instantaneously with a psy model, which is much more practical than arrange a golden ear listening test. I can also imagine that mechanical evaluation could give an estimation about how far below audibility something is.
In short: IF an accurate psy-model can be made, I can see mostly advantages over listening tests.

Psy models: will they ever be able to replace listening tests?

Reply #26
So, using mechanical evaluation may someday have life for transparency, in my opinion, although I see nothing (public) satisfactory for stereo or spatial hearing yet (but it is possible, that I'm sure of).

As far as I am aware of, this is again the problem of not having enough data from listening tests. For most people stereo problems are not important and very often even impossible to discover. Yet there are rare people that dislike even smallest change in stereo image.

It's not only a spacial/soundstage problem, but there is also multichannel masking/unmasking.

Psy models: will they ever be able to replace listening tests?

Reply #27

I see a philosophical dilemma when evaluating lossy compression using auditory models.

The lossy codecs themsevelves use models of the human auditory system. If we assume that a "state of the art" encoder contains a "state of the art" auditory modell to allocate bits as best we can with current knowledge, then there will be no "even more state of the art model" to evauate the codec?

well, that's where i figure the 'ear' comes in, really.. maybe not everyone's, but Guru will most likely be around for a while longer ;-)

Are there believed to be some upper bounds on lossy compression efficiency? I mean, if you put together a team of dsp and audio people to manually encode a song with unlimited resources, how much better than a general codec could they do (excuding trivial solutions such as building a decoder that essentially contains the single song to be played allready in memory).


there are, yes.. ultimately, that'll be reached once the psymodel is a 1:1 interpreter of the human auditory system, but even with the current models out there I think we're pretty close to that..
Remember, compression (for audio) is removing superfluous information (the 'unheard' stuff that's taken away by the encoder), and possibly looking for signal repetition..
As the latter is very hard to find (the repeating patterns), and since you can't really do what they do in video (where you can basically do save a reference frame+diff for more efficient storage of similar frames), since complex waves are such pains in the ass, (IIR my complex analyses course C ;-)) you're looking at a limit that's dependent on the condition of the human ear.
and while it can be argued that a large percentage of today's youth is happily trying to become deaf before turning 40, i'm not sure that's something you can depend upon them succeeding
(that's also more or less why lossless codecs are having trouble becoming more efficient, btw)


ps. IANAD

A trivial bound:
For an instrumental, electronic song, the information that is put into it never exceeds that needed to describe:
A) A set of note on/off commands, velocity, etc (MIDI)
B) A set of short samples triggered by those commands
C) A set of synthesizer "patches" describing synthetic instrument programming
D) A set of mixing and effects settings

If the encoder can assume that the decoder contains a complete virtual recording studio, then those songs can be compressed (lossless or near lossless) probably very efficiently. A lot better than the General Midi attempt of the mid 90s, and probably more versatile than the synthetic audio parts of MPEG4?


Before dismissing this, perhaps one should consider just how much of the musical events in a typical radio song is made just like this.

-k

Psy models: will they ever be able to replace listening tests?

Reply #28
A trivial bound:
For an instrumental, electronic song, the information that is put into it never exceeds that needed to describe:
A) A set of note on/off commands, velocity, etc (MIDI)
B) A set of short samples triggered by those commands
C) A set of synthesizer "patches" describing synthetic instrument programming
D) A set of mixing and effects settings

If the encoder can assume that the decoder contains a complete virtual recording studio, then those songs can be compressed (lossless or near lossless) probably very efficiently. A lot better than the General Midi attempt of the mid 90s, and probably more versatile than the synthetic audio parts of MPEG4?


Before dismissing this, perhaps one should consider just how much of the musical events in a typical radio song is made just like this.

-k


Unfortunately, that's not a very useful bound, given the addition of things like reverberation, human operation of controls, singing, etc.

An older attempt at this was Johnston, J. D., “Estimation of perceptual entropy using noise masking criteria,” ICASSP '88 Record, 1988, pp. 2524-2527.  and such a work ought to be at least as possible today. I am aware of newer measurements made, but only mentioned peripherally in publication, that put pure transparency at about 1.1 bits/sample for some complex material.

Before suggesting that all is MIDI, one must recall not only preprocessing, but also random issues in synthesizers. Not all random, uniform, flat streams necessarily sound the same, you have to consider both short-term and long-term statistics, something that many random number generators fail at. When you use something like that for your cymbals, how much of the ***audible information*** there is random number generator state?  You might be surprised.

It's even worse with real instruments, almost all of which, while they maintain things like pitch splendidly, exhibit pitch jitter due to basic physics that can be heard in at least some cases.  Your entropy estimates have to capture all the random elements that are audible. Measuring the MIDI rate isn't necessarily going to do that.
-----
J. D. (jj) Johnston

Psy models: will they ever be able to replace listening tests?

Reply #29
.
Well, we have nearly 42000 members here in HAF, I propose that we post samples of files representative of any two given audio scenarios for folks to audition, stating that answers are acceptable only if presented with ABX results in detail - and base findings on these results.

Scenario: We post two files - for instance; one 400Kb/s .OOG (what OOG developers refer to as ‘10’ on the OOG scale), converted to .WAV; the other a pure .WAV file.  We title the files appropriately so folks can sit down and familiarize themselves with the correctly labeled files.  We also post two more files with irrelevant names such as Orange.wav and Yellow.wav – (the same two files as above, but with no way for folks to tell which is which).

We ask participants to ABX the two 'color' files and give us their ABX evaluation sheet ‘print-outs’ and base our findings on these results.  With 42000 members, we are bound to get at least a 1000 replies.  From those thousand, we should be able to extract solid data, no?

Andrew D.
www.cdnav.com

.

Psy models: will they ever be able to replace listening tests?

Reply #30
.
We ask participants to ABX the two 'color' files and give us their ABX evaluation sheet ‘print-outs’ and base our findings on these results.  With 42000 members, we are bound to get at least a 1000 replies.  From those thousand, we should be able to extract solid data, no?
.


How do you prevent bad apples from determining which 'colour' file is which?
Creature of habit.

Psy models: will they ever be able to replace listening tests?

Reply #31
.
Oddly enough, if one takes (from what I have seen thus far), any file, say a 128Kb/s MP3, and converts it into a .WAV format, the file size increases to be exactly the same as a 'standard' .WAV file.  (Can someone please explain why this is?  It surprised the hell outta’ me).

Unless the 'bad-apples' use some kind of HEX file editor utility or (???), to discern what the file was originally comprised of, all should be A-Ok.  However, you are correct in assuming that there will always be numbskulls who wish to screw with people's minds...  I guess one would have to have a margin of error or such to discard odd-balls and buttheads...  I dunno...

Andrew D.

.

Psy models: will they ever be able to replace listening tests?

Reply #32
.
Oddly enough, if one takes (from what I have seen thus far), any file, say a 128Kb/s MP3, and converts it into a .WAV format, the file size increases to be exactly the same as a 'standard' .WAV file.  (Can someone please explain why this is?  It surprised the hell outta’ me).

Unless the 'bad-apples' use some kind of HEX file editor utility or (???), to discern what the file was originally comprised of, all should be A-Ok.  However, you are correct in assuming that there will always be numbskulls who wish to screw with people's minds...  I guess one would have to have a margin of error or such to discard odd-balls and buttheads...  I dunno...

Andrew D.

.

The .wav file is the same because the time is the same, 176400 bytes a second, even if it is quiet.
Paul

     
"Reality is merely an illusion, albeit a very persistent one." Albert Einstein

Psy models: will they ever be able to replace listening tests?

Reply #33

.
We ask participants to ABX the two 'color' files and give us their ABX evaluation sheet ‘print-outs’ and base our findings on these results.  With 42000 members, we are bound to get at least a 1000 replies.  From those thousand, we should be able to extract solid data, no?
.


How do you prevent bad apples from determining which 'colour' file is which?


Put in some controls, and see what happens when people who answer them.
-----
J. D. (jj) Johnston

Psy models: will they ever be able to replace listening tests?

Reply #34
A trivial bound:
For an instrumental, electronic song, the information that is put into it never exceeds that needed to describe:
A) A set of note on/off commands, velocity, etc (MIDI)
B) A set of short samples triggered by those commands
C) A set of synthesizer "patches" describing synthetic instrument programming
D) A set of mixing and effects settings

If the encoder can assume that the decoder contains a complete virtual recording studio, then those songs can be compressed (lossless or near lossless) probably very efficiently. A lot better than the General Midi attempt of the mid 90s, and probably more versatile than the synthetic audio parts of MPEG4?


Before dismissing this, perhaps one should consider just how much of the musical events in a typical radio song is made just like this.

-k

Unfortunately, that's not a very useful bound, given the addition of things like reverberation, human operation of controls, singing, etc.

I did state that my bound was only considering instrumental, electronically generated music, but that most pop music contains sufficient energy produced in such as fashion that at least those components could be recreated this way (simplifying the compression of natural instruments/voice), if only the separate tracks was available pre-mix, pre-rendering and a suitable codec was available. Quite possibly, neither is practically possible.

Also, I did include "parametric post-processing" in my argument, just like General Midi does allow (if I recall correctly) some simple reverb/chorus amount pr instrument. One could hypothetically easily replicate the sound engineer setting of his Lexicon 480L by recording the sound dry and slapping on some metadata either describing the preset used, or a impulse-response to be used for convolution.
Quote
An older attempt at this was Johnston, J. D., “Estimation of perceptual entropy using noise masking criteria,” ICASSP '88 Record, 1988, pp. 2524-2527.  and such a work ought to be at least as possible today. I am aware of newer measurements made, but only mentioned peripherally in publication, that put pure transparency at about 1.1 bits/sample for some complex material.

Before suggesting that all is MIDI, one must recall not only preprocessing, but also random issues in synthesizers. Not all random, uniform, flat streams necessarily sound the same, you have to consider both short-term and long-term statistics, something that many random number generators fail at. When you use something like that for your cymbals, how much of the ***audible information*** there is random number generator state?  You might be surprised.

I dont understand what you are saying. Are you saying that if I input the exact same MIDI sequence into my Waldorf microQ all-digital synth, it will sound different on an audible scale?

If I produce a song using a MIDI sequencer, monitoring using a set of plugins for effects etc. Are you saying that a decoder 10 years from now containing algorithms presicely replicating my monitoring equipment cannot duplicate the sound that I am hearing?

I would argue that "random" and "quasi-random" elements included explisit in synthesizers for sound generation very rarely depend on the specific outcome of the dice thrown. If a sound is mixed with "white noise", 10 different realisations of the same sound would for all intent sound the same, even though a true random noise source would be radically different on a sample-for-sample basis.

Quote
It's even worse with real instruments, almost all of which, while they maintain things like pitch splendidly, exhibit pitch jitter due to basic physics that can be heard in at least some cases.  Your entropy estimates have to capture all the random elements that are audible. Measuring the MIDI rate isn't necessarily going to do that.

What do you mean by pitch jitter?

Of course, I do not mean that something like an acoustic violin equipped with a MIDI sensor generating note-on, note-off, velocity and pitch/pitch-bend information captures the true information of that instrument. A MIDI-equipped piano would probably come a lot closer, but not all the way.

-k

Psy models: will they ever be able to replace listening tests?

Reply #35
Has anyone in the lossy encoding area done any work regarding the usage of musical genre? It seems to me that at least encoding speed and perhaps even accuracy improvements could be made if the lossy encoder was given metadata prior to the encode.

This feeds right into psymodels replacing listening tests, since it seems (to me at least) a big problem to make an encoder which works well across all genre's.

Psy models: will they ever be able to replace listening tests?

Reply #36
Has anyone in the lossy encoding area done any work regarding the usage of musical genre? It seems to me that at least encoding speed and perhaps even accuracy improvements could be made if the lossy encoder was given metadata prior to the encode.

This feeds right into psymodels replacing listening tests, since it seems (to me at least) a big problem to make an encoder which works well across all genre's.


I don't see any point in knowing the genre. Let alone it could make the encoder faster or better. If you have any ideas there I'd like to hear them. What could the metadata tell the encoder that it couldn't figure for itself?

If I'd like to know any metadata for my psymodel, then it would be a profile of the listener  Or playback level that will be used

Psy models: will they ever be able to replace listening tests?

Reply #37
If I'd like to know any metadata for my psymodel, then it would be a profile of the listener  Or playback level that will be used


The background noise level in the playback situation would be nice to know, too

Psy models: will they ever be able to replace listening tests?

Reply #38
The background noise level in the playback situation would be nice to know, too

It is usually assumed that it is near 0 and it is better to try to achieve this in subjective listening tests.

Psy models: will they ever be able to replace listening tests?

Reply #39
The background noise level in the playback situation would be nice to know, too

It is usually assumed that it is near 0 and it is better to try to achieve this in subjective listening tests.

I guess the point was that if the actual frequency-dependant playback noise-floor could be incorporated into the psy-model at encode time, then the priorities of an encoder might change.

Problem is, most realistic usage scenarios of lossy audio supposes a flexible usage, and variable noise-floor.

-k

Psy models: will they ever be able to replace listening tests?

Reply #40

The background noise level in the playback situation would be nice to know, too

It is usually assumed that it is near 0 and it is better to try to achieve this in subjective listening tests.

I guess the point was that if the actual frequency-dependant playback noise-floor could be incorporated into the psy-model at encode time, then the priorities of an encoder might change.

Problem is, most realistic usage scenarios of lossy audio supposes a flexible usage, and variable noise-floor.

-k



One of the important principles of building any psychoacoustic model, in my opinion,is the idea that "you have no idea what the user will do with their volume control".  This is effectively a dual of what you just said.
-----
J. D. (jj) Johnston

Psy models: will they ever be able to replace listening tests?

Reply #41
Considering prediction of volume for playback, predicting the worst case scenario as I see is the best solution. It is however very hard to achieve this solution.

Psy models: will they ever be able to replace listening tests?

Reply #42

The background noise level in the playback situation would be nice to know, too

It is usually assumed that it is near 0 and it is better to try to achieve this in subjective listening tests.

I guess the point was that if the actual frequency-dependant playback noise-floor could be incorporated into the psy-model at encode time, then the priorities of an encoder might change.


Your guess is correct, and anyway, it was more of a tongue-in-cheek comment than a serious suggestion. I understand that since we don't know the actual levels we must assume the worst-case scenario with high listening volume and low or non-existent background noise at the psymodel.

But come think of it, there might be few cases though where the background noise could be made available to encoder. Consider for example the situation where you want to encode some files and listen to them only when you are driving your car: if you could record the noise profile inside the car at typical speed, and then give this info to encoder, it would be able to compress the data much more efficiently than in a case where worst-case (no noise) is assumed. Of course, you would probably be very annoyed if you stopped the car and continued to listen to the music. I don't know, maybe this could be worth considering if the storage cost would be something like 100 times what it is now. But luckily, storage is cheap, and we can happily encode our music with enough bits to make it sound transparent in every situation

 

Psy models: will they ever be able to replace listening tests?

Reply #43
Will any model ever replace people senses, in anything that can be subjective, I doubt it.

I think it will, but it will most likely not be psychoacoustic models or waveform analyzers as we know them today, but some completely new technology we'll see in coming years.

Until then, the only viable way to objectively test the results of audio encoding is to double-blindly compare encoded samples with a lossless reference in enough cycles to get a meaningful test result.

Also, I don't think it would ever be useful to test something which is purely subjective - thankfully, digital media encoding has an aspect of subjectivity (individually preferred sound quality) and another of objectivity (perceptual variance from reference within a test group).  The latter is what we may one day be able to measure in an automated way, but alas, to the best of my knowledge we cannot as yet.