Earguy\'s improved Digital Ear

Topic: Earguy\'s improved Digital Ear (Read 26108 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Earguy\'s improved Digital Ear

Reply #100 – 2002-03-03 15:13:00

Maybe VL and Garf et all are using different
Ogg Vorbis Encoders? Often a different compiler switch or another compiler (GCC, MSC, ICC) is
enough to generate different _sounding_ files.

In this area C is not very stable. Mathematicans
still prefer Fortran over C. One reason is the
bad portability of C relative to FORTRAN from the view of numerical predictability.

Especially rounding FP to int is sloooow and a mess in C.

--
Frank Klemm

Earguy\'s improved Digital Ear

Reply #101 – 2002-03-03 15:26:35

Quote

Originally posted by Frank Klemm
Maybe VL and Garf et all are using different
Ogg Vorbis Encoders? Often a different compiler switch or another compiler (GCC, MSC, ICC) is
enough to generate different _sounding_ files.

Seems unlikely to me they used anything besides the official build.

Moreover, it seems even much more unlikely it would be bad enough that one file sounds fine while the VL result indicates large distortions.

Quote

Especially rounding FP to int is sloooow and a mess in C.

Vorbis works around the worst problems by controlling the rounding mode itself.

--
GCP

Earguy\'s improved Digital Ear

Reply #102 – 2002-03-04 14:03:15

VL outputs a results.txt file that includes 251 pairs of numbers. The first number of the pair is the total difference VL heard at that frequency over the entire music clip. The second number of the pair is the total count of samples that VL heard a difference at that frequency over the entire music clip. The idea was to look for bad spots regardless of the music clip length.
The graphs are then the 251 total difference values divided by the 251 total difference counts. I then take these 251 graph points and simply average them to get an overall average difference value (although I feel the graph is worth more than just one number).

VL does no normalization at all, so if the test wav is louder or quieter than the original wav, then VL will report this as a difference. Perhaps this is what happened for the group tests (ff123 mentioned that each of the wav files were normalized, which could have added bias to the results). At the core of VL, it compares a computed "specific loudness" value at each frequency for both the original wav and test wav, so biasing this by having one wav louder than the other will make VL report a difference. As a test I changed VL to use an original wav and the original wav *0.90 for the test wav (taking each original wav sample and multiplying it by 0.90 or 90%). Needless to say, VL thought it sounded terrible (akin to the two samples for Vorbis). So with VL any lossy compression technique that doesn't reproduce a wav file with the same volume, already has a disadvantage.

The frustrating thing about VL to me is that it just says it heard a difference, but VL can't say what it heard (was it volume changes, pre-echo, chirping, drop-outs, etc...?) I had VL listen to the training samples and sure enough, VL definitely heard a difference for all of them, but VL can't describe what it heard differently. It is like asking someone "Do you know what time it is?" and they answer "Yep".

Because of this, having VL listen for pre-echo is kind-of futile. Unless we can create an original wav and a test wav where the only difference is pre-echo, then we don't know if the difference VL heard was indeed the aftifact we were testing VL's detectability for or not. I doubt a test like this could be setup to everyone's liking (but I could be wrong).

Earguy\'s improved Digital Ear

Reply #103 – 2002-03-04 14:53:28

Quote

(ff123 mentioned that each of the wav files were normalized, which could have added bias to the results).

What I meant was that I checked to make sure that each file had exactly the same volume as the original. Some of the presets in lame now use --scale, so that's one thing to watch out for.

ff123

Earguy\'s improved Digital Ear

Reply #104 – 2002-03-04 15:34:50

Quote

Originally posted by EarGuy
Because of this, having VL listen for pre-echo is kind-of futile. Unless we can create an original wav and a test wav where the only difference is pre-echo, then we don't know if the difference

Ok, well here are 2 totally artificial test clips, which should create only pre-echo artifacts.
http://sivut.koti.soon.fi/julaak/blips.flac
http://sivut.koti.soon.fi/julaak/Short_Block_Test.flac

But still most of the tracks i listed previously should sound just fine at higher bitrates, except creating pre-echo:
http://www.hydrogenaudio.org/forums/showth...d=9799#post9799

Quote

ff123 wrote
What I meant was that I checked to make sure that each file had exactly the same volume as the original. Some of the presets in lame now use --scale, so that's one thing to watch out for.

From already tested presets, --alt-preset 192 uses scale. I think it should be tested with --scale 0 added to the command line. Also --r3mix uses scale, so --scale 0 should be added here as well, when it's tested.

Earguy\'s improved Digital Ear

Reply #105 – 2002-03-04 15:35:32

EarGuy,

First: please don't take anything I'm about to say as criticism - they're supposed to be helpful suggestions. Second: please don't take anything I'm about to say as suggesting that I could do better. If I could, I would have done - but I can't, so I haven't! Finally, I appologise if I'm teaching my grandmother to suck eggs. Right then...

The ear model you're using is good, but to make it assess audio quality correctly you've got to be more careful with the inputs and outputs.

For the inputs, I assume the original and coded versions are time-aligned, but it's good to remove any other linear changes before sending the signal(s) through the ear. If the volume has been changed, fix it. If there's a subtle (but constant) change in frequency response, fix it. If there's a fixed low-pass, note this, and try applying the same low-pass to the original, so you can guage the effects of coding artefacts and simple low passing separately. This last point applies more to low quality audio than what you're dealing with here, but the other two are possibly relevant. Obviously you have to account for any "fixes" you apply in the final judgement of quality: if the volume needed restoring, it's not a problem; if the frequency response needed restoring, then the codec has problems.

I'm familiar with the kind of time-varying frequency dependent error surface that the ear outputs, and my opinion is that accurately assessing the significance of the features on this error surface is just as important (and just as difficult) as generating it in the first place. In other words, when you've generated this surface, you're only half way there. The outputs you are currently generating (251 pairs of numbers) are probably not optimum for subsequent processing. But as is painfully obvious, the time-varying error surface is just too complicated to use on its own.

I've read about (but not tried) the techniques used in the PEAQ basic and advanced models. I assume you have too. Without access to a large training database, the neural network is probably a non-starter, but maybe some training could be accomplished using the MPEG test data that is readily available, and by re-creating the tests of r3mix and ff123 - the test samples and results are readily available.

If you don't want to use the PEAQ etchniques, I acheived surprisingly good results using the work of Mike Hollier. He defines a quantity called "Error Entropy" - basically a large amplitude error lumped together in one place is much worse than an equivalent error volume spread throughout the extract. This makes sense: a single large drop-out is more annoying than a constant but barely audible distortion. The ear hears them both, and at the correct loudness - but by summing the total error in each band you make them equivalent.

You can read more in:

Hollier, M. P.; Hawksford, M. O. J.; and Guard, D. R. (1995).
Algorithms for Assessing the Subjectivity of Perceptually Weighted Audible Errors.
Journal of the Audio Engineering Society, vol. 43, Dec., pp. 1041-1045.
Hollier, M. P.; and Cosier, G. (1996).
Assessing human perception.
BT Technology Journal, vol. 14, Jan., pp. 206-215.

When it comes to transforming the outputs of the ear into a single indication of perceived audio quality, I'm not suggesting this approach is the best - I think the PEAQ approach is probably much better. However, Mike Hollier's approach is easy to code, so it is probably worth a try. If it doesn't help, you won't have wasted much time.

Finally, I can't find the reference, but the AES conference paper on PEAQ from 98 or 99 talks at length about the difference between a good ear model, and a good audio quality assessment system. They try to show that the former is only a small component of the latter - there are commercial reasons for them to make this point, but it is largely true!

I hope this information is of some use to you - and I appologise again if I'm telling you things that you already know. Please don't vanish again if you do decide to make any improvements - the results you're giving now are both interesting and useful.

All the best,
David.
http://www.David.Robinson.org/
P.S. every time I have written "the ear" in this message, I am referring to your digital ear model, not human ears!

Earguy\'s improved Digital Ear

Reply #106 – 2002-03-12 18:14:48

Still I think it would be nice to see what kind of results would latest Gogo Nocoda, with decent VBR setting, get in software analysis.

Earguy\'s improved Digital Ear

Reply #107 – 2002-03-13 03:48:16

Is there any hope VL will be released to the public?

(It has already been asked in this thread, but I found no answers)

Regards;

Roberto.

Earguy\'s improved Digital Ear

Reply #108 – 2002-03-13 13:23:57

Is earguy reading this? Can someone email him?

Cheers,
David.

Earguy\'s improved Digital Ear

Reply #109 – 2002-03-13 13:33:57

Quote

Originally posted by 2Bdecided
Is earguy reading this? Can someone email him?

Cheers,
David.

Yes, I think he's reading. He visited here yesterday.

Earguy\'s improved Digital Ear

Reply #110 – 2002-03-14 14:02:47

My personal web page server was changed. Here is a new link to the VL graphs.
Virtual Listener Graphs

As far as making VL public, I have Ivan Dimkovic trying VL privately, to see if it is useful for the work he is doing. VL was written for speed, not for user friendliness, so it can be tricky to setup a listening job. Also, VL currently requires a PIII or P4 and Windows 98/2K/XP in order to run (due to using SSE instructions).

I have reservations of making VL public, mostly due to the fact that I don't want to support it. I have moved on to other projects, since VL helped me to decide that the only way to archive my CD collection is by using lossless compression.

Hopefully, Ivan can post his experiences of using VL and this might encourage someone to write their own implementation of Frank Baumgarte's ear model (if it is deemed useful). Everything needed to implement the model is in his dissertation.

Earguy\'s improved Digital Ear

Reply #111 – 2002-03-14 14:26:38

Quote

Originally posted by EarGuy
I have reservations of making VL public, mostly due to the fact that I don't want to support it. I have moved on to other projects, since VL helped me to decide that the only way to archive my CD collection is by using lossless compression.

Hopefully, Ivan can post his experiences of using VL and this might encourage someone to write their own implementation of Frank Baumgarte's ear model (if it is deemed useful). Everything needed to implement the model is in his dissertation.

Can't you make it open source? I think it's very unlikely that someone else would implement soon another implementation of Baumgarte's ear model.
But someone could very well tweak VL a bit or implement a better error analysis like 2BDecided suggested.

I don't think that VL would have been needed to the conclusion that if you want totally perfect quality absolutely always, you need lossless compression...

Earguy\'s improved Digital Ear

Reply #112 – 2002-03-25 17:41:28

I have stopped doing tests. It was brought to my attention, and rightly so, that interpreting the raw data that VL generates is as subjective as the listening tests themselves. How one condenses the raw VL data can totally change the ranking of the listening tests. There have been many papers written on this problem, with different solutions provided. Which interpretation you want to believe is subjective and the purpose for VL was to be fair, unbiased, and deterministic. Frank Baungarte himself told me that the ear model was only intended to be accurate close to the threshold of hearing, and found the idea of using the model to rank different encoder/settings intriguing, but was concerned about the results because the model wasn't built to be used quite this way.

I have since decided to only use lossless compression and have moved on to other projects of interest to me (such as improving lossless compression).

I have been asked to make VL source code open source. I'm afraid the source is in no way readable by anyone who wasn't familiar with the different versions along the way. I didn't comment very much since I originally never planned to make VL public. I'm sorry if this isn't what you all wanted to hear.
:'(

Earguy\'s improved Digital Ear

Reply #113 – 2002-03-25 18:16:43

Earguy, you should at least provide the binaries..
In my opinion there's lots that can be done to tweak the interpretion of the raw data results VL outputs.

You don't even have to host the binaries, I'm sure there are plenty of people willing to host it. Of course it would be great to have the source too. And of course you don't have to include your personal information with the release..

This is just too valuable to let go like this..

Earguy\'s improved Digital Ear

Reply #114 – 2002-03-25 18:23:40

Quote

Originally posted by EarGuy
Also, VL currently requires a PIII or P4 and Windows 98/2K/XP in order to run (due to using SSE instructions).

Athlon XP supports SSE too.

If your problem with VL is that you don't want to support it,
don't include your email in the distribution

As for the code being unreadable, sounds familiar, but not really a big issue.

Really, if these are the only reasons not to opensource it, please give it another think...

--
GCP

Earguy\'s improved Digital Ear

Reply #115 – 2002-03-25 20:06:35

Quote

Originally posted by JohnV
I'm sure there are plenty of people willing to host it.

They would be very welcome at my page.

Isn't there some kind of licensing issue with VL? (Like the one with Eaqual)? Then, would be even another reason to host it here at Brazil.

Regards;

Roberto.

Edit: if there's any interest, you can contact me at rjamorim at yahoo dot com.

Earguy\'s improved Digital Ear

Reply #116 – 2002-03-25 22:40:17

VL doesn't output the raw data. The reason is because for one second of listening, VL internally generates 88200*251*4 or about 90MB of data for ONE second. So the interpretation of the data almost has to be done inside VL instead of outside VL.

There are no licensing issues that I'm aware of.

Instead of reverse engineering my archaic code, it would be best for a developer instead to read the dissertation and create the model from scratch. The only thing I did was change the convolution to a FFT convolution for a 15:1 speedup, and utilize SSE instructions to get an additional 4:1 speedup. Really, read through the appendix sections of the dissertation, the model is spelled out pretty clearly and not very complicated. The complexity is in trying to improve the processing speed. I had the first prototype written in a week, the speed optimization is what took a year. Unfortunately, the speed optimization is what has changed the code to be almost unreadable (even by me at this point).

Earguy\'s improved Digital Ear

Reply #117 – 2002-03-25 23:32:11

So... hummm... that means you won't release even the binaries?

Pretty please?

Earguy\'s improved Digital Ear

Reply #118 – 2002-03-26 00:24:23

Quote

Originally posted by EarGuy
The complexity is in trying to improve the processing speed. I had the first prototype written in a week, the speed optimization is what took a year. Unfortunately, the speed optimization is what has changed the code to be almost unreadable (even by me at this point).

But is the data analysis also so completely unreadable? I would hope that someone would make his own implementation of Baumgarte's ear, but realistically, it's very unlikely to happen in the near future. It could take many many years.
Then again, someone could definitely tweak the data analysis, if the source is available.. I'd guess this is much easier thing than starting all over again. It's very unlikely that someone would start this now all over. Please provide the source.. You don't have to support it in any way. You don't have to host it. You can totally forget it if you want. But please provide it..

Notice