Statistical Methods for Listening Tests(splitted R3mix VBR s

Topic: Statistical Methods for Listening Tests(splitted R3mix VBR s (Read 28043 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Statistical Methods for Listening Tests(splitted R3mix VBR s

Reply #75 – 2001-11-09 21:35:21

tangent,

Thanks. I'll get that into version 0.4. Version 0.3 is at:

http://ff123.net/bootstrap/bootstrap03.zip

Here are some improvements I'd like to schedule for version 0.4:

1. An improved rerandomization (permutation) algorithm which will be greatly speeded up. Also, since step-down is not strictly valid with permutation resampling, I will revert back to single-step for this.

2. Bootstrap step-down, for improved power.

3. Resampling to arrive at unadjusted p-values. This isn't really needed, because I don't care too much about the unadjusted p-values, but it should be nice to see in place of the current normal model (calculated unadjusted p-values).

ff123

Statistical Methods for Listening Tests(splitted R3mix VBR s

Reply #76 – 2001-11-11 05:02:13

Interestingly, I note that by using my bootstrap resampling technique, I can arrive at resampled, unadjusted p-values which are almost identical with the blocked ANOVA model, which is good, because it means that what I am doing is exactly what I want.

I.e., here is the blocked ANOVA p-value table for the AQ1 data:

Code: [Select]

         dm-std   dm-xtrm  dm-ins   cbr256   abr224   r3mix    cbr192

mpc      0.174    0.075    0.010*   0.003*   0.001*   0.000*   0.000*

dm-std            0.673    0.218    0.113    0.056    0.026*   0.000*

dm-xtrm                    0.418    0.243    0.136    0.070    0.000*

dm-ins                              0.721    0.496    0.315    0.006*

cbr256                                       0.746    0.517    0.015*

abr224                                                0.746    0.036*

r3mix                                                          0.075

and here is the tweaked bootstrap resampled version of the same thing with 100,000 trials (p-values are not adjusted for multiplicity):

Code: [Select]

         dm-std   dm-xtrm  dm-ins   cbr256   abr224   r3mix    cbr192

mpc      0.160    0.068    0.009*   0.003*   0.001*   0.000*   0.000*

dm-std      -     0.663    0.206    0.104    0.051    0.023*   0.000*

dm-xtrm     -        -     0.398    0.229    0.125    0.064    0.000*

dm-ins      -        -        -     0.712    0.480    0.299    0.005*

cbr256      -        -        -        -     0.737    0.503    0.014*

abr224      -        -        -        -        -     0.737    0.032*

r3mix       -        -        -        -        -        -     0.067

Notice a similarity?

BTW, this also means that the blocked ANOVA using a protected Fisher's LSD does not control experiment-wise error!

ff123

Statistical Methods for Listening Tests(splitted R3mix VBR s

Reply #77 – 2001-11-12 02:09:50

Version 0.4 is complete and up at:

http://ff123.net/bootstrap/bootstrap04.zip

It implements bootstrap free step-down p-value adjustment. Just type:

bootstrap aq1.txt

This will run 10,000 bootstrap trials of the AQ1 data, and the results will be:

Code: [Select]

BOOTSTRAP version 0.4, Nov 10, 2001

Input file : aq1.txt

Read 8 treatments, 42 samples



                            Unadjusted p-values

         dm-std   dm-xtrm  dm-ins   cbr256   abr224   r3mix    cbr192   

mpc      0.174    0.075    0.010*   0.003*   0.001*   0.000*   0.000*   

dm-std     -      0.673    0.218    0.113    0.056    0.026*   0.000*   

dm-xtrm    -        -      0.418    0.243    0.136    0.070    0.000*   

dm-ins     -        -        -      0.721    0.496    0.315    0.006*   

cbr256     -        -        -        -      0.746    0.517    0.015*   

abr224     -        -        -        -        -      0.746    0.036*   

r3mix      -        -        -        -        -        -      0.075    



Each '.' is 1,000 resamples.  Each '+' is 10,000 resamples

.........+



                             Adjusted p-values

         dm-std   dm-xtrm  dm-ins   cbr256   abr224   r3mix    cbr192   

mpc      0.684    0.442    0.124    0.059    0.024*   0.010*   0.000*   

dm-std     -      0.980    0.719    0.545    0.395    0.245    0.002*   

dm-xtrm    -        -      0.906    0.739    0.603    0.450    0.010*   

dm-ins     -        -        -      0.966    0.935    0.829    0.085    

cbr256     -        -        -        -      0.738    0.935    0.174    

abr224     -        -        -        -        -      0.919    0.303    

r3mix      -        -        -        -        -        -      0.470

Sorry I didn't include your code, tangent, but I really only sort in one or two places and it wasn't going to save a lot of time to implement your quicksort. Plus I changed the sort routine a little to make it able to sort either from min to max or max to min.

ff123

Edit: It doesn't actually take too long to run a million trials now, so I did:

Code: [Select]

                             Adjusted p-values

         dm-std   dm-xtrm  dm-ins   cbr256   abr224   r3mix    cbr192

mpc      0.690    0.452    0.130    0.061    0.027*   0.011*   0.000*

dm-std     -      0.982    0.726    0.554    0.404    0.248    0.003*

dm-xtrm    -        -      0.906    0.747    0.608    0.456    0.012*

dm-ins     -        -        -      0.968    0.936    0.831    0.088

cbr256     -        -        -        -      0.742    0.934    0.180

abr224     -        -        -        -        -      0.922    0.311

r3mix      -        -        -        -        -        -      0.477

Statistical Methods for Listening Tests(splitted R3mix VBR s

Reply #78 – 2001-11-12 21:54:58

Well, Westfall's paper, "Multiple Testing of General Contrasts Using Logical Constraints and Correlations" (1997) is damn near impenetrable to me. I think a big part of the problem is that I'm not familiar with the mathematical notation. However, if I stare at it long enough, I may start to catch on. Improving on the free step-down bootstrap is a tempting carrot.

Just as an amusing anecdote, I called the SAS institute to find out how much they charged for their software. It's on the order of about $2600 for the required base package for the first year (about $1300 for a yearly renewal), plus $1100 for the optional subpackage, which I presumes runs the types of tests I would be interested in (about half that for the yearly renewal).

That's a little out of my reach :-)

While I'm trying to decipher Westfall's paper, I will probably implement the rest of the rerandomization code (so that it can shuffle the whole pool of values) in order to verify that it gives the same results as the example in the 1993 Westfall/Young book. Also, I will probably add an option to convert the raw data into ranked data.

Then I'll probably integrate it into the current web-based analysis tool. However, I don't want to load down my server's CPU and get myself into trouble, so I'll probably limit that to 1000 trials.

ff123

Statistical Methods for Listening Tests(splitted R3mix VBR s

Reply #79 – 2001-11-17 21:25:21

Yay!

After a couple days of frustration, I finally found the bug in the blocked bootstrap which was causing it to lose power. Now, with free stepdown, for my dogies.wav data, adjusted p-values are (100,000 trials):

Code: [Select]

         AAC      OGG      LAME     WMA      XING

MPC      0.943    0.006*   0.002*   0.001*   0.000*

AAC        -      0.015*   0.005*   0.004*   0.000*

OGG        -        -      0.943    0.933    0.002*

LAME       -        -        -      0.943    0.007*

WMA        -        -        -        -      0.008*

And AQ1 results are (100,000 trials):

Code: [Select]

         dm-std   dm-xtrm  dm-ins   cbr256   abr224   r3mix    cbr192

mpc      0.737    0.524    0.130    0.053    0.019*   0.006*   0.000*

dm-std     -      0.986    0.777    0.612    0.452    0.271    0.001*

dm-xtrm    -        -      0.929    0.792    0.664    0.509    0.006*

dm-ins     -        -        -      0.986    0.947    0.861    0.081

cbr256     -        -        -        -      0.986    0.947    0.186

abr224     -        -        -        -        -      0.986    0.339

r3mix      -        -        -        -        -        -      0.509

And I have a pretty good idea now of how to implement restricted step-down now as well. So I'll release version 0.5 (bug fix) shortly.

ff123

Statistical Methods for Listening Tests(splitted R3mix VBR s

Reply #80 – 2001-11-17 21:55:43

ff123,

What is the correct interpretation of the last chart you posted, namely:

ode: dm-std dm-xtrm dm-ins cbr256 abr224 r3mix cbr192
mpc 0.737 0.524 0.130 0.053 0.019* 0.006* 0.000*
dm-std - 0.986 0.777 0.612 0.452 0.271 0.001*
dm-xtrm - - 0.929 0.792 0.664 0.509 0.006*
dm-ins - - - 0.986 0.947 0.861 0.081
cbr256 - - - - 0.986 0.947 0.186
abr224 - - - - - 0.986 0.339
r3mix - - - - - - 0.509

HAS it finally been proven that dm-preset standard beat --r3mix in that test?

Curious....

Statistical Methods for Listening Tests(splitted R3mix VBR s

Reply #81 – 2001-11-17 22:29:12

No. It will probably never be shown that dm-std beats r3mix with 95% confidence in this data set, even after I implement restricted step-down, which will be more powerful than free step-down (but we'll see). The problem is that there were just too many samples. Each additional sample adds more statistical noise.

On the other hand, you can say that dm-std is better than cbr192, while you can't say the same thing with r3mix.

ff123

Statistical Methods for Listening Tests(splitted R3mix VBR s

Reply #82 – 2001-11-18 08:19:32

Version 0.5 is up, as well as a page for it at:

http://ff123.net/bootstrap/

I have also placed it under LGPL.

ff123

Statistical Methods for Listening Tests(splitted R3mix VBR s

Reply #83 – 2001-11-22 04:18:10

Ok, with the help of Matlab, I believe I finally understand how to find the subsets required to perform restricted step down. The reason it took me so long is because there is apparently a typo in the critical formula!

But after looking at the SAS/IML code which Westfall generated, I was able to sort it out.

So now I just need to generate a few routines to do stuff like multiply matrices and take the inverse of matrices (or snag such routines off the web).

I'm getting close now, I can see the finish line.

ff123

Statistical Methods for Listening Tests(splitted R3mix VBR s

Reply #84 – 2001-12-03 01:36:54

It's becoming clear that finding the subsets for the restricted stepdown is going to dominate the computing time. We're talking an extremely long time to calculate the subsets for 8 codecs. I haven't even finished yet, but the current computations for 6 codecs already takes 11 seconds. It's likely to grow further. Scaling to 8 codecs, that would mean finding all the subsets would take 25 hours! It's the difference between 2^(15-2) versus 2^(28-2).

It can be done, of course, but that's not the sort of thing I'd like to do every day. 7 codecs is about the practical max for this sort of analysis on my 800MHz Celeron, clocking in at 12 minutes, given the current time estimates.

I'm probably about a week away from finishing up the code.

ff123

Statistical Methods for Listening Tests(splitted R3mix VBR s

Reply #85 – 2001-12-03 15:52:47

I guess it's time to write a distributed version of bootstrap.... call it bootstrap@home

Statistical Methods for Listening Tests(splitted R3mix VBR s

Reply #86 – 2001-12-04 08:00:55

I hit yet another roadblock: singular or near-singular matrices. A couple of test cases I tried in Matlab give ambiguous results because of the matrix inversion involved. I have tried a request for help in sci.stat.math with no reply, so I have finally written to Dr. Westfall himself in search of assistance. I hope he is amenable to spending time on a person with a hobby.

ff123

Statistical Methods for Listening Tests(splitted R3mix VBR s

Reply #87 – 2001-12-04 16:04:11

Dr Westfall replied. I should be using the generalized inverse, which is what superscript "-" means (superscript "-1" means the normal inverse). So his formula didn't contain a typo. I just didn't know enough to know what it meant! Doh.

He also informed me that some German scientists are coding the algorithm (finding subsets for restricted step-down, I gather) into R.

ff123