Statistics question

Topic: Statistics question (Read 3714 times) previous topic - next topic

0 Members and 1 Guest are viewing this topic.

Statistics question

2005-10-08 11:19:39

I have two sets of data:

A:
20733 35555
41604 00402
56614 14202
11001 07171
60201 10047

B:
15151 00082
06006 80202
26110 04800
40731 66438
05770 04410
40050 00621
13115 00161
00106 31866
51140 11404
30007 77157

61474 74421
72300 07482
00802 04000
00551 84223
07132 10067
30600 13070
51065 01109
90022 06100
00041 10774
70613 20840

77100 07205
00064 03570
30735 10056
68532 857

The probability of a number from set A being non-zero is 74%. The probability of a number from set B being non-zero is approximately 66.80%. That number decreased from A to B.

I wanted it to increase.

Assuming that set A is a sampled selection from a larger non-random population, and set B is a sampled selection from another larger non-random population, how big is the chance that this is due to a bug?

Statistics question

Reply #1 – 2005-10-08 12:14:48

Is there anything known about the distributions or nature of the populations?

Statistics question

Reply #2 – 2005-10-08 13:55:12

Ideally, population B should have less zeros than population A does. Well, that's what I expected.

Currently, I can only collect data from population B. It'd take me till next Saturday (maybe earlier, if I'm really lucky) to collect data from population A.

Statistics question

Reply #3 – 2005-10-13 11:06:17

Did you understand my question?

Statistics question

Reply #4 – 2005-10-13 11:31:28

I thought I did, but now that you mention it, I'm not sure.

What happens if you assume that the populations are roughly Gaussian? With the "test" becoming less strict from population A to population B.

Statistics question

Reply #5 – 2005-10-13 13:15:50

let's sum up to see if I get it right:
you have two non random populations with datasets n1<n2 (data from population A is smaller than data from population B). you assume that both are bell curved (or gaussian or normal distributions) which is the standard first approach.
your thesis is that population A has fewer zeros (in respect to what?) than population B.

according to your data, this is not the case, so the alternative hypothesis is true for now.
of course this is flawed as you have far less data from A.
as you collect more data from any population (randomly!) you get nearer to the true value.
did you collect data randomly (from your two populations)? if not then your data is worth nothing.
on which basis did you separate the two populations, i.e. what are your variables?

[span style='font-size:8pt;line-height:100%']edit[/span]

Quote

With the "test" becoming less strict from population A to population B.

what do you mean by this? did you change methods of collecting data? that would be no good...

Statistics question

Reply #6 – 2005-10-13 15:51:02

OK. There's this game I play, in which you can acquire skill points when you gain an experience level. If you accumulate a certain required number of skill points, you can gain/strengthen a skill.

After a lot of hard work, I saved up enough skill points to advance in magical music.

If I succeed in playing magical music, I record the effect, in numbers. If I fail, I record 0. All samples are sequential, with nothing left out.

Set A was sampled when I was rank 3, the third highest rank when it comes to magical music. Set B was sampled right after I became rank 2. Now that my rank is now higher, I expect my success rate to be higher.

I can't go back to rank 3 tilll Saturday, unless I'm lucky.

Statistics question

Reply #7 – 2005-10-13 16:27:22

Code: [Select]

#include <stdio.h>
#include <stdlib.h>
#define R2 66.80
#define R3 61.80
#define LOOP 100
#define R2_TRIALS 80
#define R3_TRIALS 50

float pPlay_trial(float pRate, int nCycle);

int main(void)
{
    int i;
    int nAnomaly=0;

    for (i=0; i < LOOP; i++)
    {
  float pR2;
  float pR3;
  pR2=pPlay_trial(R2,R2_TRIALS);
  pR3=pPlay_trial(R3,R3_TRIALS);

  if (pR2 < pR3 || pR2 == pR3)
      nAnomaly++;
    }

    printf("%f\n", ((float)nAnomaly)/LOOP);

    return 0;
}

float pPlay_trial(float pRate, int nCycle)
{
    int r3[nCycle];
    int i;
    int nSuccess=0;

    for (i=0; i < nCycle; i++) 
    {
  r3[i] = rand() % 100;
  //printf("%02d\n",r3[i]);
  if (r3[i] < pRate || r3[i] == pRate)
  {
      nSuccess++;
  }
    }
    return ((float)nSuccess)*100/nCycle;
}

With R3 defined as 61.80 I get 0.280000. With R3 defined as 56.80 I get 0.090000.

Statistics question

Reply #8 – 2005-10-13 16:52:14

Quote

OK. There's this game I play, in which you can acquire skill points when you gain an experience level. If you accumulate a certain required number of skill points, you can gain/strengthen a skill.

After a lot of hard work, I saved up enough skill points to advance in magical music.

If I succeed in playing magical music, I record the effect, in numbers. If I fail, I record 0. All samples are sequential, with nothing left out.

Set A was sampled when I was rank 3, the third highest rank when it comes to magical music. Set B was sampled right after I became rank 2. Now that my rank is now higher, I expect my success rate to be higher.

so you talking about a classic RPG here

A= 13 zeros /50
B= 107zeros /238
so far so bad. the thing is that you have to few data on level 2 (A) to make a direct valid comparision between the two sets.

also, is the outcome using the magical music skill only determinated by that skill alone or are there any other factors (as often in RPGs) that add to the rate of success?

Statistics question

Reply #9 – 2005-10-13 18:07:39

Success rate for crafts goes up on Mondays, but I don't think there's anything which affects music.

Um, oops. That code was written last week, when I assumed everything was random.

Statistics question

Reply #10 – 2005-10-13 20:01:37

Quote

I thought I did, but now that you mention it, I'm not sure.

What happens if you assume that the populations are roughly Gaussian? With the "test" becoming less strict from population A to population B.
[a href="index.php?act=findpost&pid=334003"][{POST_SNAPBACK}][/a]

If you know the populations are Gaussian/Normally distributed, you can use statistics with more distinguishing power, than if you are working with a black box. That's why I was asking.

Statistics question

Reply #11 – 2005-10-15 09:25:54

Here are some more samples from rank 3:

40030 20142
20301 76110
61470 65070
30740 06004
52650 05401
05040 77161
15507 03100
11146 00115
10720 40114
06001 71002

Success rate is 65%. Averaged with previous samples, the success rate is 68%. It's still higher than 66.80%, but close enough to convince me I was just lucky when I got 74%.

Notice