Neither method deals with the problem of undisclosed results. Someone could easily do four sets of 16 trials and get relatively normal results but only publish the fifth set which was unusual.

Or, assuming "way more good faith" and yet getting the same problem:

Four people would get the normal result - which in their honest opinion is nothing to write forum posts about. Then every now and then, by coincidence, someone will by chance get a strange score. And a certain fraction of them would by chance get it twice.

By the way, on the Bayesian approach:

It does not seem controversial to advise the OP to try the test again, in order to find out whether we can write it off as chance.

But that advise is a (rough!) stopping rule.