Hydrogenaudio Forums

Hydrogenaudio Forum => Listening Tests => Topic started by: kinnerful on 25 June, 2012, 12:28:14 PM

Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: kinnerful on 25 June, 2012, 12:28:14 PM
http://lifehacker.com/5920793/the-great-mp...rate-experiment (http://lifehacker.com/5920793/the-great-mp3-bitrate-experiment)
http://www.codinghorror.com/blog/2012/06/t...experiment.html (http://www.codinghorror.com/blog/2012/06/the-great-mp3-bitrate-experiment.html)

I guess lifehacker has a larger audience than hydrogenaudio... could be interesting
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: JJZolx on 25 June, 2012, 12:40:29 PM
Anyone who would pay a kid $1 per CD to rip their music collection and then encode it in a lossy format to save a little hard drive space can't be very bright. Ten or fifteen years ago ... maybe it made some sense, but certainly not in today's world of disk space for under $1/GB.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: db1989 on 25 June, 2012, 12:57:45 PM
Quote
The point of this exercise is absolutely not piracy; I have no interest in keeping both digital and physical copies of the media I paid for the privilege of owning[/]temporarily licensing.
I totally agree about CDs being redundant after ripping, but I’m always wary that some might class this as piracy. Setting aside complicated and often retroactive analyses of copyright and fair-use regulations aside, this is a nice compromise:
Quote
I'll donate all the ripped CDs to some charity or library

…or…
Quote
and if I can't pull that off, I'll just destroy them outright. Stupid atoms!
Hahaha.

Back to the main topic, I’m very interested in the results of this – though I imagine they won’t be surprising to people at Hydrogenaudio! In any case, anything that can publicise the effectiveness of perceptual encoding, and possibly debunk a good few myths, to a large readership is very welcome.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: ExUser on 27 June, 2012, 03:07:52 PM
http://www.codinghorror.com/blog/2012/06/c...experiment.html (http://www.codinghorror.com/blog/2012/06/concluding-the-great-mp3-bitrate-experiment.html)

Quote
Running T-Test and Analysis of Variance (it's in the spreadsheet) on the non-insane results, I can confirm that the 128kbps CBR sample is lower quality with an extremely high degree of statistical confidence. Beyond that, as you'd expect, nobody can hear the difference between a 320kbps CBR audio file and the CD. And the 192kbps VBR results have a barely statistically significant difference versus the raw CD audio at the 95% confidence level. I'm talking absolutely wafer thin here.


Seems pretty well-done. Thanks to Zao for the link.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: greynol on 27 June, 2012, 03:24:18 PM
Quote
Beyond that, as you'd expect, nobody can hear the difference between a 320kbps CBR audio file and the CD.

I wouldn't necessarily have such an expectation considering there are members here who can (or at least claim to be able to) regularly ABX 320 CBR against lossless on normal music samples.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: halb27 on 28 June, 2012, 04:22:52 AM
For a mere bitrate comparison it's a pity that VBR 128 kbps wasn't tested.
Current Lame CBR 128 seems to be suboptimal as was found in 3.98's time (-V5 -b128 -B128 being better than CBR 128 on the sample examined then). Lame 3.99 development did not improve upon CBR behavior AFAIK.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: 2Bdecided on 28 June, 2012, 05:57:19 AM
Quote
Lately I've been trying to rid my life of as many physical artifacts as possible.
This is clearly a mental disease, and should be recognised as such. It's the opposite of hoarding (which is what I have!). Up to a point, both are rational responses to some facets of life - the former to the lack of space in modern housing, or the number of times people move these days with the associated hassle of having lots of "stuff" to pack, unpack, arrange, etc; the latter to the transitory nature of parts of life, and the realisation that some things which you thought would always be available, won't be, unless you keep them yourself. Both attitudes to life become a problem (almost a mental health issue) if they start to take over your life or impede more important parts of your life.

Quote
Ripping to uncompressed audio is a non-starter. I don't care how much of an ultra audio quality nerd you are, spending 7× or 5× the bandwidth and storage for completely inaudible "quality" improvements is a dagger directly in the heart of this efficiency-loving nerd, at least.
If you're choosing to keep your own audio files (which itself could be considered eccentric by some in the age of Spotify etc), it's easy to rationalise the need to keep something that will transition to whatever formats/devices arrive in the future - especially when the cost is so low. Hence it's easy to justify FLAC. If you have shorter term goals, it probably doesn't matter.

Encoding to mp3 today is like recording to a decent cassette tape a couple of decades ago. It's a pretty good substitute for the original CD or vinyl - but fast forward 20 years and you'll probably wish you had the original CD or vinyl to make a pristine transfer. FLAC is that pristine transfer.


I heard a DJ last week who should have used lossless. He was DJing for a kids dancing competition. His CD player failed to read one of the kid's CDs. No problem - he had the same track on his laptop. Problem was, the kid was using the version without vocals (for reasons that will become apparent), and he only had the vocal version. Ah, no problem again - the vocal cut feature in the software would sort that. If only it hadn't been an mp3. Vocal cut only works (sometimes) on the highest quality mp3s, and this one wasn't. The vocal bled through as horrible mp3 artefacts. It was so bad that he gave up and switched the vocal cut off. Just at the point where the lyrics said something like "...and you're no fucking use to me..." - as the five year old girl continued through her dancing routine. I doubt they'll be using that DJ again.


Fascinating comparison though. It's always surprising how good even low-ish bitrate mp3 sounds. Better than 99% of digital radio and TV in most of the world. And of course I use it every day.

Cheers,
David.

P.S. It's interesting that some of the comments suggest this track is a bad choice for a codec test because it's old. While it's hardly a "codec killer" for the current lame mp3 implementation, the stereo effects, tape noise, soft transients, and synths are all things that mp3 encoders have choked on in the past. Plus the dynamic range of parts of this track make a welcome change from modern pop which is trashed by dynamic range compression whether you use lossy compression or not. The fact that some of the effects on the raw track sound a bit like codec artefacts doesn't help in a non-referenced comparison like this, but for typical codec testing that's often the kind of thing that makes the codec misbehave. So, historically at least, this is a pretty good mp3 test track.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: lvqcl on 28 June, 2012, 06:32:12 AM
Current Lame CBR 128 seems to be suboptimal as was found in 3.98's time (-V5 -b128 -B128 being better than CBR 128 on the sample examined then). Lame 3.99 development did not improve upon CBR behavior AFAIK.

Changelog for 3.99 beta 0:
Quote
All encoding modes use the PSY model from new VBR code, addresses Bugtracker item [ 3187397 ] Strange compression behavior

However I'm under impression that sometimes it makes more harm than good - at least for low bitrates.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: krabapple on 28 June, 2012, 03:21:07 PM
P.S. It's interesting that some of the comments suggest this track is a bad choice for a codec test because it's old. While it's hardly a "codec killer" for the current lame mp3 implementation, the stereo effects, tape noise, soft transients, and synths are all things that mp3 encoders have choked on in the past. Plus the dynamic range of parts of this track make a welcome change from modern pop which is trashed by dynamic range compression whether you use lossy compression or not. The fact that some of the effects on the raw track sound a bit like codec artefacts doesn't help in a non-referenced comparison like this, but for typical codec testing that's often the kind of thing that makes the codec misbehave. So, historically at least, this is a pretty good mp3 test track.



Would you mind posting this over there?  The ignorance and snobbery in the comments there really begs for a response.  When people insist that only an orchestral or symphonic work will do as a 'real' test of a codec, I can't help recalling that some famous  'codec killers' consisted of solo harpsichord , castanets, or entirely synthetic club music.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: mjb2006 on 28 June, 2012, 04:24:49 PM
When people insist that only an orchestral or symphonic work will do as a 'real' test of a codec, I can't help recalling that some famous  'codec killers' consisted of solo harpsichord , castanets, or entirely synthetic club music.

Well yes, only testing with one specific type of music like orchestral/symphonic is bogus unless the goal is to test the codec's quality in regard to that type of music alone, rather than its quality in general. But likewise, only using killer samples to evaluate a codec's general quality is inappropriate, and only relevant to the extent that 1. someone is sensitive to pre-echo (or whatever) and 2. their collection has moments of solo castanets/harpsichord/etc.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: greynol on 28 June, 2012, 04:30:13 PM
only using killer samples to evaluate a codec's general quality is inappropriate

+1 with a bullet!
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: ExUser on 28 June, 2012, 05:56:27 PM
+1 with a bullet!

?
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: greynol on 28 June, 2012, 05:58:33 PM
One of these:
(https://hydrogenaud.io/imgcache.php?id=d4f66a6c6b23c1250c54eafcc9140db9" rel="cached" data-warn="External image, click to view at original size" data-url="http://www.riflebarrels.com/images/50brass.JPG)
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: JJZolx on 28 June, 2012, 06:17:32 PM
Was this experiment done using ABX?
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: greynol on 28 June, 2012, 06:27:52 PM
You mean did I actually hear eig, trumpet, herding calls or whatever other single sample some "expert" uses to suggest which codec and bitrate I should use for my entire library cry out as it was put to death?
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: db1989 on 28 June, 2012, 06:38:35 PM
It’s more likely that JJZolx is asking about the titular experiment, not questioning your scepticism about exceptional samples being extrapolated as representative of entire libraries. In which case, reading the page is predictably instructive, but for convenience:
Quote
Behold The Great MP3 Bitrate Experiment!

As proposed on our very own Audio and Video Production Stack Exchange, we're going to do a blind test of the same 2 minute excerpt of a particular rock audio track at a few different bitrates, ranging from 128kbps CBR MP3 all the way up to raw uncompressed CD audio. Each sample was encoded (if necessary), then exported to WAV so they all have the same file size. Can you tell the difference between any of these audio samples using just your ears?

1. Listen to each two minute audio sample
[links]

2. Rate each sample for encoding quality
Once you've given each audio sample a listen – with only your ears please, not analysis software – fill out this brief form and rate each audio sample from 1 to 5 on encoding quality, where one represents worst and five represents flawless.
So, no: ABX was not used, but the test was blind nonetheless.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: JJZolx on 28 June, 2012, 06:39:50 PM
Was that in response to my question? I don't get it. I'm asking how the public test was performed by the participants.

Edit: yeah, signals crossed
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: greynol on 28 June, 2012, 06:43:15 PM
Seriousness: +1 (no bullet this time)
Playful banter: 0
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: halb27 on 28 June, 2012, 06:52:37 PM
In one of the comments the author found that many of the listeners assigned the numbers 1 to 5 to the contenders. This suggests that many participants did a ranking instead of a quality judgement. Having looked at the spreadsheet of the listening results I feel like this as well.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: krabapple on 28 June, 2012, 08:34:58 PM
When people insist that only an orchestral or symphonic work will do as a 'real' test of a codec, I can't help recalling that some famous  'codec killers' consisted of solo harpsichord , castanets, or entirely synthetic club music.

Well yes, only testing with one specific type of music like orchestral/symphonic is bogus unless the goal is to test the codec's quality in regard to that type of music alone, rather than its quality in general. But likewise, only using killer samples to evaluate a codec's general quality is inappropriate, and only relevant to the extent that 1. someone is sensitive to pre-echo (or whatever) and 2. their collection has moments of solo castanets/harpsichord/etc.


Some of the audio snobs on that thread assume that symphonic music must be the hardest music to encode...and it's a common assumption.  That's what I was addressing.

Beyond that, I'm not following the point of your reply, nor greynol's thumbsup.  I would expect a codec that had gone through iterative improvement , involving (but not restricted to) serial challenge from different 'killer' samples with different artifacts and coming from different genres, to perform generally better , subjectively as well as objectively, than a codec that had not been put to such tests.  The audio snobs, on the other hand, would seem to hold that codecs should be tuned to symphonic music preferably, for best results generally.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: db1989 on 28 June, 2012, 08:53:08 PM
As I interpret their posts, mjb2006 and greynol want exactly what you do, i.e. to discourage conclusions (proclamations?) based upon single or small groups of killer sample(s) or specific genre(s), as their specificity makes for narrow applicability to general use. I don’t think you folks actually have anything to disagree about!
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: greynol on 28 June, 2012, 09:27:09 PM
We already know the pointlessness of catering to audiophiles since they are never satisfied (which is practically the working definition of an audiophile).  This is without considering their rampant lack of objectivity.

My general concerns that people often draw sweeping generalizations from a small data set are once again front and center here.  Forget about killer samples; lossy codecs are not universally transparent across all music to all people.  It's foolish to think a test like this proves otherwise, not that I believe people here think that.

With regards to what genres are most difficult to encode, I think we've discussed this before.  IIRC when looking at average bitrate at any given VBR quality level on a per genre basis, metal requires far more data than classical.  In terms of samples and positive ABX results submitted to the forum indicating a lack of transparency at high bitrates, again it is usually metal and rarely classical music.

That said, I still applaud any effort that attempts to gather objective data, especially when it places an emphasis on the importance that the data actually be objective.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: 2Bdecided on 29 June, 2012, 05:45:11 AM
Would you mind posting this over there?
Good grief, no...
http://english.stackexchange.com/questions...gue-with-idiots (http://english.stackexchange.com/questions/66460/origin-of-do-not-argue-with-idiots)

...though my reason for resisting is that life can be frustrating enough without going looking for more of it.

Quote
The ignorance and snobbery in the comments there really begs for a response.  When people insist that only an orchestral or symphonic work will do as a 'real' test of a codec, I can't help recalling that some famous  'codec killers' consisted of solo harpsichord , castanets, or entirely synthetic club music.
As well as the experience here, there are AES conference papers that say the same. A quick search couldn't find them though. It's quite telling that the original EBU SQAM CD included so many orchestral instruments and some clips of classical music, while more recent official codec tests don't, because codecs cope with almost all of these just fine (the specific solo harpsichord recording on that CD being the notable exception for a long time).

Cheers,
David.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: Arnold B. Krueger on 29 June, 2012, 09:28:28 AM
http://lifehacker.com/5920793/the-great-mp...rate-experiment (http://lifehacker.com/5920793/the-great-mp3-bitrate-experiment)
http://www.codinghorror.com/blog/2012/06/t...experiment.html (http://www.codinghorror.com/blog/2012/06/the-great-mp3-bitrate-experiment.html)

I guess lifehacker has a larger audience than hydrogenaudio... could be interesting


Ironically, the source music for this alleged test is "We Built This City On Rock And Roll" by Jefferson Starship first released August 1, 1985. If memory serves this  recording is pretty highly processed, even to the point where its original CD track had some MP3-like artifacts.

So without getting involved with Blender Magazine's nomination and VH1's seconding of it as "One of the worst songs ever released", it seems like one of the worst tracks that could ever be chosen for the purpose.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: nevermind on 02 July, 2012, 11:21:01 PM
Sorry to resurrect this thread (without waiting at least several years) but I was having a look at the excel data form this test, thinking, maybe if i remove some of the people who rated the lowest sample higher then cd it would be more interesting, and I think i have found something unusual. It seems that if you look only at people who rated 128 mp3 > cd there is a trend for them to rate it in reverse order, such as 128>160>192>320>cd . I think this might be interesting because this trend seems to occurs in abx's from time to time.

Maybe someone can calculate if this is statistically relevant, it is sort of getting over my head.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: 2Bdecided on 03 July, 2012, 06:52:18 AM
Like someone else said, I think some testers got a bit confused at how they were supposed to be rating these things.

Though I find it amazing that anyone could perfectly rate them (except by accident or cheating). I guess I just have cloth ears!

Cheers,
David.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: splice on 03 July, 2012, 10:04:03 AM
One of these:
(https://hydrogenaud.io/imgcache.php?id=d4f66a6c6b23c1250c54eafcc9140db9" rel="cached" data-warn="External image, click to view at original size" data-url="http://www.riflebarrels.com/images/50brass.JPG)


Tch.  Kids these days... obviously never read Billboard.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: mzil on 03 July, 2012, 10:40:33 AM
maybe if i remove some of the people who rated the lowest sample higher then cd it would be more interesting, and I think i have found something unusual....


See: 1.1 "Discarding unfavorable data" (http://en.wikipedia.org/wiki/Misuse_of_statistics)
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: db1989 on 03 July, 2012, 10:47:38 AM
There is a fundamental difference between data that are unfavourable and data that do not meet the requirements of the test.

Many users submitted their data under the incorrect assumption that the scale of 1–5 was a rank of their preference for each individual sample, with each value being useable only once. In actuality, the scale was supposed to be used as their rating of perceived quality for each sample, with no limit to the number of occurrences.

So, I don’t think your reference is relevant.

Whether or not it’s possible to confidently identify the data that do not meet the actual specification, to discard them, and to retain sufficient numbers to draw a useful conclusion is another question entirely.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: mzil on 03 July, 2012, 01:26:21 PM
You shouldn't cherry pick raw data under any circumstances in a properly unbiased, double blind test. It makes the test suspect,  regardless of the test conductor's intentions, good or evil. If there was poor wording or a misunderstanding in the instructions, then one needs to conduct a fundamentally new test, not discard raw data one "believes" to be compromised.

[In different circumstances I'd accept using one test in an attempt to find certain "gifted" test subjects ,who are then retested, however. This could be used, for instance, to find "golden-eared" listeners.]
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: ExUser on 03 July, 2012, 01:47:07 PM
Disregarding all listeners who rated WAV as less than 5 gives us this chart (based on his Excel file, can't be bothered to make it more "accurate")

(https://hydrogenaud.io/imgcache.php?id=48b0d4bfcbe541e7271f8a2f2ccfafc5" rel="cached" data-warn="External image, click to view at original size" data-url="http://fb2k.net/ha/ch-smart.png)

And our original:

(https://hydrogenaud.io/imgcache.php?id=bf757282e9c6af9fd9f76be5b362c974" rel="cached" data-warn="External image, click to view at original size" data-url="http://fb2k.net/ha/ch-orig.png)

Note how that completely removes much of the preference for the first option, and brings all the other options roughly in line with what we would expect.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: db1989 on 03 July, 2012, 01:51:46 PM
Disclaimer: meandering musings

You can't cherry pick raw data under any circumstances. It makes the test invalid regardless of your intentions, good or evil. If there was poor wording or a misunderstanding in the instructions, then you need to conduct a fundamentally new test, not discard raw data you "believe" to be compromised.

I don’t disagree in principle. Hail science! I was just pointing out that, however scientifically tenuous it might be, excluding data because they were submitted in the wrong format is not exactly equivalent to excluding data because they aren’t conducive to someone’s ulterior motive(s). At the very least, it’s not equivalent ethically: one is done in an effort to improve the reliability of a conclusion, whereas the other is done merely out of cynical self-interest.

Scientific ethics aside (just for a moment! ), is such filtering of incorrectly calibrated data even likely to be possible in any real-life study with any probability of preserving its objective reliability? I lack the experience to answer either way, and I suspect that it’s better avoided anyway due to the same concerns that you’ve raised – but in this case, I don’t think it’s very likely that one could do it. That was what I meant by my closing sentence, although I should have given it more consideration.

Of course, as you implied, this question should never arise: collection of data should be designed so as to preclude any of them being ‘incorrect’ or ambiguous. In this specific case, the take-home message is that instructions must be clear and unambiguous, so that respondents can provide useful data. It’s a shame how this test is somewhat marred by its shortcomings in that area and, as I said, how this confounding factor can’t be removed post hoc.

Disregarding all listeners who rated WAV as less than 5
Since you’ve just reminded me of something I wondered about earlier: how about disregarding all respondents whose data sets included each number only once? Or am I getting desperate here?
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: ExUser on 03 July, 2012, 02:11:13 PM
Since you’ve just reminded me of something I wondered about earlier: how about disregarding all respondents whose data sets included each number only once? Or am I getting desperate here?
(https://hydrogenaud.io/imgcache.php?id=14cc74a15d1bf86cedc983ff3355f73d" rel="cached" data-warn="External image, click to view at original size" data-url="http://fb2k.net/ha/ch-seq.png)
I also excluded all data sets consisting of one number for all entries.

Combining our approaches (restricted to WAV=5, only entries with duplicates) does not provide good results either: [4.11, 3.79, 5, 3.79, 3.52]
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: Porcus on 03 July, 2012, 03:28:27 PM
You shouldn't cherry pick raw data under any circumstances in a properly unbiased, double blind test. It makes the test suspect,  regardless of the test conductor's intentions, good or evil. If there was poor wording or a misunderstanding in the instructions, then one needs to conduct a fundamentally new test, not discard raw data one "believes" to be compromised.


Well ... opinions certainly differ on that one. As far as I know, there is no universally agreed-upon treatment of outliers.

However, if the null is random ranking, then various statistical models could cope with those who rank the other way around. You could formulate the alternative hypothesis to be H1: after possibly switching order of rankings, they are still more concordant with bitrate than what is consistent with the null. But if you start looking at data, you are mining, and that is not without issues either.


Now for designing a new test, you are of course free to look at your old data with any creativity you can imagine. You are essentially looking for any pattern that could be tested.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: mzil on 03 July, 2012, 04:24:47 PM
http://news.change.org/stories/cherry-pick...ientific-method (http://news.change.org/stories/cherry-picking-vs-the-scientific-method)

Quote
As far as I know, there is no universally agreed-upon treatment of outliers
You count them.

As always, if you discover there is flaw it the test design, then you chuck ALL the data in the trash bin and re-design a new test. You don't go back and cherry pick out (keep) only the data you feel with your "completely objective and unbiased view" is "legit".
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: db1989 on 03 July, 2012, 04:30:59 PM
Shall I just repeat what I’ve already said about your allegation that the exclusion of incorrectly formatted data – which was not done by the actual researcher, it must be emphasised – is equivalent to cynical cherry-picking in favour of an ulterior motive? Or are you the only one who gets to repeat yourself?

I don’t disagree in principle that one should always endeavour to solve problems at the earliest/proper point, i.e. the experimental design in this case. I was just musing hypothetically. That last word is important, since it’s me who’s twittering away to myself here, rather than the researcher having actually done this or anything like it! Looking back, I do not agree with the filtering suggested by nevermind, which began all of this, but again: That’s different from asking whether one can filter data that were not formatted correctly. Which, again, isn’t something I think can be done reliably – but it was just a hypothetical question about the possibility of putting a Band-Aid on a less than optimally designed test, not prodding something in a direction according to self-interest
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: mzil on 03 July, 2012, 04:33:20 PM
I think there is a belief here that "as long as the motives are pure, and unmotivated by desired outcome", then cherry picking is "OK". I don't feel that way. There could be things which are unforeseen by all of us.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: ExUser on 03 July, 2012, 04:39:35 PM
Data are data. If there has been some kind of procedural error and it's not feasible to re-run the experiment, it's entirely legit to restrict your data down to the valid subset, if there is some easy way to do so. If, due to some error, only 10% of your data are actually valid, and you can identify that 10% post hoc, there is no reason not to analyze that 10%. It might redeem the entire experiment.

Ideally, yes, you re-run the experiment and try ensure that 100% of your data are valid. This is not always feasible, nor should it be absolutely required.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: saratoga on 03 July, 2012, 05:05:09 PM
As always, if you discover there is flaw it the test design, then you chuck ALL the data in the trash bin and re-design a new test.


While I understand your motivation, this is basically just you unsupportable opinion.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: ExUser on 03 July, 2012, 05:23:25 PM
basically just you unsupportable opinion
Careful, we've stepped below science into its superstructure: philosophy of science. Here there be dragons: terrible things that could render all the lovely objectivity around here into little more than "unsupportable opinion"...
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: mzil on 03 July, 2012, 06:18:17 PM
If there has been some kind of procedural error and it's not feasible to re-run the experiment, it's entirely legit to restrict your data down to the valid subset, if there is some easy way to do so.
...

Huh? I suspect you don't really mean this, unless I am just completely mis-reading it. The feasibility or ease of re-running a test doesn't make a difference as to the legitimacy of the original test.

To paraphrase what you have written, one could say "If it is difficult to re-run a test, then we should accept at least the subset of the data that we believe wasn't compromised, due to the known error", [as long as we still have a large enough sample left over to make the results statistically significant, I guess]. "If it is easy to re-run the test, however, then the original data is suspect, should be ignored, and we should do the re-test."

The difficulty in re-running a test, but this time without the design flaw, doesn't change whether the original test data is legit or not. It either is or it isn't, regardless of the time needed/ease/difficulty in conducting a new test without the design flaw. Right?
---

"Cherry picking" is a type of confirmation bias, more accurately called a "fallacy of suppressed evidence" and may very well be unconscious in nature, despite its sinister sounding name. I wasn't, however, trying to speak poorly of anyone here or question their motives, but I seem to be alone here in thinking that claims of "pure and unbiased" motivation, which of course all scientists think applies to them  , doesn't suddenly make cherry picking "acceptable". Everyone thinks their selection process is "sound, pure, and motivated only by the unbiased pursuit of truth".

As it says here (http://www.iep.utm.edu/fallacy/#SuppressedEvidence), one's motivation may indeed be pure and honest, but the fallacy name, even if not a very good name, still applies:

"If the relevant information is not intentionally suppressed by rather inadvertently overlooked, the fallacy of suppressed evidence also is said to occur, although the fallacy’s name is misleading in this case. The fallacy is also called the Fallacy of Incomplete Evidence and Cherry-Picking the Evidence.."

I unfortunately don't have any more time on my hands to devote to this, so I'm outta here.
Happy July 4th everyone!
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: saratoga on 03 July, 2012, 07:19:52 PM
"Cherry picking" is a type of confirmation bias, more accurately called a "fallacy of suppressed evidence" and may very well be unconscious in nature, despite its sinister sounding name. I wasn't, however, trying to speak poorly of anyone here or question their motives, but I seem to be alone here in thinking that claims of "pure and unbiased" motivation, which of course all scientists think applies to them  , doesn't suddenly make cherry picking "acceptable". Everyone thinks their selection process is "sound, pure, and motivated only by the unbiased pursuit of truth".


I don't think anyone is saying that cherry picking doesn't exist.  I think the point is that you're remarks about cherry picking are not really relevant in this particular instance. 

basically just you unsupportable opinion
Careful, we've stepped below science into its superstructure: philosophy of science.


Which is why it is incorrect to make universal assertions about how things must be done.
Title: Jeff Atwood's "Great MP3 Bitrate Experiment"
Post by: Porcus on 03 July, 2012, 07:59:08 PM
Quote
As far as I know, there is no universally agreed-upon treatment of outliers
You count them.

Before or after you have defined them?


As always, if you discover there is flaw it the test design, then you chuck ALL the data in the trash bin and re-design a new test.

Well go tell that to a paleontologist 


No, seriously: have a look at http://en.wikipedia.org/wiki/Meta-analysis..._and_weaknesses (http://en.wikipedia.org/wiki/Meta-analysis#Disadvantages_and_weaknesses) .