Monday, 2 September 2013

Survivorship Bias in Science

Let’s imagine for a moment that uncertain job prospects and too much caffeine pushes me over the edge and I gather up every monkey in the world and shut them in a room with a bunch of computers. Sometime later, I return to a lot of flung poo and, among all the random strings of letters typed by the unfortunate (and now cannibalistic) monkeys, I discover that one capuchin has typed the sentence: “HELLO KAT”.

This is a version of the Infinite Monkey Theorem, which basically states that a monkey hammering on a keyboard for an infinite amount of time will eventually type out the complete works of William Shakespeare. It’s all about probabilities.

Give a million monkeys ten years, and the probability that one of them will type ‘HELLO KAT’ entirely by chance is 1 in 2. The same as guessing the outcome of a coin toss*. Throw in all the other 9 character sentences that can be made from the letters on a keyboard, and the likelihood of one of the monkeys NOT typing something meaningful by chance is practically zero.

But what happens if I now take that one, single monkey, and I publish a paper saying that I have found the world’s first literate capuchin? Disregarding all the random sentences typed by all the other monkeys, I proclaim that there was only a 1 in 1.8x107 chance that my monkey could have randomly typed ‘HELLO KAT’. Those odds are so slim that surely this particular monkey must have intentionally hit those particular keys?

This is an example of Survivorship Bias, in which only focussing on the successes while ignoring the failures can lead you to make incorrect conclusions.

The same thing happens when it comes to careers. I can’t count the number of times I’ve listened to a leading scientist explain how they made it to the top using their formula of:

(Being smart) x (Choosing the right field) (Hard Work) + (Networking) = Success

The thing is, this doesn’t take into account all the people who are plugging the exact same numbers into the exact same formula and coming up with entirely different results. When something is heavily dependent on chance and luck, you can’t make conclusions based only on the survivors—you need to check the graveyard too. The road to permanent scientific positions is littered with the tombstones of postdocs who have fallen along the way and I can’t believe I didn’t notice them until the point at which I was down on my hands and knees, scrabbling around in the dirt.

The stupid thing is that others did try to warn me when I started out, but I didn’t want to listen. Looking back, I wish I hadn’t been so quick to disregard the experiences of older scientists finding themselves in the same position I am now in. It was all too easy to presume that they’d done something wrong; that they hadn’t tried hard enough or they simply weren’t very good at science. Understanding the role that luck plays in a scientific career wouldn’t have stopped me from becoming a scientist, but it might have made me less of a dick.

Now that I am picking myself back up and heading off for pastures new, I am experiencing yet another example of Survivorship Bias. People who know that I write science fiction novels in my spare time keep sending me articles about self-publishing success stories. Why are you trying to find a traditional publisher when E. L. James self-published 50 Shades of Grey and look at her now! What they don’t realise is that, for every wannabe author who becomes famous from self-publishing, there are hundreds of thousands who fail miserably.

When it is a scientist who tries to tell me how to be successful as a writer, I ask them if they would self-publish a scientific paper that had been rejected by a few dozen journals. It’s not the same, they say, science isn’t subjective like writing. You either do it right or wrong. Then I sit back and wait for them to do everything right and find that it still isn’t quite enough.

*Let’s just say there are 50 keys on my keyboard. So the probability of that monkey hitting the first ‘H’ is 1/50. The probability that the ‘H’ will be followed with ‘E’ is (1/50) X (1/50) and so on. I worked it out, and the overall probability is 1 in 1.9x1015, which in the grand scheme of things is extremely close to zero. But let’s say that a monkey can type at a speed of 200 characters a minute and it manages to type around 100 million strings of 9 characters over one year. If we work out the probability that the monkey will type ‘HELLO KAT’ at some point over the year, it works out at 1 in 1.8x107 – still very, very unlikely. But what if we give a million monkeys ten years? Now the probability that one will type ‘HELLO KAT’ entirely by chance is up to 1 in 2.3. Entirely doable.

No comments:

Post a Comment