From me to my Valentine: A chocolate box of statistical paradoxes
I don’t often write about my subject on OurWarwick but today I can’t help myself – this blog will be all about Stats. In particular, I’m talking about statistical paradoxes with a Valentines Day twist, so crack open some bubbles, get the ready meal in the oven and show your statistical side some love.
It’s Valentines Day so I’m obviously going to turn this day of romance into an example for Stats. Suppose you and your significant other are trying to choose between a romantic, candlelit dinner at Pizza Hut or Burger King (your other half has really pushed the boat out) . You really want to go to Pizza Hut but they want Burger King so of course as a statistician you do the only sensible thing and ask 57 friends for their opinion. You ask 40 friends if they like Burger King and 17 friends if they like Pizza Hut and break up your results depending on whether your friend is older or younger than you.
Triumphantly you declare that both your older friends and younger friends like Pizza Hut more than they like Burger King. However, devastatingly, you also notice that in total, when you combine the two age groups, people seem to like Burger King more – how is this possible?
It’s all to do with sample sizes, you asked a lot more of your younger friends if they preferred Pizza Hut but a lot more of your older friends if they preferred Burger King.
Correlation does not equal causation
That date is going really well, in fact all of a sudden your partner gets down on one knee and brings out a shiny diamond ring. The other dinner guests go wild – getting engaged in a Burger King isn’t how you imagined it but you’ll take it. You start thinking about marriage in general, how many people are getting married nowadays and what your chance of divorce will be. You do some research and it looks like you can predict the divorce rate by considering the per-capita consumption of margarine. That’s right, maybe if you cut down on your margarine intake you can increase the longevity of your marriage, and I have the graph to prove it.
Of course, unless you’re maliciously feeding your significant other copious amounts of margarine there is no way you can explain this correlation. You can’t deny that these two datasets are correlated but that doesn’t mean that there is any causation behind it – the fact these datasets are linked is just pure luck. There are enough datasets in the world to find baseless correlations, and some great examples are shown on this website.
The Birthday Paradox
The dinner date is over and it’s time to start planning that wedding you have coming up. I imagine most people would want a special day to celebrate and you don’t want to share the same anniversary as another member of your family. Between you and your partner you know 23 couples who have married, what’s the likelihood that at least 2 couples share the same wedding date?
The chance of you and Uncle Harry and Aunty Sally sharing the same wedding date is low, in fact 1/365.
However, how many pairs of couples could possibly share an anniversary if there are 23 of you? If you’re aware of nCr notation then there are 23C2 = 253 possible pairs of anniversaries you have to consider. The calculations are a little intimidating so I’ll leave you find out more here. Let’s just cut to the answer, if you know 23 pairs of couples the chance that at least two will share the same anniversary is more than half.
You better check your diary a little more carefully if you’re desperate to have a unique anniversary.