I was doing a little light reading on my Saturday night in my Oxford Dictionary of Statistics by Graham Upton and Ian Cook and came across this definition:
nonsense correlation: A term used to describe a situation where two variables (X and Y, say) are correlated without being causally related to one another. The usual explanation is that they are both related to a third variable, Z. Often the third variable is time. For example, if we compare the price of a detached house in Edinburgh in 1920, 1930, ... with the size of the population of India at those dates, a 'significant' positive correlation will be found, since both variables have increased markedly with time. The first comprehensive study of nonsense correlation was undertaken in 1926 by Yule, who considered the apparent connection between the fall in Church of England marriages and the concurrent increase in life expectancy. See also GOOSEBERRY BUSHES; RUM CONSUMPTION
I had no idea there was an actual statistical term for this - one that I could look up in a real statistics dictionary written by a professor at the University of Essex, no less. Or that there might be funded comprehensive studies for such things.
My favorite example of the confusion between correlation and causation is the Pirate Effect, i.e. There are fewer pirates on the oceans. Global average temperatures are increasing. Global warming is therefore caused by lack of pirates.
And what about the studies that suggest lack of dental health is a causal factor in heart disease? Did they consider that Z = the individual's attention to general health maintenance?
And here's a professor who keeps a list of the headlines that suggest causal relationships when the research was correlative to teach his students that correlation != causation.
But what I really want to know is how do I get funding for a nonsense correlation study? I could come up with an endless supply of possibilities.
I think my first study shall be evaluating the increase in the number of colors in the Oracle installation screens with the parallel increase in the number of poorly configured databases. Clearly, multiple colors combined with pictures must affect the installers ability to make appropriate database configuration choices. Perhaps we could isolate the specific colors that cause this issue, remove them from the tool and improve the condition of databases everywhere.
Or maybe not ...
(sorry - I get cranky when I read statistics on Saturday night. Think I'll read about rum consumption next and start a self funded study of nonsense correlation)
3 years ago