Assumptions Part 2: Homogeneity of Variance/Homoscedasticity

My last blog was about the assumption of normality, and this one continues the theme by looking at homogeneity of variance (or homoscedasticity to give it its even more tongue-twisting name). Just to remind you, I’m writing about assumptions because this paper showed (sort of) that recent postgraduate researchers don’t seem to check them. Also, as I mentioned before, I get asked about assumptions a lot. Before I get hauled up before a court for self-plaigerism I will be up front and say that this is an edited extract from the new edition of my Discovering Statistics book. If making edited extracts of my book available for free makes me a bad and nefarious person then so be it.

Assumptions: A reminder

Now, I’m even going to self-plagiarize my last blog to remind you that most of the models we fit to data sets are based on the general linear model, (GLM). This fact means that any assumption that applies to the GLM (i.e., regression) applies to virtually everything else. You don’t really need to memorize a list of different assumptions for different tests: if it’s a GLM (e.g., ANOVA, regression etc.) then you need to think about the assumptions of regression. The most important ones are:

  • Linearity
  • Normality (of residuals) 
  • Homoscedasticity (aka homogeneity of variance) 
  • Independence of errors. 

What Does Homoscedasticity Affect?

Like normality, if you’re thinking about homoscedasticity, then you need to think about 3 things:

  1. Parameter estimates: That could be an estimate of the mean, or a b in regression (and a b in regression can represent differences between means). if we assume equality of variance then the estimates we get using the method of least squares will be optimal. 
  2. Confidence intervals: whenever you have a parameter, you usually want to compute a confidence interval (CI) because it’ll give you some idea of what the population value of the parameter is. 
  3. Significance tests: we often test parameters against a null value (usually we’re testing whether b is different from 0). For this process to work, we assume that the parameter estimates have a normal distribution. 

When Does The Assumption Matter?

With reference to the three things above, let’s look at the effect of heterogeneity of variance/heteroscedasticity:

  1. Parameter estimates: If variances for the outcome variable differ along the predictor variable then the estimates of the parameters within the model will not be optimal. The method of least squares (known as ordinary least squares, OLS), which we normally use, will produce ‘unbiased’ estimates of parameters even when homogeneity of variance can’t be assumed, but better estimates can be achieved using different methods, for example, by using weighted least squares (WLS) in which each case is weighted by a function of its variance. Therefore, if all you care about is estimating the parameters of the model in your sample then you don’t need to worry about homogeneity of variance in most cases: the method of least squares will produce unbiased estimates (Hayes & Cai, 2007). However, if you even better estimates, then use weighted least squares regression to estimate the parameters. 
  2. Confidence intervals: unequal variances/heteroscedasticity creates a bias and inconsistency in the estimate of the standard error associated with the parameter estimates in your model (Hayes & Cai, 2007). As such, your confidence intervals and significance tests for the parameter estimates will be biased, because they are computed using the standard error. Confidence intervals can be ‘extremely inaccurate’ when homogeneity of variance/homoscedasticity cannot be assumed (Wilcox, 2010). 
  3. Significance tests: same as above. 


If all you want to do is estimate the parameters of your model then homoscedasticity doesn’t really matter: if you have heteroscedasticity then using weighted least squares to estimate the parameters will give you better estimates, but the estimates from ordinary least squares will be ‘unbiased’ (although not as good as WLS). 
If you’re interested in confidence intervals around the parameter estimates (bs), or significance tests of the parameter estimates then homoscedasticity does matter. However, many tests have variants to cope with these situations; for example, the t-test, the Brown-Forsythe and Welch adjustments in ANOVA, and numerous robust variants described by Wilcox (2010) and explained, for R, in my book (Field, Miles, & Field, 2012


This blog is based on excerpts from the forthcoming 4th edition of ‘Discovering Statistics Using SPSS: and sex and drugs and rock ‘n’ roll’.


  • Field, A. P., Miles, J. N. V., & Field, Z. C. (2012). Discovering statistics using R: And sex and drugs and rock ‘n’ roll. London: Sage. 
  • Hayes, A. F., & Cai, L. (2007). Using heteroskedasticity-consistent standard error estimators in OLS regression: An introduction and software implementation. Behavior Research Methods, 39(4), 709-722. 
  • Wilcox, R. R. (2010). Fundamentals of modern statistical methods: substantially improving power and accuracy. New York: Springer.

<!–[if supportFields]><![endif]–>

Top 5 Statistical Fax Pas

In a recent article (Nieuwenhuis, et al., 2011, Nature Neuroscience, 14, 1105-1107), neuroscientists were shown to be statistically retarded … or something like that. Ben Goldacre wrote an article about this in the Guardian newspaper, which caused a bit of a kerfuffle amongst British psychologists because in the first published version he accidentally lumped psychologists in with neuroscientists. Us psychologists, being the sensitive souls that we are, decided that we didn’t like being called statistically retarded; we endure a lot of statistics classes during our undergraduate and postgraduate degrees, and we if we learnt nothing in them then the unbelievable mental anguish will have been for nothing.
Neuroscientists may have felt much the same, but unfortunately for them Nieuwenhuis, at the request of the British Psychological Society publication, The Psychologist, declared the sample of papers that he reviewed absent of psychologists. The deafening sonic eruption of people around the UK not giving a shit could be heard in Fiji.
The main finding from the Nieuwenhuis paper was that neuroscientists often make the error of thinking that a non-significant difference is different from a significant one. Hang on, that’s confusing. Let’s say group A’s anxiety levels change significantly over time (p = .049) and group B’s do not (p = .060), then neuroscientists tend to assume that the change in anxiety in group A is different to that in group B, whereas the average psychologist would know that you need to test whether the change in group A differs from the change in group B (i.e., look for a significant interaction).
My friend Thom Baguely wrote a nice blog about it. He asked whether psychologists were entitled to feel smug about not making the Nieuwenhuis error, and politely pointed out some errors that we do tend to make. This blog inspired me to write my top 5 common mistakes that should remind scientists of every variety that we probably shouldn’t meddle with things that we don’t understand; Statistics, for example.

5. Median splits

OK, I’m starting by cheating because this one is in Thom’s blog too, but scientists (psychologists especially) love nothing more than butchering perfectly good continuous variables with the rusty meat cleaver that is the median (or some other arbitrary blunt instrument). Imagine 4 children aged 2, 8, 9, and 16. You do a median split to compare ‘young’ (younger than 8.5) and old (older than 8.5). What you’re saying here is that a 2 year old is identical to an 8 year old, a 9 year old is identical to a 16 year old, and an 8 year old is completely different in every way to a 9 year old. If that doesn’t convince you that it’s a curious practice then read DeCoster, Gallucci, & Iselin, 2011; MacCallum, Zhang, Preacher, & Rucker, 2002.

4. Confidence intervals:

Using confidence intervals is a good idea – the APA statistics task force say so – except that no-one understands them. Well, behavioural neuroscientists, medics and psychologists don’t (Belia, Fidler, Williams, & Cumming, 2005). (see a nice summary of the Belia paper here). I think many scientists would struggle to say what a CI represents correctly, and many textbooks (including the first edition of my own Discovering Statistics Using SPSS) give completely incorrect, but commonly reproduced, explanations of what a CI means.

3. Assuming normally distributed data

I haven’t done it, but I reckon if you asked the average scientist what the assumptions of tests based on the normal distribution were, most would tell me that you need normally distributed data. You don’t. You typically need a normally-distributed sampling distributions or normally-distributed residuals/errors. The beauty of the central limit theorem is that in large samples the sampling distribution will be normal anyway so you’re sample data can be shaped exactly like a blue whale giving a large African elephant a piggyback and it won’t make a blind bit of difference.

2. Homogeneity of variance matters

Despite people like me teaching the next generation of scientists all about how homogeneity of variance/homoscedasticity should be carefully checked, the reality is that we should probably just do robust tests or use a bootstrap anyway and free ourselves from the Iron Maiden of assumptions that perforate our innards on a daily basis. Also, in regression, heteroscedasticity doesn’t really affect anything important (according to Gelman & Hill, 2007)

1. Hypothesis testing

In at number 1 as the top statistical faux pas is null hypothesis significance testing (NHST). With the honorable exceptions of physicists and a few others from the harder sciences, most scientists use NHST. Lots is written on why this practice is a bad idea (e.g., Meehl, 1978). To sum up (1) it stems from a sort of hideous experiment in which two quite different statistical philosophies were placed together on a work bench and joined using a staple gun; (2) a p –value is the probability of something given that something that is never true is true, which of course it isn’t, which means that you can’t really get anything useful from a p-value other than a publication in a journal; (3) it results in the kind of ridiculous situations in which people completely reject ideas because their p was .06, but lovingly embrace and copulate with other ideas because their p value was .049; (4) ps depend on sample size and consequently you find researchers who have just studied 1000 participants joyfully rubbing their crotch at a pitifully small and unsubstantive effect that, because of their large sample, has crept below the magical fast-track to publication that is p < .05; (5) no-one understands what a p-value is, not even research professors or people teaching statistics (Haller & Kraus, 2002). Physicists must literally shit their pants with laughter at this kind of behaviour.
Surely, the interaction oversight (or the ‘missing in interaction’ you might say) faux pas of the neuroscientists is the least of their (and our) worries.


  • Belia, S., Fidler, F., Williams, J., & Cumming, G. (2005). Researchers misunderstand confidence intervals and standard error bars. . Psychological Methods, 10, 389-396.
  • DeCoster, J., Gallucci, M., & Iselin, A.-M. R. (2011). Best Practices for Using Median Splits, Artificial Categorization, and their Continuous Alternatives. Journal of Experimental Psychopathology., 2(2), 197-209.
  • Gelman, A., & Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press.
  • Haller, H., & Kraus, S. (2002). Misinterpretations of Significance: A Problem Students Share with Their Teachers? MPR-Online, 7(1), 1-20.
  • MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the practice of dichotomization of quantitative variables. Psychological Methods, 7(1), 19-40.
  • Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806-834.