Assumptions are not met. Period.

Home / Assumptions are not met. Period.

Assumptions are not met. Period.

October 1, 2019 | General | 4 Comments

Statistical inference is built upon layers of assumptions. My colleague, Katie Banner, is giving a presentation today on some of our joint work and she came up with a great way to coarsely describe the layers of assumptions: those we are aware of and talk about; those we are aware of, but rarely talk about; and those we are not aware of (and therefore also do not talk about!).

Those on the surface seem obvious and make their way into even introductory courses — e.g., normality, linearity, constant variance, and independence in the context of linear regression. As we go down through the layers, they are increasingly ignored and accepted as implicit parts of the process — rarely acknowledged or discussed. But perhaps “ignored” is not fair in the deeper levels — I think it is more a reflection of honest ignorance. Many (most?) people using or relying on statistical inference are simply not aware of the layers or the need to peel them back. I have alluded to some of the deeper layers in other posts — such as the often automatic reliance on models for means and even the decision to use probability as the basis for inference (and all the assumptions that accompany that huge statement). These are things I can’t stop thinking about — as I struggle to figure out how to communicate the importance and associated problems to other scientists. But today, I am staying on the surface, as there are plenty of challenges there as well.

Assumptions are not met. Stating that an assumption is “met” implies that the assumption has been checked and concluded to be true. In 1000’s of Statistics courses around the world, students are being taught to use this wording and I believe it has substantial negative impacts on science and contributes to a lack of critical thinking to back up inferences.

Let’s take “normality” since that is an assumption many people are aware of from intro stats classes. The validity of statistical results may rest on the assumption that the errors are normally distributed (based on the Gaussian probability distribution model). In practice, it is common to use plotting techniques (that have their own problems and limitations), such as boxplots or Normal Q-Q plots, to “check” the assumption. If things “look okay” then the students are often taught to say “the assumption of normality is met.” This is a false statement. No amount of justification could ever convince me to believe it. The errors around a mean do not arise from a normal distribution — we just hope to model them as if they do and the validity of our inferences depends on the degree to which the “as if” approach is reasonable. The “checking” is really an assessment of how severely violated the assumption is. We know it is violated, but how severely is the violation? This is not a yes or no question, it is a question of degree of severity. The answer must be a justification — based on plots of the data, based on knowledge of the measurement and population, based on the study design, and based on knowledge of the robustness of the method to violations of the assumption. Is this easy? No. But assessing assumptions should not be an exercise is deciding whether they are met, or not. This is just another in the long list of false dichotomies that have become associated with the use of statistical methods in practice.

If you are a researcher, or a teacher of statistics, please do not treat assessment of an assumption as a yes or no question. Do not let your students ever write or mutter the phrase “the assumption of _______ is met.” I know from experience that it is possible to teach and report on research without falling into the trap of this false dichotomy.

Assumptions are not met. Period. We must discuss the severity of the violation in non-trivial way to assess the reasonableness of the suggested model.

About Author

about author

MD Higgs

Megan Dailey Higgs is a statistician who loves to think and write about the use of statistical inference, reasoning, and methods in scientific research - among other things. She believes we should spend more time critically thinking about the human practice of "doing science" -- and specifically the past, present, and future roles of Statistics. She has a PhD in Statistics and has worked as a tenured professor, an environmental statistician, director of an academic statistical consulting program, and now works independently on a variety of different types of projects since founding Critical Inference LLC.

4 Comments
  1. thomas marvell

    Probably the most difficult assumptions to meet – and to prove that they are met – in regression analysis are the lack of simultaneity and omitted variable bias (which are very similar problems). The researcher is unlikely to know all the causal forces that swirl around in the topic being studied.

    • MD Higgs

      Thanks for the comment! That definitely counts as one of those assumptions that is rarely explicitly talked about or justified. The bigger point I was hoping to make is — it is not only difficult to meet assumptions or prove they are met — but impossible… because they are not met. I think a baby step in the right direction is getting rid of language that implies they are, or can be, actually met in practice. Letting go of this “met or not” idea tends to make people uncomfortable, but that’s okay — it should feel uncomfortable. We can, and will, choose to rely on many assumptions to be able to go forward using methods we believe help us learn about the world. There is no reason the assumption has to be “met” for it to be useful– we simply need to justify that using it is reasonable (despite it not being met). In my mind, this usually means the downsides of appealing to the assumption(s) do not appear to outweigh the potential gains. This is a judgement call and we should have to deal with the discomfort of that.

  2. Teddy Ampian

    Hi Megan, I just found your blog from Gelman et al.’s “Stat Modelling” Columbia webpage. I appreciate this post; it broadened my perspective. I wonder if we should replace or build on “assumed” terminology with “presumed”? That is to say, we can at least supply evidence to suggest the violations of our “assumed” condition are within acceptable range (thus “presumed”). Thank you for the blog post and I look forward to reading more.

    • MD Higgs

      Teddy,

      Thanks for the comment and I apologize for missing this one (for months!). Thanks for sharing the thought and suggestion — I will think more about “presumed”!

Leave a Reply