Worried about stopping talk about statistical and practical significance

Home / Worried about stopping talk about statistical and practical significance

Andrew Gelman published a post last week titled Stop talking about “statistical significance and practical significance.” Combined with previous posts he links to, he shares traps people can fall into when using logic/explanations such as “its statistically significant, but not practically significant.” He points out the importance not forgetting about variation in effect sizes, conveys issues with implying larger effect size estimates are better, and says “I’m assuming here that these numbers have some interpretable scale.” These are all things I agree with — so my comment is not in disagreement with his points, but to voice my concern that we should balance these points with talking about the positive aspects of talking about practical importance. My motivation probably comes out of worry for how the post might be interpreted by those who didn’t really want to deal with justification of practical significance anyway, or were struggling to even know how to do it. As I said in my comment (copied below), I may be overreacting, but I don’t think it can hurt to put my reaction out there. The post produces an image in my head of a comic strip where the researcher is first working hard to justify what effect magnitudes might be judged practically meaningful as part of their study design, then they take a break to read Andrew’s blog for statistical wisdom, and then the next frame shows them smiling with a text balloon saying “Gelman says I don’t have to worry about practical significance!”

I am not going deep into the issues here, but just sharing my comment, which I tried to keep fairly short. I guess the main message I’m hoping to get across is that understanding and actually interpreting statistical summaries relative to the scale of the chosen measurement is important to good research. A researcher should understand what differences or changes on that scale might have relevance to practice or theory before setting out. One of my biggest frustrations as a collaborative statistician has been hearing the perception that such an exercise is unnecessary because it is believed that is the role of statistical tests. And I think I even remember feeling this way as a graduate student in another discipline before going back to school in Statistics. I believe we should have higher expectations for researchers to take this on in the design and in the interpretation of results from statistical models — after years of many letting only “statistical significance” do the judging for them. Sure, mistakes will still be made, but anything that encourages or builds expectations for doing such work is a step in the right direction in my opinion. I would love to see people arguing over how practically meaningful, or even realistic, an estimated effect (or range of effects) is — as opposed to continuing to accepting an unjustified verdict based on a p-value or other default criteria (assuming people aren’t going to give up on such criteria anytime soon).

Andrew,

While I agree with your points about the potential pitfalls of “talking about statistical significance and practical significance” (as made in this and the previous blog posts you link to), I worry more about the harm in *not* acknowledging clinical/practical relevance than I do about the harm from falling into traps you describe. We have seen what the world is like when researchers are not required to explain/interpret/justify magnitudes of effect sizes in the specific context of the problem before declaring them “significant” (or not); a world in which it is easy to bypass the challenges of interrogating choice of measurement and how that choice connects to statistical parameters to be estimated in favor of handing the hard work over to “statistical significance” thresholds. I see the momentum for talking about clinical significance as a small step in building expectations for justifying practical relevance, *including* justification for why a large estimated effect is realistic and should (or should not) be trusted. I am likely overreacting here, but I worry that some people will take too much from your blog post headlines (without reading or understanding the details) and think “Gelman says we don’t have to worry about clinical relevance,” thus inadvertently giving some the perception of a free pass.

Mistakes are going to be made on either end (for large and small estimated effects), but not talking about clinical relevance isn’t going to solve that, particularly if we continue to largely rely on single studies, or even a few studies by the same research group. Encouraging discussions of practical/clinical significance can at least start to push people to not stop with “statistical significance.” I believe anything that gets researchers, or those using the outcomes of research, to have to think hard about and justify what magnitudes of an effect would have practical/clinical relevance is important to research. This goes both ways – not only having to justify why a small, but precisely estimated, effect should not be celebrated, but also having to justify why a large (probably uncertain) effect could be plausible in real life before celebrating (given the design, measurement, things that cannot be controlled for, etc.) For example, the reported effects of the childhood interventions, such as the Perry Preschool program, on adult earnings always seemed unrealistically large to me, especially when considering the huge challenges in estimating such an effect in real life (RCT or not). Another small step forward in the practical/clinical relevance discussion is questioning large effects, as well as small. 

So, I agree that discussion of statistical vs. clinical significance could be improved and that there are still holes one can fall in even if considering clinical significance, but I see an expectation for the discussion as a step in the right direction. I see the good outweighing the harm given current practices in many disciplines. In general, I think it is about creating an expectation for justification and explanation in context, and getting away from simply trusting a point estimate and/or p-value — the whole interval of values should be considered and interpreted, and then revisited and interpreted when results from other studies studying a similar effect come out. And, even better, ranges of clinical/practical relevance (as well as those too large to be realistic) could be specified a priori.

From the Statistical Modeling, Causal Inference, and Social Science blog – December 29, 2021

Thanks again to Andrew for motivating a post here after a bit of a dry spell. It’s a new year!

About Author

about author

MD Higgs

Megan Dailey Higgs is a statistician who loves to think and write about the use of statistical inference, reasoning, and methods in scientific research - among other things. She believes we should spend more time critically thinking about the human practice of "doing science" -- and specifically the past, present, and future roles of Statistics. She has a PhD in Statistics and has worked as a tenured professor, an environmental statistician, director of an academic statistical consulting program, and now works independently on a variety of different types of projects since founding Critical Inference LLC.

Leave a Reply