Giving too much power to power

Home / Giving too much power to power

Giving too much power to power

November 11, 2019 | General | 6 Comments

In many scientific disciplines, power analysis has become a prerequisite for grant funding. And grant funding has become prerequisite for survival of a scientist.

I strongly believe that effort spent in design phase of any study is the most important part of the research process. But… I have always felt very uneasy about power analysis. I suppose my uneasiness is less about power analysis itself, and more about the extreme and automatic reliance on it, coupled with a surprising lack of accountability for justifying its use and its results. Why do presumably skeptical scientists seem to give so much power over to power analysis?

If you’re reading this and don’t really understand what I mean by “power analysis,” I’m referring to statistical power and its common use in justifying the number of subjects (or other units) for a study or experiment. “Sample size calculations” don’t have to be based on statistical power, but often are. Statistical power is directly related to the concept of Type II error rate — and I have more blog posts coming about Type I and Type II error rates (and what they might really mean, or not mean, to you). There is plenty of intro-level information out there on these concepts — just read it with skepticism. For this post, there’s one really important bit of background information needed, and I don’t think it’s controversial. I’ll state it one long sentence with three parts. Power analysis relies on a set of assumptions; the results (the seemingly satisfying number(s) spit out from the analysis) are conditional on those assumptions; and the results are only as justified as the assumptions are justified in the context of the problem (e.g., “garbage in, garbage out”).

Now, back to the question. Why is there such a tendency to over-rely on power analysis, particularly without adequate justification for the underlying assumptions? I am fully aware I can’t answer this huge question adequately in one blog post, but I would like to throw a few thoughts out there. I have to start somewhere.

The reasons for the often blind trust in power are less about Statistics and more about human nature and current scientific culture and paradigms. I keep coming back to two things that help me understand this phenomenon (and others like it). First, relying on it simplifies life for a lot of people and seems logical if you have a superficial understanding of power analysis. It serves the gate-keepers, allowing them to do their job quickly by simply checking tickets without the knowledge to assess if they might be forged. Second, though related, it provides comfort because it spits out numbers and makes an incredibly challenging study design decision seem very easy and as if it has a correct answer. It provides a false sense that we have taken a challenging problem, full of uncertainty, and dramatically reduced the uncertainty associated with it. The inherent uncertainty does not disappear, but is effectively swept under a rug. As conveyed by Herbert Weisberg, we can proceed with the calculations after willfully ignoring many conceptual sticking points. If one is willing to ignore the very tenuous underlying assumptions, then power analysis appears a very useful construct.

But, I think there’s another layer to the willful ignorance part of the story. The term includes the word “willful,” implying the person appealing to it has enough knowledge to be aware of what they are ignoring. That is, they understand the possible problems and unresolved issues, but they willfully make a decision to ignore them — presumably out of weighing risks associated with doing so. Appealing to willful ignorance brings some sense of comfort at being able to move forward with the problem, but it should also carry a healthy dose of discomfort from an understanding of what is being ignored. If one is not aware of the underlying issues and problems, then the decision to go forward is very comforting because there is nothing to invoke the balancing discomfort. Unwillful ignorance brings far more comfort than willful ignorance — naively proceeding without the ugly knowledge of what is being swept away. It gives a greater sense of trust in the method and its results. This envisions yet another continuum based on depth of knowledge about a topic. In order to willfully ignore something, we have to have awareness of it. To gain awareness, we have to be open to listening to the views of people who have spent time thinking hard about a problem — which is often (hopefully!) the case with statisticians and power analysis.

For nearly 20 years, I have been having conversations with researchers about my views on a healthier approaches when power analysis is desired. I have tried many different strategies, tones, etc. But, it nearly always felt like I was arguing against a tidal wave of pressure pushing researchers to do it “the usual way.” My advice as a PhD statistician usually could not compete with the culture and system the researchers were trying to swim in. I walked a fine and uncomfortable line in trying to help as part of my paid job as a statistician — by trying to help justify assumptions and push out of default settings to critically think through the logic for each problem. I am no longer in such a job and hope never have to be in that position again — the realization that I have escaped it still elicits an overwhelming sense of relief (might even qualify as joy). That said, there is still plenty I want discuss about power analysis. It is a tangible context for researchers and has the potential to be a door into deeper conversation we need to be having about the use of Statistics in science in general.

Here’s an email exchange I had with a very successful researcher. I quotes are direct from emails — I only left out minor detail with the ellipsis to help with anonymity.

“Megan, I don’t know what to tell you. Let’s stop here and if the reviewers want a power analysis I’ll find someone else to help me.”

Me: “Just wanted to reach out again and say I am happy to work on this with you if we can collaborate to think through and justify (as best as possible) the choice of numbers going into the power analysis.”

“Thanks for reaching out, Megan, and thanks also for your kind offer. I know you disagree, but I’m going to stick with my bad science power analysis for this proposal — it’s what the NIH program officer I’ve been talking with told me to do. I will appreciate your help with a real power analysis … once we have some pilot data to inform good decision making. But thank you.”

This scenario is not at all uncommon and spans many disciplines. I share this one not to pick on the person, but because I have it in an easily quotable email. I always talked openly about it with my students, but have found it difficult to motivate and engage in productive discussion with researchers who had already had plenty of success navigating through the gate. It is a sensitive topic that is uncomfortable for many to talk about frankly. It can feel embarrassing for everyone involved. It is not productive to blame any one individual who is trying to survive and thrive in their profession with the current gate keepers. Honest and open conversations are needed without fear of it impacting a career.

My hope is that one day my work to educate and help people think through the foundations and underlying logic of things like power analysis will be valued more than my ability (that I refuse to use) to thoughtlessly punch unjustified numbers into an unjustified formula to appease a gate keeper who probably isn’t aware the tickets they are punching are forged. And, refusal of a statistician to participate in forging a power analysis ticket should be professionally respected.

About Author

about author

MD Higgs

Megan Dailey Higgs is a statistician who loves to think and write about the use of statistical inference, reasoning, and methods in scientific research - among other things. She believes we should spend more time critically thinking about the human practice of "doing science" -- and specifically the past, present, and future roles of Statistics. She has a PhD in Statistics and has worked as a tenured professor, an environmental statistician, director of an academic statistical consulting program, and now works independently on a variety of different types of projects since founding Critical Inference LLC.

6 Comments
  1. George Savva

    Thanks for a thoughtful and thought provoking post. I’m writing to give a couple of counterpoints as somebody who does still punch numbers into formulae to get sample sizes, and as somebody at the other end of the transaction, evaluating funding applications as part of funding panels.

    Dealing with the latter; in my experience funding panels know that power calculations are rough guides at best. I believe (or rather I hope) that what is being judged is the credibility of the assumptions and whether or not the researcher is genuinely questioning the feasibility of their proposal and what their actual research objectives are. I fully agree with you that shoehorning a power calculation onto a fully worked up and costed design is an utter waste of time, and that this does happen more often than it should, but I think we can appraise projects better with this element than without. In any case, I’d be interested on your thoughts as to how instead we should ask researchers to determine or justify their sample sizes. We should push funders to change if an alternative is apparent.

    Speaking as a practitioner, the need for a power calculation (for funding or ethics) is often the only reason a researcher consults a statistician at all (the other main reason being a reviewer’s comment they can’t handle), and it is valuable to keep it in place as it acts as a gateway to broader questions about design and analysis (as you say). Your post does make me worry that perhaps the people who I am doing power calculations for don’t understand how vague they are and how dependent on the assumptions they make, but I hope I convey that in consultations and in broader educational efforts. On a practical level we do stop worthless studies through being forced to fill in power calculation boxes, and its hard to see how else we’d be able to put the brakes on studies that are going to be too small to be useful.

    Thanks again for a great post, it has given me confidence to be more honest in future (when the pressure is on to give the sample size required), and to trust my judgement when I feel it’s inapprorpiate to calculate power in the conventional way.

    • MD Higgs

      Thank you so much for the thoughtful comment and for raising those important counterpoints. I hesitated to present such a cynical view as I did, but am finally to the point where I think there is value in having such views out there. I have been struggling against this cynicism for about 20 years, yet it has only gotten worse through my experiences.

      Here are a few responses to your comment. First, I did not mean to imply that statisticians should never crunch the numbers. I think there is a lot to be gained from the exercise of going through a power analysis – if the focus is on justifying the inputs and other assumptions, rather than on the end result. And, as you point out, it is a great time for a statistician to be brought to the table and hopefully their expertise on the topic is respected, even if it differs from discipline and funding agency norms. Power analysis can be incredibly important front end work, but I don’t usually end up caring about the numbers at the end much because it’s rare the assumptions can be justified to a degree that would make me trust the numbers. I believe we should, however, refuse to do the work of crunching the numbers in situations where the researcher refuses to do the hard work of justification or is going to over-sell the results as fact without adequate justification of what they are conditional on.

      In my experience, I have rarely seen a power analysis presented in a grant proposal using enough words justifying assumptions that someone could adequately evaluate reasonableness of the results. And, that’s under the assumption that the people reviewing it have the knowledge base needed to critically assess the justifications if presented. I have met with many very successful “quantitative” researchers who have not before recognized power analysis as more than just a “calculation” with an answer. I worry that this is case with many grant reviewers.

      I understand the desire to weed out “worthless” studies early on, but in my experience the power analysis is not at all adequate to judge worth (or worthlessness) of a future study. I think it brings a false sense of comfort that we are doing that, but I don’t see power analysis check box as a tool for stopping worthless studies. In my cynical view, I see it as letting through those who know how to play the game (like the email I quoted in my post) and penalizing those who don’t. Perhaps we are not funding those who are willing to think outside the box and who are less likely to give in to culture norms. No way to know, but worth considering?

      I haven’t really addressed your practical question about how we should ask researchers to change. My short answer is we should focus more on how the decisions to rely on chosen assumptions are justified. This implies the need to state and discuss assumptions and choices for inputs in a less superficial way than is typical. Assumptions are never “met”, so it is justification of the decision to base your calculations on the assumptions and how that translates into trust of the results. Let’s open the door to honesty about uncertainty and lack of trust in assumptions, rather than taking a salesman-like attitude to covering up the faults of a power analysis (usually for lack of looking hard for faults rather than purposeful cover-up). Admittedly, this is harder and takes more physical space in a proposal and therefore is hard to get traction on. It also can’t be done well until the author has a deeper than usual understanding of the concept of power. For example, justify the Type I and Type II error rates chosen (rather than take the discipline-accepted defaults) and go through a thorough process of specifying a practically meaningful difference (or analogous quantity) one would like to detect (rather than grabbing statistically “significant” “effect” sizes from previously published work). An estimated effect being judged statistically “significant” in one study does not imply it is practically meaningful (same old arguments about the difference between practical and statistical “significance” apply). This just scratches the surface and I will try to write more about what I see as some deep conceptual issues with power analysis soon.

      I am quite happy to hear that in your experience “funding panels know that power calculations are rough guides at best.” This makes sense to me, but I’m not convinced there’s as much skepticism in the numbers as there should be. Have most people judging the calculations thought deeply enough about the limitations of power analysis as commonly carried out, even if on some level they recognize it as a “rough guide?” I have often heard and read variations of the following: “Based on the power analysis, we will be able to detect a significant difference if it exists by using XX subjects.” The definitiveness of this statement is misleading and quite disturbing to me, even if we could agree on the definition of “significant difference.”

      I better stop there. Thank you again for taking the time to comment!

      • George Savva

        Thanks for the response. You make a lot of powerful (ha) arguments. Since power is so tied to statistical significance I wonder if this would be a good time to write something for publication for general science community; that is a cautionary note about the validity and interpretation of power calculations and how they are used and abused.

        • MD Higgs

          Thanks, and agreed. I am writing away and trying to find a more fun and effective way to get people to question the foundations. Stay tuned…

  2. Martha Smith

    “I suppose my uneasiness is less about power analysis itself, and more about the extreme and automatic reliance on it, coupled with a surprising lack of accountability for justifying its use and its results.” This says it well.

    Also, I notice that you haven’t mentioned the ideas of Type S and Type M errors. These can be useful in helping people understand that power analysis may be missing some very important ways things can go wrong in a statistical analysis.

    • MD Higgs

      Thanks for the comments! I agree there is a lot more to be said to help people understand limitations of common approaches to statistical analysis (and power analysis as an example), including Type S and Type M errors. I will try to get to more of these soon — my wish list of blog posts is long and keeps getting longer each day!

Leave a Reply