Oh, the words we use

Home / Oh, the words we use

Oh, the words we use

June 16, 2020 | General | 5 Comments

On a road trip this morning, I decided to catch up on some science and statistics related podcasts. Maybe it was the little break I took from listening to them, but it wasn’t long before I was cringing. The implicit messages sent by statisticians and other scientists to a broader audience play a part in perpetuating and contributing to misconceptions and misinterpretations related to statistical inference… and therefore many scientific results.

The goal of simplifying concepts of statistical inference for a general audience is an admirable one. But it does not come without mistakes and unforeseen consequences. There are many phrases that are rampant among scientists who rely heavily on statistical methods in their work and even statisticians. Some statisticians are working hard to counter misinterpretations and misused language, but it will take an awful lot of work to counter the subtle statements fed to the public at a high frequency. And the subtlety of the words makes it more difficult. It’s more about things implied, rather than things explicitly wrong. We hone in on things like p-values, but we rarely talk about the more subtle language used to convey statistics-related concepts within science communication. That’s what made me cringe during the podcasts.

I don’t listen to as many science and statistics related podcasts as I would like to — mostly because of the cringing. The feelings of frustration, or at least feelings that the scientific community is making big mistakes in communication, creep in around the edges and steal my attention — and any enjoyment along with it. It’s probably not fun to be in the car with me at those points, though I’m getting better at transferring the energy into writing instead of voicing my opinions out loud to my family in a less than pleasant tone. Sometimes, I just turn it off. But, I realize that’s no solution — it just keeps me from noticing and learning.

Simplification for communication

In this post, I briefly call your attention to a few phrases that came up in the podcasts I listened to this morning. The goal is just to bring your attention to them, not to do a deep dive into meanings and implications. It’s important to keep in mind that they come from a place of wanting to communicate effectively to a broad audience — they do not come from any blatant desire to mislead. They are meant to make answers or explanations accessible to those with no formal background in statistical theory and foundations.

The goal is to convey complex concepts in simple terms and few words. There is nothing wrong with this goal, but simplifying can be dangerous. It means we’re presenting words that don’t quite capture the truth — the words shave off corners — but we hope they don’t shave off enough to do harm. There are very few statisticians and other scientists who escape this problem if they are trying to communicate about their work, as they should be. I know I am guilty and I’m sure I will be guilty again in the future — even when I’m thinking hard about it! There will always be phrases I haven’t thought through enough before they come out of my mouth.

Opinions will always vary about the severity of the problem and how much harm certain phrases might actually be causing. I believe the harms are bigger than we like to think they are, or at least we should start from that assumption. I think benign sounding phrases have contributed greatly to misunderstandings of the role of Statistics in science, and will continue to do so.

Whether or Not

The phrase “whether or not” is one that comes up over and over again. It can show up with many words around it, but the meaning is generally in line with the statement: “Statistics allows us to conclude ‘whether or not’ there is a real effect.” In one of the podcasts from this morning, the statement (by a prominent statistician) described the purpose of a clinical trial as determining whether or not evidence exists to say that there is a real difference.

On the surface, the statement is simple enough and accessible, but assessing the potential harms of the statement requires trying to put ourselves in the shoes of those who don’t have the knowledge to understand the simplification. Interpretation comes from their understanding of the words in their typical use, not in their understanding of statistical inference. In this phrase, there are some big words whose meanings are pretty clear. We have the “whether or not” — implying a yes or no answer and the ability of research to distinguish between black and white. It ignores the gray area where most of work lands and implies either “evidence exists” or “evidence does not exist”. It ignores considering evidence as measured on a continuum. And we have the words “determine” and “real” – but for today I will stick with the “whether or not.”

This is a phrase where the problem is fairly straight forward. Evidence is not presented as if exists on a continuum, and subject to many assumptions, it is presented as an oversimplified binary “evidence” or “no evidence.” The language implies there is nothing in between. It does not matter if statisticians know there is something in between — it is about how this is translated and internalized by those who don’t realize the subtleties of that point.

We need to think more about how the message is interpreted by those who do not necessarily understand the continuum and the gray area. It suggests the outcome is simply a binary one and that the answer comes without uncertainty. In broad audience explanations, there’s rarely any discussion about arbitrary thresholds needed to move from continuous to dichotomous. This wording implies the ultimate goal is dichotomization and that statistics is helpful because it removes uncertainty and gets us to the black or white answer we all crave.

The “whether or not” phrase is one I have been talking about for almost a decade, mostly with students in my classes. I have tried to remove it from all my wording — in language and writing, though it still can surprise me by creeping in.

Objectivity and unbiasedness

I hesitate to give this section such an enormous heading and I will not dig in deeply. But, the fact it is such a huge topic is exactly the reason we should stop using the terms as if it they are simple and uncomplicated. Commonly stating that the use of statistics, or even aspects of it, magically makes things objective and unbiased is misleading. The entire process of statistical inference is not as objective as statisticians and scientists would like to believe and definitely not as objective as commonly conveyed to broad audiences.

I am skirting around the definition of objective here, but the main point is acknowledging the many decisions and judgement calls along the way — from the design through analysis and interpretation. It is not reasonable to think the same thing would be done by two qualified and reasonable statisticians or scientists, which I think is often what people interpret “objective” to mean — something objective doesn’t have a human component and the “answer” shouldn’t be dependent on the human carrying it out. It’s a story we’ve told about statistics for decades. And, it’s a dangerous story. Even simple statistics reported fall prey to this — there are decisions in terms of how to aggregate or not aggregate that make a difference to the outcome. Those decisions are not objective and they are not free of human biases.

I think I understand what people are trying to say, and because it’s said over and over again, it sounds okay to most ears. But, what are the implications? Should we really be saying it? Are we oversimplifying? The statements may be true on some level, but most people listening to them don’t understand the conditions under which that level is true. Here are some paraphrased examples I heard in the podcasts.

  • “Use of random assignment means we have no conscious or unconscious bias.”
  • “We develop objective rules.”
  • “There’s a desire for decisions that are objectively based on evidence.”
  • “A randomized study allows for an unbiased evaluation of the two things being tested.”

Do we have an understanding of what people listening to statements interpret “unbiased” and “objective” to mean? Are we making things sound far better and more trustworthy than they are?

Defaulting to averages

This is something I have written about before here, here, and here and I suspect there will be more in the future. I don’t have anything against averages, but I do have something against the default use of averages. By default use I mean choosing to base analyses (and then conclusions) on averages (or models for means) without ever thinking about it as a choice or a decision. It is often done without awareness of how the choice of assuming there are groups of homogeneous individuals may impact inferences. Use of averages is so accepted, and expected, that justification for the choice is not asked for.

In my experience, most people relying on them do not even think of it as a choice or an assumption to be justified. When conclusions are translated for a broad audience, the “on average” is often excluded or tagged onto the end of a sentence in a way that implies it can be ignored. It’s not easy to bring this into a conversation simply, but I hope we can stop implying averages are the option and that implications need not be questioned.

For example, in one of the podcasts today the use of average survival time measured in months was referred to. It is often implied that there are no other options and that it is inherently a parameter of interest. There’s no mention of what the distribution of survival times for those of interest might look like. What if many have survival times for about half of participants range from 1-3 months and the rest survive over a year? How useful is the average?

Disclaimers or explanation about who or what results may actually represent are rarely provided and it is easy to assume that the estimate inserted into the statement represents some common person or even the individual listening. This may seem like a minor oversight, but I do not think it’s minor at all when we try to carefully consider how things are interpreted by those who have not had a reason to think through the details (and shouldn’t have to). I’m not suggesting we get rid of using the word “average” or “mean” — just that we consider it and talk about it as yet another assumption that does influence statements about results and general conclusions.

Stop pretending it isn’t hard

I see pretending as a huge part of the problem. We want so badly to communicate ideas and result in a way that is understandable that we greatly simply and then pretend that the simplification covers it. We ignore the extent to which the corners are shaved. We don’t openly acknowledge how hard it is to simplify and not accidentally mislead. There are plenty of other words and phrases we could go into — like “proven.”

We need to reel in the tendency to pretend like most aspects of statistical inference are simple enough to convey in a tweetish-length response. And, we stop pretending that those with formal degrees in Statistics always get it right. It is not about whether the person talking or writing understand what they are trying to say, it is about carefully considering how the words may be interpreted in potentially misleading ways.

It is difficult to talk about how statistical inference works (or doesn’t) to a broad audience. If we’re trying, we will make mistakes and we will want to update things we’ve said and written in the past. I don’t think we can realize the potential harm in what we’re saying until we recognize that. It take others pointing it out or time spent thinking about implications. So, if we slip up or inadvertently imply something, let’s just admit it openly and be honest, even if it’s hard to talk about. The more we acknowledge the challenge as statisticians, maybe the more other scientists and writers will start saying it too, without worrying about egos. Inference and uncertainty are hard we should try not to deny that or lose sight of it, especially when speaking from a place of expertise.

About Author

about author

MD Higgs

Megan Dailey Higgs is a statistician who loves to think and write about the use of statistical inference, reasoning, and methods in scientific research - among other things. She believes we should spend more time critically thinking about the human practice of "doing science" -- and specifically the past, present, and future roles of Statistics. She has a PhD in Statistics and has worked as a tenured professor, an environmental statistician, director of an academic statistical consulting program, and now works independently on a variety of different types of projects since founding Critical Inference LLC.

5 Comments
  1. Martha Smith

    This hits the nail on the head. So often people want/crave/expect certainty. Here is something I wrote several years ago about the importance of accepting uncertainty: https://web.ma.utexas.edu/users/mks/statmistakes/uncertainty.html .

  2. Andrew Gelman

    Megan:

    That last point (“Stop pretending it isn’t hard”) is interesting. As teachers, we’re always told not to tell students that some task is “easy,” as this disparages their efforts. Better to say that it’s “fundamental” or something like that.

    • MD Higgs

      Interesting point. I hadn’t thought about that “easy” connection to teaching. I agree we are told about, and hopefully practice, acknowledging and validating the challenges felt by students. As usual, there are layers and I was after a deeper one. I think there is huge pressure when explaining concepts in a talk or to a broad audience (like in a podcast) to greatly simplify concepts and I don’t think we openly recognize and acknowledge how hard that is to do well. Someone might feel comfortable acknowledging, and believing, statistical concepts are hard for their students, but not realize that communicating about them in brief, simplified ways is also hard for them. I guess my “stop pretending it isn’t hard” is really appeal for a little humility on the side of the “experts.”

      I also believe much of Statistics taught formally in the classroom is too nicely packaged and the simplifications are not adequately acknowledged. Again, we can easily acknowledge the challenges students face in grasping the simplified version of the material without ever adequately acknowledging what we’re leaving out in the simplification. The process and context of the simplification is the hard part I’m worrying about. The worst case scenario in my mind is the old school (I hope!) formula based textbook for an intro course in statistical inference taught by a mathematician (maybe even a first year graduate student with an undergrad math degree) who has no experience with design or analysis in real life and probably has not thought much at all about the process of making inferences. Students still find it hard to get the right answers to the math-like problems, but it is under a pretense of presenting statistical inference as far simpler and more straightforward than it is. It’s not hard to oversimplify and leave out messiness, but it is hard to do it in way that doesn’t leave too much out. And, it is very hard to acknowledge and talk about what’s left out in a meaningful way.

  3. People are not plants – and some meditation – Critical Inference

    […] are often assumed to provide tools to help answer the question – which is why my job often leads me to obsessing about the phrase and its two parts: “whether or not” and “has an effect.” What do they imply […]

Leave a Reply