More on average

Home / More on average

More on average

September 25, 2019 | General | 2 Comments

In all my formal education as a statistician, I have no specific memories of professors or other more experienced statisticians bringing up the potential limitations of averages explicitly. The fact that we average and model means was presented as an implicit fact of how we should do things, not as something open to scrutiny and question. You can see it in the math and proofs, but there is a serious disconnect between how we learn the theory and tying that to questioning what we do in practice to address real problems. This gap has always seemed huge to me — in fact, I moved from being a graduate student in biological sciences to a graduate student in Statistics because the gap felt so massive. It has taken me almost 20 years of circling and trying to put words to what has always felt so uncomfortable about how we teach, talk about, and attempt to use statistical inference. It still feels like a jigsaw puzzle fresh out of the box — this blog is like quickly examining tiny pieces of the puzzle so I can start to make sense of how they fit together into a coherent story I can tell. I want to turn the years of struggle into glimpses of clarity that I can share. Which leads me to what I sat down to say —- I don’t think I fully realized, or at least fully acknowledged, how prominent the puzzle piece focused around our use of averages is. I want to put this out there now while it feels so obvious.

In years of trying my best to teach foundations of statistical inference and common statistical methods, I landed at a few basic messages around this topic that can be loosely summarized as: (1) look at and understand your raw data before aggregating, (2) box plots (and such) do not count as visualizing the raw data — that’s already aggregating, and (3) we should only average things that we are convinced are inherently measuring the same thing.

Using “things” and “thing” in the same sentence never felt pretty, but for some reason it seemed to get the point across. The problem is that I usually talked about it in the context of higher level modeling and not in the initial decision to even base inferences on means to begin with. So, while I think it was helpful and better than nothing, I should have tried harder to go deeper. It was like trying to help someone make ethical decisions about how to spend a pot of money when the money was stolen to begin with. I just wasn’t hitting the discussion at the right level. Why not? The best I can come up with is that my students and their advisors had very specific expectations about what they would get out of their graduate education in Statistics — and those expectations were conditional on the assumption that means represent the magic quantity of interest. We understood that we were educating so students could immediately participate as integral players in the current popular culture of averages. The problem (or one of them) is that it’s incredibly rare for scientists, including statisticians, to explicitly think about that conditions underlying their models, beyond “checking” higher level assumptions in a stale and automatic fashion. We have to start asking what a mean really means in a particular context and what an average might really represent for a particular set of data.

I am very aware that the reason this piece of the puzzle looks so clear and large to me right now is because I am still reading The End of Average by Todd Rose. There are so many things that resonate and help put words and context to my gut feelings of discomfort and academic frustrations. Over the last few years as a statistical consultant, I had many conversations in very different contexts with scientists about what the average calculated from the data (or mean in a model) could reasonably represent and whether that was really what the scientist was after. These were conversations typically met with some interest, but also (at least in my perception) annoyance. Annoyance that I was holding up the process of just proceeding under the status quo. Annoyance that I was making things more difficult when I was supposed to making things easier. The question always arises: “What else is there?”

On that note, I want to share a few quotes from Rose’s book and his discussion of the averagarianism in our society. These resonated with me last night in the context of statistical inference and science. And, this won’t be the last you hear from me on averages.

The primary research method of averagarianism is aggregate, then analyze: First, combine many people together and look for patterns in the group. Then, use these group patterns (such as averages and other statistics) to analyze and model individuals.(21) The science of the individual instead instructs scientists to analyze, then aggregate: First, look for pattern within each individual. Then, look for ways to combine these individual patterns into collective insight.

Pg 69, The End of Average by Todd Rose, HarperCollins, Reference 21 is given in Notes as Rose et al. Science of the Individual, pg 152-158.

The mathematics of averagarianism is known as statistics because it is the math of static values — unchanging, stable, and fixed values.

Pg 68-69, The End of Average by Todd Rose, HarperCollins

Note: The above quote is one I want to come back to later. I see his point, but it presents a limited view of the field of Statistics (if one reads it as if it is Statistics with an upper case ‘S’ — see my previous post on statistics vs Statistics). I think there are paths forward within a broader view of Statistics.

“What you are proposing is anarchy!” (16) This sentiment was perhaps the most common reaction among psyshometricians and social scientists whenever Molenaar showcased the irreconcilable error at the heart of averagarianism. Nobody disputed Molenaar’s math. In truth, it’s fair to say that many of the scientists and educators whose professional lives were affected by the ergodic switch did not follow all the details of the ergodic theory. But even those who understood the math and recognized that Molenaar’s conclusions were sound still expressed the same shared concern: If you could not use averages to evaluate, model, and select individuals, well then … what could you use?

This practical retort underscores the reason that averagarianism has endured for so long and become so deeply ingrained throughout society…

Pg 66 The End of Average by Todd Rose, HarperCollins. Reference 16 is given in Notes as “Molenaar, interview, 2014.”

The above quote is consistent with my experiences. Methods based on averages are available, easy, convenient, and take little creativity — and they are expected in our scientific culture. Justification for using averages is simply not demanded — though justification for use of anything but averages is incredibly difficult to sell.

About Author

about author

MD Higgs

Megan Dailey Higgs is a statistician who loves to think and write about the use of statistical inference, reasoning, and methods in scientific research - among other things. She believes we should spend more time critically thinking about the human practice of "doing science" -- and specifically the past, present, and future roles of Statistics. She has a PhD in Statistics and has worked as a tenured professor, an environmental statistician, director of an academic statistical consulting program, and now works independently on a variety of different types of projects since founding Critical Inference LLC.

2 Comments
  1. Martha Smith

    You might find the following items on the page https://web.ma.utexas.edu/users/mks/ProbStatGradTeach/ProbStatGradTeachHome.html of interest:

    What DoYou Mean By Average
    Weighted Means and Means as Weighted Sums
    Logarithms and Means
    Lognormal Distributions
    Lognormal 2

Leave a Reply