Category: General

Home / Category: General

I happened across this “fact sheet” awhile ago – “How to Reduce the Number of Animals Used in Research by Improving Experimental Design and Statistics” provided by The Australian and New Zealand Council for the Care of Animals in Research and Teaching (ANZCCART). According to their website, ANZCCART is an “independent body which was established to provide a focus for consideration of the scientific, ethical and social issues associated with the use of animals in research and teaching.” I’m not sure it deserves a whole post – but for some reason I started writing one about it – I think because it represents something very common, but not as much talked about as egregious mistakes, misuses, or misinterpretations related to Statistics. It’s not so much about the details, as the attitude or implicit messages about the role of statistical methods and results in the research process. The idea that churning out the numbers with a software package is the end of the game and that statistical analysis plays that end-game role in “confirmatory studies.”

The title and goal of fact sheet sounded good to me, and I was just a little curious about how they wrapped up their advice into a concise “fact sheet” style report for users without much of a background in stats. Taking experimental design and statistical concepts seriously should be part of trying to reduce the number of animals needed for experiments – as well as avoiding common statistical mistakes and misinterpretations. But the fact sheet leaves some things feeling complicated and others feeling way too easy – like what to do with statistical results after you have them and how they play into the conclusions of a “confirmatory study.”

I’m just going to comment on a few things in this post, by focusing more on the implicit messages we can send (inadvertently?) when writing about design and statistical analysis to an audience of researchers with potentially little background in Statistics. I think we do more harm than we realize by perpetuating over simplified beliefs about what is involved in Statistics – beyond just computing statistics.

The advice in the report is meant to apply to “confirmatory experiments,” as opposed to pilot studies or exploratory studies. This always makes me wonder about the assumed role of statistical hypothesis testing in confirmation of a scientific hypothesis. We should always wonder about what role we’re placing the statistical methods in, and particularly whether we’re giving too much of the science or decision making over to simple statistical estimations or tests.

In this fact sheet, there is the implicit assumption or expectation that statistical tests for means (likely through ANOVA) are the way to test scientific hypotheses of differences among treatments in a “confirmatory” way. I am not delving into the extensive and rich literature and debates about exploratory vs. confirmatory research in Philosophy and Statistics, but do point out that the fact sheet presents the approach to confirmatory research as if it is a broadly accepted and agreed upon way of doing business — the disagreements, unsettled nature, and philosophical arguments aren’t mentioned or appreciated. And maybe a “fact sheet” is just not the place or venue to bring them up – but then we at least need to consider the potential implications of continuing to pretend as if there is more consensus than there really is on the role of Statistics in doing science, even pretty clean cut science as described here. The “fact sheet”-like strategy and presentation is simply, oversimplified.

Here is the description of a confirmatory experiment:

Confirmatory experiments are used to test a formal, and preferably quite simple, hypothesis which is specified before starting the experiment. In most cases there will be a number of treatment groups and the aim will be to determine whether the treatment affects the mean, median or some other parameter of interest. In this case it is essential that the experiments give the correct result. It is this type of experiment which is discussed in more detail here.

I find the emphasis on “it is essential that the experiments given the correct result” fascinating. Of course, we would like a “correct result” – but what does that even really mean? Are we talking about capturing some true difference in means by an estimate or statistical interval? Or concluding there is any non-zero difference in means when in fact there really is (or vice-versa)? I could go on, but the point is — If we’re in the land of statistical inference, we’re not in the land of knowing we’ll get “the correct result,” as much as we would love to be. However, I find this attitude common, and concerning. It supposes that the goal of Statistics is to rid a situation of uncertainty, rather than provide something useful in the face of variability and uncertainty. There are many places in the document that I think feed this attitude or message – even if subtle.

The usual issues arise with the brief discussion of “significance level” and “power” – really, the language just perpetuates dichotomous thinking and decision making through the “false positives” and “false negatives” narrative, which I guess is consistent with the need to get a “correct” or “incorrect” result implied in other wording.

Some things I liked

Some tidbits I did like included the easy-to-digest description of identifying or defining the “experimental unit” – something that often confusing in laboratory research with animals, and often where a conversation between a statistician and researcher first leads. The fact sheet directly discusses implications of animals should be “caged” – the implications of which are often considered far too late in the process, rather than the design phase.

Perhaps the part I’m happiest with is this description of “The effect size on the parameter of scientific interest” in the context of power analysis:

This is the difference between the means of the treated and control groups which are of clinical or biological significance. A very small difference would be of no interest, but the investigator would certainly want to be able to detect a large response. The effect size is the cut-off between these two extremes. A large response is easy to detect, but a small one is more difficult so needs larger groups.

It doesn’t say “use the estimate of the effect size from a pilot or other study” – it clearly says to use one that represents “clinical or biological significance,” as I’ve talked about elsewhere. Where the cutoff ends up being placed is tricky business not discussed, but overall it was refreshing to see this. I wonder how/if it could be translated into practice based on the brief description…

A few worrisome things

Use of in-bred strains of mice is encouraged. The decrease in variability makes sense and of course impacts number of experimental units in an obvious and practically relevant way. However, there is no discussion or acknowledgement of the downsides of using in-bred mice in terms of limiting scope of inference, or external validity. This is something that should be considered in the design as a tradeoff, though it’s hard because not quantified by any sample size formula.

Blocking is referenced – which I see as a good thing. But, I find it interesting that a randomized block design with no replication within blocks is presented as the default option. This may be reasonable given the emphasis on minimizing the number of animals, but having to purchase the untestable assumption of no block x treatment interaction should at least be mentioned and considered. There are always trade-offs, and we need to be careful when presenting something as a default, rather than a decision to be made and justified.

Then, a method for justifying sample sizes called the “Resource Equation method” is presented as an alternative to the “usually preferred” power analysis method. This isn’t a method I’m familiar with, so I was a little intrigued – particularly when it was described as a way to get around the parts of power analysis people find most difficult (and therefore often don’t do or take shortcuts to get around). The challenges with power analysis are described:

However, this can be difficult where more complex experimental designs are employed as can happen in more fundamental research projects. For example, if there are several different treatment groups, it can be difficult to specify the effect size (signal) that would be of scientific interest and if many characters are to be measured, it may be difficult to decide which is the most important. The power analysis also requires a reliable estimate of the standard deviation, so it cannot be used if this is not available.

And then the Resource Equation method is provided as an easier-to-use improvement:

The Resource Equation method provides a much better alternative for experiments with a quantitative outcome (i.e. using measurement data). It depends on the law of diminishing returns. Adding one more experimental unit to a small experiment will give good returns, but as the experiment gets larger the value of adding one additional unit diminishes. The resource equation is:

E = (total number of experimental units) -(number of treatment groups)

E should normally be between 10 and 20, although it can be greater than 20 if the cost of an experimental unit is low (e.g. if it is a well in a multi-well plate) or in order to ensure a balanced design with equal numbers in each group. As an example, suppose an experiment is to be set up to study the effect of four dose levels of a compound on activity in male and female mice. This is a factorial design (discussed below), and it involves eight groups (4 doses x 2 sexes). How many animals should be used in each group? According to the Resource Equation if there were, say, three mice per group, that would involve a total of 24 mice and with eight groups E=24-8 = 16. So this would be an appropriate number. Of course, these animals should be chosen to be free of disease, of uniform weight and preferably of an inbred strain.

Wow – very simple and easy. But what is the justification, beyond appealing to the law of diminishing returns? And where does the “between 10 and 20” really come from? I didn’t look into it further. But “The Resource Equation method provides a much better alternative for experiments with a quantitative outcome (i.e. using measurement data).” is a strong statement. I’m not sure why the previous paragraph states that power analysis is “usually preferred” then – as that would have to use a quantitative outcome as well. I do see why practicing researchers would greatly prefer the Resource Equation method for its simplicity and lack of justification needed, but is that a good enough reason? It is certainly easier to disconnect number-of-animal considerations from statistical estimation, but how is that consistent with still relying on statistical estimates or tests in the end? I find this part a bit perplexing.

Interactions are mentioned, but the wording implies that main effects and interactions can be estimated – without making the point that meaningful interpretation of “main effects” in the presence of interactions is an issue. In my experience, even intro classes in analysis of variance don’t do a great job in presenting the reasons, except to say that tests for interactions should happen before tests for main effects.

Finally, to The Statistical Analysis…

Then, the fact sheet ends with a whole section on “The Statistical Analysis”. It has some good advice, such as:

The statistical analysis of an experiment should be planned at the time that the experiment is designed and no scientist should start an experiment unless he or she knows how the results will be analysed. To do so is asking for trouble. They may find that they do not have the tools or the know-how for the analysis so that it does not get done correctly. They may put off the analysis until they have done several similar experiments, but in this case they will be unable to adjust conditions according to results observed in the earlier experiments.

Beyond that, there’s not much substance in the section – it’s short and quite vague. — it sounds like one just needs access to a reputable software package and basic knowledge of analysis of variance. After some exploratory data analysis, “the statistical analysis should be used to assess the significance of any differences among groups.” No mention of estimation of effects or interpretation of results — just assessing “significance of any differences.” Example results are given, but not interpreted.

I’m not sure what I expected from this section, and maybe less info is better than more, but still not enough. I’m not sure. I guess what bothers me is that it’s presented as the culmination of the process – the end step. The hard parts of model checking, interpretation, etc. aren’t acknowledged. Maybe that’s meant to be included in a “basic understanding of analysis of variance,” but that’s not consistent with my experiences as a teacher or a collaborator.

How should the results be used and reported? What is the analysis capable of? What is is not capable of? What are common mistakes and misinterpretations in this context? And maybe my worry and skepticism was largely fed through the statement early in the paper that it is “essential” that the confirmatory studies they are referring to give “the correct result.” If that’s the goal, and the end step in the fact sheet is assessing significance from analysis of variance-related F-tests, what does that imply about the role of statistical inference in the science?

I spent a couple of hours in the car yesterday and got hooked again on the podcast Philosophize This! by Stephen West – not sure why I inadvertently took a break from it. Stephen does an amazing job with his presentation and crams an incredible amount of information into a small space without making me feel too claustrophobic.

One of the episodes I listened to was Martin Heidegger pt. 3 – Authenticity (Episode #102). When I’m listening to these, I of course find myself thinking about life in general, but usually find myself thinking more specifically about how useful the ideas and language are for describing how we do science – applying ideas on the philosophy of being human to the philosophy of doing science. This one was particularly helpful in in terms of adding welcome structure to my thoughts about how statistical methods and related technologies are often used in science today.

Why do we care about what we do in the process of doing science? And, why do we choose the tasks we do – that ultimately make up our scientific practice?

Here’s Stephen West’s intro to Heidegger’s The Care Structure:

Ontological beings constantly engage in tasks that we care about – the things which we care about, and the various things that dictate the things we choose to care about – many of which are entirely out of our control. This overall concept of care becomes a central focus in Heidegger’s philosophy. And the way he breaks down what a Dosein ultimately chooses to care about is commonly explained in terms of three major factors, the group of which is sometimes called the care structure. What a Dosein ultimately chooses to care about comes down to three things: its facticity, its fallenness, and its existentiality.

About minute 7:00 in Episode #102

I find it very helpful to think about these three aspects in the context of doing science — which he ultimately uses to discuss the concept of authenticity. While authenticity is a popular word now, it has always seemed vague and hard to define to me – thus, giving the concept some structure and better definition makes it seem more useful to me. In particular, it seems helpful to reflect on a degree of authenticity with which each of us practices our science – using Heidegger’s three part care structure. First, we need just a quick intro to three parts – facticity, fallenness, and existentiality. I use quotes from the podcast to provide these.

So, the first one is Dasein’s facticity. Heidegger would say “Look, it’s not like before you were born you found yourself you found yourself on some cosmic game show where you got to pick when and where you were born, who your parents were, how tall you were, … No! What happened was one day you just kind of found yourself thrown into existence. Thrown into a particular historical context, a particular cultural context, a particular socio-economic class, a particular gender. None of these are things you explicitly chose, but all of these things drastically influence the tasks you care about enough to care about to be constantly engaged in. This collection of things about your individual being that you had no control over … Whatever it is that you are, these facts and many others like them, individual to you, make up the facticity of your existence. And again, this facticity strongly influence what things you decide to care about.

About minute 7:58 (my transcription may contain small deviations from what was actually said)

It doesn’t take much of a leap to translate this to the context of doing science – we are practicing science in a particular historical and cultural context. Things we did not explicitly choose drastically influence the tasks we engage in to practice science. I am not saying this as a bad thing, but as a thing that just is – a thing we can easily forget, but would be well served to notice and openly acknowledge more often. There seems to be a belief that we are lucky enough to be living after development of The scientific method, with a lack of awareness that humans in the future will look back and criticize aspects of how we do science as silly — even though it might look perfectly reasonable to us at the moment. This is just part of being human, and part of progress. We do this all the time in our own lives! How many times have you made a decision that seemed so right at the time, only to look back later and realize it was silly — but there would have been no way to see the silliness at the time. I’ve diverged a bit from facticity. Let’s look at fallenness:

Fallenness is an important part of being a Dasein. And while at first we may not like to admit all the ways we’re behaving simply because someone else told us to behave that way, but make no mistake – we’re all doing it at varying levels. We’ve all, in a sense, fallen into tasks as Daseins. It’s part of our nature.

About minute 13:30 in Episode #102 (my transcription may contain small deviations from what was actually said)

So, fallenness is describes doing “what other Daseins are already doing around you.” This of course goes along with facticity – but so neatly describes what we see in the application of statistical methods in doing science. Scientists using methods, often very specific ones within disciplines, because that’s what other scientists around them are doing. That’s what other scientists are asking for in reviews or grant applications, that’s what other scientists are teaching – because that’s what they were taught. This “fallenness in scientific practice” is worrisome and made worse by lack of awareness or reflection on aspects of our scientific “facticity.”

The existentiality part of the structure describes the possibilities we have as humans – we are a particular type of being that has possibilities, different from say, a rock or a tree. West sums up the the three concepts in this way —

So, the first thing that had an effect on the tasks we decide to care about was facticity, the second thing was our fallenness, and the last piece is existentially. ….. The first thing that has an effect is the reality you were thrown into, the second thing is what other Daseins are already doing around you, and the last thing are the possibilities you have at your disposal.

About minute 14:00 in Episode #102 (my transcription may contain small deviations from what was actually said)

Finally – West introduces Heidegger’s idea of authenticity through essentially weighting the three components of the care structure. Again, it doesn’t take much of a leap to use these ideas to frame a kind of authenticity of scientific practice. I think it’s important to add the “practice” at the end, rather than just “authenticity of science”, because the “practice” highlights the idea of the choices regarding the tasks we use that essentially make up our scientific practice (which may be different than how we theoretically want to do science or even tell others we’re doing science).

Now, when you consider these three parts of the care structure: facticity, fallenness, and existentiality; when you arrive at this place of realizing how they drastically affect the way you’re going to be behaving, Heidegger thinks at this point you’re left with a choice. It’s a choice of living in a certain way on a giant spectrum between what he called authenticity on one end and inauthenticity on the other. Now the sort of quintessential example of an inauthentic person is someone who really only embodies the first two parts of this care structure, their facticity and fallenness. They’re thrown into existence in a particular time and place, and they fall into tasks that other people around them tell them to do, never really considering the possibilities at their disposal about other ways to live their life, never considering the whole branch of existentiality. Now, as you can imagine, the antithesis to that – living authentically – is to radically consider the possibilities you have and to live in a way that brings about what he calls the Dasein’s own potentiality. To be deeply engaged in asking these ontological questions of being; to examine and understand your own facticity, including the cultural and historical context you were born into; to be introspective, and to realize the tasks that you’ve, you know, fallen into simply because someone else told you to do it. To be truly authentic is to fully embody the statement “being one’s own” or “living for yourself.”

As you can imagine, this is far from a dichotomy. It’s not like you’re either, you know, a mindless drone going along with whatever other people tell you to do or “Oh, I don’t just go along with what everyone else says to do, I must be authentic!” No, we all exist on different points along this spectrum of authenticity. And even if you’re someone who’s self aware enough to have corrected a few things along the way – you realized that they were just ways other people told you to act – what most people do, by and large, is they get to a point in their life where they feel like they’re living authentically enough and then they just sort of stop asking these ontological questions. They stop trying to arrive at a deeper understanding of the culture and time period they were born into. They stop actively examining their behavior, trying to identify the things that they’re doing just ’cause someone told them to do it at one point. What happens is, in practice, what most of us end up doing – we arrive at these sort of rest stops on this giant road trip of life and we end up living the rest of our lives largely inauthentically by telling ourselves a story – “Well, I’m more authentic than that person over there, so…” And the interesting thing to think about there is that this too, is part of what it is to be a Dasein. Again, Heidegger’s not writing an ethical doctrine here, he’s talking about the nature of what it is to be us. He never says that living authentically is better than living inauthentically, he’s just sort of laying it out.

Minute 14:10 in Episode #102 (my transcription may contain small deviations from what was actually said)

In my experience, many scientists get to this same “rest area” in their practices. We stop trying to arrive at a deeper understanding of the how the culture and time period we were born into is influencing what work we do and how we do it. We keep doing what others around us are doing, even if we don’t really understand it or agree with it. We stop actively examining our behavior and stop actively trying to identify the things we’re doing just ’cause someone told us to do it (e.g., providing the p-value for a reviewer or the power analysis for a grant proposal!). I can’t help but wonder if introducing concepts with clear words like “facticity” and “fallenness” could help motivate more introspection and reform of practices at the individual level — maybe someone else has tried this already and I just haven’t heard it yet.

P.S. I think the ideas presented here are interesting and potentially helpful – but that does not imply I endorse Heidegger’s beliefs and actions in general!

Make an ASS out of U and ME

May 10, 2021 | General | 2 Comments

In 9th grade, I had a basketball coach named Mr. Bean. His name did not match his stature, though his large voice did. I had never experienced yelling in my direction like that before – and maybe haven’t since. The words he yelled weren’t harmful – it was just his way of communicating with his players in a gym. But it was hard for me to hear the actual words over the volume. It was intimidating and jarring to my freshman self and after a week or two of practices, I really wanted to quit. This was one of those times I now thank my parents for providing enough pressure to give it more time. It did quickly get to the point that I could actually hear what he was saying and I have fond memories of him and of that season. There’s one memory in particular that continues to resurface on a regular basis in many aspects of life. Of course, I didn’t appreciate it in the moment, but I guess that’s not surprising.

I’m not sure I remember the exact context he first said it, but I’m guessing it had to do with us moving too fast and making assumptions about what our teammates were going to do before waiting just long enough to reality check our assumptions. Anyway – my memory is that he surprised us all (the first time) by saying – in that deep, loud, bellowing voice – “DO YOU KNOW WHAT ASSUMING DOES?!!!” I’m sure we all looked at him with wide, questioning eyes – having no idea what answer he was looking for. He let the question hang in the air just long enough to make us a little nervous. Then, “IT MAKES AN ASS OUT OF U and ME!!” – with a little smile. If I remember correctly, it took a second or two for it to sink in, and then it just stuck. I assume (ha) the phrase is pretty common, at least in the U.S., but I have only heard it used a handful of times since that memorable first exposure.

Fast forward about 30 years and much of the time I feel like I’m swimming in a deep bottomless pool of assumptions, just trying to keep my head above water. Sure – assumptions related to Statistics (as I’ve hit on before here, here, and here and will continue to hit on in the future), but the pool extends far beyond those. There are the bigger assumptions underlying how we do science, what we value in our culture, how we go about our days. There are the little assumptions I make each day about my kids, my friends, my dog, and then the assumptions I, myself, make about myself.

Clearly we can’t go forward in living life without making assumptions – of various types and from many different motivations. They are necessary, but they are also tricky and even dangerous. I see the trickery and danger come in mostly when we are unaware of the assumptions we are making – either purposely ignoring them or unaware out of sheer ignorance. We humans are so good at ignoring the assumptions we make, or at least pretending as if they are facts or reasonable expectations, rather than artifacts of our own minds created to make life tractable. We all make mistakes through our assumptions on a daily basis, whether we admit it to ourselves or not. And I doubt there are many who would disagree with me on that.

I find it fascinating that we seem to believe (or assume!) that we can rise above the human challenge of recognizing and acknowledging assumptions when we really want or need to — like when we turn to doing science. We often pretend as if we can just write down all of our assumptions and go forward – we include the ones that are easy to write down (particularly those related to theory and/or mathematics) and largely ignore others. We fool ourselves with going after the goal of “objectivity” (or at least trying to convince others of it), rather than the goal of awareness and contemplation of assumptions — or spending real work on where might be fooling ourselves. In the practice of Statistics, there is a general expectation to try to “check” the low hanging fruit assumptions, but even that is often carried out under the wrong assumption that we can conclude whether the assumption is “met” or not by looking at a little data.

But my message for this post was supposed to be simple. My 9th grade experience keeps popping up into my consciousness – and there is something useful in here. It’s less of an accusation or warning and more of a simple question used to trigger reflection. How am I likely making an ass out of myself – or you – through my assumptions? I think we could benefit from having that question bouncing around in our heads more often (in all areas of our lives). Or maybe you prefer the wording in “How am I likely fooling myself?” – but it doesn’t have quite the same ring to it. A little mantra to remind ourselves to try a little harder not to fool ourselves.

When something big happens in our lives to make us realize how good we are at fooling ourselves and others, it’s hard not to then see it everywhere and start questioning everything. As long as the questioning doesn’t become completely overwhelming, it seems to me to be mostly a positive thing. Which assumptions are likely to make an ass out me or someone else? Is the risk, or benefit, of the assumption worth it anyway? Will openness and honesty upfront about assumptions at least take some of the sting out feeling or looking like an ass later? I think so. Our inherent overconfidence or arrogance toward an assumption is ultimately what leads to the sting – not the problem with the actual assumption itself. Sure, questioning can make us feel a little crazy, but it can also be embraced as just the way life is – including science and statistics. We will continue to make asses out of ourselves and others, but maybe less often or to a lesser degree?

We all know our lives and work depend on assumptions. The key is somehow figuring out which ones will eventually make an ass out of us. And that’s not easy. But it’s worth thinking more about.

Numbers and their inevitable baggage

April 22, 2021 | General | No Comments

This is another cross-post from the Statisticians React to the News blog sponsored by the International Statistical Institute (ISI).


Numbers often get the spotlight in news stories.  Sometimes they deserve the front stage as valuable information describing a situation that would be hard, if not impossible, to adequately capture without them.  But, they can also easily mislead.  All numbers in the news come with baggage. For some numbers, the baggage is out in the open like a carry on, but for many it is of unknown form, size, and weight – and hidden deep in the baggage compartment.

Photo by Belinda Fewings from Unsplash

Numbers by themselves, without their baggage, seem to make a story clear, or at least clearer.  They suggest credibility and appear as evidence.  They provide comfort by conveying something is known about a situation.  They invite us to trust the report.  But their ability to provide clarity or trustworthiness to a story depend on how weighed down they are with hidden baggage.

I started writing this a few days ago to reflect on our complicated relationship with numbers, especially in the news.  I opened up the day’s The New York Times to find a few stories for examples.

This morning (April 11, 2021), one of the headlines was Young Migrants Crowd Shelters Posing Test for Biden, followed by the lead sentence: “The administration is under intensifying pressure to expand its capacity to care for as many as 35,000 unaccompanied minors, part of a wave of people crossing the border.”

The 35,000, while still not without baggage, falls on the lighter side of the baggage weights. Its luggage is mostly carried on board and displayed out in the open. The number is presented with “as many as” wording and honestly rounded, both of which imply uncertainty – as opposed to a statement like “35,102 unaccompanied minors will need care.”  The 35,000 number is not without error, but it’s clearly not intended to be taken as an accurate count. Instead, it is meant to provide valuable information about the situation – to allow us to gauge the extent of the problem – in a way that descriptions based on words could not.  What words get us close to the information contained in the number 35,000?  “A lot?”  “A huge number of ?”  The context is vague until the number is there to provide a tangible reference point. 

Numbers like the 35,000 have an important role to play in the news and often come as part of “official statistics” provided by the government of a country. But even “official statistics” have hidden baggage depending on methods of data collection and analysis, and also depending on the government or administration overseeing the gathering or reporting of such information. For example, another headline in today’s paper is You can’t trust anyone’: Russia’s Hidden Covid Toll is an Open Secret followed by the lead sentence: “The country’s official coronavirus death toll is 102,649. But at least 300,000 more people died last year during the pandemic than were reported in Russia’s most widely cited official statistics.”

We find ourselves wanting to know how much to trust a reported number, and to understand why it is being presented to us in the first place.  Is it sensationalism? Is it to provide a useful reference? To establish context? To bolster one side of an argument?  In order to judge such things, we have to consider its potential baggage – that which is visible and that which is hidden from view (whether on purpose or not).  Political baggage, while often hard to gauge the extent of, is a type of baggage people understand, and unfortunately even expect under some governments. But some baggage is complicated, hidden inadvertently, or hard to gain access to – like it’s in customs and only people with particular credentials can get access to examine all of it.  For example, when the numbers reported are estimates developed to help us generalize conclusions beyond the limits of data collected — often with the help of statistical reasoning and methods — baggage does get tricky.

The contrast in reported precision between the number 102,649 and the number 300,000 is striking. It provides a nice example of how reporting a number at an increased level of precision (to the ones place in this case) can make it seem trustworthy – as if it has less baggage.  The rounding provided in 35,000 and 300,000 make it clear they are being reported, not as accurate counts, but to provide a useful reference for the size of the problem.  If the error could be as big as 300,000, there’s not much point in distinguishing between 102,649 and and 102,650 deaths.  But to be fair, the 102,649 could indeed be an accurate count of some subset of the total deaths – but that potential baggage is stowed well out of view in this case.

Sketch by Shaden Higgs

Before looking at another example, it’s worth taking a quick step back to reflect on our relationship with numbers in school. It’s not often (or at least it didn’t used to be) that education centered on numbers involved talking about their baggage. Instead, most of us were taught to accept (or automatically state) the assumptions and conditions – and to calculate a right answer based on those conditions. In Mathematics classes, numbers often played a key role as being right or wrong. There was no expectation to carefully inspect the baggage.  Even in Statistics classes, the numbers turned in on an assignment or test were probably judged as right or wrong. So, when faced with a news story focused on a number, it can be hard to let go of the “right or wrong” interpretation to instead grab onto the more difficult question of “what baggage does this number carry with it?”  In real life, it’s pretty safe to say the number is “wrong” – to some degree that depends on its baggage.

In a quick scan of The New York Times on April 9, 2021 (the day I started writing this), I found this article about trying to understand potential learning losses associated with the pandemic.  The article states: “A preliminary national study of 98,000 students from Policy Analysis for California Education, an independent group with ties to several large universities, found that as of late fall, second graders were 26 percent behind where they would have been, absent the pandemic, in their ability to read aloud accurately and quickly. Third graders were 33 percent behind.”

This paragraph taken by itself, offers the rather alarming numbers of 26% and 33% behind! The journalist could have easily focused only on these and created a fairly dramatic story eliciting fear among parents, educators, and others.  No statistics-based measures of uncertainty are provided, and there is no talk of baggage.  The reader is implicitly asked to trust the numbers by reference to “98,000 students” and “independent group with ties to several large universities.”  By itself, the summary does little to acknowledge the baggage being dragged along by the numbers.

However, the article then goes on to say “At least one large study found no decline in fall reading performance, and only modest losses in math.”  This follow-up description of a different conclusion from another large study is all that’s needed to make it clear that the first comes with baggage, as well as the second.  Presentation of differing conclusions is incredibly valuable.  And, it may be that both are completely reasonable summaries of their respective situations – given the different baggage they carry.  In other words, one need not be right and the other wrong. There are many questions to ask and there is no simple answer to the question of how much, if any, decline there has been in student learning – it will depend on the individuals included, how things are measured, what criteria are used, etc.  So much baggage.

Then, the article does start rummaging through parts of the baggage as it goes on to say “But testing experts caution that the true impact of the pandemic on learning could be greater than is currently visible. Many of the students most at risk academically are missing from research because they are participating irregularly in online learning, have not been tested or have dropped off public school enrollment rolls altogether. In addition, some students have been tested at home, where they could have had assistance from adults.”  The “testing experts” could go on.  And, we could add things like why “percent behind” is incredibly hard to measure and why summarizing it in terms of a single number (per subject area and grade) aggregated over many schools, or even states, may be meaningless under the weight of all the baggage.

Number baggage takes many forms and can range from small carry-ons, to fancy suitcases, to old style trunks, to shipping containers, or some combination.  While we rarely get a chance to inspect what’s really in the baggage, or even see the extent of the collection, we need to know it’s there and talk openly about it.  Hopefully, journalists and others take this into account when they report numbers.

I think the examples included in this post illustrate journalists choosing to acknowledge number baggage, even if they couldn’t go into messy details in a brief news article. The article provides several numbers as a focus, but it does not overly emphasize the most dramatic numbers. It provides enough context to openly acknowledge the existence of baggage attached to the numbers.

It is not realistic for a journalist to inspect and/or present a long list of the pieces of baggage and contents – even with unlimited space it would be a challenge – but claiming the existence of baggage is a huge first step to conveying information more openly and honestly, and avoiding sensationalism. Baggage does not keep the numbers from being useful, but it does mean we should not be fooled into trusting them as if they are the answer to one of our elementary school math problems. Ignoring or forgetting about baggage makes number-based disinformation too easy.

This piece is cross-posted on the Statisticians React to the News blog, sponsored by the International Statistical Institute of which I am currently the editor. I made a few changes from the original post that was published on Groundhog Day 2021, but not many.

****************

On this Groundhog Day, I should be focused on how many weeks are left in winter.  Instead, I still find myself in that period of reflection spurred on by the arbitrary change of number associated with our calendar.  As for many, 2020 was marked by much change in my life. I attribute much of the positive to attempts to build various mindfulness practices into my life.  As a statistician, I often find myself in a tension between caring and not caring about the scientific research on such practices.  Why the tension?

As I navigate the information out there about mindfulness, meditation, awareness, theories of consciousness, etc., I am struck again and again by how often the phrase “research shows…” is used as justification for potential benefits – often without much to back it up.  Phrases such as “evidence based,” “significant effect,” and “proven” are also tossed around as selling points.  And this is where the tension starts to build in me.  I consider myself a scientist – I value research and I care about quality of that research and the information it may provide. I am all for studying mindfulness-based interventions (in their various forms) that hold potential for positive effects on overall health for individuals.

However, using common research methodology to study potential “effects” on outcomes such as “overall wellness,” “stress,” or “happiness” feels to me like wading into a murky and complex environment to measure something with tools designed for a simpler and far less turbid environment. We all know how complex human behaviors and perceptions can be – and as consumers of news and science, it can’t hurt to remind ourselves of that complexity … often.

I want to be very clear that my message here is not anti-research, but it is a message of caution about assuming research can provide quick answers, particularly in complex health and/or social science contexts. “Research shows…” should not be automatically be interpreted as “the answer is …” There are many paths we could take to examine the problems and potential implications of such statements. For example, statements are often made in reference to the conclusion from a single study – which always has limitations. For this post, I want to focus on a few issues that I think get enough attention in the context of human-related research.

Early research about interventions, such as mindfulness techniques, tends to be framed under the goal of deciding “whether or not there is an effect.”  I expect the phrase to sound familiar to you, as variations show up in peer-reviewed scientific writing, as well as science journalism.  Statistical methodology and reasoning are often assumed to provide tools to help answer the question – which is why my job often leads me to obsessing about the phrase and its two parts: “whether or not” and “has an effect.” What do they imply and how are they interpreted (even subconsciously)?  I try to be aware, but still catch myself lapsing, even if momentarily, into comfortable acceptance of some of the implications!

Whether or not

Framing a research question as “whether or not” severely oversimplifies a problem. The phrase implies all or nothing — it either works for everyone or it does not work for anyone. While it often makes an attractive title or headline, such “yes or no” wording may also lead us to trust a headline (whether positive or negative) more than we would otherwise — maybe because it leaves so little room or invitation for follow up questions. It’s nice when at least the ‘how’ is included with the ‘whether or not’ – like in this Washington Post article.

From a Statistics viewpoint, much has been written about the mistakes we (including scientists) make in our tendency to dichotomize (e.g., significant or not significant, effect or no effect, assumptions met or not met, etc.). We seem to naturally steer away from having to deal with gray area and uncertainty.  Categorizing and simplifying aren’t always bad – they can be done with purpose and justification – but, we best beware of how they can take attention away from deeper questions. “Is the situation as simple as whether or not?” “What criteria are used to decide between the ‘whether’ and the ‘not’?” “Are ‘maybe’ or ‘it depends’ possible conclusions?” “How well does the instrument measure what we’re after?” “I wonder how effects might differ across individuals and why?” And so on.

Has an effect 

I find the implications hidden in the phrase “has an effect” to be more subtle and harder to convey.  In my life as a statistician, the “whether or not” started to bother me very early on, but the issues with “has an effect” were slower to show themselves — I think because it’s so deeply embedded in how statistical methodology is typically taught.  There are many questions that now immediately surface for me when I see the phrase.  The first is usually the “causal” implication of the word “effect,” but warnings are common against that mistake, so I will focus on a couple of other issues for this post.

Do results apply to me?

The first issue is perhaps more about what is not said than what is said.  The an effect with no qualifiers implies the treatment is expected to apply in the same way to anyone and everyone.  At the very least, it encourages sweeping a huge question about the results under the rug: “Who, if anyone, might the conclusion apply to and why?”  When we are trying to decide what to take away from the information for our own lives, this is a crucial question.

Egos are strong and I don’t always find my first reaction to be “What are the reasons this conclusion might not apply to me?”  I guess you can test this on your own by honestly checking your very immediate reaction upon reading such headlines. The consequences of not asking the question certainly depend on risks associated with assuming it does apply to you.  Risks are minimal regarding meditation for most people, but that is certainly not the case for medical treatments with possible side effects, such as hormone replacement therapy for women in menopause. And the varied effects of COVID-19 across individuals has made us all painfully aware of individual differences and how hard it can be to predict or explain them.

Does describing a single effect even make sense?

There are other more subtle issues related to “an effect” that have deep connections to the use of common statistical methods. We often refer to an effect; we try to measure it using individuals, we try to estimate it with our models, and we report research as if we captured it.  This all assumes there actually is some true underlying common effect of the treatment on individuals that can be measured (or estimated from measurements on individuals). But, what if the premise of being able to learn about an effect is too oversimplified and provides a shaky foundation?  What if we miss important parts of the story because of a default mindset to pursue a common, or overall, “effect”?

For many (most?) mind-based interventions it seems pretty safe to expect that people will respond differently to a treatment depending on the prior conditions of their lives – and differences can be substantial – not reasonably chalked up to “random errors.” And things get murkier when we also acknowledge the challenges of measuring what we want to measure and designing controlled experiments on humans.

The assumption of “a common effect” is deeply embedded into current research culture — in both language, models, and methods. And, it is integrally connected to our addiction to using averages. I tend to put a good sized chunk of blame onto pressure to use popular (or at least historically widespread) statistical methods – even in the context of trying to describe complex human feelings and behavior. 

The statistical methods we see first depend on the common effect assumption being a reasonable one – yet we’re not really taught to grapple with the reasonableness of that assumption.  It is just presented as how it is. Before walking through a hypothetical example to make things more tangible, it’s worth asking – Why do we rely so heavily on common effects and averages? 

People are not plants

There is a strong historical connection between agriculture experiments and the history of common text-book statistical methods (e.g., t-tests). Methods that make sense for capturing effects in the context of plants may not extend well to the study of things like human behavior and feelings.  Put simply — people are not plants.  While this statement may not seem very profound, I believe it’s worth a little consideration.

There may be value in borrowing methods that work well in the context of agricultural experiments, but there is also value in reflecting on how different a group of human participants is from a collection of plants growing in a field or greenhouse. For genetically similar plants in a controlled environment, it is much easier to envision the reasonableness of assuming a common effect to something like a fertilizer.  With plants, the focus is on physical characteristics that are relatively straightforward to measure and averages often do say something meaningful about a “typical” plant.  I think we can all agree that measuring the height of a plant is far different than attempting to measure something like stress or happiness in a human.

There is a lot to say on this topic and in lieu of more words in an already long post, I leave you with two contrasting images I hope might stick in your mind more than just the phrase “people are not plants.”

First image: A few sunflower plants are growing near each other. They are facing one direction, except one plant whose flower points in a noticeably different direction.  Is this surprising to you?  Why?

Second image: A few humans, maybe even genetically similar, are sitting next to each other in the same environment. They are facing different directions and focusing on different aspects of the environment – and likely feeling and thinking different things.  Are you surprised?  Why? 

A hypothetical example

To make things a bit more tangible, let’s walk through a hypothetical example. Suppose a researcher wants to study potential changes in stress associated with 4 weeks of guided daily meditations.  The researcher chooses a popular survey instrument designed [to attempt] to measure overall stress by translating a set of survey responses into a numeric score between 0 and 100. Participants are volunteers who say they have never meditated before; they are randomly assigned to either receive the daily meditations via an app on their phone (treatment group) or a “control” group of no meditation (for ethical reasons, they will be given access to the app after the end of the study!).

All participants are given the survey before and after the 4 week period. Before the study, researchers and practitioners agreed that – for an individual – a change in score of about 10 points is considered meaningful because it represents enough change that an individual typically notices it in their lives. (Note – even this gets tricky because it assumes a 10 point change is meaningful regardless of where the person starts … but I’m dodging that rabbit hole, along with others, for the sake of a shorter long post).

Now – to the point — suppose that about half the people in the meditation group had changes in their scores close to or greater than 10 points, while the other half had changes between about -3 and 3 (consistent with changes observed among people in the “control” group). If the researcher follows data analysis norms (STAT 101 methods), they may focus on comparing the average score change in the meditation group to the average score change in the “control” group (relative to observed variability among scores – thank you Statistics).  The average for the meditation group would be around 5 after combining the scores near zero and the scores near 10 — a number that doesn’t do a good job describing the response of any individual in the study.

What does the average even represent for the meditation group?  Is it a meaningful summary if it doesn’t describe a score change observed by any of the individuals in the study?  Does the criteria of a 10 point change developed for individuals hold when thinking about the average over a group of people? What are the broader implications if readers (or worse, researchers) don’t ever look at the individual responses enough to recognize the two clusters of outcomes in the data — because use of averages is so entrenched in our methodology and expectations?

It’s not unrealistic to follow this hypothetical scenario further and imagine the researcher using statistical methods to conclude there “is no effect” of the meditation program which then may show up as a headline “Research shows meditation does not lessen stress”   (I will dodge another rabbit hole here, but note they should avoid confusing lack of evidence with no evidence).

The often hidden assumption that if the intervention works it should work the same way on all individuals ends up misleading readers. What happens to the valuable information contained in the fact it did appear to work (according to the criterion) for about 1/2 of the participants? Sometimes such information is conveyed, but from personal experience working with researchers, I suspect it is lost in many cases — particularly if the raw data are never plotted and aren’t publicly shared (yet more rabbit holes to dodge).

It is worth considering the consequences of never digesting the information. A person who could have benefited from the treatment may log the headline away in their brain as a reason not to spend the effort trying meditation. Researchers may put time and effort into designing a larger study with more participants (with the stated goal of increasing statistical power) or they may work to change the meditation treatment before trying again to detect a difference based on averages.

What would happen if, instead, if we decide to go beyond just a focus on averages to spend time speculating about the split in outcomes observed for the participants in the treatment group.  There are many potential explanations for why a treatment might elicit meaningful change in the score for some participants and not others.

Perhaps it reveals something as boring (though important!) as a fundamental problem with measurement – maybe the instrument was picking up stressful daily events for some participants rather than any effect of the treatment.  Or, maybe because the participants knew the researchers hoped they would benefit from the treatment (no blinding in the design), some participants unknowingly answered the survey in a way consistent with expectations.  Or, maybe upon asking follow up questions, the researcher realizes the split coincides with self-reports of how often individuals seriously engaged with the meditations, so that it might reflect a sort of “dose”.  Or, maybe the split is actually reflective of the how different individuals might respond the 4 weeks of meditation for yet unknown reasons.  And I could go on. The point is – letting go of tunnel vision on averages and common effects opens up so much for consideration and future research.

I hope awareness that “an effect” relies on assumptions to be questioned might invite something different — from readers, scientists, and journalists. We might choose not to downplay (or completely ignore) individual differences when choosing words to summarize research. We don’t have to be stuck using averages and common effects – there is room for creativity and more nuanced interpretations. I notice New York Times journalists inserting the words like “can” and “may” into headlines; small changes I believe can make a big difference in what our brains first take away. Even those three letter words provide a subtle invitation to ask follow up questions. 

Back to meditation

I started writing this now long post in response to hearing and reading the phrase “research shows …” followed by words implying an effect that applies to anyone.  I am really interested in meditation research, but my motivation is more a desire to understand how practices of the mind are associated with physical changes in the body, than trying to use research to decide if the practice is worth doing for myself.  It is not a medical decision with the potential for serious negative risks – in fact, the risk of not trying it may be greater than any risk of trying it.

Would a few headlines with the flavor “research shows no effect of meditation on …” affect my decision to spend time practicing?  Definitely not.  I know my positive experiences and I know how much we, by necessity, simplify complex human behavior to carry out research.  This is a setting where I doubt the ability to measure what we’re really after and question how meaningful a reported estimate of “an effect” is.  The potential effects of mindfulness practices (like many other human interventions) are complicated and likely vary substantially across individuals and even over time within the same individual!  There’s a lot of uncertainty, and I’m okay with that.

Finally a post – thanks to Gelman’s blog

January 8, 2021 | General | No Comments

I’m not sure how many have noticed the sad frequency of my blog posts, but I have. My life looks and feels so very different, both professionally and personally, than it did when I started to write in earnest in September of 2019. In some ways, it feels like a lifetime ago – but it was a sweet few months I enjoyed in a liminal space that I am now very thankful for. I just didn’t realize quite how liminal the space was.

As we head into 2021, things feel chaotic and unsettled in so many ways. Like many of you, I am working to accept what I cannot change and to learn greater flexibility. I finally realize that to make my own writing happen, I will have to rearrange (and give up) some things I can control — it feels important for many reasons.

For now, thanks to Andrew Gelman for having an enormous blog lag (a different type of lag than my blog) and providing a post this week for me to share to start things off again — titled Megan Higgs (statistician) and Anna Dreber (economist) on how to judge the success of a replication. It relates to this post I wrote last year, but provides a different look within an email exchange.

Welcome to 2021 everyone.

Given I have not found the time to finish one of the many, many draft posts I have started, I am cross-posting most of one I wrote for the Statisticians React to the News blog as the volunteer editor.

*************************************************

Ten weeks, ten posts.  It seems like a good time for blog editor reflection.

Here in the state of Montana, in the northern Rocky Mountains of the U.S., the air feels different as fall settles in.  We are lucky to be looking forward to weather that will put out local fires and settle smoke drifting in from the west.  But, as far as news, things don’t seem to be changing – COVID-19, politics, and extreme weather events.  And, like the stability of news topics, statisticians’ reactions to the news share many timeless messages.

Statisticians React to the News isn’t meant to be a COVID-19 blog, but it is not surprising it has started with a lot of COVID-19 commentary – that’s what’s in the news, and for good reason.  This blog has been a unique place to capture reactions from authors scattered across the globe (Brazil, Sweden, Italy, USA, Palestine, Philippines).  It has been a weekly reminder of the challenges we all share. We can always use such reminders – making it my favorite part of being involved with this blog.

Contributors have highlighted shared international challenges, such as collecting and reporting quality data in a pandemic; implementing broader testing to include asymptomatic people; understanding sometimes alarming false negative and false positive rates for tests; learning to recognize cognitive biases; the importance of sampling design; and the importance of data to support basic human rights.

We all look forward to a time when COVID-19 isn’t front and center in the news, but for now there is a lot to learn from statisticians’ reactions to the related news. The pandemic has created a common and pressing context highlighting long standing issues with collecting, sharing, and interpreting data.

Challenges discussed on the blog are now personal for many (most?) of us, as the virus continues to spread through our communities.  As I experienced in the last few weeks, a false sense of control and complacency is easy to come by – even for someone who thinks they fully recognize the enormous uncertainties involved in the situation.  Both of my careful, mask-wearing, mid-70’s parents tested positive and for some currently inexplicable reason (I feel very lucky!) made it through with mild symptoms despite my dad’s high risk. In the meantime, I received a negative test result, which was met with “So glad you don’t have COVID!” from most people I told.   I then felt professionally obligated to annoy them with a reminder of false negative rates – information it appeared most would have preferred I kept to myself. Ignorance certainly can be bliss (or at least more comfortable).

As any project, this blog is a work in progress – an adolescent trying new things, searching for its place in the world (I have two teenagers at home and couldn’t resist the analogy).  I am keenly aware of how much content there is to read and/or listen to each day, for better or worse.  I hope this blog will make the cut of earning your precious time and if there’s anything we can do help make that the case, please feel free to let me know.

Finally – I leave you with one reaction of my own.  Statistics semantics.  I have never had a love for things like grammar, spelling, and sentence structure.  But when it comes to Statistics semantics, I just can’t stop thinking about it (plus, the alliteration makes it hard to resist).  Statistics semantics.   In English, we have the phrase “That’s just semantics” – meant to express that you’re trivially worrying about choice of words that don’t affect the meaning (ironically, this phrase makes little sense in the context of Semantics!).

In Statistics, there are so many words that carry different meanings, or even just emphasis, in everyday language as compared to a formal statistical context.  I have grown used to the subtle eye-rolls and half-snarky comments that often come in response to my statistical-wording-pickiness, but to me it is not “just semantics.”   We (in a very royal sense) need to be open to seeing the unintended implications of our word choices (e.g., significant, determine, best, random, answer, calculate, confidence, etc.).  The choices and context do matter – they send implicit messages to the reader. I used to say and write things that I now cringe at, and I’m well aware I will someday cringe at words I automatically use today.  When words and phrases feel a harmless part of cultural norms (including scientific culture), it’s hard to gain the perspective needed to see their potential flaws.  The international voices heard in this blog add another layer of complexity that simultaneously challenges and unites statisticians over words.

To me, Statistics is fundamentally about using information (data and assumptions) to support inferences; and communicating those inferences fundamentally relies on the words we choose.  Regardless of the topic, how “statisticians react to the news” often comes down to the words used or omitted – Statistics semantics.  The same words on a page will be internalized differently by different readers (including different statisticians!) An awareness of the positive and negative implications of our words can play a leading role in improving science communication.  One of my big hopes for this blog is for it to encourage us all to reflect more deeply on the words we use – while at the same time getting a unique dose of international news and perspectives.

Stories — and Science

September 10, 2020 | General | No Comments

Stories. Science.

The stories we tell ourselves, and that we adopt from others, are weighing heavily on my mind. They are everywhere in life — including science.

I have started many draft posts over the last couple of months, but life has intervened in both positive and negative ways to keep me from finishing them. I’ll leave out the negative here – a positive has been my time spent as editor of the new Statisticians React to the News blog sponsored by the International Statistical Institute (ISI). The Editorial Advisory Committee (me, Ashley Steel, John Bailer, Peter Guttorp, and Andrew Gelman) have been recruiting a group of international contributors for weekly posts. Ashley and I recently discussed the blog, with John Bailer, in a Stats+Stories podcast that comes out today; which also keeps me stuck on the topic of stories.

I am currently fascinated by time spent thinking and reading about the complex connections between Stories and Science – and the role Statistics may play in our stories. I’m not to the point of having any sweeping or novel insights here, but wanted to get a few thoughts out on “paper.”

In my experiences as a scientist, I have often seen the following simple narrative: Doing science is supposed to be an “objective” story-free way of gathering information and knowledge, but then you should think about how to attach “a story” to the work if you hope to get publicity for your work and/or disseminate the work to a broad audience. That is, recognition of the fundamental importance of stories in how humans process information appears to come in after the science has been done.

There is acknowledgement of the power of a story in the context of dissemination to “lay-people” but there is little, if any, acknowledgement about how the power of story for all humans might be impacting the actual science being done. I see this as coming, at least in part, from another related narrative — that scientists, by their very nature and/or training, are able to rise above the cognitive challenges in logic and reasoning that hold down the rest of the “lay people” making up the “public.” To me, this is related to a scientist-as-hero narrative, as recently mentioned by Andrew Gelman on his blog (the post by coincidence features my kitten Tonks!) and I plan to say more on this in another post.

For many years now, I have cringed at statements implying (with seemingly no needed justification) that the process of doing science is “objective” and even scientists themselves are “objective.” Digging into what we mean by objective isn’t easy and hasn’t been overly productive for me so far. Today, I see the human brain’s creation and use of stories as maybe a way to steer a more productive, though related, conversation.

It is first important to acknowledge, and really sit with, the fact that scientists are humans — and all humans have a complicated relationship with stories that impacts all aspects of our lives. Scientists are part of the public, scientists are “lay people.” Scientists are not immune to relying on stories — and most importantly, scientists are not immune to a lack of awareness of how their attachment to stories may impact their work.

Photo by Suzy Hazelwood

As humans, we have all had experiences (whether we have admitted it to ourselves or not) of firmly believing a story about our personal lives, about society, or about our work — only to realize later that there was little truth to the story and we just weren’t able to see it or didn’t know enough at the time to question it. And, it is very hard to let go of stories we have lived for many years — even in the face of information that the story deserves questioning.

There is nothing unscientific about considering the potential positive and negative implications of our attachment to stories on the process of doing and disseminating science. In fact, we continue to learn about how our brains process information through stories by doing science!

So, why should we operate under the assumption that scientists are able to rise above stories in their day-to-day work? Is this assumption just another story we like to live by because it brings us comfort in the context of great uncertainty and complexity?

[To be continued in future posts…]

Statistical numbers from Facebook

July 25, 2020 | General | No Comments

I don’t often log into Facebook, but when I did this morning (to promote the International Statistical Institutes new blog – “Statisticians React to the News”), I was met with an invitation to fill out a survey for the sake of helping to predict COVID-19. I’m always curious — and nervous– about such surveys. I decided to participate.

Here’s the first page, and of course I couldn’t help but be a bit statistically intrigued (if that’s a term) by the part I have highlighted.

I’m not sure how Qualtrics, Facebook, and Delphi group work together to get/create weights for survey design and/or subsequent statistical modeling, and I don’t plan to dig into it now. I really just find the language interesting, and amusing in some sense.

How is a “statistical number” different from another type of number? How do they know they are weighing my participation “properly” with their fancy statistical number? In my opinion, it’s just another example of fancy statistical-sounding wording put out there to make people feel like they should trust what’s going on. It’s so easy to provide a false sense of sophistication and as if everything is under control.

Don’t worry everyone: Facebook knows how to assign “statistical numbers” for “proper” weighting. I feel so much better about the world.

Us vs. them isn’t working

July 22, 2020 | General | 2 Comments

When we start to notice something, we often see it everywhere. We wonder how before we only saw it rarely, or not at all. But, that’s the amazing work awareness can do for us. It’s one of those experiences that throws the limitations of our cognitive abilities into focus, even if briefly — a needed dose of humility about what our brains can and cannot do. I saw the quote “Don’t believe everything you think” on a bumper sticker yesterday – I had forgotten how much I like it.

Here’s the thing I’m seeing everywhere lately — the us (scientists) vs. them (others) narrative. It presents as if we have a world of scientists living in their clean, white labs and then across the tracks is the world of others — most often called “the public” or “lay people.” It’s a narrative we (at least in my generation) have been fed since birth — in education and through all types of media. It is language that has felt fully justified and normalized within our culture (at least within scientific culture). I think it’s counterproductive and I might even go so far as saying dangerous. We are living in a time when it would serve us well to be very aware of such narratives and their potential effects on social cohesion — as well as to be aware that there is so much we are not yet aware of. It is quite humbling.

I think of myself as a scientist and I suspect most people reading this consider themselves to be scientists. So, my writing on this topic does come from that place.

Scientists vs. Others

Us vs. them doesn’t work when the goal is open sharing of information and conversation. Us vs. them turns the goal into winning. It promotes cohesion within each team, but not among individuals across teams. When the goal is to win, when the object is to beat the other side, it is nearly impossible to have coherent, productive conversations. We are living in a time of great divides that feel as if they are growing on a daily basis.

Scientists often talk of how to get information out to “the public” or how to convey information in a way “the public can understand.” The motivation behind these words grows out of good intentions and a sense of responsibility to disseminate work in a meaningful way beyond the silos existing in academia and scientific culture. But good intentions aren’t enough to cover up the air of condescension, elitism, and arrogance that is often implied through the us vs. them narrative language. The tone can make it difficult for “others” to take in the information, or even want to, even for those scientists with the best of intentions and no conscious awareness of what might be implied. We all know (through plenty of mistakes) what a huge effect tone can have on our daily interactions with other humans, and this has to apply on a larger scale as well.

I see this as part of an enormous topic related to the importance of humility in science. I read a draft essay a few of days ago that I generally agreed with, but that left a very sour taste in my mouth. It took reading it a couple of times for me to understand where the sourness came from — it was the scientists vs. others tone running underneath and coloring all the words. So this post is just a few thoughts for today that I wanted to put on paper after that experience.

Scientists are part of the public

As a culture, we tend to use the language, and maybe even believe, that scientists are a separate group from “the public.” And, I think it’s fair to say that a hierarchy is often implied — with scientists sitting above “the public” with their knowledge and credibility. I understand what’s meant by the language, but do not believe we think enough about what is implied and the message it sends to those perfectly capable thinkers in the “public” group who didn’t choose a career that affords them the label of “scientist.”

We are not a society split into “scientists” and “others.” A person who does work as a scientist may study a very, very thin slice of the world. They may have a great deal of expertise and knowledge in a very specific area. They may have a PhD earned by dedicating years of their life to studying some small thing in minute detail. But when it comes to the millions of other topics studied by others, the scientist is does not have expertise just because they are labeled a scientist. They are just a member of the public.

Scientists are part of the public. Scientists are “lay people.”

Science should not be about winning a debate

Will science win out? This was part of the theme of the draft essay I read. Who or what is science competing against? We need to ask ourselves this question and if the need-to-win mindset is at all productive. Will such language lead to joint listening and openness to considering different viewpoints and new information? I see no reason to believe so.

We are all humans. Yes, even the scientists.

How do you know what you know? How often do you ask yourself that or question where your knowledge and beliefs come from? Humans make mistakes, and a lot of them, every day. Choosing a career that affords the label of “scientist” does not imply we are immune to cognitive mistakes and lack of awareness that all other humans suffer from as well. Science is not separate from the humans carrying it out and communicating about it. The belief that science somehow rises above the usual human faults in logic and communication is itself rather unscientific. Just because we want something to be “objective” and super-human, does not mean it actually is. I think letting go of this belief is a crucial first step in changing the us vs. them narrative in science communication.