Lies, damned lies, and gluten sensitivity – a story of statistics

Originally posted on

Mar 4, 2015

Author’s note: I have not been keeping up with the latest research on gluten sensitivity. This article was written in 2015, and while it still serves as a good example of how to evaluate scientific research, I cannot say whether it represents the current thinking on gluten sensitivity.

You’ve heard the Mark Twain quote, “There are three kinds of lies: lies, damned lies, and statistics” [1] – it refers to the way that numbers can be used to win arguments. Statistics can be used deliberately to deceive, but it can also lead us quite innocently to conclusions that overreach the data. Here I’m thinking about the most recent publication on non-celiac gluten sensitivity from Di Sabatino et al. [2], “Small Amounts of Gluten in Subjects with Suspected Nonceliac Gluten Sensitivity: a Randomized, Double-Blind, Placebo-Controlled, Cross-Over Trial.”

At first glance, this study seems to say that gluten sensitivity is real. The abstract tells us:

“In a cross-over trial of subjects with suspected NCGS, the severity of overall symptoms increased significantly during 1 week of intake of small amounts of gluten, compared with placebo.”

This is the only conclusion that we see there. Later in the paper, though, we get some clarification:

“However, when we examined the individual patients’ overall scores we found that only a minority of the participants experienced a real worsening of symptoms under gluten.”

and

“If we look at the distribution of delta overall scores (gluten minus placebo), it is not surprising to note that a fair number of patients are victims of the nocebo effect…”

This sounds more like what we have been hearing from other gluten trials [3], and it certainly tempers the enthusiasm of the abstract. It even makes you wonder how the conclusion in the abstract could be so strong in light of the nocebo effect and negative gluten challenges. But there is no deliberate deception here, and, in fact, the authors caution that these results do not provide “crucial evidence in favor of the existence of this new syndrome.” It simply turns out that 3 of 59 participants – that’s 3 out of 59 people who had all originally believed that they were sensitive to low doses of gluten – reacted strongly enough during their gluten challenges to skew the group results in favor of gluten sensitivity. And how much stock can we place in these three individuals? Pretty much none. Let’s take a look at this in more detail.

Learning from the past

The gluten sensitivity debate has remained unsettled because of flaws in the previous studies. I have already discussed these problems in an article on Science-Based Medicine – essentially, either the baseline diets failed to exclude foods high in fermentable carbohydrates (FODMAPs), which are a source of considerable gastrointestinal discomfort, or the challenge capsules contained wheat instead of gluten. The Di Sabatino work seems to have taken this into account, and there are many good things about the way their study was carried out:

It was a randomized double blind placebo-controlled food challenge.
With its crossover design, data was obtained from 59 participants, which is a large group considering this type of study. A large group was needed because researchers were looking for a relatively small change (15 points) in symptom ratings relative to placebo over a collection of 15 intestinal and 13 extra-intestinal symptoms.
Unlike previous trials, this study did not focus on IBS sufferers, who may not be the best candidates for gluten sensitivity. Participants were recruited from patients who were referred to two celiac clinics for suspected gluten intolerance.
Patients were excluded if they showed signs of lactose intolerance or sensitivity to FODMAPs foods. Unfortunately, though, we don’t have any details on the criteria that was used to discover these sensitivities.
The placebo (rice starch) was selected to minimize fermentable carbohydrates in the intestine.
It seems that sufficient time was allowed for each of the challenges and the washout period in between (one week each).
Participants complied well with the baseline gluten-free diet, as evaluated using a standard questionnaire.
The study followed established statistical protocols.

I do have some misgivings about the baseline or ‘elimination’ diet, though. The elimination diet levels the playing field so that no effects from a participant’s regular diet are carried over into the study; it also serves as the baseline diet against which the effects of the placebo and gluten challenges should be measured. Ideally, the elimination diet would contain minimal food intolerance triggers, even going above and beyond the elimination of gluten. For example, histamine and other dietary amines can induce gastrointestinal symptoms and headaches (one of the extra-intestinal symptoms in this study) in some people [4], and either these people should be excluded from the trial or high-amine foods should be excluded from the baseline diet. Unfortunately, the Di Sabatino article gives us no details on the gluten-free diet that spanned the study, so I assume that the participants were allowed to select their own foods.

The plot thickens

Despite my misgivings, the effects of an uncontrolled elimination diet, as well as other random factors, can be accounted for by analyzing the challenge data on a group, not an individual, basis. Indeed, this study’s enthusiastic conclusion comes from the overall symptom scores for the group, which were greater for the gluten challenge than for the placebo (P = 0.034). Furthermore, the group symptom scores for many individual symptoms were also significantly higher for the gluten challenge than the placebo. However, the group data – although properly treated – does not give us the entire picture. As I mentioned earlier, it seems that the group response was skewed by three individuals who had an exceptionally high response to gluten compared with the placebo.

How can we know this? The article doesn’t offer that much in the way of details on individual patients, except for a very interesting scatter plot of the ‘delta weekly overall score’ for each person. (The delta score is overall weekly gluten score minus the placebo score.) In this plot we see what looks to my ruler and laser-beam eyes like a Gaussian distribution of points centered around an average delta score of 12.2. On both extremes – at a distance of more than two standard deviations beyond the average – we see a few lonely points, including three on the positive end and two on the negative. The authors identify the three points in the positive tail as the only “true gluten sensitive” participants in the study. I’ll repeat – only three participants were deemed gluten-sensitive.

Don’t worry if you can’t visualize what I just described – here are the two main ideas that we get from this graph. First, it seems that the placebo response rate in this study is somewhere around 50%, which isn’t surprising for gluten sensitivity, but which is higher than the 35% that is seen when double blind placebo-controlled food challenges are used to detect allergies [5]. This idea will come into play in a moment. Second, the three points being located more than two standard deviations (2 x 50.4) away from the average delta score means that there is only a 5% likelihood that these high scores would occur by chance. In other words, the math says that these three results probably aren’t a fluke – they’re a definite reaction. Again, everything is on the statistical up-and-up.

But the big question is, while being statistically correct, do these results really make a strong case for gluten sensitivity? I don’t think so. There are little things that don’t feel right, like the two people who reacted to the placebo to a similar degree as the “true gluten sensitive” (I prefer to call them ‘reactors’). On top of this, the reactors made up only 5% of a group of 59 people who believed that they suffered from gluten sensitivity. Could the 3 reactors be outliers in the sense that we have two distributions here – one random (meaning, gluten isn’t causing any real effect) and one related to some other type of food intolerance? My doubts about the uncontrolled elimination diet are coming back.

The preceding paragraph is just speculation, though. Here’s the real clincher. In this study, each participant took one gluten challenge and one placebo challenge. This is a fine thing to do when you are going to look at the group averages, which the researchers did. However, once you start to place more weight on the three gluten reactors, you are looking at things on an individual basis, and to diagnose food sensitivity in an individual, you need more than one set of challenges. (Think of it this way – in the group case, you do one test on multiple people, but in the individual case, you do multiple tests on one person.) Assuming a typical placebo response rate, an individual must go through three gluten and three placebo challenges in order for us to be 95% certain that they didn’t happen upon the three gluten challenges by luck [6]. Since the placebo response rate in this study is higher than normal, the three reactors really should have repeated the challenges even more than three times, but, unfortunately, we only have one set of results from them each. In a funky bit of irony, statistics tells us that we cannot conclude that these individuals have gluten sensitivity based on the limited data.

What’s next?

Life is full of compromises. Researchers look at things like group averages because double blind placebo-controlled food challenges are time consuming and difficult on the participants. To properly evaluate individual responses, participants would need to be on an extremely restrictive elimination diet (above and beyond excluding gluten) for months – just imagine the compliance problems and drop-out rates! Alternatively, the initial screening could be beefed up to identify likely placebo responders and to rule out more food intolerances, but imagine the costs! The current study addressed these issues in a reasonable way, and future researchers would do well to copy much of what Di Sabatino and colleagues have done…HOWEVER, we all must remember that gluten sensitivity cannot be identified in any individual without multiple gluten and placebo challenges.

References

1. Lies, damned lies, and statistics [Internet]. Wikipedia, the free encyclopedia. 2015 [cited 2015 Mar 3]. Available from: http://en.wikipedia.org/w/index.php?title=Lies,_damned_lies,_and_statistics&oldid=645997091

2. Di Sabatino A., Chiara Salvatore, Paolo Biancheri, Giacomo Caio, Roberto De Giorgio, Michele Di Stefano & Gino R. Corazza (2015). Small Amounts of Gluten in Subjects with Suspected Nonceliac Gluten Sensitivity: a Randomized, Double-Blind, Placebo-Controlled, Cross-Over Trial, Clinical Gastroenterology and Hepatology, DOI: http://dx.doi.org/10.1016/j.cgh.2015.01.029

3. Biesiekierski JR, Peters SL, Newnham ED, Rosella O, Muir JG, Gibson PR. No Effects of Gluten in Patients With Self-Reported Non-Celiac Gluten Sensitivity After Dietary Reduction of Fermentable, Poorly Absorbed, Short-Chain Carbohydrates. Gastroenterology. 2013 Aug;145(2):320–328.e3. http://dx.doi.org/10.1053/j.gastro.2013.04.051

4. Maintz L, Novak N. Histamine and histamine intolerance. Am J Clin Nutr. 2007 May 1;85(5):1185–96.

5. Gellerstedt M, Bengtsson U, Niggemann B. Methodological issues in the diagnostic work-up of food allergy: a real challenge. Journal of Investigational Allergology and Clinical Immunology. 2007;17(6):350.

6. Bindslev-Jensen C. Standardization of double-blind, placebo-controlled food challenges. Allergy. 2001;56(s67):75–7.