Understanding Ferguson's δ: time to say good-bye?
© Terluin et al; licensee BioMed Central Ltd. 2009
Received: 23 February 2009
Accepted: 30 April 2009
Published: 30 April 2009
A critique of Hankins, M: 'How discriminating are discriminative instruments?' Health and Quality of Life Outcomes 2008, 6:36.
Recently Hankins (re-)introduced Ferguson's coefficient δ as an index of discrimination, to be distinguished from the well-known measurement properties validity and reliability [1, 2]. Hankins presented Ferguson's δ as a useful index of the degree to which an instrument discriminates between individuals, being "the ratio of the observed number of between-person differences to the theoretical maximum number possible" . The value of δ varies between 0 (no discrimination at all) and 1 (maximal possible discrimination). The calculation is straightforward and Hankins provided a generalized formula for calculating δ for questionnaires with dichotomous as well as polytomous items.
Hankins' paper  elicited two critical comments [3, 4]. Wyrwich referred to the work of Guyatt  who related discrimination tot reliability, theoretically consistent correlations with other measures, and interpretability of small but important differences. Since Hankins failed to present relevant information regarding these issues, Wyrwich concluded that it is impossible to make a judgement on whether Ferguson's δ is a useful index or not . Whereas Hankins stated that discrimination is something else than reliability, Norman expressed the opposite view, i.e. that "reliability is discrimination". Scrutinizing Hankins' examples and adding one of his own, Norman illustrated his main point that Ferguson's δ fails to distinguish between true differences and measurement error . In his response, Hankins remarked that both Norman and Wyrwich made too much of his examples, and seemed to have missed his point, which is that Ferguson's δ is an additional index of an instruments' measurement properties, beside reliability, validity and interpretability, and that Ferguson's δ can only be computed on the assumption that the measurement is valid and reliable .
In this letter, we will examine how exactly Ferguson's δ 'works' and what δ actually measures. More specifically, we will show that the magnitude of δ is only determined by the distribution of the scores in a given sample. Moreover, we will show that the standard computation of δ ignores reliability, but, when reliability is accounted for, δ becomes impossible to interpret. Our final conclusion will be that Ferguson's δ is not a useful attribute of a measurement instrument.
How Ferguson's δ works
in which k is the number of items, m is the number of response options per item, n is the sample size and is the sum of squared frequencies of each score i. Note that k(m - 1) equals the score range of a scale, and 1 + k(m - 1) equals the total number of score categories q of an instrument.
it becomes apparent that the denominator is corrected for the fact that a person cannot be discriminated from his/her self (the shaded cells). Instead of all possible n2 comparisons (all cells), the denominator represents all possible discriminating comparisons (the white cells). Note that all discriminating comparisons are counted twice. For instance, the subject with score '7' is compared with the subject with score '2' in two cells (see Figure 1): cell a contains the comparison between subject (i = 7) and subject (j = 2), while cell b contains the comparison between subject (i = 2) and subject (j = 7), and it should be remembered that subject (i = 2) and subject (j = 2) are the same, and the same goes for subject (i = 7) and subject (j = 7). It should also be noted that Ferguson's δ treats the score categories as the scores of a nominal (or categorical) scale: all differences (if present) between all subjects are valued equally. In case the scale has ordinal properties (as in Hankins' examples) Ferguson's δ does not utilize the variation in differences between subjects.
So, Ferguson's δ is always 1, irrespective of the number of score categories q, provided that the subjects are evenly (uniformly) distributed among the score categories. Even in the case of q = 2 Ferguson's δ remains 1 as long as half of the subjects score '1' and the other half of them score '2'. Whether this situation represents an example of excellent discrimination, seems to be questionable. Intuitively, one expects an instrument to lose discriminative power when the number of score categories is limited to very small numbers, i.e. 2 or 3.
This example illustrates what Norman already advanced, namely that Ferguson's δ does not distinguish between true differences and differences due to measurement error . In his words: "The problem with δ is that all it cares about are differences". Hankins replied that 'acceptable' reliability (and validity) must be presupposed in order to determine δ. Furthermore, Hankins suggested that the computation of δ should be adjusted for non-reliable differences, to take into account only meaningful differences . By current standards, the reliability of the scale in our example is fully 'acceptable' (reliability coefficient 0.86). Let us execute the suggested adjustment of δ for reliability, by assuming that the 'smallest detectable difference' (SDD)  is a meaningful difference between subjects. The SDD is the smallest difference between two subjects that can, with 95% confidence, be attributed to a real difference in true scores. The SDD can be calculated from the standard error of measurement (SEM) using the formula .
This result suggests that adjusting δ for non-reliable differences might have a large impact on its magnitude, even when reliability is 'acceptable'. But, what does that tell us about the discriminative power of this instrument? What represents δ after adjustment for non-reliable differences? We really don't know.
Hankins reported that Ferguson mentioned that δ was 1 when the distribution was uniform (as we confirmed), and that normal distributions typically produce δ values of about 0.90 . Lower values of δ are associated with skewed distributions.
In daily life, uniform distributions are highly uncommon. More common are normal and skewed distributions. In addition, many health outcomes are characterized by floor or ceiling effects. We will now examine how δ is affected by different kinds of distributions.
Now, let us examine what happens to δ when the distribution is skewed. A skewed distribution is often present in health outcomes when the majority of subjects are normal, healthy or well. We construct a skewed distribution by taking the fourth power of the scores of the normal distribution, adjusting the range to the 1–10 range and rounding the scores to the nearest integer (Figure 6b). is 218. Ferguson's δ is 0.842.
A more skewed distribution is made by taking the tenth power of the scores of the normal distribution, adjusting the range to the 1–10 range and rounding the scores to the nearest integer (Figure 6c). is 508. Ferguson's δ is 0.484.
In the skewed distributions there is a clear floor effect discernable. These examples and some others we have tried, suggest that a decrease of δ is associated with kurtosis, the clustering of subjects within one or a few response categories. If δ is indeed a reflection of the sample's distribution, it does not seem to tell us anything about the discriminative properties of the instrument.
We have shown that Ferguson's δ is only determined by the distribution of the subjects in a sample over de score categories of an instrument. If the distribution is uniform, then δ is always 1. To our surprise, the maximum value of δ turned out not to be limited by the number of response categories q. Because, at any given value of q (provided q > 1), δ can take on any value between 0 and 1, it is safe to say that δ is independent of q, the number of score categories of the instrument.
Does Ferguson's δ say anything about the discriminative power of an instrument? Take for example our real life example. Is it valid to say that the 4DSQ depression scale is poorly discriminative in an employee sample, just because it fails to discriminate among those employees who do not experience the kind of depressive symptoms that de scale measures? If we want to discriminate anything with the 4DSQ depression scale, then we want to discriminate those who do experience depressive symptoms from those who don't, and that is what the scale is doing reasonably well . There seems to be absolutely no point in requiring that a depression scale discriminates among individuals who do not have depressive symptoms. To put it in more general terms, there is no point in discriminating among people who belong to the same category.
We agree with Norman's point  that Ferguson's δ simply ignores measurement error. Ferguson's δ does not distinguish between reliable and non-reliable differences. Although it is technically possible to adjust δ for non-reliable differences, this has a large impact on its magnitude. More problematic, though, is that we don't know how the resulting statistic should be interpreted. The important point is, that in the standard computation of δ reliability is not an issue. Hankins provided an example of an 8-item scale with a reliability coefficient (Cronbach's α) of 0.76 and a δ of 0.92 . Surely, this δ had not been adjusted for non-reliable differences!
The conclusion seems inescapable that Ferguson's δ is a characteristic of a population and that it does not refer to any useful property of a measurement instrument. We therefore conclude that it is time to say good bye to Ferguson's δ and let it slip into oblivion again.
- Hankins M: How discriminating are discriminative instruments? Health Qual Life Outcomes 2008, 6: 36. 10.1186/1477-7525-6-36PubMed CentralView ArticleGoogle Scholar
- Hankins M: Questionnaire discrimination: (re)-introducing coefficient delta. BMC Med Res Methodol 2007, 7: 19. 10.1186/1471-2288-7-19PubMed CentralView ArticleGoogle Scholar
- Wyrwich KW: Understanding the role of discriminative instruments in HRQoL research: can Ferguson's Delta help? Health Qual Life Outcomes 2008, 6: 82. 10.1186/1477-7525-6-82PubMed CentralView ArticleGoogle Scholar
- Norman GR: Discrimination and reliability: equal partners? Health Qual Life Outcomes 2008, 6: 81. 10.1186/1477-7525-6-81PubMed CentralView ArticleGoogle Scholar
- Guyatt GH: A taxonomy of health status instruments. J Rheumatol 1995, 22: 1188–1190.Google Scholar
- Hankins M: Discrimination and reliability: equal partners? Understanding the role of discriminative instruments in HRQoL research: can Ferguson's Delta help? A response. Health Qual Life Outcomes 2008, 6: 83. 10.1186/1477-7525-6-83PubMed CentralView ArticleGoogle Scholar
- de Vet HC, Bouter LM, Bezemer PD, Beurskens AJ: Reproducibility and responsiveness of evaluative outcome measures. Theoretical considerations illustrated by an empirical example. Int J Technol Assess Health Care 2001, 17: 479–487. [http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=101045] 10.1017/S0266462301106148View ArticleGoogle Scholar
- Terluin B, van Marwijk HW, Adèr HJ, De Vet HC, Penninx BW, Hermens ML, van Boeijen CA, van Balkom AJ, van der Klink JJ, Stalman WAB: The Four-Dimensional Symptom Questionnaire (4DSQ): a validation study of a multidimensional self-report questionnaire to assess distress, depression, anxiety and somatization. BMC Psychiatry 2006, 6: 34. 10.1186/1471-244X-6-34PubMed CentralView ArticleGoogle Scholar
- Terluin B, Van Rhenen W, Schaufeli WB, De Haan M: The Four-Dimensional Symptom Questionnaire (4DSQ): measuring distress and other mental health problems in a working population. Work Stress 2004, 18: 187–207. 10.1080/0267837042000297535View ArticleGoogle Scholar
- Hermens ML, van Hout HP, Terluin B, Adèr HJ, Penninx BW, van Marwijk HW, Bosmans JE, van Dyck R, De Haan M: Clinical effectiveness of usual care with or without antidepressant medication for primary care patients with minor or mild-major depression: a randomized equivalence trial. BMC Medicine 2007, 5: 36. 10.1186/1741-7015-5-36PubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.