The influence of study population and definition of improvement on the smallest detectable change and the minimal important change of the neck disability index
© Schuller et al.; licensee BioMed Central Ltd. 2014
Received: 5 November 2013
Accepted: 8 April 2014
Published: 15 April 2014
Reported values of the minimal important change (MIC) and the smallest detectable change (SDC) for the neck disability index (NDI) differ strongly, raising questions about the generalizability of these parameters. The SDC and the MIC are possibly influenced by the study design or by the study population. We studied the influence of the type of anchor, the definition of improvement and population characteristics on the SDC and the MIC of the NDI.
A cohort study including 101 patients with non-specific, chronic neck pain. SDC and MIC were calculated using two types of external anchors. For each anchor we applied two different definitions to dichotomize the population into improved and unimproved patients. The influence of patient characteristics was assessed in relevant subgroups: patients with or without radiating pain and patients with different baseline scores.
The influence of different anchors and different definitions of improvement on estimates of the SDC and the MIC was only minimal. The SDC and the MIC were similar for subgroups of patients with or without radiation, but differed strongly for subgroups of patients with higher or lower baseline scores.
Our study shows that estimates of the SDC and the MIC of the NDI can be influenced by population characteristics. It is concluded that we cannot adopt a single change score to define relevant change by combining the result of previous studies.
The Neck Disability Index (NDI) was published by Vernon in 1991 as a patient reported outcome measure of disability in patients with neck pain. It has been reported to be the most commonly used self-report instrument for evaluating functional status in neck pain clinical research[2, 3]. A review published in 2008 stated that the NDI had been used in approximately 300 publications, and translated into 22 languages. Many studies have addressed measurement properties of the NDI, and two systematic reviews of these studies were recently published. In a review by MacDermid et al. three comprehensive review articles and 41 studies that addressed at least one psychometric property were identified. The authors concluded that the NDI is reliable, valid and responsive in various patient populations, including patients with acute and chronic conditions, as well as those suffering from neck pain associated from musculoskeletal dysfunction, whiplash-associated disorders, and cervical radiculopathy. The authors stated that the work on smallest detectable change (SDC) and minimal important change (MIC) is sparse and inconsistent but followed Vernon in suggesting an accepted MIC of 5 (10%). In a more recently published systematic review by Schellingerhout et al. the COSMIN checklist was used to assess the quality of the collected studies. These authors concluded that the NDI scored positive on internal consistency, content validity, structural validity, hypothesis testing and responsiveness, but showed low reliability. According to Schellingerhout et al. a value for the MIC cannot be provided yet, as the estimates for the MIC are too diverse, and more studies are needed that determine the SDC and the MIC for the NDI in different subgroups of patients.
Previous studies presenting estimates of SDC and MIC a
acute + chronic
-3 to +3
average 4 weeks
-1 to +1
-2 to +2
acute + chronic
-2 to +2
NP + CR
average 2 weeks
-3 to +3
average 4 weeks
-2 to +2
-1 to +1
acute + chronic
2 to 4
3 to 5
acute + chronic
-3 to +3
From March to October 2009 patients with general, non-specific, chronic neck pain were recruited in four practices for OrthoManual Medicine in the Netherlands. Inclusion criteria were chronic non-specific neck pain, defined as neck pain existing at least 3 months, aged 18 years or older, and having no contraindications for manipulative treatment. Duration of complaints, age, radiation into the arm(s), and the presence of concomitant headache were recorded. After signing an informed consent form patients filled in the NDI as a baseline measure (T0). Patients were treated by one of six OrthoManual physicians. Usually the treatment would be completed within three months, but some patients returned for treatment after this period due to persistent or recurring complaints. After a follow-up period of six months patients were asked by email to fill in the NDI questionnaire again together with questions about the global perceived effect relating to pain and to function (T1).
The neck disability index
The NDI contains ten items. Seven items are related to activities of daily living, two are related to pain, and one item is related to concentration. Each item is scored on a 0-5 scale, adding up to an overall score ranging from 0 to 50, with higher scores corresponding to more severe disability. A series of studies has disputed the unidimensionality of the NDI, and various changes have been suggested to improve the unidimensionality[17–21], but this has not led to the widespread application of the suggested changes. In our study we have therefore used the NDI as a unidimensional construct.
Type of anchor, definition of improvement and choice of clinical subgroups
GPE-pain, unchanged = GPE4
GPE-pain, unchanged = GPE3-5
GPE-function, unchanged = GPE4
GPE-function, unchanged = GPE3-5
For each combination we have presented means and standard deviations of the NDI scores at T0 and at T1, and of the changes in score between T0 and T1. For each combination we calculated the SDC and the MIC.
Patients with or without radiation. Patients with neck pain frequently have symptoms that radiate into the arm(s), while the NDI is designed for measuring neck specific disability. The NDI is also used to measure change in groups of patients with cervical radiculopathy, who could have arm symptoms without neck pain .
Patients with baseline scores above or below the median NDI score of 24 points (48%). Different baseline scores have been shown to influence measurement properties in other instruments . In order to maximise the group size and thus optimize the statistical power we chose to use the median NDI score to divide our population into two groups with higher or lower baseline scores.
For each subgroup we calculated the SDC and the MIC. A GPE score of 3-5 was used to define stable patients in order to increase the number of unchanged patients.
SDC was based on the standard error of measurement (SEM) which was derived from the variance component in the formula for the intraclass correlation coefficient, ICCagreement . It was calculated on the group of patients who were considered to be unchanged by 1.96 × √2 × SEMagreement [25–27] (SDC 95)
MIC was calculated using a receiver operating characteristics (ROC) curve to establish the optimal cut-off point. The optimal cut-off point was defined as the point on the ROC curve with the highest sum of specificity and sensitivity, minimizing the overall misclassification . We report the MIC and the sensitivity and specificity at the cut-off point.
Patient characteristics at inclusion (N = 101)
Mean age at inclusion (range)
Mean duration of complaints in months (range)
Mean NDI score at baseline (SD)
NDI scores at baseline and after 6 months for different levels of the GPE (N = 99)
NDI:T0, mean (sd)
NDI:T1, mean (sd)
NDI:T1-T0, mean (sd)
1 Completely recovered (N = 14)
2 Much improved (N = 41)
3 Slightly improved (N = 17)
4 Unchanged (N = 23)
5 Slightly worse (N = 2)
6 Much worse (N = 2)
GPE 1-3 (N = 72)
GPE 4 (N = 23)
GPE 5-6 (N = 4)
GPE 1-2 (N = 55)
GPE 3-5 (N = 42)
GPE 6 (N = 2)
NDI:T 0 , mean (sd)
NDI:T 1 , mean (sd)
NDI:T 1 -T 0 , mean (sd)
1. Completely recovered (N = 19)
2. Much improved (N = 34)
3. Slightly improved (N = 17)
4. Unchanged (N = 26)
5. Slightly worse (N = 2)
6. Much worse (N = 1)
GPE 1-3 (N = 70)
GPE 4 (N = 26)
GPE 5-6 (N = 3)
GPE 1-2 (N = 53)
GPE 3-5 (N = 45)
GPE 6 (N = 1)
Influence of the type of anchor and of the definition of improvement
SDC and MIC using two different anchors and two different ways to dichotomize GPE scores (N = 99)
Unchanged = GPE 4 (N = 23)
Unchanged = GPE 3-5 (N = 42)
Unchanged = GPE 4 (N = 26)
Unchanged = GPE 3-5 (N = 45)
Influence of clinical characteristics
Clinical subgroup analysis of SDC and MIC for the NDI. GPE on pain (3-5 = unchanged)
No radiation (N = 47)
Radiation (N = 54)
Baseline < 24 (N = 49)
Baseline ≥24 (N = 52)
Summary of findings
Using different types of anchors or applying different definitions of improvement had a minimal influence on our estimates of the SDC and the MIC. Estimates of the SDC and the MIC were similar for patients with or without radiation, but differed for patients with different baseline scores, with a cut-off NDI score of 24 points (48%). Patients with a baseline score above 24 points had a higher MIC than patients with a lower baseline score. The SDC is similar in almost all analyses but again varies strongly in patients with higher or lower baseline scores. It can be concluded that the different estimates of SDC and MIC in our study are predominantly explained by patient characteristics.
Comparison with previous studies
Differences in population characteristics influence estimation of the SDC and the MIC of the NDI. Could different patient characteristics alone explain the different estimates reported in previous studies? The results of these studies are presented in Table 1, ranked according to the estimated MIC. Clearly the study of Cleland et al. with the highest baseline NDI does have the highest MIC (9.5), but in this study the MIC was reduced to 7.0 with a narrower definition of unchanged patients. Studies recruiting patients with cervical radiculopathy regardless of the coexistence of neck pain have the highest estimates of the SDC, but patients with cervical radiculopathy do not necessarily have neck pain as well. The perceived effect in these patients could be related to arm symptoms, while the NDI specifically measures neck symptoms. It is likely that this would reduce the correlation between the NDI score and the GPE, thus increasing the variance of the change scores in the group of stable patients. This would lead to higher estimates of the SDC and decrease the sensitivity and the specificity of the MIC. In our study we did not observe any difference in SDC between the subgroup of patients with or without radiating pain. This could be explained by the fact that we recruited a population of patients with neck pain, and we did not specifically screen for radiculopathy. Judged by the high SDC’s and the low sensitivity and specificity of the MIC in populations with a primary complaint of cervical radiculopathy it does seem that the NDI is less useful in these populations.
Due to the different populations recruited and the different methods used it remains very difficult to compare the estimates from previous studies. In our view different patient populations indeed seem to lead to different estimates of the SDC and of the MIC, but it is unclear whether patient characteristics are the only source explaining these differences. The dual factor structure of the NDI could possibly explain part of these findings. This might be improved by changing the NDI to a shorter version with an improved factor structure, as suggested by several authors[17–21], but it could also be that other instruments for measuring neck related disability will eventually prove more useful. It is clear that the SDC is much higher than the MIC in most studies, ranging even to 17.9 in a population with cervical radiculopathy. Given this high SDC one needs a change score much higher than the MIC to reliably label a patient as improved. This raises questions about the usefulness of the NDI to assess change in individual patients.
Although we did not study the influence of the follow-up period it is interesting to see that the MIC in our study is rather low compared to most other studies, despite our long follow-up period of six months. This could possibly be explained by the strict recruitment of patients with chronic neck pain. Selecting a population with chronic neck pain could have excluded patients with a favourable short term outcome whilst including patients with more stable and unchanging complaints. Patients with longstanding, stable complaints might also have a better recollection of the complaints they used to have before treatment. A comparable study of Jorritsma et al. used a follow-up period of three to five months, and reported a MIC of 3.5. Although there is no information to explain the lower estimate of the MIC in our study this finding only underlines our conclusion that estimates of the MIC are not invariable and are likely to be population specific.
Strengths and limitations
The main strength of our study lies in the use of a single study to assess the influence of different anchors, different definitions of improvement on the anchor and of population characteristics on estimates of the SDC and the MIC. Using differently phrased anchors for pain and for function gave us the opportunity to study the influence of the phrasing of the anchor in estimating the SDC and the MIC for the NDI in the same population, thereby excluding the possibility of sampling bias. A possible weakness lies in the number of recruited patients. This number of patients was enough for the main analyses, but the sample size may have been relatively small for our subgroup analyses.
Another limitation lies in the methods we use to calculate the MIC. We calculate the MIC by comparing the change score with the global perceived effect. This global perceived effect has been reported to correlate stronger with present status than with change in status. A patient with severe disability needs a large improvement to arrive at a better present status after treatment and could still end up in the group of patients reporting to be unchanged even with a strong improvement of the NDI score. A patient with a low baseline score but no real change after treatment will still have a good present status at follow-up and could end up in the group of improved patients while the NDI change score is small. This could explain higher estimates of the MIC for patients with higher baseline scores and does not necessarily reflect a real need for a larger improvement of the NDI score, but could be a shortcoming of our way of calculating the MIC using a global perceived effect as the external anchor.
It would still be interesting to study psychometric properties of the NDI in different populations, for example in patients with Whiplash Associated Disorders (WAD), and with larger sample sizes in the subgroups. It would also be very interesting to study the suggested shorter versions with an improved factor structure or to compare NDI scores with other legacy instruments or with recently developed IRT item banks for pain behaviour and pain interference[30, 31].
In terms of methodology future studies may focus on alternatives for the GPE. It is quite understandable that different patients have different perceptions of what magnitude of effect they consider an important change, perhaps also depending upon the treatment administered. A treatment that is costly, painful, or strenuous might need a larger effect to be considered worthwhile, and a patient who has experienced severe side effects might even consider a large improvement not worthwhile. The development of other methods to estimate sufficient important change could lead to new perspectives. We could still make progress in defining clinical relevance[32, 33].
We studied the influence of the type of anchor and the definition of improvement on the estimates of the SDC and the MIC of the NDI in a sample of patients with general, non-specific, chronic neck pain. The use of two different types of anchor and two definitions of improvement only had a minimal influence on these estimates. We also estimated the SDC and the MIC in subgroups of patients with or without radiating pain and with different baseline scores. Subgroups of patients with or without radiating pain had comparable estimates of the SDC and the MIC, but subgroups of patients with different baseline scores had different estimates of the SDC and the MIC, showing that these values can be influenced by population characteristics. It is therefore concluded that we cannot readily adopt a single change score to define relevant change by combining the results of previous studies. In line with other studies we also report the SDC to be much higher than the MIC, which is a limitation of the NDI. Quite a large change score is needed to reliably label a patient as improved, raising questions about the ability of the NDI to assess change in individual patients.
Acknowledgements and financial support
This study was originally conducted as a scientific paper in the specialist training for OrthoManual physician. We thank the OrthoManual physicians Brouwer, Cossee, Cuppen, Jonquiere, and Savelkouls for recruiting patients for this study. The first authors position at the EMGO + Institute for Health and Care Research is funded by the Dutch Association for Orthomanual Medicine (Nederlandse Vereniging voor OrthoManuele Geneeskunde, NVOMG).
- Vernon H, Mior S: The Neck Disability Index: a study of reliability and validity. J Manipulative Physiol Ther 1991, 14(7):409–415.PubMedGoogle Scholar
- MacDermid JC, Walton DM, Avery S, Blanchard A, Etruw E, McAlpine C, Goldsmith CH: Measurement properties of the neck disability index: a systematic review. J Orthop Sports Phys Ther 2009, 39(5):400–417.View ArticlePubMedGoogle Scholar
- Pietrobon R, Coeytaux RR, Carey TS, Richardson WJ, DeVellis RF: Standard scales for measurement of functional outcome for cervical pain or dysfunction: a systematic review. Spine (Phila Pa 1976) 2002, 27(5):515–522.View ArticleGoogle Scholar
- Vernon H: The Neck Disability Index: state-of-the-art, 1991–2008. J Manipulative Physiol Ther 2008, 31(7):491–502.View ArticlePubMedGoogle Scholar
- Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, Bouter LM, de Vet HC: The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res 2010, 19(4):539–549.PubMed CentralView ArticlePubMedGoogle Scholar
- Schellingerhout JM, Verhagen AP, Heymans MW, Koes BW, de Vet HC, Terwee CB: Measurement properties of disease-specific questionnaires in patients with neck pain: a systematic review. Qual Life Res 2012, 21(4):659–670.PubMed CentralView ArticlePubMedGoogle Scholar
- Terwee CB, Roorda LD, Knol DL, de Boer MR, de Vet HC: Linking measurement error to minimal important change of patient-reported outcomes. J Clin Epidemiol 2009, 62(10):1062–1067.View ArticlePubMedGoogle Scholar
- Cleland JA, Fritz JM, Whitman JM, Palmer JA: The reliability and construct validity of the Neck Disability Index and patient specific functional scale in patients with cervical radiculopathy. Spine (Phila Pa 1976) 2006, 31(5):598–602.View ArticleGoogle Scholar
- Cleland JA, Childs JD, Whitman JM: Psychometric properties of the Neck Disability Index and Numeric Pain Rating Scale in patients with mechanical neck pain. Arch Phys Med Rehabil 2008, 89(1):69–74.View ArticlePubMedGoogle Scholar
- Pool JJ, Ostelo RW, Hoving JL, Bouter LM, de Vet HC: Minimal clinically important change of the Neck Disability Index and the Numerical Rating Scale for patients with neck pain. Spine (Phila Pa 1976) 2007, 32(26):3047–3051.View ArticleGoogle Scholar
- Trouli MN, Vernon HT, Kakavelakis KN, Antonopoulou MD, Paganas AN, Lionis CD: Translation of the Neck Disability Index and validation of the Greek version in a sample of neck pain patients. BMC Musculoskelet Disord 2008, 9: 106.PubMed CentralView ArticlePubMedGoogle Scholar
- Vos CJ: Acute Neck Pain in General Practice. Erasmus University Rotterdam; 2006. Ref Type: Thesis/DissertationGoogle Scholar
- Young BA, Walker MJ, Strunce JB, Boyles RE, Whitman JM, Childs JD: Responsiveness of the Neck Disability Index in patients with mechanical neck disorders. Spine J 2009, 9(10):802–808.View ArticlePubMedGoogle Scholar
- Young IA, Cleland JA, Michener LA, Brown C: Reliability, construct validity, and responsiveness of the neck disability index, patient-specific functional scale, and numeric pain rating scale in patients with cervical radiculopathy. Am J Phys Med Rehabil 2010, 89(10):831–839.View ArticlePubMedGoogle Scholar
- Jorritsma W, Dijkstra PU, de Vries GE, Geertzen JH, Reneman MF: Detecting relevant changes and responsiveness of Neck Pain and Disability Scale and Neck Disability Index. Eur Spine J 2012, 21(12):2550–2557.PubMed CentralView ArticlePubMedGoogle Scholar
- Vos CJ, Verhagen AP, Koes BW: Reliability and responsiveness of the Dutch version of the Neck Disability Index in patients with acute neck pain in general practice. Eur Spine J 2006, 15(11):1729–1736.View ArticlePubMedGoogle Scholar
- Gabel CP, Cuesta-Vargas AI, Osborne JW, Burkett B, Melloh M: Confirmatory factory analysis of the Neck Disability Index in a general problematic neck population indicates a one-factor model. Spine J 2013. doi:10.1016/j.spinee.2013.08.026Google Scholar
- Johansen JB, Andelic N, Bakke E, Holter EB, Mengshoel AM, Roe C: Measurement properties of the norwegian version of the neck disability index in chronic neck pain. Spine (Phila Pa 1976) 2013, 38(10):851–856.View ArticleGoogle Scholar
- van der Velde G, Beaton D, Hogg-Johnston S, Hurwitz E, Tennant A: Rasch analysis provides new insights into the measurement properties of the neck disability index. Arthritis Rheum 2009, 61(4):544–551.View ArticlePubMedGoogle Scholar
- Walton DM, MacDermid JC: A brief 5-item version of the Neck Disability Index shows good psychometric properties. Health Qual Life Outcomes 2013, 11: 108.PubMed CentralView ArticlePubMedGoogle Scholar
- Young SB, Aprill C, Braswell J, Ogard WK, Richards JS, McCarthy JP: Psychological factors and domains of neck pain disability. Pain Med 2009, 10(2):310–318.View ArticlePubMedGoogle Scholar
- Osborn W, Jull G: Patients with non-specific neck disorders commonly report upper limb disability. Man Ther 2013, 18(6):492–497.View ArticlePubMedGoogle Scholar
- Demoulin C, Ostelo R, Knottnerus JA, Smeets RJ: What factors influence the measurement properties of the Roland-Morris disability questionnaire? Eur J Pain 2010, 14(2):200–206.View ArticlePubMedGoogle Scholar
- de Vet HC, Terwee CB, Mokkink LB, Knol D: Measurement in Medicine. Cambridge University Press; 2011. Ref Type: Serial (Book, Monograph)View ArticleGoogle Scholar
- Bland JM, Altman DG: Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986, 1(8476):307–310.View ArticlePubMedGoogle Scholar
- de Vet HC, Terwee CB, Knol DL, Bouter LM: When to use agreement versus reliability measures. J Clin Epidemiol 2006, 59(10):1033–1039.View ArticlePubMedGoogle Scholar
- Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, Bouter LM, de Vet HC: Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol 2007, 60(1):34–42.View ArticlePubMedGoogle Scholar
- van der Roer N, Ostelo RW, Bekkering GE, van Tulder MW, de Vet HC: Minimal clinically important change for pain intensity, functional status, and general health status in patients with nonspecific low back pain. Spine 2006, 31(5):578–582.View ArticlePubMedGoogle Scholar
- Kamper SJ, Ostelo RW, Knol DL, Maher CG, de Vet HC, Hancock MJ: Global Perceived Effect scales provided reliable assessments of health transition in people with musculoskeletal disorders, but ratings are strongly influenced by current status. J Clin Epidemiol 2010, 63(7):760–766.View ArticlePubMedGoogle Scholar
- Amtmann D, Cook KF, Jensen MP, Chen WH, Choi S, Revicki D, Cella D, Rothrock N, Keefe F, Callahan L, Lai JS: Development of a PROMIS item bank to measure pain interference. Pain 2010, 150(1):173–182.PubMed CentralView ArticlePubMedGoogle Scholar
- Revicki DA, Chen WH, Harnam N, Cook KF, Amtmann D, Callahan LF, Jensen MP, Keefe FJ: Development and psychometric analysis of the PROMIS pain behavior item bank. Pain 2009, 146(1–2):158–169.PubMed CentralView ArticlePubMedGoogle Scholar
- Barrett B, Brown D, Mundt M, Brown R: Sufficiently important difference: expanding the framework of clinical significance. Med Decis Making 2005, 25(3):250–261.View ArticlePubMedGoogle Scholar
- Ferreira ML, Herbert RD, Ferreira PH, Latimer J, Ostelo RW, Nascimento DP, Smeets RJ: A critical review of methods used to determine the smallest worthwhile effect of interventions for low back pain. J Clin Epidemiol 2012, 65(3):253–261.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.