Skip to main content


Psychometric evaluation of the Oswestry Disability Index in patients with chronic low back pain: factor and Mokken analyses



Disputes exist regarding the psychometric properties of the Oswestry Disability Index (ODI). The present study was to examine the reliability, validity, and dimensionality of a Chinese version of the ODI version 2.1 in a sample of 225 adult orthopedic outpatients with chronic low back pain [mean age (SD): 40.7 (11.4) years].


We conducted reliability analysis, exploratory bifactor analysis, confirmatory factor analysis, and Mokken scale analysis of the ODI. To validate the ODI, we used the Short-Form 36 questionnaire (SF-36) and visual analog scale (VAS).


The reliability, and discriminant and construct validities of the ODI was good. The fit statistics of the unidimensional model of the ODI were inadequate. The ODI was a weak Mokken scale (Hs = 0.31).


The ODI was a reliable and valid scale suitable for measurement of disability in patients with low back pain. But the ODI seemed to be multidimensional that was against the use of the raw score of the ODI as a measurement of disability.


Low back pain (LBP) is the second leading cause of disability in the world [1]. The level of disability in patients with LBP is an important outcome measure for clinical practice and research [2]. The Oswestry Disability Index (ODI) is one of the most commonly used scales that assess the disability related to LBP [2, 3].

The scoring of the ODI is the simple sum score of the items of the ODI multiplied by two [3]. The derived score of the ODI is a measurement of the level of disability related to LBP [3]. But the items of the ODI are ordinal measurements. To use the raw score of the ODI as a valid measurement, three assumptions are required: (1) the ODI is unidimensional; (2) all the items of the ODI are equally correlated with the measured construct (i.e. LBP-related disability); and (3) the point intervals are equal on the ODI [4]. But these assumptions are typically unchecked and unjustified [4]. Hence, to use the raw score of the ODI as a measurement, it is necessary to check whether the ODI meets these assumptions.

Disputes exist in the psychometric properties of the ODI. First, the construct and transcultural validities of the ODI are uncertain [5]. Second, the unidimensionality of the ODI is conflicting [6, 7]. Several items of the ODI poorly fit unidimensionality [6,7,8]. Third, the ODI appears to have a floor effect that the ODI poorly differentiates patients with little disability [6, 7, 9]. Fourth, the ODI also has a ceiling effect that limits differentiating patients with high disability [6, 7, 9]. Such disputes suggest the raw score of the ODI is problematic.

Lue et al. (2008) developed the Chinese version of the ODI 2.1, and claimed that the ODI was unidimensional solely based upon its Chronbach’s α was 0.90 [10]. Such argument is false because Cronbach’s α does not measure dimensionality. Cronbach’s α is a measure of the mean inter-item covariance and the number of items [11]. No clear relationship exists between Cronbach’s α and the dimensionality of a scale [11]. Lue et al. (2013) has shown that the ODI fitted the Rasch model, and they suggested that the ODI is a unidimensional scale [7]. But in that study [7], the majority (6 of 10) of the items of the ODI violated monotonicity, suggesting that the ODI might be a multidimensional scale.

The present study was to address two research questions of the Chinese version of the ODI 2.1:

  1. 1.

    Is the ODI a unidimensional scale?

  2. 2.

    Is the ODI reliable and valid for measuring disability in adult orthopedic outpatients with chronic LBP?



We re-analysed the data from a cross-sectional sample of adult orthopedic outpatients with chronic LBP in Taiwan [12]. This study was conducted in the general orthopedics clinic of the Chang Gung Memorial Hospital, Linkou, from August 2008 to November 2010, and was approved by the Institutional Review Board of the same hospital. Inclusion criteria were as follows: (1) 20–65 years of age, and (2) chronic LBP that was defined as LBP for at least 3 months. Exclusion criteria were as follows: (1) receiving antidepressant or antipsychotic medication during the preceding 4 weeks, (2) psychotic symptoms, (3) mental retardation, and (4) severe cognitive impairment. All participants gave written informed consent before study enrollment.

The sample consisted of 225 patients [mean age (SD): 40.7 (11.4) years; 103 (45.8%) females]. As for comorbid mental illness per the DSM-IV criteria [13], 49 (21.8%) had major depressive disorder and 52 (23.1%) had at least one anxiety disorder. 83 (36.9%) patients had severe LBP.


The Oswestry Disability Index (ODI)

The ODI consists of 10 items on the degree of severity to which back (or leg) trouble has affected the ability to manage in everyday life [3]. The 10 sections cover the pain and the daily function (including pain intensity, personal hygiene, lifting, walking, sitting, standing, sleeping, sexual activity, social activity, and traveling). Each item is rated on a 6-point scale (0–5); the higher score means the higher level of disability related to LBP. The present study used the traditional Chinese version of the ODI 2.1 [10].

The short-form 36 questionnaire (SF-36)

The SF-36 consists of 36 items for measuring the general health status of patients [14]. The SF-36 has 8 subscales as follows: physical functioning (PF), role limitations due to physical health problems (Role-physical, RP), bodily pain (BP), general health (GH), vitality (VT), social functioning (SF), role limitation due to emotional problems (Role-emotional, RE) and mental health (MH). Each subscale is rated on a scale of 0–100; the higher score means the better health status. We used the traditional Chinese version of the SF-36 [15]. We adopted two kinds of aggregate scoring of the SF-36. First, according to the Medical Outcomes Study conceptual model, the Physical Health score (PHS) is derived from the sum of PF, RP, BP, and GH; the Mental Health score (MHS) is derived from the sum of VT, SF, RE, and MH [14, 16]. Second, the standard physical component summary (PCS) and mental health component summary (MCS) scores were calculated with the formulae available in the study by Leese et al. [16] and the norms of Taiwan population [15]:

  1. 1.

    Standardized scores of the SF-36:

$$ PFZ=\frac{\left( PF-92.24\right)}{16.16}, RPZ=\frac{\left( RP-83.65\right)}{33.27}, BPZ=\frac{\left( BP-84.84\right)}{19.42}, GHZ=\frac{\left( GH-69.29\right)}{21.27}, VTZ=\frac{\left( VT-68.27\right)}{18.66}, SFZ=\frac{\left( SF-86.81\right)}{17.05}, REZ=\frac{\left( RE-79.4\right)}{36.07}, MHZ=\frac{\left( MH-73.01\right)}{16.55} $$
  1. 2.

    PCS1 and MCS1 scores:

$$ {PCS}_1=0.42402\times PFZ+0.35119\times RPZ+0.31754\times BPZ+0.24954\times GHZ+0.02877\times VTZ-0.00753\times SFZ-0.19206\times REZ-0.22069\times MHZ $$
$$ {MCS}_1=-0.22999\times PFZ-0.12329\times RPZ-0.09731\times BPZ-0.01571\times GHZ+0.23534\times VTZ+0.26876\times SFZ+0.43407\times REZ+0.4858\times MHZ $$
  1. 3.

    PCS and MCS scores:

$$ PCS={PCS}_1\times 10+50 $$
$$ MCS={MCS}_1\times 10+50 $$

The visual analog scale (VAS)

Each patient rated the pain intensity of the back and lower legs on the VAS with a horizontal line of 10 cm [17]. In this study, VAS ≥ 7 was considered to indicate severe pain.

Statistical analysis

We conducted all analyses in R version 3.3.1 [18]. The P-values were two-tailed with the significance level of 0.05. We summarized the item statistics of the ODI with the likert package [19].


To assess the reliability of the ODI, we used the psych package for getting four coefficients as follows: Cronbach’s α, Revelle’s β, and McDonald’s ωt and ωh [20, 21]. Revelle’s β refers to the worst split-half reliability. The coefficient ωt refers to the amount of reliable variance in a scale. The coefficient ωh refers to an estimate of the general factor saturation of a scale. As for each of the four coefficients of a scale, a value of 0.7 or higher marks good reliability. Also, the ordered sequence of the four coefficients offers insight into the dimensionality of a scale [21]. For a unidimensional scale, ωh should be equal or greater than Chronbach’s α [21].

We took two steps to get the coefficients ωt and ωh of the ODI. First, we conducted parallel analysis of the ODI data to decide the proper number of extracted factors. Second, we conducted exploratory bifactor analysis (EBA) of the ODI data to get ωt, ωh, and the explained common variance (ECV) for the general factor. If the ECV is larger than 60%, a unidimensional construct is confirmed [22]. We obtained the corresponding 95% bias-corrected and accelerated bootstrap confidence intervals of α, β, ωt, ωh, and the ECV with 10,000 bootstrap replications with the boot package [23, 24].

Convergent and Discriminant validities

To test the convergent validity of the unidimensional model of the ODI, we conducted confirmatory factor analysis (CFA) with the lavaan and matrixpls packages [25, 26]. If the composite reliability (CR) is greater than 0.7 and the average variance extracted (AVE) greater than 0.5, then the convergent validity is confirmed [27]. The cutoffs of model fit statistics are as follows: the root mean square error of approximation (RMSEA) < 0.06, the close fit (CFit) test that was non-significant (i.e., the probability value that the RMSEA ≤ 0.05 was greater than 0.05), the standardized root mean square residual (SRMR) ≤ 0.05, the weighted root mean square residual (WRMR) < 1.0, the comparative fit index (CFI) ≥ 0.95, and the Tucker-Lewis index (TLI) ≥ 0.95 [28,29,30].

To test the discriminant validity of the ODI, we used the Fornell-Larcker criterion and the heterotrait-monotrait (HTMT) ratio with the PCS and MCS of the SF-36 [16, 27, 31]. As for the Fornell-Larcker criterion, if the AVE of the ODI is larger than the squared correlations between the ODI and the PCS and MCS of the SF-36, then the discriminant validity of the ODI is confirmed. As for the HTMT ratio method, two criteria of discriminant validity are as follows: (1) the HTMT ratio should be less than 0.85 (HTMT.85), and (2) the 90% normal bootstrap confidence should not include 1 (HTMTinference) [23, 24, 31]. We got the corresponding 95% confidence intervals of CR, AVE, and HTMT ratios of the ODI with 10,000 bootstrap replications [23, 24].

Mokken scale analysis (MSA)

MSA is one of non-parametric item response theory models and is useful for scrutinizing a scale [32, 33]. We conducted MSA of the ODI data with the mokken package [34, 35]. First, we got the three Loevinger’s scalability coefficients (H): item-pair (H ij ), item (H i ), and scale (H s ) [34,35,36]. The rules of thumb for the H values are as follows: a scale is weak if 0.3 ≤ H < 0.4, moderate if 0.4 ≤ H < 0.5, and strong if H ≥ 0.5 [34,35,36]. Second, we examined local independence with conditional association procedure, monotonicity with item-rest regression, and non-intersection with the restscore method [32, 34, 35, 37, 38]. Finally, we assessed item ordering with the manifest IIO method and the backward selection procedure [34]. Next, we rated IIO of the selected items on the coefficient H T [39]. The rules of thumb for H T values are as follows: a weak IIO if 0.3 ≤ H T < 0.4, moderate if 0.4 ≤ H T < 0.5, and strong if H T ≥ 0.5. We rated the reliability of a Mokken scale on the latent class reliability coefficient (LCRC) [40]. A reliable scale should have a LCRC ≥ 0.7.

Construct validity

We calculated the Pearson’s product-moment correlation coefficients between the ODI and the SF-36 aggregate scores [i.e. PHS, MHS, PCS, and MCS], and the VAS. As the ODI is a scale of LBP-related disability, we hypothesized that the absolute values of the correlation coefficients between the ODI and the physical domain (i.e. the PHS, PCS, and VAS) should be greater than those between the ODI and the mental domain (i.e. the MHS and MCS).


Table 1 shows the demographic and clinical characteristics of the sample. Among the 225 patients, 49 (21.8%) patients had major depressive disorder. Among them, 21 (42.9%) patients were in a current major depressive episode, 21 (42.9%) were in partial remission of depression, and 7 (14.3%) were in full remission. Among the 225 patients, 149 (66.2%) patients had abnormal radiographic findings. 96 (42.7%) patients had associated leg symptoms, including leg radiation pain, leg numbness, intermittent claudication, and neurological deficits. 68 (30.2%) patients had medical comorbidities. Table 2 shows the item statistics of the ODI. As for the combined proportion of response ≥3 (i.e. at least moderate disability), the items 1 (pain intensity), 6 (standing), and 9 (social activity) were the highest among the 10 items of the ODI. Table 3 shows the abridged summary of the ODI, SF-36, and VAS. Parallel analysis of the ODI data revealed that the number of factor extracted should be four. Table 4 shows the summary of the reliability and validity statistics of the ODI. The ODI had a Cronbach’s α, a McDonald’s ωt, and an LCRC greater than 0.7, indicating the ODI was reliable. The ODI was multidimensional according to the following criteria: (1) ωh was lower Chronbach’s α; (2) the ECV was below 60%; and (3) the fit statistics of the unidimensional model of the ODI were inadequate. The ODI had poor discriminant validity with the PCS but good discriminant validity with the MCS of the SF-36. The ODI was a weak mokken scale (H s  = 0.31). Table 5 shows the correlation coefficients between the ODI and other scales. The ODI negatively correlated to the aggregate scores of the SF-36 and positively to the VAS. Table 6 shows the item scalability coefficients (H i ) of the ODI. Each item of the ODI had no violation of local independence and monotonicity; also, no serious violation of non-intersection existed. The backward selection procedure of the ODI removed 3 items (# 3, 5, and 7). The other 7 items had weak IIO property (H T = 0.36). The hierarchical item ordering of the 7 items was the ascending order of the mean scores of these items [2 (personal hygiene), 10 (traveling), 4 (walking), 8 (sexual life), 9 (social life), and 1 (pain intensity)].

Table 1 Demographic and clinical characteristics of the sample (N = 225)
Table 2 Item statistics of the ODI
Table 3 Abridged summary of the ODI, SF-36, and VAS
Table 4 Reliability and validity statistics of the ODI
Table 5 Correlation coefficients between the ODI and other scales
Table 6 The item scalability coefficients (Hi) of the ODI


Our results supported that the ODI has good reliability [7, 10]. As for the dimensionality of the ODI, our results largely opposed that the ODI is a unidimensional scale. On one hand, evidence for the ODI as a multidimensional scale was as follows: (1) ωh was less than α [21]; (2) the ECV was less than 60% [22]; (3) the ODI had insufficient convergent validity; and (4) the fit statistics of the unidimensional model of the ODI were generally poor. On the other hand, our MSA shows that the ODI was a weak Mokken scale that supported the use of the raw score of the ODI as valid ordinal personal measurement of disability [7]. But a half of the ODI items performed poorly in unidimensionality (H i  < 0.3). Such results further signified the poor unidimensionality of the ODI [6, 8].

The ODI had good discriminant validity from the MCS and poor discriminant validity from the PCS. Such results supported that the ODI is a construct of physical disability [3]. The correlations between the ODI, the aggregate scores of the SF-36, and the VAS also supported that the ODI is a measure of physical disability.

Our results showed that the 7 items of the ODI had weak IIO property. The IIO property is a useful feature for measurement of disability. For example, if a patient with LBP reports impaired personal hygiene, the patient would also suffer from disability of the other 6 items of the ODI. Also, when a patient reports improved disability of pain intensity, the patient would have improved disability of the other 6 items of the ODI. Besides, our results concurred with recent studies that the disability of personal hygiene is the most difficult item of the ODI [7].

Our results have two implications for research of the ODI. First, the raw score of the ODI might not be the ideal aggregate score of the ODI. Alternative scoring methods of the ODI include dividing the raw score into 5 categories, and using the individual items of the ODI [3]. Second, further research should consider multidimensional scaling of the ODI, for example, multidimensional scaling and item-response theory models [41, 42].

The present study has three limitations. First, this was cross-sectional analysis of a single-site sample. We were incapable of verifying test-retest reliability of the ODI. Also, selection bias unavoidably existed. Second, the sample size was modest. Third, the original study was not designed to investigate the research questions addressed in the present study. Fourth, we did not include a reliable and valid assessment of personality disorders. Polatin et al. (1993) reported that the prevalence of at least one personality disorder was as high as 51% among chronic LBP patients [43]. Hence, as regards psychological factors of chronic LBP, it is crucial to include assessment of personality disorder. However, to the best of our knowledge, there is still a lack of a validated Chinese version of the Structured Clinical Interview for DSM-IV-TR Axis II Personality Disorders (SCID-II), which is the standard assessment of personality disorders [44]. Hence, we cannot provide the prevalence of personality disorders in the study sample.


Using multiple methods, we show that the ODI was a reliable and valid scale suitable for measuring disability in patients with low back pain. But the ODI seemed to be a multidimensional scale that was against the use of the raw score of the ODI as a measurement of disability.



Average Variance Extracted


Comparative Fit Index

CFit test:

Close Fit test


Composite Reliability


Explained Common Variance

HTMT ratio:

Heterotrait-Monotrait ratio


Low back pain


Latent Class Reliability Coefficient


Oswestry Disability Index


Root Mean Square Error of Approximation


Short-Form 36 questionnaire

• SF-36 subscales:

 - BP: Bodily Pain

 - GH: General Health

 - MH: Mental Health

 - PF: Physical Functioning

 - RE: Role-Emotional

 - RP: Role-Physical

 - SF: Social Functioning

 - VT: Vitality

• SF-36 composite scales:

 - MCS: Mental Health Component Summary

 - MHS: Mental Health Score

 - PCS: Physical Component Summary

 - PHS: Physical Health Score


Standardized Root Mean Square Residual


Tucker-Lewis Index


Visual Analog Scale


Weighted Root Mean Square Residual


  1. 1.

    Andersson GB. Epidemiological Features of Chronic Low-Back Pain. Lancet. 1999;354:581–5. Available from:

  2. 2.

    Sheahan PJ, Nelson-Wong EJ, Fischer SL. A Review of Culturally Adapted Versions of the Oswestry Disability Index: The Adaptation Process, Construct Validity, Test-Retest Reliability and Internal Consistency. Disabil Rehabil. 2015;37:2367–74. Available from:

  3. 3.

    Fairbank JC, Pynsent PB. The Oswestry Disability Index. Spine. 2000;25:2940–53. Available from:

  4. 4.

    Streiner D, Norman G. Health Measurement Scales: A Practical Guide To Their Development and Use. 4th ed. New York: Oxford University Press; 2008.

  5. 5.

    Chiarotto A, Maxwell LJ, Terwee CB, Wells GA, Tugwell P, Ostelo RW. Roland-Morris Disability Questionnaire and Oswestry Disability Index: Which Has Better Measurement Properties for Measuring Physical Functioning in Nonspecific Low Back Pain? Systematic Review and Meta-Analysis. Phys Ther. 2016;96:1620–37. Available from:

  6. 6.

    Brodke DS, Vadim G, Lawrence BD, Ryan SW. Neese A, Hung M. Oswestry Disability Index: A Psychometric Analysis with 1,610 Patients. Spine J. 2017;17:321–7.

  7. 7.

    Lu Y-M, Wu Y-Y, Hsieh C-L, Lin C-L, Hwang S-L, Cheng K-I, et al. Measurement Precision of the Disability for Back Pain Scale-by Applying Rasch Analysis. Health Qual Life Outcomes. 2013;11:1–10.

  8. 8.

    Lochhead LE, MacMillan PD. Psychometric Properties of the Oswestry Disability Index: Rasch Analysis of Responses in a Work-Disabled Population. Work. 2013;46:67–76.

  9. 9.

    Dawson AP, Steele EJ, Hodges PW, Stewart S. Utility of the Oswestry Disability Index for Studies of Back Pain Related Disability in Nurses: evaluation of Psychometric and Measurement Properties. Int J Nurs Stud. 2010;47:604–7.

  10. 10.

    Lue Y-J, Hsieh C-L, Huang M-H, Lin G-T, Lu Y-M. Development of a Chinese Version of the Oswestry Disability Index Version 2.1. Spine. 2008;33:2354–60. Available from:

  11. 11.

    Sijtsma K. On the Use, the Misuse, and the Very Limited Usefulness of Cronbach’s Alpha. Psychometrika. 2009;74:107–20. Available from:

  12. 12.

    Wang H-Y, Fu T-S, Hsu S-C, Hung C-I. Association of Depression with Sleep Quality Might be Greater Than that of Pain Intensity among Outpatients with Chronic Low Back Pain. Neuropsychiatr Dis Treat. 2016;12:1993–8.

  13. 13.

    First MB, Spitzer RL, Miriam G, Williams JB. Structured Clinical Interview for DSM-IV-TR Axis I Disorders, Research Version, Patient Edition (SCID-I/P). New York: Biometrics Research, New York State Psychiatric Institute; 2002.

  14. 14.

    Ware JE, Kosinski M, Gandek B. SF-36 Health Survey: Manual and Interpretation Guide. 3rd ed. Lincoln, RI: QualityMetric Inc.; 2005.

  15. 15.

    Lu J-F, Tseng H-M, Tsai Y-J. Assessment of Health-Related Quality of Life in Taiwan (I): Development and Psychometric Testing of SF-36 Taiwan Version. Taiwan J Public Health. 2003;22:501–11.

  16. 16.

    Leese M, Schene A, Koeter M, Meijer K, Bindman J, Mazzi M, et al. SF-36 Scales, and Simple Sums of Scales, Were Reliable Quality-of-Life Summaries for Patients with Schizophrenia. J Clin Epidemiol. 2008;61:588–96.

  17. 17.

    Bond MR, Pilowsky I. Subjective Assessment of Pain and Its Relationship to the Administration of Analgesics in Patients with Advanced Cancer. J Psychosom Res. 1966;10:203–8. Available from:

  18. 18.

    R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2016. Available from:

  19. 19.

    Bryer J, Speerschneider K. likert: Analysis and Visualization Likert Items. 2016. Available from:

  20. 20.

    Revelle W. psych: Procedures for Psychological, Psychometric, and Personality Research. Evanston, Illinois: Northwestern University; 2016. Available from:

  21. 21.

    Zinbarg RE, Revelle W, Yovel I, Li W. Cronbach’s α, Revelle’s β, and Mcdonald’s ωH : Their Relations with Each Other and Two Alternative Conceptualizations of Reliability. Psychometrika. 2005;70:123–33. Available from:

  22. 22.

    Reise SP, Moore TM, Haviland MG. Bifactor Models and Rotations: Exploring the Extent to which Multidimensional Data Yield Univocal Scale Scores. J Pers Assess. 2010;92:544–59. Available from:

  23. 23.

    Canty A, Ripley B. Boot: Bootstrap R (S-Plus) Functions. 2016. Available from:

  24. 24.

    Davison AC, Hinkley DV. Bootstrap methods and their applications. Cambridge: Cambridge University Press; 1997.

  25. 25.

    Rosseel Y. Lavaan: An R Package for Structural Equation Modeling. J Stat Softw. 2012;48:1–36.

  26. 26.

    Rönkkö M. matrixpls: Matrix-based Partial Least Squares Estimation. 2016. Available from:

  27. 27.

    Fornell C, Larcker DF. Evaluating Structural Equation Models with Unobservable Variables and Measurement Error. J Mark Res. 1981;18:39–50. Available from:

  28. 28.

    Cook KF, Kallen MA, Amtmann D. Having a Fit: Impact of Number of Items and Distribution of Data on Traditional Criteria for Assessing IRT’s Unidimensionality Assumption. Qual Life Res. 2009;18:447–60. Available from:

  29. 29.

    Heene M, Hilbert S, Draxler C, Ziegler M, Buhner M. Masking Misfit in Confirmatory Factor Analysis by Increasing Unique Variances: A Cautionary Note on the Usefulness of Cutoff Values of Fit Indices. Psychol Methods. 2011;16:319–36. Available from:

  30. 30.

    Reeve BB, Hays RD, Bjorner JB, Cook KF, Crane PK, Teresi JA, et al. Psychometric Evaluation and Calibration of Health-Related Quality of Life Item Banks: Plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Med Care. 2007;45:S22–31. Available from:

  31. 31.

    Henseler J, Ringle CM, Sarstedt M. A New Criterion for Assessing Discriminant Validity in Variance-Based Structural Equation Modeling. J Acad Market Sci. 2015;43:115–35. Available from:

  32. 32.

    Mokken RJ. A Theory and Procedure of Scale Analysis: With applications in political research. Reprint 2011 ed. De Gruyter Mouton: Berlin, Germany; 1971.

  33. 33.

    Stochl J, Jones PB, Croudace TJ. Mokken Scale Analysis of Mental Health and Well-Being Questionnaire Item Responses: A Non-Parametric IRT Method in Empirical Research for Applied Health Researchers. BMC Med Res Methodol. 2012;12:74. Available from:

  34. 34.

    van der Ark LA. New Developments in Mokken Scale Analysis in R. J Stat Softw. 2012;48:1–27. Available from:

  35. 35.

    van der Ark LA, Croon MA, Sijtsma K. Mokken Scale Analysis for Dichotomous Items Using Marginal Models. Psychometrika. 2008;73:183–208. Available from:

  36. 36.

    Loevinger J. The Technic of Homogeneous Tests Compared with Some Aspects of Scale Analysis and Factor Analysis. Psychol Bull. 1948;45:507–529. Available from:

  37. 37.

    Straat JH. Using Scalability Coefficients and Conditional Association to Assess Monotone Homogeneity [Thesis/dissertation]. [Ridderkerk, Netherlands]: Proefschrift Tilburg University; Ridderprint; 2012. Available from:

  38. 38.

    Sijtsma K, Molenaar IW. Introduction to Nonparametric Item Response Theory. 1st ed. Thousand Oaks: Sage; 2002.

  39. 39.

    Ligtvoet R, van der Ark LA, te Marvelde JM, Sijtsma K. Investigating an Invariant Item Ordering for Polytomously Scored Items. Educ Psychol Meas. 2010;70:578–595. Available from:

  40. 40.

    van der Ark LA, van der Palm DW, Sijtsma K. A Latent Class Approach to Estimating Test-Score Reliability. Appl Psych Meas. 2011;35:380–92. Available from:

  41. 41.

    Hartig J, Höhler J. Multidimensional IRT Models for the Assessment of Competencies. Stud Educ Eval. 2009;35:57–63.

  42. 42.

    Jaworska N, Chupetlovska-Anastasova A. A Review of Multidimensional Scaling (MDS) and its Utility in Various Psychological Domains. Tutor Quant Methods Psychol. 2009;5:1–10.

  43. 43.

    Polatin PB, Kinney RK, Gatchel RJ, Lillo E, Mayer TG. Psychiatric illness and chronic low-back pain. The mind and the spine--which goes first? Spine. 1993;18:66–71.

  44. 44.

    First MB, Spitzer RL, Miriam G, Williams JB, Benjamin LS. Structured Clinical for DSM-IV Axis II Personality Disorders, (SCID-II). Washington D.C.: American Psychiatric Press, Inc.; 1997.

Download references


All of the participants are acknowledged with appreciation. We would like to thank Ms. Sylvia Yu-Chien Hsiao for assistance with the managerial work of the present study.


This work has been financially supported by the Chang Gung Memorial Hospital Research Program grant (CMRPG 371701) and the National Science Council of Taiwan grant (NSC 102–2314-B-182A007). The sponsor had no role in study design; collection, analysis and interpretation of data; writing of the report; decision to submit the article for publication.

Availability of data and materials

The dataset supporting the conclusions of this article is included within the article (dataset-hung.csv). The R script of the analysis of the dataset (dataset-hung.csv) is included within the article (hung-script.R).

Author information

Conception and design: C-I Hung, C-Y Liu, T-S Fu. Study coordination: C-I Hung, T-S Fu. Acquisition of data: C-I Hung, T-S Fu. Analysis of data: C-P Lee. Interpretation of data: C-P Lee. Drafting and writing of manuscript: C-P Lee. Revision and approval of final version of the manuscript: C-P Lee, T-S Fu, C-Y Liu, C-I Hung. All authors read and approved the final manuscript.

Correspondence to Ching-I Hung.

Ethics declarations

Ethics approval and consent to participate

In accordance with Taiwan law, the study protocol was approved by the Chang Gung Medical Foundation Institutional Review Board (reference number: 97-0894B, approved on 2008/07/01). All patients provided written informed consent.

Consent for publication

Not applicable (the manuscript does not contain any individual persons data).

Competing interests

The authors declare that they have no competing interests. The authors alone are responsible for the content and writing of this article.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lee, C., Fu, T., Liu, C. et al. Psychometric evaluation of the Oswestry Disability Index in patients with chronic low back pain: factor and Mokken analyses. Health Qual Life Outcomes 15, 192 (2017).

Download citation


  • The Oswestry Disability Index (ODI)
  • Psychometrics
  • Validity
  • Reliability
  • Dimensionality
  • Low back pain