A comparison between the low back pain scales for patients with lumbar disc herniation: validity, reliability, and responsiveness

Background Although the Japanese Orthopedic Association Back Pain Evaluation Questionnaire (JOABPEQ), Numerical Pain Rating Scale (NPRS), Oswestry Disability Index (ODI), Roland Morris Disability Questionnaire (RMDQ), and Short Form 36 Health Survey (SF-36) has shown a preferable psychometric properties in patients with low back pain (LBP), but no study has yet determined these in conservative treatment of patients with lumbar disc herniation (LDH). Thus the current study aimed to compare those scales in LDH patients receiving conservative treatment to select the better option to assess the severity of disease. Methods LDH patients were invited to complete the JOABPEQ, NPRS, ODI, RMDQ, and SF-36 twice. The internal consistency was evaluated by the Cronbach’s α. Test-retest reliability was tested by the intraclass correlation coefficient (ICC). The relationships of these scales were evaluated by the Pearson correlation coefficients (r). The responsiveness was operationalised using the receiver operating characteristic (ROC) curve, as well as the comparison of smallest detectable change (SDC), minimum important change (MIC). Results A total of 353 LDH patients were enrolled. Four subscales of the Chinese JOABPEQ were over 0.70, then the ICCs for the test-retest reliability were over 0.75. For functional status, remarked negative correlations could be seen between JOABPEQ Q2-Q4 and ODI, as well as RMDQ (r = − 0.634 to − 0.752). For general health status, remarkable positive correlations could also be seen between Q5 Mental health and SF-36 PCS (r = 0.724) as well as SF-36 MCS (r = 0.736). Besides, the area under of the curves (AUC) of the JOABPEQ ranged from 0.743 to 0.827, indicating acceptale responsiveness, as well as the NPRS, ODI, and RMDQ. Conclusion NPRS, and ODI or RMDQ is recommended in studies related to LDH patients, while if the quality of life also is needed to observe, the NPRS, and JOABPEQ would be more appropriate rather than SF-36.


Introduction
Lumbar disc herniation (LDH) is one of the common causes of low back pain (LBP) [1,2]. Symptomatic herniations present as lumbar radiculopathy including radicular pain, sensory abnormalities from both a mechanical compression and chemical irritation of the nerve root [2]. LDH occurs in approximately 10% of the population and has a serious impact on the work and life quality of patients and is the most common causes working-age individuals to undergo lumbar spine surgery, and also generates a large economic burden [3,4].
Nevertheless, no objective biological markers are available to evaluate LDH severity, it is well known that the patient's opinion of the results by patient-reported outcomes tools are still a very important measurement of treatment quality, several patient-reported outcomes tools were used to assess LBP such as the Numerical Pain Rating Scale (NPRS), and the Visual Analogue Scale (VAS) for pain intensity, the Roland Morris Disability Questionnaire (RMDQ), and the Oswestry Disability Index (ODI) for functional status, and the Short Form 36 Health Survey (SF-36) for general health status [5]. While the Japanese Orthopedic Association Back Pain Evaluation Questionnaire (JOABPEQ), it included five subscales including Q1 Low back pain for pain intensity, Q2 Lumbar function, Q3 Walking ability, Q4 Social life function for functional status, and Q5 Mental health for general health status, which is more comprehensive to assess pain intensity, functional status, and quality of life [6]. It was concluded that there were small correlations between JOABPEQ and NPRS, medium correlations between Q2 Lumbar function, Q3 Walking ability, Q4 Social life function and ODI, RMDQ, Short Form 8 Health Survey physical component summary ; and between Q5 Mental health and SF-8, SF-36, and EuroQol-5D (EQ-5D) in LBP patients or patients after lumbar surgery [7][8][9][10].
Although all of these scales has shown a preferable psychometric properties in patients with LBP, but no study has yet determined these psychometric properties in conservative treatment of patients with LDH [7,10,11]. The RMDQ is comprised of 24 items, the ODI is made up of 10 items, and the SF-36 consists of 36 items, JOABPEQ with 25 items, which will undoubtedly add to the burdens on clinicians during research work. Based on the above, this current study was carried out to compare the validity, reliability, and responsiveness of the JOABPEQ, NPRS, RMDQ, ODI, and SF-36 in LDH patients receiving conservative treatment to select the better option to assess the severity of disease.

Materials and methods
Patients and setting LDH patients were consecutively recruited from the Longhua Hospital affiliated to Shanghai University of Traditional Chinese Medicine, and Shanghai Guanghua Hospital of Integrated Traditional Chinese and Western Medicine. To be eligible to participate in the study, participants were required to be: (1) aged 18-70 years, (2) Native Chinese speaking, (3) radiculopathy related to corresponding lumbar herniated disc with or without LBP for 1 week, radiculopathy including radicular pain, sensory abnormalities with numbness of the lower limb as the main symptom, and weakness in the distribution of one or more lumbosacral nerve roots, focal paresis, restricted trunk flexion, and increases in leg pain with straining, coughing, and sneezing are also indicative, (4) magnetic resonance imaging with single or multiple lumbar disc herniation within half a year, and (5) signed the written informed consent. Exclusion criteria included: (1) LBP with other back pathologies, such as spondylolisthesis, ankylosing spondylitis, spinal fracture, rheumatoid arthritis, secondary to tumor or other disease, (2) pregnant women, (3) patients with mental disorders, cancer and other malignant disease.

Ethical considerations
The full study protocol was approved by the Longhua Hospital Research Ethics Committee (No. 2016LCSY030). All patients participating in the study provided informed consent.

JOABPEQ
The JOABPEQ is developed from the original Japanese Orthopedic Association (JOA) scale for assessing LBP, which is disease specific and allows for judging patient outcome and self-administration. It is made up of 25 LBP-related items classified into five multi-item subscales, namely, Q1 Low back pain, Q2 Lumbar function, Q3 Walking ability, Q4 Social life function, and Q5 Mental health. The score of each factor ranges from 0 to 100 points, and a lower score is associated with worse dysfunction [15]. The five subscale scores should be used independently; adding all or some of the five subscale scores does not make sense, and summing the subscale scores to provide a total score is not necessary. The simplified Chinese version of the JOABPEQ is a reliable and valid instrument to measure functional status in patients with LBP from previous study [7].

NPRS
The NPRS is frequently employed to measure pain intensity, in which patients are asked to select a number (from 0 to 10) to represent their pain severity [16].

RMDQ
The RMDQ is a health status measure, which is designed to be completed by patients to assess their physical disability of LBP. It consists of 24 items addressing daily life and physical activity, such as personal care, sleeping, work and walking [17]. One point is assigned to each of these items, resulting in the total scores of 0 (no disability) to 24 (maximum disability) points [12].

ODI
The ODI is commonly used in clinical trials to measure the functional status of patients with spinal disorders [17]. It is comprised of 10 dimensions, with 6 levels being set in each dimension. Specifically, a score of 0 represents the lowest disability level, while 5 indicates the highest disability level. Moreover, the total score is converted into percentage, with a consequent maximum of 100%. Notably, version 2.1 adopted in the current study has been translated and cross-culturally adapted for Chinese patients [18].

SF-36
The SF-36 is composed of 8 multi-item scales, which can assess the physical function, role limitations due to physical health problems, bodily pain, general health, vitality, social functioning, role limitations due to emotional problems and emotional well-being of patients [14]. Specifically, these eight scales have been aggregated into two summary measures, which are the Physical Component Summary (PCS) score and Mental Component Summary (MCS) score [19].

Follow-up
The patients were asked to return to the hospitals to complete the questionnaire booklet again 7-14 days after the first interview. Subsequently, all LBP scales were assessed again. The global patient evaluation (GPE) was evaluated using a 7-point Likert scale that was also completed in the second interview [20]. Besides, the response options were designed as completely recovered, much improved, slightly improved, unchanged, slightly worsened, much worsened, and worse than ever. Such scale aimed to obtain the patient ratings of improvement/deterioration as well as the importance of changes.

Data analysis
Participants who had completed the questionnaires at baseline and 7 days later were included in the subsequent analyses. Continuous variables were summarized as the mean ± standard deviation unless otherwise noted. Data were tabulated using Microsoft EXCEL. Statistical analyses were carried out using SPSS (Version 21.0, SPSS, Gorinchem, The Netherlands). Meanwhile, the Bland-Altman method was implemented using the Med-Calc statistical (Version 19.1.7, Amazon, UK).

Internal consistency
The internal consistency of each domain was evaluated by the Cronbach's α. In general, a Cronbach's α of > 0.7 was acceptable [21]. All the completed baseline data were included in the analysis.

Test-retest reliability
The questionnaires accomplished 7 days later was tested by the intraclass correlation coefficient (ICC) (two-way random effects model, absolute agreement). Generally, an ICC of > 0.7 is recommended as a minimum standard for reliability [22]. Only patients that were rated "no change" in their global evaluation were included, since we did not propose to prevent the treatment for patients.

Construct validity
The relationships of the JOABPEQ, NPRS, RMDQ, ODI, and SF-36 were evaluated by means of Pearson correlation coefficients (r). According to Cohen's criteria, r = 0.2 can be considered a small correlation, r = 0.5 is a medium correlation, and r = 0.8 is a large correlation [23].

Measurement error
Standard error of measurement (SEM), smallest detectable change (SDC) and Limits of Agreement (LOA) according to Bland-Altman method were used to calculate measurement error. The SEM can indicate the precision of outcome measure, which can be estimated by taking the square root of the within-subject variance of patients categorized as "unchanged" on the GPE. The SDC is calculated in accordance with 1.96*√2*SEM, which can be 95% confident that the observed change is a real change that is not caused by measurement error. The observed change represents the result of 2 measurements at baseline and follow-up, which therefore occurs twice, hence √2 [24]. The LoA was performed using Bland-Altman method, where the difference between baseline and final scores (in Y-axis) were plotted against the mean of each score at baseline and final measurement (in X-axis) [25].

Minimum important change (MIC)
The MIC is defined as the minimal threshold of perceptible symptom improvement that is considered as meaningful by the patients [26]. Subsequently, patients were divided into two groups based on the GPE of 7-point Likert scale, namely, the slightly improved or unchanged groups. Thus, the mean change score between two groups for the smallest meaningful change was taken as the MIC [27].

Responsiveness
The responsiveness has been defined as the ability of a questionnaire to detect the clinically important changes over time, even though these changes are small [28]. The responsiveness of the JOABPEQ was assessed by receiver operating characteristic (ROC) curve. In terms of the ROC curve, patients were dichotomized into four groups based on the GPE of 7point Likert scale as completely recovered, much improved, slightly improved, or unchanged. The sensitivity values and false-positive rates (1-specificity) were plotted on the Y-and the X-axis of the curve, respectively. The area under the curve (AUC) represented the probability that a measure could correctly classify patients as clinically important improved or unchanged. An AUC of 0.7-0.8 was considered as acceptable and that of 0.8-0.9 as excellent [29].

Patient characteristics
A total of 353 LDH patients were enrolled during a 12month period. The mean age of patients was 50.53 ± 13.48 years and over 55% were female. The duration of the disease was 261.28 ± 327.53 weeks, 319 (90.37%) out of those 353 patients reported low back pain, 322 (91.22%) patients reported leg pain, 203 (57.51%) patients reported numbness of lower limb, 64 (18.13%) patients reported weakness of lower limb. Over half of the patients with L4/L5 level herniation (236/66.86%), and L5/S1 level herniation (195/55.24%). The baseline patient characteristics were shown in Table 1.
Finally, a total of 329 patients had completed the questionnaires twice at an interval of 8.87 ± 2.70 days, resulting in the response rate of 93.2%. Among them, 19 patients were rated as "completely recovered", 90 as "much improved", 139 as "slightly improved", 65 as "unchanged", 13 as "slightly worsened", and 3 as "much worsened". Meanwhile, no patient was rated as "worse than ever". The demographic characteristics of patients were presented in Table 1, and the study flow diagram in Fig. 1.   Table 2).

Construct validity
There  Table 3.

Measurement error
Sixty-five patients rated as "unchanged" were enrolled into measurement error analysis. The results suggested that the SDCs of Q1-Q5 Mental Health were ranged from 4.29 to 8.14 from SEM, then the SDCs of NPRS, ODI, RMDQ, SF-36 PCS and SF-36 MCS were 0.10, 3.43, 0.77, 3.41, and 3.93, respectively. The SDCs of the LBP scales were presented in Table 3. All those 65 patients were enrolled into the Bland-Altman plot, the LoAs of each LBP scales were in the Fig. 2. The LoAs of JOABPEQ ranged from 5.70 to 11.05 (Q1 Low back pain 11.05, Q2 Lumbar function 11.15, Q3 Walking ability 10.25, Q4 Social life function 6.65; Q5 Mental health 5.7. the SDCs of NPRS, ODI, RMDQ, SF-36 PCS and SF-36 MCS were 0.56, 5.0, 2.95, 6.9, and 7.1, respectively.

MICs
Sixty-five patients rated as "unchanged" and 139 as "slightly improved" were enrolled in MCID analysis.  Table 4.

Responsiveness
The AUCs for the responsiveness of JOABPEQ scales were presented in Table 5, and Fig. 3. As could be

Study summary of this study
To the best of our knowledge, the current study is the first to test the validity, reliability, and responsiveness of the JOABPEQ, NPRS, RMDQ, ODI, and SF-36 for LDH patients receiving conservative treatment. The selection for the questionnaires in this study was based on the characteristic of LBP. The main complain of LDH are LBP, disability, and impact to the life quality. The NPRS we chose is focused on pain intensity, ODI and RMDQ are disability, and the SF-36 is the most common scale to assess quality of life. The validity, reliability, and good sensitivity of NPRS has been identified in plenty of clinical trials [16,30,31]. To some extent, the NPRS is superior then the other scales, such as the visual analogue scale, and verbal rating scale [31]. Then ODI, and RMDQ has been verified to be a reliable and valid LBP measurement for patients [32,33]. It was shown that most of the scales had acceptable internal consistency, and reliability, except Q1 Low back pain in JOABPEQ. For pain intensity, small correlations between the NPRS, and other scales. For function, medium correlation could be seen between JOABPEQ and ODI, as well as RMDQ, similar, medium correlations were between JOABPEQ and SF-36 for quality of life. As the AUCs of all the scales were over 0.70, hence their responsiveness was all acceptable. In the other words, it means that for pain intensity, the NPRS could not been replaced by Q1 Low back pain, then Q2 Lumbar function, Q3 Walking ability, Q4 Social life function had the same similar performance compared with ODI, and RMDQ, then the Q5 could replace SF-36, it has higher responsiveness then SF-36, with acceptable correlation (SF-36 PCS r = 0.724, SF-36 MCS r = 0.736).
Based on the validity, reliability, and responsiveness of the LBP scales, if the studies designed to focus on pain intensity and function, the NPRS, and ODI or RMDQ are recommended, then the ODI would be more applicable to cross section survey due to a slim advantage on validity, reliability, on the contrary, RMDQ more suitable to intervention trials due to higher responsiveness in LDH patients. While if the quality of life also is needed to observe, the NPRS, and JOABPEQ would be more appropriate, as the SF-36 with 36 item results to heavier workload of researchers, and patients, and harder calculation, meanwhile our results also suggest that the SF-36 displays poorer responsiveness in LDH patients compared with other scales, which is consistent with findings from other previous studies [34,35]. More  measures are also a participant burden more than clinician or researcher burden, using more scales could make the trials more expensive for the researchers, thus this recommended could give some advice on studies related to LDH to save resources and lighten the burden of researchers and participants.

Validity of the LBP scales
There were only small correlations between the NPRS, and JOABPEQ related to pain intensity, consistent with our previous research in LBP patients [7]. The NPRS is a single scale that only assesses pain intensity, then Q1 Low back pain is focused on the impact of pain to daily life, such as sleeping, and rest. Although both of them  [7,9].

Superiorities and limitations
The use of GPE is controversial for most studies of this type, and the validity of a single-item design compared with a multi-items scale is also doubtful [44]. However, such limitations are inevitable. Moreover, the GPE is also associated with another disadvantage, which is that it may be difficult for patients to recall their initial health status and to compare it with their current status to assess any changes. Therefore, this may introduce bias [45]. In the current study, patients are asked to return to hospitals to complete the questionnaire booklet again 7-14 days after the first interview. This seems not to be a long time, which may be easier for patients to recall their initial health status. Secondly, treatments such as acupuncture and manual therapies are allowed for LBP patients, which have favorable short-term effects on them.
Furthermore, a small proportion of participants are retested within a short interval. Therefore, they may have been biased to give the same answer if they have remembered some of the questions asked at the first time, even though there are 96 items of these questionnaires.

Conclusion
Based on the validity, reliability, and responsiveness of the LBP scales, if the studies focus on pain intensity and function, the NPRS, and ODI for cross section survey or RMDQ intervention trials are recommended in LDH patients. While if the quality of life also is needed to observe, the NPRS, and JOABPEQ would be more appropriate rather than SF-36. This recommended could give some advice on studies related to LDH to save resources and lighten the burden of researchers, as well as participants.

Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Ethics approval and consent to participate
Each author certifies that Ethics committee of Longhua Hospital Research Ethics Committee (No. 2016LCSY030) approved the human protocol for this investigation, that all investigations were conducted in conformity with ethical principles of research, and that informed consent for participation in the study was obtained.