Comparing the Chinese versions of two knee-specific questionnaires (IKDC and KOOS): reliability, validity, and responsiveness

Background The International Knee Documentation Committee Subjective Knee Form (IKDC) and the Knee Injury and Osteoarthritis Outcome Score (KOOS) are knee-specific questionnaires that have been widely used and translated into numerous languages. However, the differences in the psychometric properties between the Chinese IKDC and KOOS remain unclear. The purpose of this study was to conduct a cross-cultural adaptation of the Chinese IKDC and Chinese KOOS and to compare the psychometric properties of these two measures in patients with various knee injuries from the acute stage up to 12 weeks after receiving treatment. Methods The original IKDC and KOOS were translated into Chinese based on the guidelines of cross-cultural adaptation and translation protocols. One hundred and seventy-three patients with various knee injuries were recruited in this study and completed both Chinese IKDC and Chinese KOOS as well as a generic health status questionnaire (Chinese Short Form-36 [SF-36]). The reliability, internal consistency, content validity, convergent and divergent validity and responsiveness of both IKDC and KOOS were assessed with appropriate indices. Results The Chinese IKDC showed excellent reliability (ICC = 0.97) and strong internal consistency (Cronbach alpha = 0.87). The Chinese KOOS also presented good reliability with ICCs ranging from 0.89 to 0.95 and internal consistency (Cronbach alpha coefficients ranging from 0.76 to 0.97). The content validity of these two questionnaires were excellent, yielding no floor or ceiling effects. Both the Chinese IKDC and KOOS were highly associated with the physical component summary (PCS) score and weakly related to the mental component summary (MCS) score of the SF-36. Responsiveness to change was large (effect size =0.95) for the Chinese IKDC and moderate (effect sizes = 0.49~0.60) at 12-week after physical therapy. Conclusion Both the Chinese IKDC and KOOS demonstrated good psychometric properties. However, the Chinese IKDC was more sensitive to changes over a period of 2, 4, 8, 12 weeks of physical therapy than the Chinese KOOS. The ROC analyses revealed a value of area under the curve (0.83 for the Chinese IKDC and 0.67–0.79 for the subscales of Chinese KOOS). Minimal clinically important difference values were 9.8 for the Chinese IKDC and 0.79, 0.76, 0.76, 0.76, 0.67 for the Symptoms, Pain, Activities of Daily Living, Sport/Recreation, and Quality of Life subscales of Chinese KOOS, respectively. The current study provides information for clinicians and researchers to use these appraisal tools for Chinese-speaking patients with various knee disorders.


Background
Clinical outcome research is required to evaluate the benefits and cost effectiveness of new diagnostic, surgical, and rehabilitative approaches for treating knee problems [1]. Both performance-based and self-reported measures are often used to evaluate clinical outcomes of orthopedic patients. To be used among various language groups and in diverse countries, patient-oriented measures (represented by self-administered questionnaires) must be translated, adapted to distinct cultural characteristics, and validated using common processes to evaluate their psychometric properties [2].
In 2007, the International Knee Documentation Committee Subjective Knee Form (IKDC) and Knee Injury and Osteoarthritis Outcome Score (KOOS) were identified as the eminent instruments for assessing general knee quality of life, involving numerous questions that assess the symptoms and disabilities relevant to patients with knee disorders [3]. Quality of life measures capture the patient perspective regarding the disease and treatment, perceived need for health care, and preferences concerning treatment and outcomes [4].
The IKDC and KOOS have been translated to and validated in several languages and have been widely used to evaluate various knee injuries [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22][23]. To employ these questionnaires among multiple language groups and in diverse cultural settings, they must be translated, adapted based on cultural characteristics, and validated against the original versions. The cross-cultural adaptation guidelines described by Guillemin et al. [2] are widely used to translate and adapt questionnaires. The criteria recommended for selecting instruments include the characteristics of patients for whom the instrument was developed, the instrument content, and its psychometric properties [24].
In Chinese-speaking countries, few translated kneespecific questionnaires have been validated. Questionnaires designed as subjective scoring systems can be used in various countries if they are translated and validated to target a specific language and population [2,25]. In addition, using culturally equivalent, standardized questionnaires simplifies the problems involved in metaanalyses during clinical research, enabling the comparison of studies and minimizing the reporting bias in various countries [26,27].
A review of the available outcome measurements for knee ligament injuries indicated that the IKDC is the preferred measurement tool [28]. The IKDC provided the optimal overall measure of the critical symptoms and disabilities of a population of postoperative articular cartilage repair patients [29]. The IKDC was found more useful than the KOOS for evaluating patients with anterior cruciate ligament (ACL) ruptures [30]; however, at present the differences in the psychometric properties between the Chinese IKDC and KOOS remain unclear. Responsiveness of an outcome measure referring to the ability to detect changes of a construct of interest over time, is considered essential. Several researchers examined the responsiveness of both the IKDC and KOOS; however, most of them provided responsiveness information for detecting changes over a period of more than 3 months. The clinical utility of these instruments on reflecting improvement of patient's status in a shorter period of time (<3 months) is unknown. Therefore, the purposes of this study were to (1) translate the English versions of the IKDC and KOOS into Chinese versions based on cross-cultural adaptation guidelines, (2) to evaluate and compare the reliability and validity of the Chinese IKDC and Chinese KOOS in patients with a variety of knee conditions, and (3) to determine and compare the responsiveness of the Chinese IKDC and KOOS in detecting clinical changes over 4 time points (2-, 4-, 8-, and 12-week) following treatment.

Cross-cultural adaptation process
The IKDC and KOOS were translated and adapted from English to traditional Chinese versions by using a forward-backward translation protocol based on the guidelines of Guillemin et al. and the recommendations for the cross-cultural adaptation of health status measures of the American Academy of Orthopaedic Surgeons (AAOS) [2].
Two independent bilinguals (native Chinese speakers) translated the original English versions of IKDC and KOOS into Chinese. The informed translator was a physical therapist who possessed 20 years of clinical experience treating adults with orthopedic disorders, and the uninformed translator was a computer engineer. A consensus meeting was held to resolve the discrepancies between the two translations of both questionnaires. The synthesized versions for both the IKDC and KOOS were produced after making several word expression amendments. Then, the synthesized Chinese versions of two questionnaires were back-translated into English by two independent bilinguals (native English speakers) who were blind to the original versions of both questionnaires. None of the back translators was aware or informed of the concepts used in the questionnaires.
An expert committee comprising 2 forward translators, 2 backward translators, a methodologist, a clinician (rehabilitation physician) and a language specialist reviewed all the translation versions of both questionnaires to assure the semantic, idiomatic, experiential, and conceptual equivalence between languages. After consolidating all the versions of the questionnaires, the committee developed the pre-final versions of Chinese IKDC and Chinese KOOS for pre-testing. Ten patients with various knee pathologies participated in the pretesting step of the validation process and completed the pre-final versions of two questionnaires. Comprehensibility and cultural relevance of the questionnaire items were discussed with the subjects through a face-to-face interview.

Participants
The Research Ethics Committee at National Taiwan University Hospital approved this clinical trial. A convenience sample of 173 patients with various knee injuries undergoing physical therapy at the Department of Physical Medicine and Rehabilitation, National Taiwan University Hospital were recruited for participating in this study ( Table 1). All patients provided their informed consents before entering in the study. Patients were excluded if they exhibited other joint problems affecting the lower extremities or back, systematic inflammatory rheumatic disease, neurological or vascular conditions, or psychiatric disorders. During the study period, all patients received physical therapy including physical agents, joint mobilization, stretching and strengthening exercises, etc. for their knee symptoms.

Instruments
Two disease-specific questionnaires, the Chinese IKDC and Chinese KOOS, the Chinese SF-36 [31] and a 15-point global rating of change (GROC) scale [32] were used in this study. The IKDC was originally designed to measure the symptoms and functional limitations in sports activities caused by various knee impairments [33]. High IKDC scores indicate a low level of symptoms and a high level of function, whereas low scores indicate a high level of symptoms and a low level of function. Thus, a score of 100 indicates no symptoms and no limitations regarding daily or sports activities [33].
The original KOOS questionnaire is an extension of the Western Ontario and McMaster Universities Arthritis Index, and is a well-designed, and simple selfadministered instrument that was developed to assess the short-and long-term symptoms and function of patients with knee injuries and osteoarthritis [34]. It is a 42-item disease-specific questionnaire comprising 5 subscales: symptoms, pain, activities of daily living (ADL), sports and recreation (Sport/Rec) and knee-related quality of life (QOL). The raw scores are separately calculated for each subscale and transformed to a 0-100 scale on which 0 indicates severe problems and 100 indicates no problems [20]. The KOOS questionnaire has demonstrated reliability, validity, and responsiveness among distinct populations exhibiting varying pathologies, injury durations, ages, and activity levels [5, 6, 8, 9, 19-21, 23, 34].
The SF-36 comprises 8 subscales: physical functioning (PF), role-physical (RP), bodily pain (BP), general health (GH), vitality (VT), social functioning (SF), roleemotional (RE), and mental health (MH) [35]. The PF, RP, and BP scales are most highly correlated with the physical component summary (PCS), contributing the most to the PCS score. The MH, RE, and SF scales are most highly correlated with the mental component summary (MCS), contributing the most to the MCS score. The VT, GH, and SF scales are notably correlated with both the PCS and MCS [35]. These 8 subscales are scored from 0 to 100, where high scores indicate a superior health status [36,37]. The Chinese version of the SF-36 has been validated for use in Taiwan [31,38].
A 15-point (−7~+ 7) GROC was used to monitor changes occurred between two time points [32]. The scale ranges from −7 (a very great deal worse) through 0 (no change) to +7 (a very great deal better) with the score of +4 or more representing moderately better (+4), a good deal better (+5), a great deal better (+6), or a very great deal better (+7). For the test-retest reliability analysis, patients scoring between 2 (a little bit better) and −2 (a little bit worse) were considered to have stable clinical states and included for analysis. GROC has also been used to measure the subject's impression of the change following an intervention. A cutpoint can be chosen to dichotomize patients as achieving significant improvement or not.

Procedure
The Chinese IKDC, Chinese KOOS, and Chinese SF-36 were administered to the study participants during their first visits to the outpatient department. Before the first treatment, 173 patients completed the Chinese IKDC, Chinese KOOS, and the Chinese SF-36 questionnaires. To assess the test-retest reliability, 40 patients (mean age, 43 y) filled out the Chinese IKDC and Chinese KOOS again after a 5-to-7-day interval. The GROC was also rated by this patient cohort. This interval was long enough for the patients to have forgotten previous responses but not so long that their condition would have changed. To minimize the clinical changes, no treatment was provided to these patients over the test-retest interval.
For the purpose of analyzing the responsiveness of the Chinese IKDC and KOOS, follow-up reassessments were performed at 2, 4, 8, and 12 weeks following treatment. In addition, the GROC was administered again at the 12-week follow-up as the external criterion for indicating significant improvement with treatment.

Data management and analysis
The statistical analysis was conducted using SPSS version 20.0 (SPSS Inc., Chicago, IL). The Kolmogorov-Smirnov test was used for normality check of all scores. The level of significance for all statistical procedures was P < 0.05.
The interpretability was evaluated by assessing the occurrence and distribution of floor and ceiling effects regarding baseline scores. A floor or ceiling effect of <15% was considered to be acceptable, which means that less than 15% of the respondents achieve the minimum or maximum possible scores [39]. Skewness statistics are usually evaluated informally; values < −1 or > +1 signal substantially non-normal distributions potentially in need of additional evaluation [40].
To determine the measurement precision, the standard error of measurement (SEM) was calculated by multiplying the square root of 1 minus the ICC by the standard deviation (SD) of the baseline score of the instrument. The minimum detectable change (MDC) based on the 95% confidence interval of SEM was then computed with multiplying the SEM by 1.96 and the square root of 2 [43].
The internal consistency of the first administration of each questionnaire was calculated using the Cronbach alpha to estimate the average correlations among items within a subscale [44]. An alpha value of 0.70 or greater indicated satisfactory internal consistency [45]; however, a value greater than 0.95 could indicate redundancy of one or more items [39].

Validity analysis
Construct validity refers to the degree to which the questionnaire measures the characteristic to be measured. We tested the construct validity of the Chinese IKDC and KOOS by calculating the Spearman's correlation coefficients of the two instruments with the Chinese SF-36 scores. Convergent and divergent validity were both assessed [46]. It was hypothesized a priori that the correlations between the IKDC and KOOS subscales with the SF-36 subscales of physical health (PF, RP, BP, and PCS) should be strong (convergent validity) while the correlations of the IKDC and KOOS subscales with the SF-36 subscales of mental health (GH, VT, SF, RE, MH, and MCS) should be weak (divergent validity). Spearman's correlation coefficients of >0.50, 0.35~0.50, and <0.35 were considered strong, moderate, and weak, respectively [23].

Responsiveness analysis
Responsiveness of two instruments was assessed by the effect size (ES) as well as the receiver operating characteristic (ROC) curve method. ES is calculated by the difference between the mean baseline and follow-up scores of a measure, divided by the standard deviation (SD) of its baseline score [47]. Four ESs were computed for the Chinese IKDC and 5 subscales of Chinese KOOS at 4 time points of follow-up. An ES value between 0.20 and 0.50 represents a change of approximately one-fifth of the baseline SD and is considered small; between 0.51 and 0.80 reflects a change of at least half the baseline SD and is considered moderate; an ES value of 0.80 or greater represents a change of at least four-fifths the baseline SD and is considered large [47]. Larger ES indicates a greater ability to detect clinical changes. Pair-t tests were used to compare the baseline and follow-up scores of Chinese IKDC and KOOS following 2, 4, 8, and 12 weeks of physical therapy treatment.
The ROC curve analysis was used to establish the minimal clinically important difference (MCID) scores for Chinese IKDC and the subscales of Chinese KOOS. The score of GROC ≥ 4 (moderately better) was chosen as the cutoff point for discriminating between patients who perceived themselves to achieve significant improvement from those who did not. The optimal cut off point was computed using the Youden index and taken as the MCID, which indicated the change score associated with the least misclassification [48]. For each value of change of the Chinese IKDC and the subscales of Chinese KOOS, the sensitivity and specificity were calculated and used to plot the ROC curves: the sensitivity values and false-positive rates (1-specificity) were plotted on the y and the x axis of the curve, and the area under the curve (AUC) showed the probability that a measure correctly classifies patients as either meaningfully improved or not. An AUC of more than 0.70 is considered to be acceptable.

Cross-cultural adaptation process
During the translation and adaptation stages, the prefinal versions of both the Chinese IKDC and KOOS were well accepted by subjects in the pre-testing. All subjects completed both questionnaires without missing items and demonstrated a clear understanding of the scale items. No major conceptual or cultural differences were found between the Chinese and English-speaking populations. Therefore, the pre-final versions of Chinese IKDC and KOOS were not modified further and were considered the final versions. To complete the final step of cross-cultural adaptation, the Chinese IKDC was submitted to the developer. It is now available for download at the website of the American Orthopaedic Society for Sports Medicine (AOSSM): https://www.sportsmed.org/ AOSSMIMIS/members/downloads/research/IKDCChine seTraditional.pdf [49].

Interpretability
The mean, standard deviation (SD), median, mode, minimum, maximum, and skewness for the Chinese IKDC and 5 KOOS subscales have been shown in Table 2. The Chinese IKDC scores indicated a normal distribution and negligible numbers of patients who demonstrated floor or ceiling effects. The Chinese KOOS scores were also distributed normally. In addition, the percentage of subjects who received the minimum possible scores in the subscales Sport/Rec and QOL were 8.7 and 1.7%, respectively. Maximum possible scores in the subscales symptoms, pain, ADL, Sport/Rec and QOL were 0.6, 0.6, 4.6, 1.7 and 0.6%, respectively. We consider that no ceiling or floor effect occurred in the Chinese KOOS.
The test-retest reliability was excellent for the Chinese IKDC with an ICC of 0.97 (P < 0.001). Good test-retest reliabilities were also found in the Chinese KOOS questionnaire with the ICCs of 0.89 or higher ( Table 3).
The SEM and MDC values of the Chinese IKDC were 3.2 and 8.9, which was smaller than the SEM and MDC of the 5 subscales Chinese KOOS (SEM range: 5.1~8.8; MDC range: 14.2~24.3) ( Table 3).
The Chinese IKDC demonstrated a high internal consistency, yielding a Cronbach alpha value of 0.87. Moderate to high internal consistency was also found in the Chinese KOOS subscales with the values ranging from 0.76 to 0.97 (Table 3). Table 4 shows the correlations among the Chinese IKDC, 5 subscales of Chinese KOOS, and the Chinese SF-36 scores.  Table 5 shows the mean baseline scores, mean scores after treatment, and ES for the Chinese IKDC and KOOS at the 2- Results of the within-group comparisons showed that at 8-and 12-week follow-up, the differences of both the Chinese IKDC and all of the Chinese KOOS subscales were statistically significant. At the 2-and 4-week time points, even with small effect sizes (0.37 and 0.46), the differences of the Chinese IKDC scores from the baseline were still significant, while the differences of the Chinese KOOS subscales were mostly nonsignificant.

Responsiveness analysis
The area under the ROC curve, minimum clinically important difference, and the sensitivity and specificity for the minimum clinically important differences are displayed in Table 6. At 12 weeks after intervention, the area under the ROC curve was significantly different from 0 for the Chinese IKDC and all of the Chinese KOOS subscales. The AUC of Chinese IKDC (0.83) was larger than all of the KOOS subscales (0.67~0.79). The AUCs of Chinese KOOS subscales were good except for the QOL subscale (0.67) which was less than optimal.

Discussion
In this study, the original versions of the IKDC and KOOS were translated and validated to facilitate assessing Chinese-speaking patients with a variety of knee injuries. To our knowledge, our study is the first one to concurrently examine and compare the psychometric properties of the Chinese IKDC and KOOS. When assessing various knee injuries, both the Chinese IKDC and Chinese KOOS demonstrated excellent reliability and good validity. Consistent with the findings of other studies, Chinese IKDC and Chinese KOOS are reliable and valid instruments. Cross-culture adapted instruments are valuable, especially when international comparisons need to be made. The SEM and MDC analysis indicated that the Chinese IKDC values (SEM 3.2; MDC 8.9) were smaller compared with the American data from Greco et al. [43]  The internal consistency of the Chinese IKDC was high (Cronbach alpha, 0.87), demonstrating similarity to that of the original version of the IKDC (Cronbach alpha, 0.92) [33]. Satisfactory Cronbach alpha values were also found in the Chinese KOOS symptoms subscale (0.76), QOL subscale (0.77), pain subscale (0.88), and Sport/Rec subscale (0.91). Our data is similar to the Swedish version of KOOS [53], in which the Cronbach alpha values were 0.74 for the symptoms subscale and 0.71 for the QOL subscales. Results of this study also showed that the Chinese KOOS ADL subscale had the highest Cronbach alpha (0.97), similar to the value (0.95) of the Swedish KOOS ADL subscale [53]. However, since these Cronbach alpha values were greater than 0.95, redundancy of items may exist in Chinese KOOS ADL subscale.
Following the same validation process of the original IKDC, we used the Chinese SF-36 to evaluate the construct validity of both the Chinese IKDC and KOOS questionnaires. Strong correlations (Spearman's rhos = 0.54~0.79) between the Chinese IKDC and the physical health dimensions of the Chinese SF-36 confirms the convergent validity of the Chinese IKDC. Weak correlations (Spearman's rhos = 0.21~0.34) between the Chinese IKDC and the mental function dimensions of the SF-36 supports the divergent validity of the Chinese IKDC. The strong correlations between the Chinese IKDC and the PF and BP domains of the Chinese SF-36 demonstrated values comparable to the original IKDC and other translated questionnaires [7,10,12,13,17,33].
Our results showed that the correlations between the role physical subscale of the Chinese SF- 36   The responsiveness analysis showed that both the Chinese IKDC and KOOS were able to detect change over time. Increased effect sizes over time observed in the Chinese IKDC and KOOS subscales were expected because all of the participants were under physical therapy treatment.
As far as we know, there is no other study that has examined the responsiveness of the IKDC and the KOOS at 2, 4, and 8-week after conservative treatments. These time intervals for responsiveness were chosen because they corresponded to common treatment durations of physical therapy for knee disorders. Improvement from the commencement of treatment was expected along a time line. It should be detected by a reliable, valid, and responsive outcome measure. Therefore, the responsiveness information regarding shorter (less than 12 weeks) time intervals still have a certain degree of reference for clinical application and research.
In this study, we successfully constructed the data on the AUC and MCID of the Chinese IKDC and KOOS. The MCID score (9.8) of the Chinese IKDC during the 12 weeks follow-up was smaller compared with those of Irrgang et al. [54] (MCID = 11.5 and 20.5; 207 patients with various knee injuries at the 19-month follow up), but larger than that of Greco et al. [ [55]. Since the MCIDs may vary with patient groups, clinical characteristics, and analytical approaches, interpretations of the study findings need to be cautious.
Comparisons between the Chinese IKDC and KOOS in their psychometric properties have been made in several studies [30,50]. A cross-sectional cohort study involving patients who were on the waiting list for meniscal surgery, and patients between 6 weeks and 6 months after meniscal surgery showed favorable results for reliability and validity of the Dutch IKDC compared with the Dutch KOOS. Despite a tendency toward the KOOS as the outcome measure for meniscal injuries, the author suggests that the IKDC Subjective Knee Form is the best applicable instrument for patients with meniscal injuries [50]. van Meer et al. conducted another study to compare the Dutch IKDC and the Dutch KOOS in a group of patients with recent anterior cruciate ligament ruptures. Their results showed that all KOOS subscales and the IKDC had good reliability. However, the KOOS did not perform optimally on the following measurement properties: relevance of the questions, construct validity, responsiveness, and ceiling effects, while the IKDC satisfied the criteria for all properties in this specific group of patients. They concluded that the Dutch IKDC is more useful than the Dutch KOOS questionnaire to evaluate patients with ACL injuries [30].
This study has some limitations. First, considering the heterogeneity in diagnosis of the study participants, comparison of the psychometric properties among various knee-injury groups for the Chinese IKDC and KOOS could not be made. However, these two instruments are considered the site-specific patient-reported outcome measures and are intended to measure the same construct for patients with a variety of knee problems. When testing the same group of patients with these two questionnaires at the same time, we could still compare the measurement properties between them. Further studies evaluating different subgroups of knee injury will be needed before generalizations are applicable. Second, the responsiveness follow-up period of 12 weeks was relatively short; this measure should not be used to infer the 6-month (medium term) and 12-month (long term) outcomes. We recommend conducting further studies to compare the Chinese IKDC and KOOS by using various patient groups and extended follow-up times.

Conclusion
The Chinese IKDC and KOOS were both culturally adapted and validated in a group of Chinese-speaking patients with various knee injuries. Both the Chinese IKDC and KOOS demonstrated high levels of reliability and validity. However, the Chinese IKDC showed better performance on the psychometric properties including ICC, SEM, MDC, and Cronbach alpha than the Chinese KOOS. Chinese IKDC was also more sensitive to changes over a period of 2, 4, 8, 12 weeks of treatment than the Chinese KOOS. Both Chinese IKDC and most of the Chinese KOOS subscales demonstrated good discriminative capacities in detecting clinically meaning changes occurred at 12 weeks after treatment. The MCIDs of the two instruments were also revealed in this study. The current study provides information for clinicians and researchers to use these appraisal tools for Chinese-speaking patients with various knee disorders.