The Dutch version of the knee injury and osteoarthritis outcome score: A validation study

Background The Knee Injury and Osteoarthritis Outcome Score (KOOS) was constructed in Sweden. This questionnaire has proved to be valid for several orthopedic interventions of the knee. It has been formally translated and validated in several languages, but not yet in Dutch. The purpose of the present study was to evaluate the clinimetric properties of the Dutch version of the KOOS questionnaire in knee patients with various stages of osteoarthritis (OA). Methods The Swedish version of the KOOS questionnaire was first translated into Dutch according to a standardized procedure and second tested for clinimetric quality. The study population consisted of patients with different stages of OA (mild, moderate and severe) and of patients after primary TKA, and after a revision of the TKA. All patients filled in the Dutch KOOS questionnaire, as well as the SF-36 and a Visual Analogue Scale for pain. The following analyses were performed to evaluate the clinimetric quality of the KOOS: Cronbach's alpha (internal consistency), principal component analyses (factor analysis), intraclass correlation coefficients (reliability), spearman's correlation coefficient (construct validity), and floor and ceiling effects. Results For all patients groups Cronbach's alpha was for all subscales above 0.70. The ICCs, assessed for the patient groups with mild and moderate OA and after revision of the TKA patients, were above 0.70 for all subscales. Of the predefined hypotheses 60% or more could be confirmed for the patients with mild and moderate OA and for the TKA patients. For the other patient groups less than 45% could be confirmed. Ceiling effects were present in the mild OA group for the subscales Pain, Symptoms and ADL and for the subscale Sport/Recreation in the severe OA group. Floor effects were found for the subscales Sport/Recreation and Qol in the severe OA and revision TKA groups. Conclusion Based on these different clinimetric properties within the present study we conclude that the KOOS questionnaire seems to be suitable for patients with mild and moderate OA and for patients with a primary TKA. The Dutch version of the KOOS had a lower construct validity for patients with severe OA on a waiting list for TKA and patients after revision of a TKA. Further validation studies on the Dutch version of the KOOS should also include a knee specific questionnaire for assessing the construct validity.


Introduction
There is consensus that patient-reported outcomes have additional value to clinical variables to evaluate patients' health. The underlying principle is that functional status and quality of life can better be described by the patients themselves than by a physician [1]. With regards to knee surgery, however, at the start of the present study almost no reliable and validated Dutch versions of disease-specific questionnaires were available to evaluate the functional status of patients and quality of life after surgery. The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) is recommended for the assessment of treatment effects in patients with osteoarthritis (OA) and was developed for elderly with OA and assesses pain, stiffness and function in daily living [2][3][4].
Traumatic knee injuries often cause damage to structures such as ligaments, menisci and cartilage, and may lead to early development of OA. To be able to follow patients after a trauma and to monitor the changes in functional status and quality of life over time, a questionnaire is needed which covers both the short and long-term consequences of an injury of the knee [5]. In other words, there is a clear need for an instrument that not only monitors the outcome in elderly knee OA patients, but also monitors the consequences of acute knee injury in physically active patients in their early adulthood. Therefore, Roos et al. developed such a questionnaire in Sweden [6,7]. The Knee Injury and Osteoarthritis Outcome Score (KOOS) evaluates the functional status and quality of life of patients with any type of knee injury who are at increased risk to develop OA; i.e. patients with anterior cruciate ligament (ACL) injury, meniscus injury or chondral injury. Until now, the KOOS questionnaire has been validated for several orthopedic interventions such as ACL reconstruction [7], total knee arthroplasty (TKA) [8], and menisectomy [9]. It has been formally translated and validated in several languages, but not yet in Dutch.
The purpose of this study was therefore to translate the KOOS questionnaire into Dutch and to evaluate the clinimetric properties of the Dutch version of the KOOS questionnaire, in terms of internal consistency, reliability, validity, and floor and ceiling effects.
We studied the Dutch version of the KOOS in patients with different stages of OA: mild, moderate and severe OA and in patients after a primary TKA and after revision of the TKA.

Methods
The study was divided into two stages. First, the Swedish version of the KOOS questionnaire was translated into Dutch according to a standardized procedure [10]. Sec-ond, the translated version was tested for clinimetric quality in a prospective study.

Procedure of translation
The procedure of translation included three steps [11]. First two persons (T1 and T2) translated independently of each other the Swedish version of the KOOS questionnaire into Dutch (forward translation); one translator had a technical background and the other had a medical background; both were native Dutch speakers. Based on a consensus meeting one final version (T-12) was formed [10].
Second, two bilingual persons (T3 and T4), one with a background in education and the other with a chemical background, both native Swedish speakers, independently re-translated this Dutch version (T-12) into Swedish (backward translation). They were blind to the original Swedish version.
Finally, all translators had a consensus meeting to consolidate the final version of the Dutch version of the KOOS questionnaire, which was used in the present study. This final version was presented to a subset of 15 patients suffering from knee complaints. These patients were asked whether they understood all items and whether they had any problems with the formulation of the items on the Dutch version of the KOOS questionnaire. None of the patients reported problems with the items of the KOOS questionnaire.

Patients
We used five patient groups with different stages of OA of the knee of the knee, based on clinical and radiographic signs, to evaluate the clinimetric properties of the Dutch version of the KOOS questionnaire. All patients were under medical treatment at the department of Orthopedics at the Erasmus Medical Center in Rotterdam between 1990 and 2005.
The first patient group consisted of patients with mild OA, who had undergone ACL reconstruction between 1994 and 1996. The second patient group consisted of patients with moderate OA who had undergone HTO between 1998 and 2000. All patients in this group had a valgus correction within a range of 5 to 14 degrees. The third patient group consisted of patients with severe OA who were on the waiting list for a TKA. The fourth patient group consisted of patients 6 months after a TKA, who were operated between 2004 and 2006. The fifth patient group consisted of patients who had undergone a revision of the primary TKA because of a failure of the primary TKA between 2001 and 2006. Patients unable to understand Dutch written language were excluded. The Medical Ethics Committee at the Erasmus Medical Center approved all studies. The choice of our study population, except for the TKA population, was based on existing retrospective cohort studies.
All participants were asked to complete three questionnaires at home: the Dutch KOOS, the SF-36 [12], and a Visual Analogue Scale for pain [13] between June 2004 and July 2006. They were asked to fill in the Dutch KOOS at home again after two till three weeks. For test-retest studies the time interval needs to be sufficiently short to support the assumption that the patients remain stable and to be sufficiently long to prevent recall [14]. We considered a time interval of three weeks to be appropriate for these patient populations. The local Medical Ethics Committee approved the study and all participants gave their written informed consent.

Questionnaires KOOS
The KOOS questionnaire covers five dimensions that are reported separately: Pain (nine items), Symptoms (seven items), activities of daily living (ADL, 17 items), sport and function (Sport/recreation, five items), and knee-related quality of life (QoL, four items). Standardized answer options are provided and each question is rated on a scale from 0 to 4. A normalized score (100 indicating no symptoms and 0 indicating extreme symptoms) is then calculated for each subscale. The format is user-friendly and the questionnaire takes about 10 minutes to complete. The KOOS questionnaire is self-explanatory and can be administrated in the waiting room or used as a mailed survey [7]. The KOOS questionnaire includes the WOMAC Osteoarthritis Index LK 3.0 [2,3] in its complete and original format (with permission), and WOMAC scores can be calculated. The WOMAC is worldwide used in elderly subjects with knee or hip OA [2]. The Dutch version of the WOMAC is validated for hip OA patients [15].

Short Form-36 (SF-36)
The SF-36 is a generic health status questionnaire that contains 36 items. It measures eight dimensions (bodily pain; physical function; social function; role limitations because of physical problems; role limitations because of emotional problems; mental health; vitality; general health perceptions) and is widely used, has shown to be reliable and valid in the Dutch general population, and is easy to complete [1,16].

Visual analogue scale for pain
The Visual Analogue Scale (VAS) for pain is a simple way of measuring the intensity of pain. The 100-mm VAS is a unidimensional scale that is versatile, easy to use, and has been adopted in many settings. It has shown to be valid and reliable [13].

Statistical analysis Internal consistency
A high degree of homogeneity is desirable in a scale. This has two implications: 1) the items should be at least moderately correlated with each other, and 2) each item should correlate above 0.20 with the total scale score [14]. These two factors form the basis of the various tests of homogeneity or internal consistency of the scale. The internal consistency was determined by calculating Cronbach's alpha. The widely accepted cut-off is that Cronbach's alpha should be 0.70 or higher for a set of items to be considered a (sub) scale [14,17].

Factor analysis
Factor analysis is a technique designed to reveal whether or not the pattern of responses on a number of tests can be explained by a smaller number of underlying traits or factors, with each factor reflecting a different construct [14]. Streiner et al. noted that an absolute minimum of five subjects per variable is necessary, with the proviso that there are at least 100 subjects. Exploratory factor analyses were conducted on all KOOS items using principal component analyses (PCA) with varimax rotation on the combined study population, because all subgroups had a number lower than 100. We first extracted factors with eigenvalues greater than 1. Next, we carried out a forced five, four, three, two and one factor solution.
First, we identified the number of meaningful factors based on the Scree plot and on the interpretation of the factor solutions. Using the Scree plot, we looked for a break between the factors with relatively large eigenvalues and those with smaller eigenvalues. Factors that appeared before the break were assumed to be meaningful, and factors that appeared on the approximately horizontal line after the break were considered to account for only a trivial amount of variance and were therefore not considered meaningful. Second, we looked at the factor structure and factor loadings after varimax rotation. Items with a factor loading less than 0.50 on all factors could be considered for exclusion. In other words factor analysis was performed in order to determine whether the KOOS questionnaire actually consists of 5 subscales.

Reliability
Reliability involves the degree to which the results of measurement are consistent across repeated measurements [14]. To estimate the test-retest reliability of the Dutch KOOS subscales, we calculated intraclass correlation coefficients (ICCs) with a 95% confidence interval (95% CI). Due to practical problems we only assed the test-retest reliability at the mild and moderate OA group and the revision TKA group. We used the ICC two-way random effects model type agreement to measure the reliability [18]. The ICC is generally considered to be good at 0.70 and above [14]. The standard error of measurement (SEM) is a measure of the absolute measurement error of a score, expressed in the unit of measurement of the instrument [19]. The SEM was calculated as the square root of the sum of the between administration variance and the residual variance [20].

Validity
Validity is the degree to which an instrument measures the construct it is intended to measure. Because of the absence of a gold standard the validity was expressed in terms of construct validity, which concerns the extent to which a particular measure relates to other measures consistent with theoretically derived hypotheses for the constructs that are being measured [21]. The construct validity of the KOOS questionnaire was determined by comparing its results with the generic SF-36 and the VAS for pain.
Hypotheses were formulated about the expected magnitude and direction of relationships between the subscales of the KOOS questionnaire and the other instruments. The formulation of the hypotheses was based on the starting point that there is a clear distinction between the subscales of the KOOS questionnaire. We defined the construct validity of the KOOS questionnaire as good if ≥ 75% of the hypotheses could be confirmed [22], moderate in case of 50-75% confirmation, and low when under 50% of confirmation. To evaluate the construct validity of the Dutch KOOS version, Spearman's correlations were calculated.
We formulated four hypotheses about convergent relations between the KOOS questionnaire, SF-36 and VAS for pain. The correlation between KOOS Pain and SF-36 BP, between KOOS Pain and SF-36 PF, KOOS (all subscales) and VAS for Pain and KOOS ADL and SF-36 PF should be ≥ 0.60. We expected that KOOS Pain has a stronger correlation with SF-36 BP compared to the correlation with SF-36 PF. This difference should be at least 0.05 higher. We further expected that KOOS Pain has a stronger correlation with VAS for pain compared to the correlation of the other subscales of the KOOS with the VAS for Pain. This difference should be at least 0.05 higher. KOOS ADL was expected to have a 0.05 higher correlation with SF-36 PF compared to the correlation of the other subscales of the SF-36.
We formulated five hypotheses about divergent relations between all subscales of the KOOS questionnaire and SF-36 GH: with correlations of ≤ 0.30. All other correlations between the KOOS subscales and the SF-36 should be higher than 0.30 and lower than 0.60.

Floor and ceiling effects
The presence of floor and ceiling effects may influence the reliability, validity and responsiveness of an instrument. An intervention effect might be missed for people who occupy the maximum score. Floor and ceiling effects were considered present if more than 15% of the respondents achieved the highest or lowest possible score [22].
Data were analysed with SPSS statistical software version 10.1. The level of significance for all statistical procedures was p ≤ 0.05. Table 1 presents the characteristics of five patient groups. The first patient group consisted of 36 patients with mild OA (response rate of 79%). All patients filled in the questionnaires for the cross-sectional validity. For the testretest reliability 35 patients filled in the KOOS questionnaire twice. The second patient group consisted of 62 patients with moderate OA (response rate of 76%) who filled in the questionnaires for the cross-sectional validity. Of these patients 53 filled in the KOOS questionnaire twice for the test-retest reliability. The third patient group consisted of 47 patients with severe OA (response rate of 54%). The fourth group consisted of 63 TKA patients (response rate of 77%) and the fifth group of 54 patients with a revision of the TKA (response rate of 75%). These patients filled in all questionnaires for the cross-sectional validity and 47 patients filled in the KOOS questionnaire twice for the test-retest reliability. Table 2 presents the internal consistency expressed by Cronbach's alpha. For all patients groups Cronbach's alpha was for all subscales above 0.71, indicating a good internal consistency of all items in these scales and subscales. Except for the subscale Symptoms in the severe OA group a Cronbach's alpha of 0.56 was found, which indicates a moderate internal consistency.

Factor analysis
The Scree plot showed a distinct break before factor 3, suggesting that only the first two factors were meaningful enough to be retained. This indicates that two factors may be adequate to describe the data. This initial solution accounted for 64% of the total variance for the Dutch version of the KOOS questionnaire (eigenvalue of 21.5 for the first factor and 3.7 for the second factor). However, in the two-factor solution, many items loaded on both factors. Therefore, we chose a forced one-factor solution, which accounted for 51.0% of the variance. The loading factors ranged from 0.37 -0.85. The loading factor of the question S4 was lower than 0.40. Table 3 presents the ICCs of all subscales of the KOOS questionnaire for patient groups with mild and moderate OA and after revision of the TKA patients. In these patient groups the ICCs were 0.70 or higher, indicating a good reliability. Only an ICC of 0.45 was found for the subscale Sport/recreation in the revision TKA group.

Reliability
The SEM ranged for the mild OA group between 5.2 and 9.0, for the moderate OA group between 5.8 and 11.6 and for patients after revision of the TKA between 7.2 and 24.6.

Validity
Of the predefined hypotheses 60% or more could be confirmed for the study groups with mild OA and moderate OA and for the TKA patient population. For the severe OA group and the revision TKA group less than 45% could be confirmed. Tables 4, 5, 6, 7 and 8 show the correlations between the KOOS subscales, the SF-36 subscales and the VAS for pain. Overall, the highest correlations between the KOOS subscales and the SF-36 bodily pain and physical function were found. Correlations between the KOOS subscale Pain and the VAS-pain were between r = -0.28 and -0.79.

Floor and ceiling effects
Neither floor effects (indicating worst possible score) nor ceiling effects (indicating best possible score) were found for the patients with moderate OA patients and the TKA patients (Table 9). Only ceiling effects were present in the mild OA group for the subscales Pain, Symptoms and ADL and for the subscale Sport/Recreation in the severe OA TKA group. Floor effects were found for the subscales Sport/Recreation and Qol in the severe OA and revision TKA.

Discussion
The results of this validation study of the Dutch KOOS questionnaire showed a good internal consistency for all study groups. Reliability was also good in the mild and moderate OA group and the revision TKA group. It was not assessed the patients with severe OA and patients with a TKA. The construct validity was moderate for the patient groups with mild and moderate OA and for TKA patients, and lower for the severe OA and revision TKA patients. Ceiling effects were present in the mild OA group and in the severe OA group. Floor effects were seen in the patient group with severe OA group and the revision TKA group.
In this validation study Cronbach's alphas were above 0.70 for almost all subscales in our patient groups. This indicates a good internal consistency, which is in line with the study of Roos et al. [6,8]. However, for the subscale Symptoms in the severe OA population we found a Cronbach's of 0.56, indicating a moderate internal consistency. Deleting one or more questions did not result in a higher internal consistency. Kessler et al. and Xie et al. also found a lower Cronbach's alpha (< 0.70) for this subscale in patients with OA of the knee [23,24].   [25][26][27][28]. In the present study, the factor loading of the question S4 (can you straighten your leg fully) was lower than 0.40 which suggests that this item might be excluded from the questionnaire. Despite our preliminary results indicating that the Dutch version of the KOOS questionnaire contains one single factor, we retained in our analyses the original subscales of the Swedish version of the KOOS questionnaire. However, based on our findings we recommend additional factor analyses on other data sets, before changing the number of subscales of the Dutch version of the KOOS questionnaire.
In the present study the test-retest reliability was good for the patient groups with mild OA (ICC 0.74-0.88), moderate OA (ICC 0.87-0.94) and patients after a revision TKA (ICC 0.73-0.89). A lower ICC (0.45) for patients after a revision TKA for the subscale Sport/recreation was found. When deleting all outliers the ICC is still smaller then 0.70 (ICC 0.62). It is plausible that for these older patients questions about sport and recreation are less relevant.
The construct validity of the KOOS questionnaire was determined by comparing the KOOS subscales with the subscales of the SF-36 and the VAS for pain. Correlations between subscales, which measure the same construct, were compared. In our study we found the highest corre-   By only reporting the correlations coefficients it is not clear whether the construct validity of a questionnaire is sufficient or not. Therefore Terwee et al. developed quality criteria for design, methods and outcomes of studies to compare the measurement properties of health status questionnaires [22]. These authors recommended assessing the construct validity by testing predefined hypotheses (e.g., about expected correlations between measures or expected differences in scores between 'known' groups). Without specific hypotheses there is a risk of bias, because retrospectively it is tempting to generate alternative explanations for low correlations instead of concluding that the questionnaire may not be valid. Terwee et al. give a positive rating for construct validity if hypotheses are specified in advance and at least 75% of the results are in correspondence with these hypotheses [22]. Our choice that convergent correlations should have a correlation coefficient of ≥ 0.60 and divergent correlations of ≤ 0.30 is arbitrary. However, there is no consensus in literature how to deal with this issue. From our pre-defined hypotheses 60% or more could be confirmed in both the mild and moderate OA group and in patients after a TKA (moderate construct validity). Less than 45% from our hypotheses  could be confirmed for patients with severe OA and after a revision TKA (lower construct validity).
The formulation of the hypotheses was based on the starting point that there is a clear distinction between the subscales of the KOOS questionnaire. However, with factor analysis we found that all items of the Dutch version of the KOOS questionnaire seem to load on one factor. This may explain the overlap between the correlations of the different constructs of the KOOS questionnaire with the SF-36. This is most obvious for the subscales Pain and ADL of the KOOS in relation to the subscales BP and PF of the SF-36. Previous studies showed that the WOMAC subscale pain and physical function loaded on the same factor [25][26][27]. Apparently it is difficult for patients to make a distinction between questions about pain and physical functioning in ADL. In our opinion this can be ascribed to the formulation of the questions; the term difficulty (translated in Dutch: 'moeite') may be not clear for some patients. The meaning of this term should be clarified or re-formulated. This was also suggested by Stratford et al, and Terwee et al. [29,30].
Because it is known that clinimetric properties are variable in different study populations [14], it is recommended to validate a questionnaire in the target population. This study showed that the clinimetric properties of the Dutch version of the KOOS questionnaire differed between the 5 different patient groups, which confirms the above described recommendation. Additionally, in future validation studies of the KOOS questionnaire, it may be of interest to evaluate the validity of the Dutch KOOS questionnaire by comparing the subscales of the KOOS questionnaire with the Dutch Oxford 12-item knee questionnaire. This latter questionnaire was considered to be valid and reliable in patients with OA of the knee [31]; however, it was not validated when we started the present study.
We observed ceiling effects only in the mild OA patient group for the subscales Pain, Symptoms and ADL of the KOOS questionnaire. It is plausible that these patients have few complaints of their knee and have no or minor clinical signs of OA, which can explain the presence of ceiling effects in this group of patients. Floor effects were only found in the subscale Sport/recreation in the patients with severe OA and in patients after revision of the TKA. Roos et al. stated that questions about sport and recreation also are relevant for older patients [8]. However, this does not seem to apply for patients after revision of the TKA. Because of severity of the disease and/or higher age, it is plausible that these patients do not participate in sport and recreational activities. Dividing the revision population into those younger than 65 years and older than 65 years resulted in floor effects of over 50% in the older patients. Questions about sport may be more relevant to younger patients than to older patients. Because the KOOS questionnaire was originally developed for younger patients this finding is not surprising.
Our study is not without limitations. First, because the selection of patients in the present study only allows statements on the reliability and validity of the KOOS questionnaire in patients with different stages of OA and it's treatment. The questionnaire was not studied in patients after a menisectomie or an ACL reconstruction. The results of the present study could not be generalized to patients with an acute knee trauma.
Second, a measurement tool can also be used to monitor the efficacy of an intervention or the disease process of the patient. For this goal the tool needs to be sensitive to detect clinically relevant changes during a certain period of time (responsiveness). ICCs are strongly influenced by the heterogeneity of the study population.
The interpretation of the SEM, i.e. whether it should be regarded as a large or a small measurement error, depends on what changes are minimal important on the KOOS subscales. The smallest detectable change (defined as 1.96*√2*SEM) has to be smaller than the minimal important changes [20]. Future studies should look at what changes in scores on the KOOS subscales constitutes minimal important change. In addition, the responsiveness of the KOOS questionnaire needs to be evaluated in a future study.