Health and Quality of Life Outcomes

Background: Since the mid eighties, responsiveness is considered to be a separate property of health status questionnaires distinct from reliability and validity. The aim of the study was to assess the strength of the relationship between internal consistency reliability, referring to an instrument's sensitivity to differences in health status among subjects at one point in time, and responsiveness referring to sensitivity to health status changes over time. Methods: We used three different datasets comprising the scores of patients on the Barthel, the SIP and the GO-QoL instruments at two points in time. The internal consistency was reduced stepwise by removing the item that contributed most to a scale's reliability. We calculated the responsiveness expressed by the Standardized Response Mean (SRM) on each set of remaining items. The strength of the relationship between the thus obtained internal consistency coefficients and SRMs was quantified by Spearman rank correlation coefficients. Results: Strong to perfect correlations (0.90 – 1.00) was found between internal consistency coefficients and SRMs for all instruments indicating, that the two can be used interchangeably. Conclusion: The results contradict the conviction that responsiveness is a separate psychometric property. The internal consistency coefficient adequately reflects an instrument's potential sensitivity to changes over time. Background Responsiveness, a concept introduced in the mid-eighties by bio-medical researchers, is considered to be an essential measurement property of health status questionnaires, distinct from reliability and validity [1]. However, it can be questioned whether an instrument's sensitivity to differences between health status changes over time, which refers to responsiveness, is different from an instrument's sensitivity to differences in health among subjects at one point in time, which refers to the psychometric concept of parallel forms reliability from the framework of classical test theory [2]. A number of theorists have argued that responsiveness is not a separate psychometric attribute of health status instruments, but merely some form of construct validity [3]. The aim of the study is to provide empirical evidence of this notion by investigating the relationship between instrument responsiveness and the traditional psychometric concept of parallel forms Published: 03 February 2005 Health and Quality of Life Outcomes 2005, 3:8 doi:10.1186/1477-7525-3-8 Received: 22 November 2004 Accepted: 03 February 2005 This article is available from: http://www.hqlo.com/content/3/1/8 © 2005 Lindeboom et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Background
Health-related quality of life (HRQL) instruments should demonstrate adequate test-retest reliability, cross-sectional and longitudinal validity before investigators use them to assess outcomes in research studies. Whether responsiveness, the ability of an instrument to detect change in HRQL when change occurs, is a measurement property distinct from reliability and validity remains, however, controversial [1][2][3][4].
Lindeboom et al. purportedly tested the assumption that responsiveness is not a distinct measurement property, but is embodied in internal consistency reliability [5]. To investigate their hypothesis, the authors removed the item contributing most to internal consistency (as determined using Cronbach's alpha) in a step-wise fashion from the physical component of the Sickness Impact profile, the Barthel activities of daily living scale and the psychosocial domain of the Graves' ophthalmology quality of life instrument using data from three previous studies. Following each step-wise removal, they recalculated Cronbach's alpha and the standardised response means (SRM, change score divided by standard deviation of change score) of the remaining items. They then assessed the correlation of these new Cronbach's alphas with the new SRMs and observed strong associations (Spearman rank correlation coefficients between 0.90 and 1.00). They concluded that internal consistency reliability adequately reflects an instrument's responsiveness and that investigators can use the two entities interchangeably.
The first conceptual problem with the approach Lindeboom et al. chose is that they looked at the correlation of internal consistency reliability and responsiveness within single studies and instruments only. However, this approach does not take into account that responsiveness depends on the type of an intervention while internal consistency reliability does not. Most HRQL measures may be very reliable, but internal consistency reliability has nothing to do with the therapy that is producing the change. In contrast, if an intervention targets aspects of HRQL that are specifically covered by a disease-specific instrument, for example, responsiveness is likely to be high. If the effect of another intervention targeting aspects other than those covered by the instrument, responsiveness will be lower. Thus the within study approach does not take into account that responsiveness is not a fixed measurement property.
Another important issue to consider is the influence of other determinants of an instrument's responsiveness such as the type of instrument, generic or disease-specific. There is ample evidence that responsiveness depends on the type of instrument. [6][7][8][9] Lindeboom's within instrument approach does not take into account this issue.
Finally, if the within instrument approach with step-wise deconstruction of domains is used, one would expect step-wise decreases of internal consistency reliability, responsiveness and other measurement properties such as cross-sectional validity for the following reasons. Internal consistency reliability is reduced when the items contributing most to internal consistency reliability are removed because the error term in the denominator increases. For the same reason, responsiveness deteriorates if the number of items is decreased [10]. Thus it is likely to see a parallel decline of internal consistency reliability and responsiveness even if there is no relationship between these two measurement properties. Indeed using Lindeboom's approach one would expect high correlations between internal consistency reliability and other measurement properties such as cross-sectional validity and could consequently conclude that they are all embodied in internal consistency reliability. The assessment of the relationship between internal consistency reliability and responsiveness should include entire domains, as they were developed, validated and used in research.
Having considered the methodological challenges and constraints above, we analysed the relationship between internal consistency reliability and responsiveness of entire domains across different instruments and studies using data from several of our previous studies.

Studies
A priori we defined the following eligibility criteria to ensure an unbiased selection of datasets as possible and to ensure that it was theoretically possible to detect a correlation between internal consistency reliability and responsiveness if one existed. We applied the following criteria: 1. Studies must have longitudinal follow-up with a baseline assessment and at least one follow-up assessment completed by the CLARITY research group (McMaster University, Hamilton, Ontario, Canada) within the last five years.

Studies must have investigated an intervention of established effectiveness that induces changes in HRQL.
3. Studies must include ≥ 2 multi-item HRQL instruments that allow calculation of Cronbach's alpha and instruments within a study must have different degrees of responsiveness (e.g. generic versus disease-specific) to ensure variability in responsiveness. We expected variability in Cronbach's alpha to be limited to values ≥ 0.60 because only those are generally accepted to represent sufficient internal consistency reliability [3].

Statistical analysis
We calculated Cronbach's alpha using baseline scores for each domain of each HRQL instrument or for the total instrument if domains did not exist. Similarly, for each domain or for a total score we calculated SRMs (change score divided by standard deviation of change score).
We calculated the correlation between Cronbach's alpha and the corresponding SRM using Pearson correlation coefficients across all studies and for each study separately. We then built linear regression models with the SRM as the dependent variable and Cronbach's alpha as the independent variable. Since the type of instrument (generic or disease-specific) affects the SRM [6][7][8][9], we introduced the type of instrument as a covariate into the regression models. For all regression models, we adjusted for possible clustering for data originating from the same group of patients (for example, patients from one study providing data for eight domains of the Short-Form Survey 36) by using the cluster function of STATA. We performed all statistical analysis with STATA for Windows version 8.2 (StataCorp, College Station, Texas, USA).

Eligible Studies
The following four studies met the eligibility criteria:

Study 1 [11]
This prospective study measured HRQL in 85 patients with chronic obstructive pulmonary disease (COPD) before and after participation in Canadian inpatient respiratory rehabilitation programs similar to many inpatient programs worldwide [12]. All patients completed the interviewer-administered Chronic Respiratory Questionnaire (CRQ) including individualised and standardised dyspnea questions. In addition, patients completed the St. Georges Respiratory Questionnaire (SGRQ) and the Short-From Survey 36 (SF-36) [13] at the beginning and end of the rehabilitation program.

Study 2 [14]
This was a prospective randomised study of 177 patients with COPD before and after respiratory rehabilitation in Canada and the United States. We randomised patients to complete either the interviewer or self-administered CRQ [11,15]. All patients answered the individualised and standardised dyspnea questions of the CRQ. Patients also completed the SGRQ and the SF-36 at the beginning and end of the rehabilitation program.
Study 3 [16,17] This prospective study enrolled 71 patients with COPD following a respiratory rehabilitation program at four cites in Switzerland, Germany and Austria. We also randomised patients to complete either the interviewer or self-administered CRQ as in study 2 [11,15]

and all
Relationship between internal consistency reliability and responsiveness, all studies Figure 1 Relationship between internal consistency reliability and responsiveness, all studies Relationship between Cronbach's alpha and standardised response mean for 79 domains or total scores of health-related quality of life instruments and symptoms scales. The data come from four studies including 333 patients with chronic obstructive pulmonary disease following a pulmonary rehabilitation and 183 patients with knee injury undergoing anterior crucial ligament reconstruction or knee arthroscopy.

Study 4 [19]
This prospective study enrolled patients undergoing anterior crucial ligament reconstruction (study 4a, n = 66) and knee arthroscopy (study 4b, n = 117) to determine their ability to recall pre-operative quality of life and functional status. Patients completed the disease-specific Anterior Crucial Ligament Quality Of Life questionnaire (ACL-QOL) [20] (study 4a) or the Western Ontario Meniscal Evaluation Tool (WOMET) [21] (study 4b) as well as the International Knee Documentation Committee (IKDC) Subjective Form [22], the Knee Injury and Osteoarthritis Outcome Score (KOOS) [23] and the SF-36 pre-and one year post-operatively. Tables 1 and 2 show the reliability coefficients and standardised response mean for each study and instrument. The mean Cronbach's alpha across all studies was 0.83 (SD 0.08, range 0.61 to 0.97) and the mean standardised response mean was 0.59 (SD 0.33, range -0.08 to 1.45). Figure 1 shows the relationship between Cronbach's alpha and SRM across all studies. The correlation coefficient was 0.10 (95% CI -0.12 to 0.32). When we analysed each study separately, correlation coefficients ranged from -0.17 to 0.62 (Figure 2). Table 3 shows the regression equations to predict the SRM from Cronbach's alpha. In an analysis of all studies including internal consistency reliability as the sole independent variable did not predict responsiveness (p = 0.59, r 2 = 0.01). In contrast, an analysis that included the type of instrument showed that the generic versus specific categorisation predicted responsiveness (p = 0.01, r 2 = 0.37).  Analysing the studies separately showed similar results ( Figure 2). Only in study 4 was Cronbach's alpha a significant predictor in unadjusted analyses. Even in this case, when we introduced the type of instrument into the model, Cronbach's alpha was no longer a significant predictor.

Discussion
We assessed the relationship between internal consistency reliability and responsiveness and found no evidence to support the claim that investigators can use them interchangeably. In general, internal consistency reliability is a poor predictor of responsiveness. Consistent with previous findings [6], we showed that in contrast to Cronbach's alpha, a significant predictor of responsiveness is whether the instrument is a generic or a disease-specific HRQL instrument.
Our findings contradict those presented by Lindeboom et al. We suspect that these differences are largely due to differences in conceptual and, thus, statistical approaches. In particular, Lindeboom's within instrument and within study approach fails to take into account that responsiveness depends on the type of instrument and on the intervention that produces change in HRQL. In our analyses, we evaluated the relationship between internal consistency reliability and responsiveness across instruments and studies.
One might argue that our failure to demonstrate a relationship between Cronbach's alpha and the SRM results from the limited variability in Cronbach's alpha across the instruments and their domains. Indeed, this limited variability in part explains the lack of relationship. Nevertheless, when choosing instruments for clinical trials, investigators will face Cronbach's alpha coefficients such as those shown in Table 1 and 2. If they rely on these results to predict responsiveness, they will be misled. In particular, some domains with very high Cronbach's alpha coefficients (SF-36 bodily pain, 0.93; CRQ IA emotional function 0.90) had low responsiveness (SRMs of 0.29 and 0.24, respectively).
Strengths of our study include the definition of a priori criteria to ensure an unbiased selection of studies that ensure large variability responsiveness creating the greatest potential to detect a relationship if one existed. Furthermore, the inclusion of very different patient populations (chronic lung disease and knee pathology) and the consistency of results across these studies and populations enhances the generalizability of our study. Replication in other populations would further strengthen our conclusions.

Conclusion
Our study demonstrates that internal consistency reliability is a poor predictor of responsiveness and that both conceptual and statistical evidence exists to support the argument that they are distinct measurement properties of evaluative instruments.