Combining scores from different patient reported outcome measures in meta-analyses: when is it justified?

Puhan, Milo A; Soesilo, Irene; Guyatt, Gordon H; Schünemann, Holger J

doi:10.1186/1477-7525-4-94

Research
Open access
Published: 07 December 2006

Combining scores from different patient reported outcome measures in meta-analyses: when is it justified?

Milo A Puhan¹,
Irene Soesilo²,
Gordon H Guyatt³ &
…
Holger J Schünemann^3,4

Health and Quality of Life Outcomes volume 4, Article number: 94 (2006) Cite this article

12k Accesses
69 Citations
Metrics details

Abstract

Background

Combining outcomes and the use of standardized effect measures such as effect size and standardized response mean across instruments allows more comprehensive meta-analyses and should avoid selection bias. However, such analysis ideally requires that the instruments correlate strongly and that the underlying assumption of similar responsiveness is fulfilled. The aim of the study was to assess the correlation between two widely used health-related quality of life instruments for patients with chronic obstructive pulmonary disease and to compare the instruments' responsiveness on a study level.

Methods

We systematically identified all longitudinal studies that used both the Chronic Respiratory Questionnaire (CRQ) and the St. George's Respiratory Questionnaire (SGRQ) through electronic searches of MEDLINE, EMBASE, CENTRAL and PubMed. We assessed the correlation between CRQ (scale 1 – 7) and SGRQ (scale 1 – 100) change scores and compared responsiveness of the two instruments by comparing standardized response means (change scores divided by their standard deviation).

Results

We identified 15 studies with 23 patient groups. CRQ change scores ranged from -0.19 to 1.87 (median 0.35, IQR 0.14–0.68) and from -16.00 to 3.00 (median -3.00, IQR -4.73–0.25) for SGRQ change scores. The correlation between CRQ and SGRQ change scores was 0.88. Standardized response means of the CRQ (median 0.51, IQR 0.19–0.98) were significantly higher (p < 0.001) than for the SGRQ (median 0.26, IQR -0.03–0.40).

Conclusion

Investigators should be cautious about pooling the results from different instruments in meta-analysis even if they appear to measure similar constructs. Despite high correlation in changes scores, responsiveness of instruments may differ substantially and could lead to important between-study heterogeneity and biased meta-analyses.

Background

Systematic reviews and meta-analyses should include all available evidence to avoid selection bias and to increase the power of analyses of primary effects and effect modification by differences in patients and interventions. In meta-analysis of patient reported outcome (PRO) measures, effects are, however, often measured with different instruments. For example, meta-analyses of respiratory rehabilitation in chronic obstructive pulmonary disease (COPD) are typically either based on studies using the Chronic Respiratory Questionnaire (CRQ)[1] or the St George's Respiratory Questionnaire (SGRQ) [2–4].

Investigators often deal with this challenge by standardizing scores from different instruments and combining them as unit-free scores – effect sizes or standardized response means (SRMs)[2, 5, 6] However, critics have noted that, because standard deviations (SD) may vary substantially from study to study, treatment effects that are homogeneous when expressed in their original unit can become heterogeneous when expressed as (SRM) [7]. An alternative to standardisation is to directly transform PRO scores. For instance, investigators could transform CRQ into SGRQ scores or vice versa using transformation coefficients from regression analyses [8].

For either method of combining scores, two important premises must ideally be met. First, the scores must correlate strongly indicating that the instruments measure constructs that are similar enough to be combined. Second, the responsiveness – the instrument's ability to detect important changes even if those changes are small – should be similar. If instruments express different magnitude of change for identical underlying effects, less responsive instruments will underestimate treatment effects, and meta-analyses will manifest heterogeneity that might falsely be attributed to variability in patient or interventions or effects of the interventions.

While investigators found moderate to strong correlations between the CRQ and SGRQ on an individual patient level [9–12] suggesting that they provide similar information, it is unknown whether a strong relationship between CRQ and SGRQ change scores exists on a study level. The objective of this study was to assess the correlation of CRQ and SGRQ change scores as well as their responsiveness on a study level and to evaluate the implications for combining scores in meta-analyses.

Methods

We conducted a systematic literature search to identify all longitudinal studies that used both the CRQ and SGRQ.

Search strategy

We began the literature search by identifying all studies that used the CRQ using the keywords "chronic respiratory questionnaire", "chronic respiratory disease questionnaire", "CRQ" and "CRDQ" for electronic database searches in MEDLINE (Ovid version, New York, New York, from inception to November 2004), EMBASE (DataStar version, Cary, North Carolina from inception to November 2004) and the Cochrane Central Register of Controlled Trials (Oxford, United Kingdom, 2004, Issue 4). We also used the related articles feature in PubMed (National Library of Medicine, Washington, Maryland) for included articles to search for additional papers. In addition, we hand-searched the bibliographies of included primary studies and our own files.

Study selection criteria

Study design

Eligible studies included both randomized controlled trials and uncontrolled studies with a baseline measurement and at least one follow-up measurement of the CRQ and SGRQ.

Participants

We included studies if more than 90% of study participants had COPD defined by chronic airflow obstruction (FEV₁ less than 80% predicted) and little reversibility of airflow obstruction (reversibility of FEV1 in % predicted in response to inhaled β-agonists below 20%).

Interventions

Any intervention, usual care, placebo or time (natural history).

Outcome measures

Studies had to include both the CRQ and SGRQ.

Study selection

Two members of the study team independently scrutinized the titles and abstracts of all identified citations (see Figure 1). We obtained the full text of any article that was deemed potentially eligible by one of the reviewers. The two reviewers then evaluated the full text of all retrieved papers and evaluated their eligibility, resolving disagreement by consensus.

Data extraction

Information extracted included details about patients, interventions, length of follow-up, study design, mean change scores (difference between follow-up scores and baseline) for the CRQ total and SGRQ total scores, and SD of CRQ total and SGRQ total baseline and change scores. If the SD was not available from reports we used the median SD from studies that reported the SD. For studies, for which a CRQ total score was not provided in the article, we calculated the mean of the four domain change scores. Because one cannot calculate SGRQ total scores on the basis of domain scores, we asked authors of articles reporting only SGRQ domain scores to provide total scores.

Quality assessment

We evaluated two aspects of study quality. First, we assessed whether the order of administration of the CRQ and SGRQ was randomised and second, whether investigators used validated versions of the CRQ and SGRQ. We considered questionnaires to be validated if the investigators referred to a reference for the validation process in the respective language.

Statistical analysis

We performed all analyses on a study level. We first calculated median CRQ and SGRQ scores together with their interquartile ranges (25^th to 75^th percentile). We then assessed the relationship between CRQ and SGRQ change scores using scatter plots and Spearman rank correlation coefficients. Our criterion for a strong correlation, 0.7 or more, exceeded that of previous studies with individual patient data (0.5) because the use of mean change scores is likely to increase correlation coefficients by lowering denominators of the correlation coefficients.

To compare the responsiveness of the CRQ and SGRQ we calculated SRMs by dividing change scores by the SD of change cores and multiplying the resulting SGRQ SRM by -1 to adjust for the fact that negative scores indicate improvement on the SGRQ [13]. We then conducted a Wilcoxon signed-rank test. We performed all statistical analyses using SPSS for Windows version 12.0.1 (SPSS Inc, Chicago, Ill).

Results

The electronic database search yielded 538 citations of which 27 articles were potentially eligible (Figure 1); hand searching added another 4 articles. Full text review of these 31 articles demonstrated that 15 studies fulfilled our inclusion criteria. Most of the 16 excluded studies were review articles or included only the CRQ. Agreement on inclusion and exclusion was excellent (agreement in 94% of all decisions, chance-corrected kappa = 0.87).

Table 1 describes the 15 included studies that reported on 22 patient groups. In 11 groups, patients followed a respiratory rehabilitation program; 9 groups were cohorts without specific interventions or controls in randomized controlled trials receiving usual care, patient education or placebo and in two groups, patients received inhaled bronchodilators. Sample size ranged from 21 to 183 and duration of follow-up from 4 to 52 weeks. In all studies, investigators used validated versions of the CRQ and SGRQ and in four studies they randomized the order of administration of the CRQ and SGRQ. The results published by Bestall [14] and Wedzicha [15] were based on the same randomized controlled trial. We only included the last available data for each patient group, i.e. the 52 weeks follow up data for patients with moderate to severe COPD and the 8-week follow-up data for patients with severe COPD. For one study, mean CRQ and SGRQ total change scores were not available [16].

Table 1 Characteristics of included studies

Full size table

Mean total scores at baseline ranged from 2.64 to 5.31 for the CRQ and from 40.3 to 69.6 for the SGRQ. The correlation coefficient for total scores at baseline was -0.86 (95% CI 0.62–1.00).

CRQ change scores ranged from -0.19 to 1.87 (median 0.35, IQR 0.14–0.68) and for the SGRQ from -16.10 to 3.00 (median -3.00, IQR -4.73–0.25). Figure 2 shows the strong correlation between CRQ and SGRQ change scores with a correlation coefficient of 0.88. One study [17] showed substantially larger effects than the others on both instruments and could have led to this strong correlation. However, a sensitivity analysis excluding the study by Man et al [17] showed that it had little influence on the correlation coefficient (r = 0.86).

Figure 3 shows the SRMs for the CRQ and SGRQ. SRMs ranged from -0.24 to 3.53 for the CRQ (median 0.51, IQR 0.19–0.98) and were significantly higher (p < 0.001) compared to standardised response means of the SGRQ (range from -0.29 to 2.71, median 0.26, IQR -0.03–0.40).

Discussion

We observed that high correlation between PRO measure change scores does not necessarily imply similar responsiveness. In our example of the CRQ and SGRQ we showed a strong correlation between total change scores on a study level, but the CRQ was substantially more responsive than the SGRQ. This finding indicates that these two measures provide very similar information and could justify the use of pooled estimates in meta-analyses on a conceptual, theoretical level. However, studies using the less responsive measure are likely to underestimate treatment effects and to introduce heterogeneity in study results.

Strengths of this study include the systematic review approach to identifying longitudinal studies using both the CRQ and SGRQ. A limitation of our approach is the lack of individual patient data to explore the association between CRQ and SGRQ change scores in greater detail.

Earlier studies indicated that the responsiveness of the CRQ is superior to the SGRQ when applied to the same patients [12, 13]. Our results extend these findings beyond the previous samples and demonstrate its generalizability. The phenomenon will not only lead to underestimates of effect in studies using the SGRQ, but could lead investigators who are unaware of different responsiveness to spuriously attribute variability to differences in patients, interventions, intervention effects or methodological quality.

The rehabilitation studies included in this systematic review indicate the extent of underestimation by the SGRQ. In the nine studies with patients following respiratory rehabilitation[8, 10, 12, 14, 15, 17–20], the median SGRQ change was 4.0 points or, expressed as SRM 0.31. The corresponding SRM based on the CRQ change scores was 0.62. If the SRM of 0.62 was expressed as SGRQ change scores the corresponding change would be 8.0 points. The difference of 4 points between changes measured by the CRQ and SGRQ is substantial and equivalent to the minimal important difference of the SGRQ or to the shift from a minimal to a moderate difference [8].

A conservative solution to the problem with meta-analysis that we raised is to restrict analyses to the most responsive available instrument. Investigators could develop alternatives allowing for combining trials with different instruments. Preferable alternatives to this conservative approach would include testing for an association between effect size and the outcome measure and, if there is no association within the individual meta-analysis, using all studies irrespective of outcome measure. Alternatively, investigators could introduce instrument as a variable in meta-regression models. Finally, if there is a strong linear relationship between instruments (as in this case), one could transform the scores of one instrument into those of another instrument. For example, SGRQ scores could be transformed into CRQ scores using the equation of a linear regression model where SGRQ was used to predict CRQ scores[8]

In theory, reasons for the superior responsiveness of the CRQ could include statistical reasons, differences in the aspects of HRQL measured by the CRQ and SGRQ, and the way these questionnaires are administered. A statistical reason why the CRQ is more responsive is the lower variability of CRQ scores leading to smaller noise terms. The domains of CRQ and SGRQ do not measure identical aspects of HRQL even though their total scores correlate highly. The domains of SGRQ focus on impairment from respiratory symptoms while the CRQ also addresses impairment from extra-pulmonary manifestations of COPD such as fatigue or depressive symptoms. Thereby, it is possible that the CRQ captures, by its broader approach, improvements of pulmonary and extra-pulmonary manifestation better than the SGRQ. However, the corresponding domains on the two questionnaires (e.g. symptoms and impact on the SGRQ compared with dyspnea and physical functioning) generally show greater responsiveness for the CRQ indicating that this may not be the explanation. Finally, the administration format may also influence responsiveness. In two randomised trials where we compared the interviewer- and self-administered CRQ[10, 11] we found that the self-administered CRQ tends to be more responsive than the interviewer-administered CRQ. This was mainly due to lower baseline scores with the self-administered format. Patients may be more willing to express the severity of impairment in the absence of an interviewer. Thus self-administration might enhance responsiveness compared with interviewer-administration. The SGRQ is a self-administered questionnaire and the CRQ required, until recently and as it was the case in the studies of this systematic review, an interviewer. If self-administration is associated with greater responsiveness the analyses presented in this article may even underestimate differences in responsiveness between the CRQ and the SGRQ.

Conclusion

The presence of a strong relationship of two different instruments alone does not allow combining them in meta-analysis. There should be similar responsiveness, otherwise pooled estimates may become biased and substantial heterogeneity can arise. At present, investigators should remain cautious about combining results from trials that use different instruments without careful exploration of possible heterogeneity of the results.

Author disclosure

The CRQ is copyrighted by McMaster University, Hamilton, Canada; Principal Authors Dr. Gordon Guyatt and Dr. Holger Schünemann. Use of the CRQ requires permission by McMaster University and the authors.

References

Lacasse Y, Goldstein R, Lasserson TJ, Martin S: Pulmonary rehabilitation for chronic obstructive pulmonary disease. Cochrane Database Syst Rev 2006, CD003793.
Google Scholar
Appleton S, Poole P, Smith B, Veale A, Bara A: Long-acting beta2-agonists for chronic obstructive pulmonary disease patients with poorly reversible airflow limitation. Cochrane Database Syst Rev 2002, CD001104.
Google Scholar
Nannini L, Cates CJ, Lasserson TJ, Poole P: Combined corticosteroid and longacting beta-agonist in one inhaler for chronic obstructive pulmonary disease. CochraneDatabaseSystRev 2004, CD003794.
Google Scholar
Sin DD, McAlister FA, Man SF, Anthonisen NR: Contemporary management of chronic obstructive pulmonary disease: scientific review. JAMA 2003,290(17):2301–2312. 10.1001/jama.290.17.2301
Article CAS PubMed Google Scholar
Jones A, Fay JK, Burr M, Stone M, Hood K, Roberts G: Inhaled corticosteroid effects on bone metabolism in asthma and mild chronic obstructive pulmonary disease. Cochrane Database Syst Rev 2002, CD003537.
Google Scholar
Saint S, Bent S, Vittinghoff E, Grady D: Antibiotics in chronic obstructive pulmonary disease exacerbations. A meta-analysis. JAMA 1995,273(12):957–960. 10.1001/jama.273.12.957
Article CAS PubMed Google Scholar
Cummings P: Meta-analysis based on standardized effects is unreliable. Arch Pediatr Adolesc Med 2004,158(6):595–597. 10.1001/archpedi.158.6.595
Article PubMed Google Scholar
Schunemann HJ, Griffith L, Jaeschke R, Goldstein R, Stubbing D, Guyatt GH: Evaluation of the minimal important difference for the feeling thermometer and the St. George's Respiratory Questionnaire in patients with chronic airflow obstruction. Journal of clinical epidemiology 2003,56(12):1170–1176. 10.1016/S0895-4356(03)00115-X
Article PubMed Google Scholar
Rutten-van Molken M, Roos B, Van Noord JA: An empirical comparison of the St George's Respiratory Questionnaire (SGRQ) and the Chronic Respiratory Disease Questionnaire (CRQ) in a clinical trial setting. Thorax 1999,54(11):995–1003.
Article CAS PubMed Google Scholar
Schunemann JS, Griffith L, Jaeschke R, Goldstein R, Stubbing D, Guyatt GH: A Randomized Trial to Evaluate the Self-Administered Standardized CRQ. Europ Respir J 2005,25(1):31–40. 10.1183/09031936.04.00029704
Article CAS Google Scholar
Puhan MA, Behnke M, Laschke M, Lichtenschopf A, Brandli O, Guyatt GH, Schunemann HJ: Self-administration and standardisation of the chronic respiratory questionnaire: a randomised trial in three German-speaking countries. Respir Med 2004,98(4):342–350. 10.1016/j.rmed.2003.10.013
Article PubMed Google Scholar
Singh SJ, Sodergren SC, Hyland ME, Williams J, Morgan MD: A comparison of three disease-specific and two generic health-status measures to evaluate the outcome of pulmonary rehabilitation in COPD. Resp Med 2001,95(1):71–77. 10.1053/rmed.2000.0976
Article CAS Google Scholar
Puhan MA, Guyatt GH, Goldstein R, Mador J, McKim D, Stahl E, Griffith L, Schunemann HJ: Relative responsiveness of the Chronic Respiratory Questionnaire, St. Georges Respiratory Questionnaire and four other health-related quality of life instruments for patients with chronic lung disease. Respir Med 2006.
Google Scholar
Bestall JC, Paul EA, Garrod R, Garnham R, Jones PW, Wedzicha JA: Longitudinal trends in exercise capacity and health status after pulmonary rehabilitation in patients with COPD. Respiratory Medicine 2003,97(2):173–180. 10.1053/rmed.2003.1397
Article CAS PubMed Google Scholar
Wedzicha JA, Bestall JC, Garrod R, Garnham R, Paul EA, Jones PW: Randomized controlled trial of pulmonary rehabilitation in severe chronic obstructive pulmonary disease patients, stratified with the MRC dyspnoea scale. EurRespirJ 1998,12(2):363–369.
CAS Google Scholar
Connor MC, O'Shea FD, O'Driscoll MF, Concannon D, McDonnell TJ: Efficacy of pulmonary rehabilitation in an Irish population. IrMedJ 2001,94(2):46–48.
CAS Google Scholar
Man SF, McAlister FA, Anthonisen NR, Sin DD: Contemporary management of chronic obstructive pulmonary disease: clinical applications 1. JAMA 2003,290(17):2313–2316. 10.1001/jama.290.17.2313
Article CAS PubMed Google Scholar
Barr JT, Schumacher GE, Freeman S, LeMoine M, Bakst AW, Jones PW: American translation, modification, and validation of the St. George's Respiratory Questionnaire. Clinical Therapeutics 2000,22(9):1121–1145. 10.1016/S0149-2918(00)80089-2
Article CAS PubMed Google Scholar
de Torres JP, Pinto-Plata V, Ingenito E, Bagley P, Gray A, Berger R, Celli B: Power of Outcome Measurements to Detect Clinically Significant Changes in Pulmonary Rehabilitation of Patients With COPD(*). Chest 2002,121(4):1092–1098. 10.1378/chest.121.4.1092
Article PubMed Google Scholar
Griffiths TL, Burr ML, Campbell IA, Lewis-Jenkins V, Mullins J, Shiels K, Turner-Lawlor PJ, Payne N, Newcombe RG, Ionescu AA, Thomas J, Tunbridge J, Lonescu AA: Results at 1 year of outpatient multidisciplinary pulmonary rehabilitation: a randomised controlled trial. Lancet 2000,355(9201):362–368. 10.1016/S0140-6736(99)07042-7
Article CAS PubMed Google Scholar
Bourbeau J, Maltais F, Rouleau M, Guimont C: French-Canadian version of the Chronic Respiratory and St George's Respiratory questionnaires: an assessment of their psychometric properties in patients with chronic obstructive pulmonary disease. Can Respir J 2004,11(7):480–486.
PubMed Google Scholar
Desikan R, Mason HL, Rupp MT, Skehan M: Health-related quality of life and healthcare resource utilization by COPD patients: a comparison of three instruments. Qual Life Res 2002,11(8):739–751. 10.1023/A:1020836719321
Article CAS PubMed Google Scholar
Hajiro T, Nishimura K, Jones PW, Tsukino M, Ikeda A, Koyama H, Izumi T: A novel, short, and simple questionnaire to measure health-related quality of life in patients with chronic obstructive pulmonary disease. Am J Respir Crit Care Med 1999,159(6):1874–1878.
Article CAS PubMed Google Scholar
Harper R, Brazier JE, Waterhouse JC, Walters SJ, Jones NM, Howard P: Comparison of outcome measures for patients with chronic obstructive pulmonary disease (COPD) in an outpatient setting. Thorax 1997,52(10):879–887.
Article CAS PubMed Central PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Horten Centre, University of Zurich, Switzerland
Milo A Puhan
Department of Medicine, State University of New York at Buffalo, New York, USA
Irene Soesilo
Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, Canada
Gordon H Guyatt & Holger J Schünemann
Clinical Research Development and Information Translation (INFORMA) Unit, Department of Epidemiology, Italian National Cancer Institute Regina Elena, Rome, Italy
Holger J Schünemann

Authors

Milo A Puhan
View author publications
You can also search for this author in PubMed Google Scholar
Irene Soesilo
View author publications
You can also search for this author in PubMed Google Scholar
Gordon H Guyatt
View author publications
You can also search for this author in PubMed Google Scholar
Holger J Schünemann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Milo A Puhan.

Additional information

Competing interests

MAP and IS declare that they have no competing interests. HJS and GHG declare that they have no competing interests with the present analysis but they are the principal authors of the CRQ.

Authors' contributions

MP, HJS and GHG participated in the design of the study. MP and IS collected the data. MP, HJS and GHG carried the statistical analysis and drafted the manuscript. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Puhan, M.A., Soesilo, I., Guyatt, G.H. et al. Combining scores from different patient reported outcome measures in meta-analyses: when is it justified?. Health Qual Life Outcomes 4, 94 (2006). https://doi.org/10.1186/1477-7525-4-94

Download citation

Received: 17 October 2006
Accepted: 07 December 2006
Published: 07 December 2006
DOI: https://doi.org/10.1186/1477-7525-4-94

Combining scores from different patient reported outcome measures in meta-analyses: when is it justified?