The psychometric properties of the St George’s Respiratory Questionnaire (SGRQ) in patients with idiopathic pulmonary fibrosis: a literature review

Assessment of health-related quality of life (HRQL) is particularly important in patients with progressive and incurable diseases such as idiopathic pulmonary fibrosis (IPF). The St George’s Respiratory Questionnaire (SGRQ) has frequently been used to measure HRQL in patients with IPF, but it was developed for patients with obstructive lung diseases. The aim of this review was to examine published data on the psychometric performance of the SGRQ in patients with IPF. A comprehensive search was conducted to identify studies reporting data on the internal consistency, construct validity, test-retest reliability, and interpretability of the SGRQ in patients with IPF, published up to August 2013. In total, data from 30 papers were reviewed. Internal consistency was moderate for the SGRQ symptoms score and excellent for the SGRQ activity, impact and total scores. Validity of the SGRQ symptoms, activity, impact and total scores was supported by moderate to strong correlations with other patient-reported outcome measures and with a measure of exercise capacity. Most correlations were moderately strong between SGRQ activity or total scores and forced or static vital capacity, the most commonly used marker of IPF severity. There was evidence that changes in SGRQ domain and total scores could detect within-subject improvement in health status, and differentiate groups of patients whose health status had improved, declined or remained unchanged. Although the SGRQ was not developed specifically for use with patients with IPF, on balance, its psychometric properties are adequate and suggest that it may be a useful measure of HRQL in this patient population. However, several questions remain unaddressed, and further research is needed to confirm the SGRQ’s utility in IPF. Electronic supplementary material The online version of this article (doi:10.1186/s12955-014-0124-1) contains supplementary material, which is available to authorized users.


Introduction
Idiopathic pulmonary fibrosis (IPF) is a specific form of fibrosing interstitial pneumonia characterized by progressive worsening of dyspnea and lung function [1]. In the United States, the annual incidence of IPF has been estimated as 6.8-8.8 cases per 100,000 using narrow case definitions (requiring a definite pattern of Usual Interstitial Pneumonia [UIP] on high-resolution computed tomography [HRCT]), and as 16.3-17.4 cases per 100,000 using broad case definitions (including patients with a possible UIP-pattern on HRCT) [2]. Although IPF has a poor prognosis, with a median survival time from diagnosis of 2 to 3 years, the clinical course of IPF varies considerably [1,3]. Symptoms experienced by patients with IPF include non-productive cough, fatigue and chronic dyspnea, with the latter being the most prominent and disabling [4]. The morbidity associated with IPF has a broad and profound impact on patients' healthrelated quality of life (HRQL) [4,5].
As IPF is a progressive disease with no cure, HRQL and other patient-centered outcomes are important endpoints to evaluate in research and clinical practice [6]. Although no disease-specific measure of HRQL has been established as suitable for longitudinal research in patients with IPF, several HRQL instruments (and others, including symptom and generic quality of life questionnaires) have been used [7,8]. Which patient-centered instrument(s) (including HRQL questionnaires) to use in a particular study depends on a number of factors, including the design of the study, the intervention being assessed, the hypotheses being tested, and the characteristics of the comparator group (general population, patients with IPF of different severity, patients with another disease, etc.). In any situation, whether a generic HRQL instrument might perform as well or better than a disease-specific HRQL instrument is uncertain.
In this review, we focused on the St George's Respiratory Questionnaire (SGRQ). Although originally developed for use in patients with chronic obstructive pulmonary disease (COPD) and asthma [8], it has frequently been used to evaluate HRQL in patients with IPF. The SGRQ is a 50item questionnaire split into three domains: symptoms (assessing the frequency and severity of respiratory symptoms), activity (assessing the effects of breathlessness on mobility and physical activity), and impact (assessing the psychosocial impact of the disease) [9]. Scores are weighted such that every domain score and the total score range from 0 to 100, with higher scores indicating a poorer HRQL.
The aim of this review was to assess the appropriateness of the SGRQ for measuring HRQL in patients with IPF by examining the evidence relating to the psychometric performance of the SGRQ in this population. A revised version of the SGRQ, the SGRQ-I, has been developed for use in patients with IPF [10]; however, studies assessing this tool are limited, and SGRQ-I data are not covered in this manuscript.

Search strategy and data extraction
A comprehensive literature review was conducted to identify articles that evaluated the psychometric properties of the SGRQ in patients with IPF. Following a PubMed search (see Additional file 1), articles were excluded if they were not published between 1 January 1991 (date of first publication of the SGRQ) and 31 August 2013, were not published in English, did not report data on the psychometric properties of the SGRQ in patients with IPF or duplicated clinical trial data reported in another article ( Figure 1). Data extracted from the studies included study characteristics (country, duration, design, sample size), participant characteristics (age, gender, time since diagnosis, forced vital capacity [FVC]% predicted, diffusing capacity for carbon monoxide [DL CO ]% predicted) and results of the psychometric tests.
Articles were selected that assessed any of the following psychometric properties of the SGRQ: internal consistency, convergent validity, known groups validity, test-retest reliability (reproducibility), responsiveness, minimal important difference (MID), and floor and ceiling effects [11]. Internal consistency refers to the degree to which the individual items within an instrument correlate with each other (i.e., tap the same underlying construct). This is determined using Cronbach's coefficient alpha, with ≥0.70 considered to indicate acceptable internal consistency for a multi-dimensional instrument. Convergent validity describes the degree to which two measures, hypothesized to measure the same construct, correlate. Known groups validity refers to the extent to which scores on an instrument distinguish groups that differ on a key variable, usually clinical in nature. For the described validity measures, correlations were regarded as weak if ≤0.30, moderate if 0.30-0.60, and strong if >0.60 [12]. Test-retest reliability assesses the ability of an instrument to produce consistent scores over repeated measurements in patients who are clinically stable. Responsiveness assesses the ability of an instrument to detect change in individuals who are hypothesized to have changed on the underlying construct (HRQL) and who are known to have experienced change in clinical status. MID estimates identify the smallest difference in the score on an instrument that patients perceive as important. Floor and ceiling effects are limitations that occur when an individual scores at the extremes of an instrument; if a patient's score is the lowest or highest possible value, the instrument is unable to detect a reduction or increase, respectively.

Results
A total of 30 papers were included in the review ( Figure 1; Table 1).

Internal consistency
Data from a clinical trial of bosentan have been used to determine the internal consistency of the SGRQ in patients with IPF. Cronbach's alpha was 0.66 for the symptoms score and ≥0.84 for each of the SGRQ activity, impact and total scores [10,34].

Convergent validity
Convergent validity was evaluated by extracting crosssectional and longitudinal correlations between SGRQ scores and other patient-reported outcome measures (Table 2), an assessment of exercise capacity (Table 3), pulmonary function tests (PFTs) or partial pressure of arterial oxygen (Table 4), and assessments of fibrotic abnormalities on HRCT (Table 5).

Patient-reported outcome measures
In nine studies, investigators provided information on the correlation between SGRQ scores and other patientreported outcome measures ( Table 2). Moderate to strong correlations were observed between the SGRQ total score and the total scores on these instruments ( Table 2). In general, moderate to strong correlations were observed between SGRQ domain scores and the total scores on these instruments. Likewise, moderate to strong correlations were observed between SGRQ domain or total scores and the total, physical complaints, extreme physical complaints, and functional ability sub-scale scores of the CQLQ (r = 0.34 to 0.81) [21], the total and sub-scale scores of the K-BILD (r = -0.59 to -0.89) [27], the SF-36 PCS, a composite score measuring overall physical health (r = -0.52 to -0.74) [10] and the Borg Dyspnea index (r = 0.35 to 0.56) [10,15]. For most measures and their sub-scales, correlations were weakest with the SGRQ symptoms score (when compared with other SGRQ domains or the total score).
In two studies, investigators evaluated correlations between SGRQ change scores and change scores from other patient-reported outcome measures ( Table 2). In one study, correlations were moderately strong between change scores for the SGRQ activity, impact and total scores and change scores from the single-item dyspnea assessment (r = 0.59, 0.56 and 0.45, respectively) [28]. In the other study, investigators found that the correlation between the BDI change score and SGRQ total change score was -0.29 and not significant [25]. However, the BDI was designed to measure dyspnea severity at a single point in time and not to measure change in dyspnea severity [42].

Measures of exercise capacity
Correlation coefficients between SGRQ scores and a measure of exercise capacity are presented in Table 3. Distance covered during the 6-minute walk test (6MWD) is frequently used as a measure of exercise capacity in patients with IPF, and change in 6MWD has been shown to be a predictor of mortality in these patients [16]. In five cross-sectional studies in patients with IPF, investigators examined the relationship between the SGRQ total score and 6MWD. The strength of these correlations was moderate to strong in three (-0.45 to -0.72) [15,28,40] and weak in two (-0.26 and -0.28) [10,16] studies. In four cross-sectional studies, investigators examined the relationship between the SGRQ domain scores and the 6MWD [10,28,39,40]; the strength of these correlations was moderate to strong for the activity score in all four studies (r = -0.32 to -0.72), moderate to strong for the impact score (r = -0.41 to -0.63) and moderate for the symptoms score (r = -0.32 to -0.41) in three studies. In three studies, investigators examined the relationship between change scores for the SGRQ total and change in 6MWD; [16,25,28]correlation coefficients ranged from -0.23 to -0.43. Table 4 presents correlations between SGRQ scores and either PFTs or arterial blood gas analysis in patients with IPF. All correlations between the SGRQ total score and these variables were moderate to strong (r = -0.30      to -0.66, and p < 0.05 for all but one). There were moderate to strong correlations between the SGRQ activity score and the majority of pertinent PFT results (e.g., FVC or DL CO ) or arterial blood gas analysis in all studies, while correlations between the SGRQ symptoms or impact domain scores and these variables were generally weak to moderate. Results for FVC, the lung function parameter regarded as the most statistically useful physiological indicator of IPF severity, and the one most frequently used as a primary endpoint in contemporary clinical trials, were weakly to moderately correlated with SGRQ total and domain scores (r = -0.34 to -0.45 for the SGRQ total and -0.13 to -0.31 for the SGRQ domains).

HRCT
In one study of patients with IPF, investigators assessed correlations between SGRQ scores and the extent of fibrotic abnormalities on HRCT (degree of ground-glass opacity [CT-alv], interstitial opacity [CT-fib], and both [total score]) ( Table 5). Correlations were moderately strong between the SGRQ symptoms, impact and total scores and CT-alv or total scores (r = 0.34 to 0.42) and moderately strong between the SGRQ activity score and both the CT-fib and total scores (r = 0.37 to 0.39) [28].

Known groups validity
Although there are no well-established categories of disease severity in IPF, it may be hypothesized that patients receiving supplemental oxygen represent patients with more severe disease. In two studies, investigators found that SGRQ total scores were worse in patients using supplemental oxygen versus those not using supplemental oxygen [15,38]. In one study by Chang and colleagues, the magnitude of difference between patients using versus not using oxygen was 4.7 (p < 0.05) [15].

Test-retest reliability (reproducibility)
No studies were found that reported data on the testretest reliability of the SGRQ in patients with stable IPF.

Minimal important difference
A triangulation approach has been used to determine an MID estimate for SGRQ scores in patients with IPF [34]. Using both distribution-and anchor-based approaches (using FVC, DL CO and the TDI as anchors), the MID for the SGRQ symptoms, activity, impact and total scores was 8, 5, 7 and 7 respectively.

Responsiveness
The responsiveness of the SGRQ domain and total scores has been assessed in one study [34]. Using data from a randomized placebo-controlled trial of bosentan, investigators assessed the ability of the SGRQ to discriminate among IPF patients who had experienced an improvement, decline, or no change in disease status over 6 months, as defined by three clinical anchors (change in FVC, DL CO , transition dyspnea index [TDI]). With the exception of the SGRQ symptoms score when DL CO was the anchor, changes in SGRQ domain and total scores differed significantly between patients who had declined, remained stable, or improved. [34]. Change scores from the SGRQ total and its domains were reported for the DL CO and TDI response categories and ranged from +3 to +13, +1 to -5, and 0 to -12 for patients that declined, remained stable, or improved, respectively. The impact domain discriminated best between all categories of change for all three anchors [34].

SGRQ as an endpoint
In sixteen trials, investigators used the SGRQ domain and/or total scores as outcome variables. In four trials, investigators evaluated the within-subject change in SGRQ total score from baseline to end of treatment [22,23,32,37] (Table 6). In all four, improvements were observed in exercise endurance or FVC; among these, in three there was a significant decrease in SGRQ total score from baseline to end of treatment (8-24 weeks).
In the remaining 12 trials, investigators assessed whether the SGRQ domain and/or total scores differed between active and placebo groups (Table 7). In four of these [13,17,18,25], statistically significant between-group differences for the primary endpoint coincided with statistically significant between-group differences in at least one SGRQ total or domain score (range of between-groups difference in SGRQ total score: -6.1 to -13.4). Six studies [17,20,26,[29][30][31] reported a lack of statistically significant treatment effect in the primary endpoint or SGRQ scores (range of between-groups difference in SGRQ total score reported in three studies: -0.5 to -3.0; scores were not reported in three studies). In three studies [19,33,41], the primary endpoint was not met, but the SGRQ total or domain scores were significantly different between treatment groups (range of between-groups difference in SGRQ total score: -3.3 to -6.1).
Four studies [20,31,33,41] reported changes from baseline in SGRQ total score in the placebo group. Adjusting for different trial durations, the SGRQ total score in the placebo arms of these trials deteriorated (increased) by a median of +4.9 (range: 3.2 to 10.6) per 52 weeks.

Floor and ceiling effects
No studies were found in which investigators reported data on floor and ceiling effects for the SGRQ in patients with IPF. However, in most studies, the minimum and maximum achievable SGRQ total scores (0 and 100, respectively) were outside an interval spanning twice the standard deviation around the reported means (Table 1). For the two studies in which investigators reported ranges for baseline SGRQ total scores, ranges did not include minimum or maximum possible values [24,38], thus confirming the absence of floor or ceiling effects in these studies.

Conclusions
Measurement standards and psychometric criteria have been proposed to assist with choosing an appropriate instrument to evaluate HRQL in patients with IPF [6,43]. As with any patient-reported outcome measure used in the study of any condition, an instrument must have face  VC% predicted −0.14 −0.54* −0.61* −0.56* DL CO = diffusion capacity of the lung for carbon monoxide; FEV 1 = forced expiratory volume in 1 second; FVC = forced vital capacity; PaO 2 = partial pressure of oxygen dissolved in arterial blood; TLC = total lung capacity, TL CO = transfer factor of the lung for carbon monoxide; VC = vital capacity. *p < 0.05; † p < 0.01; ‡ p < 0.001; § p < 0.0001; NS = non-significant. 1 Data reported refer to the original version of the SGRQ, not the SGRQ-I.
validity, internal consistency, test-retest reliability, longitudinal validity, and minimal floor and ceiling effects in the target patient population. The constellation of findings from studies identified in our search revealed that in patients with IPF, the internal consistency of the SGRQ activity and impact domains and the SGRQ total score was excellent, and the internal consistency of the symptoms domain was moderate, and in most studies, fell below the acceptable threshold of 0.7. The lower internal consistency of the symptoms domain is likely because it asks about a range of respiratory symptoms (cough, sputum, shortness of breath, wheezing and attacks of chest trouble), the majority of which apply to few patients with IPF whose major symptoms are shortness of breath and cough. In response data, offtarget items create a weaker level of inter-relatedness among items in this domain, and thus lower internal consistency. This also contributes to the lower convergent validity of this domain, as the off-target items weaken the associations between its scores and clinical measures of IPF severity (e.g., patients may endorse wheezing or attacks of chest trouble, but these symptoms are unlikely related to a person's FVC). These offtarget (for IPF) items in the symptoms domain detract from the SGRQ's face validity and would likely have been removed or modified in a tool specifically designed for use with patients with IPF. Overall, the symptoms domain may be well-suited for patients with COPD, but is not tailored to precisely assess symptoms in patients with IPF. The non-informative noise in the symptoms domain might also contribute to a less than optimal performance of the SGRQ total score. Overall, however, despite its weak face validity in IPF, the symptoms domain performs reasonably well in this population, and its potential to detract from the performance of the SGRQ total score is tempered because it contributes least to the SGRQ total score. Convergent validity analyses seek to determine whether two measures, hypothesized to measure the same construct, do in fact correlate, and moderate, statistically significant correlations in the expected direction support convergent validity. Very strong or 'perfect' correlations, suggest redundancy in measurement, so moderate correlations between a patient-reported outcome measure and another clinical variable support convergent validity of the patient-reported outcome measure while confirming that it contributes unique information not captured by the other clinical variable [5]. The SGRQ has been used as a secondary endpoint in several clinical trials conducted in patients with IPF. Among the select few in which the intervention outperformed placebo, SGRQ results were as one would anticipate, i.e., SGRQ scores improved in the group that benefited from the intervention. Although not a formal assessment of responsiveness, consistency between the changes in SGRQ scores and the changes in other endpoints supports responsiveness.
In sum, the limitations of the SGRQ in IPF should be noted, as it was not originally developed for use in patients with IPF. In particular, this applies to possible over-interpretation of results of individual domains. However, the cross-sectional correlations between SGRQ domain and total scores and other measures of patientreported health status, exercise capacity or lung function, along with the ability of the SGRQ to distinguish patients who experience a change in clinical status or remain stable over time, support the SGRQ as a useful patient-reported outcome measure in IPF.
Limitations to our research include the following: we could only identify one study in which MID estimates for the SGRQ scores in IPF were determined [44]. This study used a triangulation approach and concluded an MID that was higher than that reported for COPD [45], but more research with additional datasets is needed to evaluate these estimates. In the meantime, the use of responder rates of patients experiencing a minimum change from baseline in SGRQ scoresor perhaps more informative, cumulative distribution plotsmay be a useful assessment, as research suggests that it may be less dependent on the exact cutoff, i.e. the precise value of the MID [46].
No articles were identified that evaluated the testretest reliability of the SGRQ in patients with stable IPF. Likewise, we could not locate a study in which floor and ceiling effects of SGRQ scores were reported, although an analysis of the reported baseline mean SGRQ total  scores and their standard deviations suggested that there was no evidence for either. Furthermore, we did not assess the content validity of the SGRQ in patients with IPF, nor did we include analyses of articles published in languages other than English. Content validity and cultural adaption are important factors to consider for any patient-reported outcome measure, but these topics were beyond the scope of this evaluation of the SGRQ's psychometric properties. Therefore, it is evident that more research on the SGRQ is needed in this patient population.
The utility of a patient-reported outcome measure may be assessed only after a wealth of data becomes available. The assessment involves examining how the measure performs in the target population under several circumstances. The cache of available data has greatly advanced our understanding of HRQL in general, and the performance of the SGRQ in patients with IPF. For example, whilst the mean baseline SGRQ total score reported in IPF (around 45; interquartile range: 42-50) is similar to that reported in COPD trials [47,48], an analysis of the reported changes from baseline in the SGRQ total score in the placebo arms suggests that untreated patients with IPF deteriorate by +4.9 points over a period of 52 weeks. This contrasts with the experience in COPD, where patients on placebo show an improvement of 2-3 points per year [46], and reflects the progressive decline in health status seen in patients with IPF.
Finally, a major factor in this assessment revolves around how confidently response data from the measure can be used to make inferences about patients in the target population. For example, what can be said about a patient with IPF whose SGRQ score is 50? How does day-to-day functioning, or how a patient feels, change for an IPF patient whose SGRQ score increases by 10 over 6 months? Being able to answer these, and similar, questions confidently and accurately will further and more strongly support the validity of the SGRQ as an instrument capable of assessing domains of HRQL in this population. Until then, the balance of the data suggests that the SGRQ may be a suitable secondary endpoint for measuring HRQL in therapeutic trials of IPF. Competing interests JJS has served as a paid consultant for Boehringer Ingelheim and InterMune. KKB has served as a paid consultant for Boehringer Ingelheim. DE and CSC are full-time employees of Boehringer Ingelheim. If accepted for publication in Health and Quality of Life Outcomes, the article processing charge for this article would be covered by Boehringer Ingelheim.
Authors' contributions JJS interpreted the data and prepared the manuscript. DE conducted the literature search, extracted and interpreted the data, and prepared the manuscript. CSC interpreted the data and prepared the manuscript. KKB interpreted the data and prepared the manuscript. All authors approved the final version of the manuscript. Test of statistical significance for the difference in mean change from baseline between groups. 3 Test of statistical significance for the difference in mean change from baseline between the nintedanib 150 mg bid and placebo groups. 4 Treatment continued for ≥12 months (data not available).