Validation of two generic patient-reported outcome measures in patients with type 2 diabetes

Background Prior to using a generic patient-reported outcome measure (PRO), the measure should be validated within the target population. The purpose of the current study was to validate two generic measures in patients with type 2 diabetes. Methods Patients with type 2 diabetes in Scotland and England completed two generic measures: EQ-5D and Psychological General Well-Being Index (PGWB). Two diabetes-specific measures were administered: ADS and DSC-R. Analyses assessed reliability and validity. Results There were 130 participants (53 Scotland; 77 England; 64% male; mean age = 55.7 years). Responses on the EQ-5D and PGWB reflected moderate impairment consistent with previous diabetes samples: mean EQ-5D Index score, 0.75; EQ-5D VAS, 68.8; PGWB global score, 67.9. All scales of the PGWB demonstrated good internal consistency reliability (Cronbach's alpha = 0.77 to 0.97). The EQ-5D and PGWB demonstrated convergent validity through significant correlations with the ADS (r = 0.48 to 0.61), DSC-R scales (r = 0.33 to 0.81 except ophthalmology subscale), and Body Mass Index (r = 0.15 to 0.38). The EQ-5D and PGWB discriminated between groups of patients known to differ in diabetes-related characteristics (e.g., history of hypoglycemia). Conclusion Results support the use of the EQ-5D and PGWB among patients with type 2 diabetes, possibly in combination with condition-specific measures.


Background
Patient-reported outcome measures (PROs) that assess health-related quality of life (HRQL) and related constructs are often categorized as either generic or conditionspecific [1][2][3]. Generic measures are designed for use among diverse populations with a broad range of medical conditions. These instruments can also be used to characterize healthy samples without a particular medical condition. In contrast, condition-specific measures are relevant to a particular group of patients, and they have been developed to assess specific populations, quantify aspects of functioning, and examine the impact of particular medical conditions or treatments.
A substantial body of literature has focused on comparing generic and condition-specific measures, while identifying advantages of each. Compared with generic measures, the primary advantage of condition-specific measures is that they are usually found to be more responsive to treatment-related change [4][5][6]. Because of their greater responsiveness to change, condition-specific instruments may be more likely to detect differences between treatment groups in clinical trials [7]. An advantage of generic PROs is that they can be used to compare among various populations, make comparisons to the general population, and estimate the relative impact of various medical conditions or treatments [1,2,8,9]. Generic measures also tend to correlate well with condition-specific measures [10,11], and in some studies, they have demonstrated responsiveness or convergent validity that was comparable to condition-specific measures [12][13][14][15][16]. Most importantly, generic measures are distinct from conditionspecific measures in that they usually assess impact of disease and treatment on overall functioning or a broader range of health domains [8,17].
Because generic and condition-specific measures have different strengths and are conceptually distinct, it is often recommended to administer both types of instruments as part of a complete outcomes assessment in clinical trials [5,[17][18][19]. Prior to using a generic measure, however, it is important that the instrument is shown to be reliable and valid in the specific population under investigation [9]. Instruments validated in one population will not necessarily perform well among patients with a different medical condition. Therefore, it is often necessary to validate generic PROs in multiple populations. The purpose of the current study was to examine the performance of two commonly used generic PROs in a sample of patients with type 2 diabetes, an increasingly prevalent disease associated with serious health risks and HRQL impairment [20][21][22][23][24][25][26][27].
One of the two measures evaluated in this study is the EuroQol EQ-5D, a brief health status instrument frequently used for clinical and economic appraisal [28,29]. The EQ-5D has been used in several large studies involving patients with type 2 diabetes to provide an estimate of HRQL and derive utilities, which used to compute quality-adjusted life years (QALYs) in models evaluating treatment cost-effectiveness [30][31][32][33][34][35]. However, no studies were located focusing on validation of the EQ-5D among patients with type 2 diabetes. The other generic PRO examined in this study is the Psychological General Well-Being Index (PGWB), which was developed to measure self-perception of affective states and to assess a sense of subjective well-being or distress [36]. The PGWB has been used in several studies that included patients with type 2 diabetes [37,38], including one study that found adequate internal consistency reliability and convergent validity among a sample of 88 Native American patients with type 2 diabetes [39]. However, no studies were located validating the PGWB in a more general sample of patients with type 2 diabetes. The current analysis builds on these findings by evaluating these properties among a larger sample in the United Kingdom, while also assessing knowngroups validity.

Participants
Participants were required to be (1) diagnosed with type 2 diabetes by a recognized medical professional (as indicated by patient self-report); (2) between 30 and 75 years old; (3) able to identify the age at which they were first diagnosed with diabetes; (4) able to read and understand English; and (5) willing and able to give informed consent prior to study entry. Patients were not eligible if they had cognitive impairment, hearing difficulty, or severe psychopathology that could interfere with the ability to complete the study measures. Participants were recruited through ten advertisements placed in four newspapers in Scotland and England from June to August of 2005. Interested patients responded by telephone, and 200 potential participants were screened to assess whether they met the study inclusion/exclusion criteria. Of the 200 screened patients, 132 were available, eligible, and willing to participate. Two participants who attended the interview were unable to complete the questionnaires relevant to the current analysis. Thus, the current study includes a sample of 130 individuals with type 2 diabetes.

Measures
EuroQol EQ-5D This questionnaire assesses functioning in five dimensions: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each dimension is assessed by one item with three response options: no problems, some problems, and severe problems. Higher scores on these items indicate greater impairment. Responses to these five items are used to derive the weighted EQ-5D index score, which represents overall health with a possible range from -0.594 to 1.0 [40,41]. After completing the five dimension items, patients completed the single item EQ-5D visual analogue scale (VAS), on which they rated their current health on a scale ranging from 0 (worst imaginable health state) to 100 (best imaginable health state). Higher scores on the index score and VAS indicate better health status.

Psychological General Well-Being Index (PGWB)
The PGWB is comprised of 22 items in the following 6 dimensions: anxiety (5 items); depressed mood (3 items); positive well-being (4 items); self-control (3 items); general health (3 items); and vitality (4 items). Responses are rated on a six-point Likert scale ranging from 0 (reflecting the most distress) to 5 (reflecting the highest level of wellbeing) [42]. Scoring approaches for the PGWB have varied in previously published studies, with the global score often ranging from 0 to 100 (when items range from 0 to 5) or 22 to 132 (when items range from 1 to 6). For the current study, this instrument was scored following the approach proposed by Chassany et al. [42] in the recently published user's manual. The six PGWB raw subscale scores were computed by summing the item responses, and the raw global score is the sum of the six subscale scores. Then, these raw scores were transformed to a 0-100 scale by dividing each raw score by the maximum possible score, multiplying by 100, and rounding to one decimal place. Higher scores reflect better well-being.

Appraisal of Diabetes Scale (ADS)
The ADS is a brief 7-item patient-reported scale that assesses the impact of diabetes on a patient's life. This condition-specific instrument was used to assess criterion validity of the EQ-5D and PGWB. The items, which were developed based on theory and previous research, include "How upsetting is having diabetes for you?" and "To what degree does your diabetes get in the way of your developing life goals?". The instrument has been shown to have acceptable internal consistency reliability, test-retest reliability, and construct validity [43]. Higher scores indicate greater negative impact of diabetes.

Diabetes Symptom Checklist -Revised (DSC-R)
The DSC-R is a revised version of the DSC-2, which was developed to measure both the frequency and perceived discomfort of physical and psychological symptoms associated with type 2 diabetes and its potential complications [44]. Like the ADS, the DSC-R was used to assess validity of the two generic instruments that were the focus of the current analysis. For each of the 34 DSC-R items, respondents first indicate whether they have experienced each symptom in the past month by circling "yes" or "no." If the patient answers "no," then the item is scored as a 0. If "yes" is selected, the participant proceeds to rate the perceived discomfort of the symptom on a 5-point scale ranging from 1 (not at all) to 5 (extremely). The instrument yields a total score and scores on the following subscales: psychology-fatigue, psychology-cognitive, neurologypain, neurology-sensory, cardiology, ophthalmology, hypoglycemia, and hyperglycemia. The total score and all subscale scores range from 0 to 5, with higher scores indicating greater symptom burden.

Demographic and clinical information form
Patients completed a brief questionnaire that included questions on age, sex, ethnicity, living situation, employment, education, diabetes-related health, and general health. Participants' weight and height were measured at the beginning of each interview and recorded on the demographic and clinical information form so that patients' Body Mass Index (BMI) could be computed [45]. Items assessing diabetes-related health included questions asking whether participants had ever experienced hyperg-lycemia, daytime hypoglycemia, or nighttime hypoglycemia.

Data collection and statistical analysis procedures
Data were collected in Edinburgh and London during August 2005. All procedures and instruments were approved by an independent Institutional Review Board, and all participants provided written informed consent prior to completing any study measures. After signing the consent form, participants independently completed the questionnaires analyzed in the current study.
Statistical analyses were completed using SAS version 8.12 (SAS Institute, Cary, NC). Descriptive statistics were used to summarize demographic/clinical characteristics and scores on health status measures. Categorical variables are summarized in terms of frequencies and percentages, and for each continuous variable, the mean, standard deviation, median, range, percent at floor, and percent and ceiling are presented. In the current study, there were no missing data on any of the measures, and therefore, procedures for handling missing data were not followed. Internal consistency reliability is the extent to which individual items within a scale are related to one another. Internal consistency was examined for the PGWB subscales and global score using Cronbach's formula for coefficient alpha. Cronbach's alphas greater than 0.70 are generally considered acceptable [9].
Construct validity refers to the extent to which the instrument measures what it is intended to measure. To assess the construct validity of the EQ-5D and PGWB, both convergent and known-groups validity were examined. Convergent validity is the degree to which scores from the instrument undergoing evaluation are related to scores from an instrument measuring a similar construct. To examine convergent validity, Spearman correlations were performed to examine the relationship of the two generic measures with diabetes-specific patient-reported measures and patients' BMI. BMI was used as a criterion because body weight is an important health indicator for patients with type 2 diabetes. Roughly 40% to 50% of patients with diabetes meet criteria for obesity [26,46], and obesity is likely to exacerbate symptoms and metabolic abnormalities of type 2 diabetes, increase the risk of complications, and complicate the goal of achieving glycemic control [47][48][49][50]. Correlation coefficients were interpreted based on guidelines proposed by Cohen [51] suggesting that a correlation of 0.10 is small, 0.30 is moderate, and 0.50 is large. It was hypothesized that the generic measures under investigation would be significantly correlated with the condition-specific measures and patients' BMI, with correlation coefficients in the moderate to large range.
Known-groups validity is a scale's ability to discriminate among groups of patients who are known to differ by a key indicator. Using t-tests, EQ-5D and PGWB scores were compared among groups of patients who differed in the following characteristics: symptom burden as reported on condition-specific measures; type of pharmacological treatment (injectable insulin vs. oral medication only); preference for weight change (patients who would like to lose weight vs. patients who would like to stay the same weight); and experience with daytime hypoglycemia, nighttime hypoglycemia, and hyperglycemia (patients who had each of these experiences vs. patients who did not). It was hypothesized that the EQ-5D and PGWB would discriminate between groups of patients in each of these comparisons.

Sample description
A total of 130 eligible participants completed the study, 53 in Edinburgh and 77 in London ( Table 1). The sample in London was more ethnically diverse than the sample in Edinburgh, which was 100% White. There were no other statistically significant differences between the two geographic groups with respect to demographics (e.g., gender, age, marital status, employment) or clinical characteristics (e.g., BMI, age when first diagnosed with diabetes, current treatment). Therefore, data from the two cities were pooled for all analyses.
The majority of participants were male (n = 84; 64.6%), and the mean age of the total sample was 55.7 years old. Most of the patients were currently married (n = 79; 60.8%), over a third worked full-time (n = 46; 35.4%), and over a third were retired (n = 47; 36.2%). The mean Body Mass Index (BMI) of the total sample was 31.5,

Descriptive statistics and internal consistency reliability
There were no missing data on the EQ-5D ( Table 2). The great majority of the 130 participants reported no problems on the EQ-5D self-care item (n = 120; 92.3%), while the other four dimension items reflected greater rates of difficulty. Roughly one third of participants reported having at least some problems in the dimensions of mobility (n = 42; 32.3%) and usual activities (n = 41; 31.5%). Approximately a third of patients reported having either some problems (n = 35; 26.9%) or extreme problems (n = 8; 6.2%) in the anxiety/depression dimension. The greatest rates of difficulty were found in the pain/discom-fort dimension, with half of the sample reporting either some problems (n = 52; 40.0%) or extreme problems (n = 13; 10.0%). The pattern of responses was similar in the London and Edinburgh samples, although the participants in Edinburgh reported slightly greater rates of problems in mobility, self-care, and usual activities. The mean EQ-5D index score of 0.75 and VAS score of 68.8 both indicate a moderate level of overall impairment in this sample ( Table 3). The mean index score was somewhat lower for the Edinburgh sample (0.70) than for the London sample (0.79). Analysis of floor and ceiling effects revealed that 40% of the sample had the maximum EQ-5D index score of 1. The VAS did not have a similar ceiling effect.
On the PGWB, there were no missing data, and mean subscale scores were similar in the two samples ( Table 4). The greatest impairment was reflected in the vitality subscale, which had a mean score of 59.5 in the total sample of 130 participants. The positive well-being and general health  subscales also reflected some impairment (mean scores of 61.1 and 63.4, respectively). Higher scores were found on the anxiety, depressed mood, and self-control subscales (69.9, 78.9, and 78.0, respectively). On the depressed mood subscale, 23.8% of the sample had the maximum score of 100, indicating that almost a quarter of this sample reported no problems with depression. The PGWB demonstrated good internal consistency reliability, with Cronbach's alphas for the six subscales ranging from 0.77 to 0.92 ( Table 5). The alpha for the global score was 0.97.

Convergent validity
Convergent validity of the EQ-5D index score and VAS were demonstrated through significant correlations with the ADS, DSC-R, and patients' BMI (Table 6). These corre-lations were generally in the moderate to large range. Spearman correlations of the index score and VAS with the ADS had coefficients of -0.52 and -0.49, respectively (both p < 0.001). Correlations with the DSC-R total score were -0.64 and -0.53 (both p < 0.001). Correlations between the EQ-5D and the DSC-R subscales ranged from -0.33 to -0.61 (all p < 0.001), except for the correlations with the ophthalmology subscale which were somewhat smaller (r = -0.22 and -0.19; both p < 0.05). Correlations of the EQ-5D index score and VAS with patients' BMI were -0.27 (p < 0.01) and -0.38 (p < 0.001).
Convergent validity of the PGWB global score was also supported (

Known-groups validity
Known-groups validity of the EQ-5D index score and VAS was supported in several group comparisons (Table 7). Both EQ-5D scores significantly discriminated between groups of patients categorized based on median splits of their ADS score and DSC-R total score (all p < 0.001). Groups with higher scores (indicating greater symptom burden) on these two diabetes-specific instruments had lower EQ-5D scores (indicating lower HRQL). In addition, patients who wanted to lose weight (n = 113) had significantly lower EQ-5D scores than patients who wanted to stay the same weight (n = 16) (both p < 0.001).
Patients who had experienced daytime hypoglycemia had significantly lower mean EQ-5D index (p < 0.05) and VAS (p < 0.001) scores than patients who had not experienced daytime hypoglycemia. Results followed a similar pattern for hyperglycemia and nighttime hypoglycemia, although comparisons were not consistently statistically significant.
The PGWB also discriminated between groups of patients who differed in scores on diabetes-specific instruments and clinical characteristics (Table 7). Groups with higher scores (indicating greater symptom burden) on the ADS and DSC-R total score had lower PGWB global scores (indicating lower overall well-being) (both p < 0.001).
Patients who wanted to lose weight (n = 113) had a significantly lower mean PGWB global score than patients who wanted to stay the same weight (n = 16) (p < 0.05). The PGWB global score also discriminated between groups of patients differing with regard to whether they had experienced daytime hypoglycemia, nighttime hypoglycemia, and hyperglycemia (all p < 0.05). Known-groups validity of the PGWB subscales was also supported (Table 8). Statistically significant group differences were found in most comparisons involving the anxiety, positive well-being, self-control, general health, and vitality subscales. The depressed mood subscale discriminated between groups of patients determined by median splits on the ADS and DSC-R, but significant differences on this subscale were not found in the other group comparisons.
No scales of the EQ-5D or PGWB significantly discriminated between patients treated with injectable medication (i.e., insulin with or without concomitant oral medication; n = 31) and patients treated only with oral medications. In general, scores were somewhat higher in the oral treatment group, but these differences between groups were not statistically significant.

Discussion
Current findings provide support for the use of the EQ-5D and PGWB in patients with type 2 diabetes. The PGWB had good internal consistency reliability, and both instruments demonstrated excellent convergent and knowngroups validity. Convergent validity was supported through consistently significant correlations with selfreport, diabetes-specific symptom impact measures as well as patients' BMI, an objective health indicator that is particularly relevant to type 2 diabetes [20,26,52]. Furthermore, both the EQ-5D and PGWB discriminated between groups of patients who differed in self-reported impact of diabetes symptoms, preference for weight change, and experience with hyperglycemia and hypoglycemia.  The mean EQ-5D index score of 0.75 is consistent with EQ-5D ratings in other studies of patients with type 2 diabetes. For example, the mean index score was 0.74 in a sample of 1348 patients in the Netherlands [34] and 0.76 in a sample of 4189 patients from five European countries [32]. In previous studies, the PGWB has been scored in several ways (as described in the methods section of the current manuscript), thus making it difficult to compare among studies. In the current study, this instrument was scored using a relatively new standardized approach proposed in the recently published PGWB user's manual.
Using this approach, all subscale scores and the global score are normalized so that scores have a possible range of 0 to 100 [42]. The current results provide a benchmark for these normalized scores among a general sample of patients with type 2 diabetes in the UK.
Although the two generic measures were found to be valid in this sample, it would not be ideal to use either instrument as the only PRO in a clinical trial. A published analysis comparing generic to condition-specific measures across 43 randomized clinical trials found that the condition-specific measures were more responsive to change, particularly in studies with a nonzero therapeutic effect [6]. Thus, interpretation of trial outcomes based solely on generic instruments such as the PGWB or EQ-5D could fail to detect true treatment-related improvement, even though these two measures do appear to correlate well with condition-specific instruments. Furthermore, generic instruments may not capture the specific impairments within a particular population. For example, although all participants in the current sample had type 2 diabetes, 40% of the participants had the maximum EQ-5D index score of 1 which theoretically represents perfect health status. This ceiling effect suggests that the brief EQ-5D may not reflect the health-related problems of all patients with type 2 diabetes, particularly patients whose symptoms have an impact on functional domains other than the five EQ-5D dimensions. Given the different strengths of generic and condition-specific measures, we recommend using the EQ-5D and PGWB as part of an overall health outcomes battery that also includes condition-specific measures of symptom burden or HRQL. There are several well-validated instruments designed specifically to assess outcomes of treatment for type 2 diabetes [53,54], and the decision regarding which measures to use will depend on  the specific aims of each study. The generic measures examined in the current study can complement previously validated condition-specific measures by providing an estimate of overall HRQL and allowing for comparisons across trials and populations.
Results of the current study are limited by the fact that data are only available at one point in time. Consequently, neither test-retest reliability nor responsiveness to change could be evaluated. Evaluation of responsiveness using longitudinal data is needed to ensure that the EQ-5D and PGWB would be sensitive to change in patients' condition when true change has occurred in either a clinical trial or naturalistic setting. It is hoped that longitudinal studies will build on the current findings by examining these measurement properties of the EQ-5D, PGWB, and other generic instruments in patients with type 2 diabetes.

Conclusion
Psychometric instrument evaluation is an ongoing process, and confidence in a PRO's performance is strengthened as data accumulate from multiple studies and samples [55]. The current study provides initial data suggesting that the EQ-5D and PGWB are appropriate for use in patients with type 2 diabetes, and future research may provide additional support for this conclusion.