Discriminative validity of the EQ-5D-5 L and SF-12 in older adults with arthritis

Background The EQ-5D-5 L and the SF-12 are the most commonly used generic measures of health-related quality of life among people with arthritis. However, there is little evidence on the extent to which the individual dimensions and domains of these instruments perform among this population. The objective was to examine the discriminative validity of the EQ-5D-5 L and the SF-12 version 2 (and SF-6D) in capturing the burden of arthritis on health-related quality of life in older adults. Methods Cross-sectional data from the Alberta Retired Teachers Association survey were used. A known-groups approach, with a-priori hypotheses, was used to examine the discriminative validity of the domain and summary scores of the EQ-5D-5 L and the SF-12 version 2 (and SF-6D). Groups were defined based on self-reported of arthritis, chronic pain level, presence and number of comorbidities, and self-reported health status. Results Mean age of respondents (N = 2844) was 68.6 (standard deviation [SD] 5.9) years; 54.8% were female, with mean body mass index (BMI) of 27.2 kg/m2 (SD 4.8), and 36.6% reported having arthritis. The overall mean EQ-5D-5 L index score was 0.86 (SD 0.11) and that of SF-6D was 0.79 (SD 0.13). Participants with arthritis had lower EQ-5D-5 L index score (0.83, SD 0.13) and SF-6D index score (0.75, SD 0.13) compared to those without arthritis (0.88, SD 0.09 and 0.81, SD 0.12, respectively). EQ-5D-5 L and SF-6D index scores demonstrated moderate discriminative validity with a moderate effect size (0.5). Related dimensions and domains between the EQ-5D-5 L and SF-12 (e.g., mobility with physical functioning score, pain/discomfort with bodily pain and anxiety/depression with mental health) were moderately to strongly correlated (r = 0.6–0.7). Both instruments could not adequately discriminate between participants with moderate and severe chronic pain of 6-month duration. Conclusion Overall, the EQ-5D-5 L pain/discomfort and mobility dimensions, and the SF-12 bodily pain scale had moderate discriminative ability among older adults with arthritis. However, both instruments had limited discriminative ability for chronic pain. The importance and nature of chronic pain assessment in a given application need to be considered when choosing any of these instruments for measuring health-related quality of life in this patient population. Electronic supplementary material The online version of this article (10.1186/s12955-019-1129-6) contains supplementary material, which is available to authorized users.


Background
Arthritis is one of the most common chronic conditions affecting men and women in Canada [1]. In 2014, around 4.8 million (16.5%) Canadians aged 15 and older reported having arthritis; this prevalence is projected to increase to 7 million (20%) by 2031 [1,2]. The impact of arthritis on the Canadian health care economy and lost productivity was estimated to be 33 billion dollars in 2011 [3]. Osteoarthritis and rheumatoid arthritis are the most common forms of arthritis in Canada [3]. Arthritis brings with it a burden of pain and disability that affect those living with this disease every day [3]. Because of the chronic nature of arthritis, health related quality of life (HRQL) has become an important indicator of the burden of this disease [4]. Persons with arthritis report an average of 4.0 more days of impaired physical health and 2.3 more days of impaired mental health during the previous 30 days compared to those without arthritis [4].
Generic preference-based measures such as the EQ-5D [5] and health profile measures such as the SF-12 [6] have been most commonly used in assessing HRQL of arthritis patients. The ability of the EQ-5D and SF-12 [7] to discriminate between patients with different severity levels of arthritis will help determine the health burden in this population. Discriminative validity is the ability of an instrument to discriminate between different groups who would differ in their HRQL [8]. Evidence on discriminative validity can inform the choice of these measures for use in specific applications.
The EQ-5D initially had 3 levels (3 L) of problems for each of the five dimensions (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression). More recently, a 5-level (5 L) version of the EQ-5D has been developed, with the aim of improving measurement of health status [9]. Several studies have compared the measurement properties of the EQ-5D-3 L and SF-36, SF-12 or SF-6D (a preference-based single index of health derived from the SF-36 or SF-12) questionnaires among patient with knee pain, osteoarthritis, rheumatoid arthritis and psoriatic arthritis [5,[10][11][12][13]. In these studies, the severity of these conditions was measured with the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) and the Michigan Hand Questionnaire. Evidence on the discriminative validity of the instruments from these studies is inconclusive and highlights the extent of disagreement in literature on the discriminative ability of the EQ-5D-3 L and SF-12 questionnaires in arthritis [5,[10][11][12][13]. Importantly, all of these studies were conducted using the 3 L version of the EQ-5D and did not examine the discriminative validity of the dimension and domain scores of each of these instruments [5].
There is little evidence on how well can the EQ-5D-5 L discriminate between patients with arthritis, how it compares to the SF-12, and how each of the individual dimensions and scales of these measures performs in this patient population. Our objective was to examine the discriminative validity of the individual components and overall scores of the EQ-5D-5 L and the SF-12 version 2 (and the SF-6D) in older adults with arthritis.

Data
This study involved secondary analysis of cross-sectional data collected through a survey of members of the Alberta Retired Teachers Association (ARTA), in Alberta, Canada. The objective of the ARTA survey was to better understand the factors that affect care and health outcomes in this population of older adults. Participants were recruited by sending an email with a link to the online survey to all members registered with ARTA (N = 14,248). Participants were included if they were retired Alberta Teachers (as described by the Alberta Retired Teachers Association). There was no restriction on gender, race, and health status. Participants were able to understand, read, speak English and they were of sound mind to complete the survey. A total of 6275 (44%) participants responded to the survey, with 2514 (18%) completing the survey. Participant information sheet and consent were completed before participating in the survey. Ethical approval was obtained from the health research ethics board at the University of Alberta.

Measures
The EQ-5D-5 L is a preference-based measure of HRQL [9]. It consists of five dimensions (mobility, self-care, usual activities, pain/discomfort and anxiety/depression) and a visual analogue scale (VAS). Each dimension has 5 levels of problems: 1 "no problems", 2 "slight problems", 3 "moderate problems", 4 "severe problems" and 5 "extreme problems". Respondents are asked to indicate their level of functioning on each of the five dimensions on that day [9]. The EQ-5D-5 L describes 3125 distinct health states, with 11111 representing the best and 55555 the worst possible health states. Index scores were generated using the Canadian scoring algorithm [14], ranging from − 0.148 for worst (55555) to 0.949 for best (11111) health states [14]. A minimally important difference (MID) for the index score has been suggested as 0.04 [15]. The VAS score ranges from 0 (worst health imaginable) to 100 (best health imaginable).
The SF-12 version 2 is derived from items in the SF-36 health survey [16]. It measures eight domains of functioning and well-being, asking respondents to consider their health status over the past 4 weeks: physical functioning (PF), role limitations due to physical problems (RP), bodily pain (BP), general health perceptions (GH), energy and vitality (VT), social functioning (SF), role limitations due to emotional problems (RE) and mental health (MH) [16]. The eight domains can further be summarized into physical component summary (PCS) and a mental component summary (MCS) scores, which are interpreted as standardized T-scores, with a mean of 50 and SD of 10. SF-12 scales and summary scores were calculated according to Fleishman, Selim and Kazis [14]. Domain scores were also converted into T-scores for this analysis. The SF-6D is derived using 7 items from the SF-12. The SF-6D classification system consists of six domains including PF, RP, BP, VT, SF and MH describing 7500 unique health states [6]. SF-6D index scores were generated based on the UK general population preferences, ranging from 0.345 for worst (345555) to 1.0 for best (111111) health states [6]. The first item of the SF-12 (SF-1) was also used as an indicator of the overall health of participants.
Pain intensity was measured using the 11-point Numeric Pain Rating Scale. Participants indicated their pain intensity over the past 6 months. Pain intensity was subsequently categorized into: 0-no pain, 1-3-slight pain, 4-6-moderate pain, and 7-10-severe pain [17,18]. Comorbidities were assessed using self-reported chronic condition question. Participants were asked if a health professional has ever told them that they had any of the following 13 conditions: arthritis (e.g. osteoarthritis, rheumatoid), respiratory diseases, cardiovascular diseases, diabetes, high lipids, renal diseases, bowel disorders, musculoskeletal disorders, thyroid dysfunction, eye diseases, cancer, neurological diseases, and mental/psychological conditions. The total number of comorbidities reported by each participant was calculated, and categorized into: 0, 1, 2, and 3 or more chronic conditions. Other data included age, sex, highest level of education, current employment status, total annual household income, ethnicity, and body mass index (BMI), calculated using self-reported weight and height.

Statistical Analysis
Participants with complete EQ-5D-5 L and SF-12 data were included in this analysis (N = 2844). Descriptive statistics were computed for all variables. Ceiling effects for each measure were reported as the percentage of participants in the best possible health states (11111 for EQ-5D-5 L and 111111 for SF-6D).
The discriminative validity of the EQ-5D-5 L and SF-12 was evaluated by first, examining the extent to which each instrument distinguished between participants with or without arthritis, and then, among those with arthritis, based on the level of pain (no, mild, moderate, severe) as a proxy for severity of the condition. The number of other chronic conditions (none, 1, 2 or 3 + chronic conditions), and self-reported health (excellent, very good, good and fair/poor) were used as indicators of the overall health of participants. One-way ANOVA, independent group t-test and chi-square test were used to test differences between examined groups as appropriate. Effect size was used to examine the magnitude of differences and was calculated as the difference in mean scores between participants with and those without arthritis, divided by the pooled standard deviation. Effect size was interpreted as: 0.2-0.49 small; 0.5-0.79 moderate and > =0.8 as large [19].
We hypothesized that EQ-5D-5 L index score, VAS, SF-12 domain and summary scores and SF-6D index score would all be lower in arthritis patients with higher levels of pain severity, while an inverse relationship was expected for the EQ-5D-5 L dimensions. Secondly, EQ-5D-5 L index score, VAS, SF-12 domain and summary scores and SF-6D index score were expected to be lower in arthritis patients with more comorbidities, with an inverse relationship for EQ-5D-5 L dimensions. Finally, EQ-5D-5 L index score, VAS, SF-12 domain and summary scores and SF-6D index score were expected to be lower in arthritis patients with poorer general health status, with an inverse relationship for EQ-5D-5 L dimensions.
The relationships between the EQ-5D-5 L and the SF-12 components were examined using Spearman's correlation coefficients. Coefficients of < 0.20 were considered absent, 0.20-0.34 as weak, 0.35-0.50 as moderate and > 0.5 was considered strong correlation [20]. We expected mobility, self-care, usual activities and pain/discomfort dimensions of the EQ-5D-5 L to have a strong negative correlation with the SF-12 PCS and PF, RF, BP, and GH. Anxiety/depression dimension of the EQ-5D-5 L was also expected to have a strong negative correlation with the SF-12 MCS, SF, RE, and MH. A negative weak correlation was expected between the mobility, self-care, usual activities and pain/discomfort dimensions of the EQ-5D-5 L and mental health domains and summary scores (MCS, SF, RE, and MH) of the SF-12.

General characteristics of participants
Of the 6275 respondents to the ARTA survey, 2844 had complete EQ-5D-5 L and SF-12 data and were included in this analysis. The average age of participants was 68.6 (SD 5.9; range 51 to 86) years; 55% were female. Half (51%) of the participants had a university education, 91% were retired, 41% had total annual household income between CAD$60,000-99,999, and 97% were Caucasian (Table 1). Participants had an average BMI of 27.2 (SD 4.8; range 13.1 to 56.8), with 64% being overweight or obese. Participants reported an average of 2.0 (SD 1.7) chronic conditions, while 34% reported 3 or more chronic conditions. The most common chronic condition among this population was arthritis (37%). Other musculoskeletal conditions, including osteoporosis, fibromyalgia and back problems, were reported by 31%, while high lipids and eye diseases were reported by 23 and 24%, respectively ( Table 1).
The mean EQ-5D-5 L index score was 0.86 (SD 0.11), with a score ranging from 0 to 0.95, and a distribution skewed towards full health with a ceiling effect of 23% (n = 662) (Additional file 1: Figure S1). The mean SF-6D index score was 0.79 (0.13), with a score ranging from 0.37 to 1, and a bi-modal distribution with a ceiling effect of 7% (n = 194) (Additional file 1: Figure S1). For respondents with arthritis, the mean age was 68.1 (SD 5.8) years, and 61% were female ( Table 1). The respondents with arthritis also reported more comorbidities, with 37% reporting 3 or more other chronic conditions compared to 18.7% in those without arthritis.
Discriminative validity of EQ-5D-5 L and SF-12 The mean difference in index scores between participants with and without arthritis was 0.05 for EQ-5D-5 L and 0.06 for SF-6D, representing a moderate effect size (0.5) for both index scores ( Table 2). The difference between the number of participants with and without arthritis who reported having problems on the EQ-5D-5 L pain/discomfort and mobility dimensions was 25 and 28%, respectively. However, the differences between those with and without arthritis who reported problems with self-care and feeling anxious/ depressed were only 4 and 7%, respectively. Participants with arthritis also had lower PCS (by 4.4) and MCS (by 2.4) scores compared to those without arthritis ( Table 2). Effect sizes were moderate for PCS (0.5), PF (0.5), RP (0.5), while BP had the largest effect size of 0.6 ( Table 2).
Both EQ-5D-5 L and SF-6D index scores were significantly different across the groups defined by severity of  (Tables 3, 4 and 5), with differences ranging from 0.02 to 0.21 for EQ-5D-5 L index scores, and 0.02 to 0.13 for SF-6D index scores. Participants with more comorbidities had lower EQ-5D-5 L index scores, and more problems on the EQ-5D-5 L dimensions compared to those with fewer comorbidities. Similarly, those with lower self-rating of health had lower EQ-5D-5 L index scores and more problems on each of the dimensions. The SF-12 domain and summary scores were lower for participants with higher levels of chronic pain, more comorbidities, and lower general self-rated health status. Participants with moderate-severe chronic pain on the pain rating scale reported similar mean EQ-5D-5 L index scores (0.79, SD 0.13 and 0.79, SD 0.19 respectively). Contrary to expectations, participants with moderate All differences were statistically significant at P < 0.001 chronic pain on the pain rating scale reported lower SF-6D index scores (0.71, SD 0.12) compared to those with severe chronic pain (0.74, SD 0.15). The same reverse pattern was also observed in the EQ-5D-5 L dimensions and SF-12 domain and summary scales (Table 3). Counter to our expectations, 19% of participants with severe chronic pain reported no problems on the pain/discomfort dimension of the EQ-5D-5 L.

Discussion
In this study, we found that the EQ-5D-5 L and SF-6D index scores demonstrated moderate discriminative validity in this population. However, the EQ-5D-5 L dimensions and SF-12 domain scores demonstrated limited discriminative validity with small effect sizes. This was consistent with previous reports on the EQ-5D-5 L and SF-36, SF-12 and SF-6D in the general population, All differences were statistically significant at P < 0.001 people with chronic pain, osteoarthritis and type 2 diabetes [21][22][23][24][25][26][27][28]. Significant differences in the discriminative validity of the EQ-5D-5 L dimensions and the SF-12 domains have also been reported in literature; however, the magnitudes of the differences were small to moderate [21][22][23]. Furthermore, more participants responded to levels 3,4 and 5 of the EQ-5D-5 L compared to the responses on the level 3 for the EQ-5D-3 L as reported in literature [5]. The difference could be attributed to the high ceiling effect of the earlier 3 L version of the EQ-5D. The use of Canadian value set for the EQ-5D-5 L index score and UK value set for the SF-6D index score could also contribute to the differences observed in this study [23,24,26].
An important observation in this study was the limited ability of both the EQ-5D-5 L and SF-12 to discriminate between participants with moderate versus severe chronic pain of 6-months duration. Participants with moderate chronic pain reported more problems on the EQ-5D-5 L pain/discomfort dimension compared to those with severe All differences were statistically significant at P < 0.001 chronic pain. In addition, 19% of participants who reported severe pain on the chronic pain scale over the last 6 months, reported no problems on the pain/discomfort dimension of the EQ-5D. These results could be related to the reference period for reporting the pain level in each of these instruments. The EQ-5D-5 L asks respondents to indicate their level of pain/discomfort "today", the SF-12 assesses pain "over the past 4 weeks", while the chronic pain scale asks about pain over the last 6 months. Stone et al. [29] employed a novel Ecological Momentary Assessment (EMA) approach to assess pain over a week in patients with rheumatoid arthritis. They found large individual differences in variation of pain and fatigue over time and at different times of the day. In our current study, it is possible that participants reporting severe pain of 6 months duration on the chronic pain scale could have been experiencing very mild to moderate pain on the day (based on the EQ-5D-5 L) or during last 4 weeks (based on the SF-12) of assessment of the pain. Momentary assessment of pain in our study could have identified times All differences were statistically significant at P < 0.001 within the day in which participants had higher or lower levels of pain, which could be attributed to medication intake, stressors or adequacy of sleep [29]. This study has some limitations. Firstly, the study had a low response rate and was limited to well educated, Caucasian, older adults, which is to a great extent inherent in the fact that the sampling frame was retired teachers and their dependents. This limits the generalizability of these results to other arthritis populations that may be different from this sample. Second, the Canadian value set was used for calculating the EQ-5D-5 L index score, while the UK value set was used for the SF-6D index score. It is possible that the UK general population has different valuations of health from the Canadian general population. However, this only affects the index scores and not the dimensions or domains of the instruments. Third, self-reported data has the potential of being under or over reported, which could bias our results. We are unable to estimate the extent to which this may have happened in this survey; nonetheless, even if the prevalence of arthritis may have been under-reported, this would not impact the assessment of discriminative validity of these instruments. Finally, we did not have data on the type, location, and severity of arthritis or the pain intensity; this information would have been very useful in examining the discriminative validity of these instruments in patient sub-groups and should be explored in future research.

Conclusion
This study found moderate discriminative validity for the EQ-5D-5 L and SF-6D index scores. Pain/discomfort and mobility dimension of the EQ-5D-5 L and bodily pain domain of the SF-12 demonstrated the highest discriminative power. However, the results suggest that these instruments are limited in their ability to capture the burden of pain in this patient population due to variation of pain duration and severity over time. This has an important implication for researchers and clinicians who use these tools in the assessment of chronic pain. The duration of pain and the time of assessment of pain are very important in capturing HRQL; these should be considered in interpreting data obtained from these instruments in patients with arthritis.