A conceptual framework for EQ-5D dimensions
The International Classification of Functioning, Disability and Health (ICF) and the Wilson and Cleary model [11] are two recommended models for conceptualizing the relationships between dimensions in HRQoL instruments. The ICF provides a standard language and framework for describing health and health-related states and comprises two parts, each with two components [12]. Part 1 refers to functioning and disability and consists of (a) body functions and structures, and (b) activities and participation. Part 2 refers to contextual factors incorporating (a) environmental factors, and (b) personal factors. Body functions refer to physiological and psychological functions of body systems (e.g. symptoms such as pain or anxiety), while activity refers to the execution of a task or action (e.g. self-care), and participation refers to involvement in a life situation (e.g. work). The EQ-5D-3L was classified in an ICF framework [13] using linking rules [14]. Its five dimensions were classified into two ICF components, such that pain/discomfort (PD) and anxiety/depression (AD) were linked to the ICF component of body functions, while mobility (MO), self-care (SC), and usual activities (UA) were linked to the ICF component of activity and participation.
The ICF has considerable overlap with the Wilson and Cleary model [15, 16] that depicts dominant causal pathways between five levels of health outcomes: biological and physiological factors, symptoms (corresponding to the ICF component of body functions and defined as the patient’s perception of an abnormal physical, emotional or cognitive state), functioning (corresponding to the ICF component of activity and participation), general health perceptions, and overall quality of life. The Wilson and Cleary conceptual model has been empirically validated in populations with different health conditions [17,18,19,20,21,22,23,24].
Based on these models, we propose the following causal pattern between the 5 EQ-5D dimensions. Firstly, the “symptom” dimensions of pain/discomfort (PD) and anxiety/depression (AD) were assumed to be primarily causal indicators, and the “activity/participation” dimensions of mobility (MO), self-care (SC), and usual activities (UA) to be effect variables, i.e. PD and AD cause changes in the HRQoL construct that are manifested as changes in MO, SC, and UA. Physiological symptoms such as pain and discomfort are clear drivers of activity/participation items and influence walking and self-care [25, 26] and daily activities [27]. Such symptoms are likely to be unidirectional, as it is unlikely that a change in mobility or self-care would alter the level of pain experienced. We assume a predominantly causal link between AD and activity/participation (MO, SC and UA), though with AD having less influence on MO (i.e. walking) than on SC and UA, as depressive symptoms explain only a small portion of the variability in mobility scores [28]. Anxiety and depression can cause disability by worsening other symptoms or by leading to limitations in activity, e.g. lack of interest in self-care [29] and activities of daily living [30]. It was noted, however, that emotional well-being may be bidirectional [2, 15], because physical symptoms, impairments, activity limitations, or participation restrictions can cause anxiety and/or depression [29].
Secondly, we assume mobility (MO) to be both cause and effect in nature, e.g. pain/discomfort (PD) can cause limitations in MO, which in turn can cause changes in SC and UA. This places MO in an intermediate position between PD and the other two activity/participation dimensions [3, 31]. Temporal priority has further been indicated by a hierarchical onset of disability among elderly people, where problems with walking preceded problems with self-care (e.g. bathing and dressing) [32].
Thirdly, we consider self-care (SC) and usual activities (UA) as similar dimensions that tap into activities of daily living. However, SC is more specific in that it refers to washing and dressing, while UA has a wider scope and encompasses participation in educational, employment, and social activities. Based on this conceptual framework, a number of testable models were specified (see Figs. 1 and 2) to be explained further below.
Data
An online survey was administered in 2012 in six countries (Australia, Canada, Germany, Norway, UK, US) by a global panel company [33]. Respondents were initially asked if they had any of seven listed chronic diseases and to rate their overall health on a [0–100] visual analogue scale (VAS), where 0 represented the least desirable health and 100 represented the best possible physical, mental, and social health. Respondents qualified for the “healthy group” if they reported no chronic diseases and a VAS rating of overall health of at least 70. Respondents then completed several HRQoL instruments, including the EQ-5D-5L. Of the 7933 respondents, 6173 reported a chronic disease (arthritis, asthma, cancer, depression, diabetes, hearing loss, heart disease). For further details on respondent recruitment, see Richardson et al., 2012 [33].
Distribution of EQ-5D health states
Spearman’s rank correlations were computed across the responses to the 5 EQ-5D dimensions. Frequency distributions of EQ-5D health states were used to examine the pattern of responses across the main distinction between symptoms (causes) vs activity/participation (effects). Two subscales were created with EQ-5D items: a Symptom subscale formed by summing the PD and AD level numbers (each from level 1 to 5), and an Activity/participation subscale formed by summing the MO, SC and UA level numbers. The relationship between the two subscales are illustrated with a graph, and descriptive statistics are provided in the Appendix.
Structural equation modelling (SEM)
Two model-testing procedures in SEM were used: confirmatory tetrad analysis (CTA) and confirmatory factor analysis (CFA). While CTA is assumed to be the best empirical approach for determining whether items should be treated as causal or effect indicators [8], agreement between the two approaches would provide more confidence in our conceptual model than either one alone [34, 35]. While both procedures investigate the path directionality between items and an underlying construct, they both have unique features that are applicable for the current investigation. First, CFA enables testing of the hypothesised intermediate position of mobility between PD and the underlying construct, while CTA allows comparison of models that are not nested in the standard log-likelihood ratio (LR) test, but nested according to the implied vanishing tetrads (explained below).
Confirmatory tetrad analysis
CTA seeks to determine whether items of a latent variable should be treated as causal or effect indicators [34, 36]. While a parameter estimator such as maximum likelihood (ML) method is usually applied when testing general SEM, the CTA test does not estimate parameters, but only tests model fit using Chi-square (χ2). The CTA test statistic depends on the tetrads produced by a model. Following Bollen and Ting [36], consider a latent variable indicated by four observed items (×1 – ×4). The effect of the latent variable to the items can be written as Eq. 1:
$$ {x}_i={\lambda}_{i1}{\xi}_1+{\delta}_i $$
(1)
where δi is the random measurement error (disturbance) term with Ε (δi) = 0 for all i,, COV (δi, δj) = 0 for i ≠ j, and COV (ξ1, δi) = 0 for all i. The population covariances (σij) of the observed items are given as Eq. 2 below:
$$ {\upsigma}_{\mathrm{ij}}={\lambda}_{i1}{\lambda}_{j1}\phi $$
(2)
where σij is the population covariance matrix of i and j items, and ϕ is the variance of ξ1.
A tetrad is ‘the difference between the product of a pair of covariances and the product of another pair among four random variables’ (Bollen & Ting, 2000, p.5) [34]. Thus, the four observed items produce six covariances, which can be arranged into three tetrads using Kelley’s notation [37], i.e.
$$ {\displaystyle \begin{array}{c}{\uptau}_{1234}={\upsigma}_{12}{\upsigma}_{34}-{\upsigma}_{13}{\upsigma}_{24}\\ {}{\uptau}_{1342}={\upsigma}_{13}{\upsigma}_{42}-{\upsigma}_{14}{\upsigma}_{32}\\ {}{\uptau}_{1423}={\upsigma}_{14}{\upsigma}_{23}-{\upsigma}_{12}{\upsigma}_{43}\end{array}} $$
(3)
where τijkl is the population tetrad that refers to σijσkl – σikσjl. If the tetrad equals to zero, that is τijkl = 0, it is referred to as a vanishing tetrad. Hence, if the four observed items were effect indicators, the model would imply three vanishing tetrads (i.e. all tetrads in Eq. 3 should equal to 0). Furthermore, vanishing tetrads implied by a model include redundant vanishing tetrads (i.e. any two of the vanishing tetrads in Eq. 3 would imply the third) [34]. Therefore, only two vanishing tetrads are non-redundant. Redundant vanishing tetrads should be excluded from the test. This exclusion makes covariance matrix of the tetrads that is part of the test statistic non-singular, and hence its inverse will exist. For a theoretical background on the tetrad, see [36].
Regardless of the number observed items, only four random variables (e.g. σ12, σ34, σ13 and σ24) are considered at a time, and this process is repeated for all combinations of the observed items. For every foursome of items, there are three possible vanishing tetrads. Considering an all-effect model with five observed variables (e.g. one item for each of the 5 EQ-5D dimensions), there will be five different combinations of four items, and each set will have three tetrads. Thus, the model would imply 15 vanishing tetrads. We could then test the hypothesis that H0: τ = 0 and H1: τ ≠ 0 based on sample data. If the vanishing tetrads implied by the model do vanish, it would produce a good fit of the model (a non-significant χ2 test), which would not reject the null hypothesis. If the test were highly significant, it would favour a causal indicator structure. However, if the χ2 test was 0 with 0 degrees of freedom, it would indicate an all-causal indicator model (as there are no model implied non-redundant vanishing tetrads with this structure) [8].
SEM models are traditionally referred to as nested when we constrain or free a set of parameters and conduct the LR test to statistically compare models. However, some models that are not nested in parameters can be nested in terms of vanishing tetrads. That is, models are nested ‘if the model-implied non-redundant vanishing tetrads from one model are contained within the set of implied non-redundant vanishing tetrads from the other model’ ([8], p.1532). When models are compared (i.e. nested), a χ2- difference test is formed, and a highly significant p-value would provide support for the model with fewest implied vanishing tetrads.
Three alternative models were developed for the CTA of EQ-5D dimensions (Fig. 1). Model 1 tested for any causal pattern, where all 5 EQ-5D items were treated as effect indicators, indicated by the arrows pointing away from the HRQoL construct. Models 2 and 3 are multiple cause multiple indicator (MIMIC) models: Model 2 tested whether symptom items (PD and AD) should be treated as causal indicators (indicated by the arrows pointing from the items to the HRQoL construct) and activity/participation items (MO, SC and UA) as effect indicators. Model 3 treated symptom items (PD and AD) and mobility (MO) as causal indicators, and SC and UA as effect indicators. A bootstrap tetrad test was used to minimize the problem of non-normality [38].
As explained above, an all-effect indicator model with the 5 EQ-5D items (Model 1) would imply 15 vanishing tetrads. However, a model specifying only the three activity/participation items as effect indicators (Model 2) would imply only nine vanishing tetrads (as a subset of the 15 vanishing tetrads). As illustrated in Bollen and Ting [34], this model implies nine tetrads as we always consider four random variables at a time, and any foursome of the items in Model 2 with 3 effect indicators would imply either three or one vanishing tetrads. Removing one causal indicator thus always leaves three items specified as effect indicators, whereas removing one effect indicator would always leave two items specified as effect indicators. A foursome that includes three or four effect indicators implies three vanishing tetrads (i.e. they are tetrad equivalent, which means they cannot be distinguished in terms of vanishing tetrads), while a foursome with two effect indicators implies only one vanishing tetrad. Considering Model 2 with three effect indicators and two causal indicators, the five subsets of four items would produce nine model-implied vanishing tetrads. That is, removing a casual indicator would imply three vanishing tetrads each (3 + 3). Removing an effect indicator would imply one vanishing tetrad each (1 + 1 + 1).
Following a similar procedure, Model 3 implies three vanishing tetrads. Note that a model with only one effect indicator has zero vanishing tetrads [34]. Both Model 2 and Model 3 could be compared with the all-effect indicator model with a nested CTA using χ2 difference test. If this test is highly significant, the model with the fewest vanishing tetrads would be favoured. In this scenario, the test is against the appropriateness of the additional vanishing tetrads implied by the all-effect indicator model. Note that models that are not nested in standard LR test can be nested in CTA. For instance, Model 3 in CTA has fewer vanishing tetrads than Model 2 and is therefore nested in Model 2. CTA is estimated using the Stata user command referred to as “tetrad” [39].
Confirmatory factor analysis
The models in Fig. 1 can be tested using CFA. Furthermore, a MIMIC model illustrated in Fig. 2 specified the hypothesized relationships among EQ-5D dimensions where MO has an intermediate position. (Due to the uncertain nature of AD and the investigation of reversed causality, alternative models were specified, not illustrated).
Maximum likelihood (ML) estimation is considered robust when using non-continuous data [40,41,42] or data that violate multivariate normality assumptions [43,44,45]. However, since ML can be affected by deviation from normality [46], bootstrap standard errors (with 1000 bootstrap draws) were used [47]. Model fit to data was examined using fit indices, i.e. the comparative fit index (CFI), the Tucker-Lewis index (TLI), root-mean square error of approximation (RMSEA), standardized root-mean square residual (SRMR), Akaike information criterion (AIC) and sample-size adjusted Bayesian information criterion (SABIC). CFI and TLI values greater than 0.95, and SRMR less than 0.08 represent a well-fitting model [48]. While RMSEA less than 0.05 is considered to reflect a good fit [49], values as high as 0.08 reflect adequate fit [50]. AIC and SABIC are only meaningful when different models are compared, and models with the lowest values are those with the best fit.
Statistical analyses were performed in Stata version 14.0 (StataCorp LP), except the path analyses which were performed with Mplus version 6.11.