Exploring the causal and effect nature of EQ-5D dimensions: an application of confirmatory tetrad analysis and confirmatory factor analysis

Background The relationship between the various items in an HRQoL instrument is a key aspect of interpreting and understanding preference weights. The aims of this paper were i) to use theoretical models of HRQoL to develop a conceptual framework for causal and effect relationships among the five dimensions of the EQ-5D instrument, and ii) to empirically test this framework. Methods A conceptual framework depicts the symptom dimensions [Pain/discomfort (PD) and Anxiety/depression (AD)] as causal indicators that drive a change in the effect indicators of activity/participation [Mobility (MO), Self-care (SC) and Usual activities (UA)], where MO has an intermediate position between PD and the other two effect dimensions (SC and UA). Confirmatory tetrad analysis (CTA) and confirmatory factor analysis (CFA) were used to test this framework using EQ-5D-5L data from 7933 respondents in six countries, classified as healthy (n = 1760) or in one of seven disease groups (n = 6173). Results CTA revealed the best fit for a model specifying SC and UA as effect indicators and PD, AD and MO as causal indicators. This was supported by CFA, revealing a satisfactory fit to the data: CFI = 0.992, TLI = 0.972, RMSEA = 0.075 (90% CI 0.062–0.088), and SRMR = 0.012. Conclusions The EQ-5D appears to include both causal indicators (PD and AD) and effect indicators (SC and UA). Mobility played an intermediate role in our conceptual framework, being a cause of problems with Self-care and Usual activities, but also an effect of Pain/discomfort. However, the empirical analyses of our data suggest that Mobility is mostly a causal indicator.


Background
Health-related quality of life (HRQoL) instruments comprise items that relate to various aspects of health and functioning. Previous research has attempted to classify the items included in these instruments as being causal or effect indicators of HRQoL [1]. Effect indicators (also called reflective indicators) can be seen as manifestations of an underlying construct. Thus, a change in the construct will lead to, or drive, a change in the effect indicators. In contrast, causal indicators (also called formative indicators) drive a change in the construct. There is evidence to suggest that symptoms have a strong causal component that drives a change in other items [2,3]. The research into the causal nature of various HRQoL items has been limited to disease-specific instruments. No studies have investigated causal relationships in generic preference-based measures of HRQoL, commonly referred to as health state utility (HSU) instruments [4], which have an important role in cost-effectiveness analyses that are increasingly being used to aid policy decisions. Based on theoretical models, and methodological lessons from previous research, this paper seeks to fill a knowledge gap by identifying a causal pattern in the most widely applied HSU instrument, the EQ-5D [5][6][7]. The causal pattern of items in the cancer-specific EORTC QLQ-C30 instrument has been investigated in three studies. Using applied graphical methods and cross-tabulation of response frequencies, Fayers et al. found strong evidence that physiological symptom items (e.g. nausea, memory problems, shortness of breath) were causal, while items such as poor concentration, irritability, and feeling tense were likely to be effect indicators [2]. Boehmer and Luszczynska applied confirmatory factor analysis and found satisfactory fit for a model with both causal indicators (symptoms e.g. fatigue, pain) and effect indicators (e.g. physical, role, cognitive, social, and emotional functioning) [3]. It was noted that physical functioning and pain might be intermediate types of indicators. Using eight EORTC QLQ-C30 items, Bollen et al. provided an example of confirmatory tetrad analysis (CTA) and concluded that symptom items (e.g. shortness of breath, problems sleeping, lack of appetite) should be treated as causal indicators, while global health status and quality of life should be treated as effect indicators [8].
Factor analysis is a common psychometric approach to investigate the relationship between items and unobserved constructs, which is one technique in structural equation modelling (SEM) used for scale design and validation. However, factor analysis usually depends on a set of homogenous items and is often not appropriate if both causal and effect items are present [2]. However, other SEM techniques incorporate causal paths to model the relationship among different types of items [9,10]. Confirmatory tetrad analysis may be the best empirical approach for determining if items should be treated as causal or effect indicators [8]. This paper is the first to apply CTA in HSU instruments.
The aims of the current paper were: first, to develop a conceptual framework for causal and effect relationships among the five dimensions of the EQ-5D instrument based on theoretical models of HRQoL, and second, to test this framework using data on EQ-5D-5L from six countries (N = 7933). More knowledge on the causal pattern is useful for at least two reasons: i) it provides a better understanding of the relative importance of the five health dimensions as reflected in the preference-based value sets, and; ii) it provides insights into the discussion on whether and how the QALY might be extended, e.g. by expanding the descriptive system to include additional symptom items (causal) or functioning items (effect).

A conceptual framework for EQ-5D dimensions
The International Classification of Functioning, Disability and Health (ICF) and the Wilson and Cleary model [11] are two recommended models for conceptualizing the relationships between dimensions in HRQoL instruments. The ICF provides a standard language and framework for describing health and health-related states and comprises two parts, each with two components [12]. Part 1 refers to functioning and disability and consists of (a) body functions and structures, and (b) activities and participation. Part 2 refers to contextual factors incorporating (a) environmental factors, and (b) personal factors. Body functions refer to physiological and psychological functions of body systems (e.g. symptoms such as pain or anxiety), while activity refers to the execution of a task or action (e.g. self-care), and participation refers to involvement in a life situation (e.g. work). The EQ-5D-3L was classified in an ICF framework [13] using linking rules [14]. Its five dimensions were classified into two ICF components, such that pain/ discomfort (PD) and anxiety/depression (AD) were linked to the ICF component of body functions, while mobility (MO), self-care (SC), and usual activities (UA) were linked to the ICF component of activity and participation.
The ICF has considerable overlap with the Wilson and Cleary model [15,16] that depicts dominant causal pathways between five levels of health outcomes: biological and physiological factors, symptoms (corresponding to the ICF component of body functions and defined as the patient's perception of an abnormal physical, emotional or cognitive state), functioning (corresponding to the ICF component of activity and participation), general health perceptions, and overall quality of life. The Wilson and Cleary conceptual model has been empirically validated in populations with different health conditions [17][18][19][20][21][22][23][24].
Based on these models, we propose the following causal pattern between the 5 EQ-5D dimensions. Firstly, the "symptom" dimensions of pain/discomfort (PD) and anxiety/depression (AD) were assumed to be primarily causal indicators, and the "activity/participation" dimensions of mobility (MO), self-care (SC), and usual activities (UA) to be effect variables, i.e. PD and AD cause changes in the HRQoL construct that are manifested as changes in MO, SC, and UA. Physiological symptoms such as pain and discomfort are clear drivers of activity/participation items and influence walking and self-care [25,26] and daily activities [27]. Such symptoms are likely to be unidirectional, as it is unlikely that a change in mobility or self-care would alter the level of pain experienced. We assume a predominantly causal link between AD and activity/participation (MO, SC and UA), though with AD having less influence on MO (i.e. walking) than on SC and UA, as depressive symptoms explain only a small portion of the variability in mobility scores [28]. Anxiety and depression can cause disability by worsening other symptoms or by leading to limitations in activity, e.g. lack of interest in self-care [29] and activities of daily living [30]. It was noted, however, that emotional well-being may be bidirectional [2,15], because physical symptoms, impairments, activity limitations, or participation restrictions can cause anxiety and/or depression [29].
Secondly, we assume mobility (MO) to be both cause and effect in nature, e.g. pain/discomfort (PD) can cause limitations in MO, which in turn can cause changes in SC and UA. This places MO in an intermediate position between PD and the other two activity/participation dimensions [3,31]. Temporal priority has further been indicated by a hierarchical onset of disability among elderly people, where problems with walking preceded problems with self-care (e.g. bathing and dressing) [32].
Thirdly, we consider self-care (SC) and usual activities (UA) as similar dimensions that tap into activities of daily living. However, SC is more specific in that it refers to washing and dressing, while UA has a wider scope and encompasses participation in educational, employment, and social activities. Based on this conceptual framework, a number of testable models were specified (see Figs. 1 and 2) to be explained further below.

Data
An online survey was administered in 2012 in six countries (Australia, Canada, Germany, Norway, UK, US) by a global panel company [33]. Respondents were initially asked if they had any of seven listed chronic diseases and to rate their overall health on a [0-100] visual analogue scale (VAS), where 0 represented the least desirable health and 100 represented the best possible physical, mental, and social health. Respondents qualified for the "healthy group" if they reported no chronic diseases and a VAS rating of overall health of at least 70. Respondents then completed several HRQoL instruments, including the EQ-5D-5L. Of the 7933 respondents, 6173 reported a chronic disease (arthritis, asthma, cancer, depression, diabetes, hearing loss, heart disease). For further details on respondent recruitment, see Richardson et al., 2012 [33].

Distribution of EQ-5D health states
Spearman's rank correlations were computed across the responses to the 5 EQ-5D dimensions. Frequency distributions of EQ-5D health states were used to examine the pattern of responses across the main distinction between symptoms (causes) vs activity/participation (effects). Two subscales were created with EQ-5D items: a Symptom subscale formed by summing the PD and AD level numbers (each from level 1 to 5), and an Activity/participation subscale formed by summing the MO, SC and UA level numbers. The relationship between the two subscales are illustrated with a graph, and descriptive statistics are provided in the Appendix.

Structural equation modelling (SEM)
Two model-testing procedures in SEM were used: confirmatory tetrad analysis (CTA) and confirmatory factor analysis (CFA). While CTA is assumed to be the best empirical approach for determining whether items should be treated as causal or effect indicators [8], agreement between the two approaches would provide more confidence in our conceptual model than either one alone [34,35]. While both procedures investigate the path directionality between items and an underlying construct, they both have unique features that are applicable for the current investigation. First, CFA enables testing of the hypothesised intermediate position of mobility between PD and the underlying construct, while CTA allows comparison of models that are not nested in the standard log-likelihood ratio (LR) test, but nested according to the implied vanishing tetrads (explained below).

Confirmatory tetrad analysis
CTA seeks to determine whether items of a latent variable should be treated as causal or effect indicators [34,36]. While a parameter estimator such as maximum likelihood (ML) method is usually applied when testing general SEM, the CTA test does not estimate parameters, but only tests model fit using Chi-square (χ 2 ). The CTA test statistic depends on the tetrads produced by a model. Following Bollen and Ting [36], consider a latent variable indicated by four observed items (× 1 -× 4 ). The effect of the latent variable to the items can be written as Eq. 1: where δ i is the random measurement error (disturbance) term with Ε (δ i ) = 0 for all i, , COV (δ i , δ j ) = 0 for i ≠ j, and COV (ξ 1 , δ i ) = 0 for all i. The population covariances (σ ij ) of the observed items are given as Eq. 2 below: where σ ij is the population covariance matrix of i and j items, and ϕ is the variance of ξ 1 . A tetrad is 'the difference between the product of a pair of covariances and the product of another pair among four random variables' (Bollen & Ting, 2000, p.5) [34]. Thus, the four observed items produce six covariances, which can be arranged into three tetrads using Kelley's notation [37], i.e.
where τ ijkl is the population tetrad that refers to σ ij σ kl -σ ik σ jl . If the tetrad equals to zero, that is τ ijkl = 0, it is referred to as a vanishing tetrad. Hence, if the four observed items were effect indicators, the model would imply three vanishing tetrads (i.e. all tetrads in Eq. 3 should equal to 0). Furthermore, vanishing tetrads implied by a model include redundant vanishing tetrads (i.e. any two of the vanishing tetrads in Eq. 3 would imply the third) [34]. Therefore, only two vanishing tetrads are non-redundant. Redundant vanishing tetrads should be excluded from the test. This exclusion makes covariance matrix of the tetrads that is part of the test statistic non-singular, and hence its inverse will exist. For a theoretical background on the tetrad, see [36].
Regardless of the number observed items, only four random variables (e.g. σ 12, σ 34, σ 13 and σ 24) are considered at a time, and this process is repeated for all combinations of the observed items. For every foursome of items, there are three possible vanishing tetrads. Considering an all-effect model with five observed variables (e.g. one item for each of the 5 EQ-5D dimensions), there will be five different combinations of four items, and each set will have three tetrads. Thus, the model would imply 15 vanishing tetrads. We could then test the hypothesis that H 0 : τ = 0 and H 1 : τ ≠ 0 based on sample data. If the vanishing tetrads implied by the model do vanish, it would produce a good fit of the model (a non-significant χ 2 test), which would not reject the null hypothesis. If the test were highly significant, it would favour a causal indicator structure. However, if the χ 2 test was 0 with 0 degrees of freedom, it would indicate an all-causal indicator model (as there are no model implied non-redundant vanishing tetrads with this structure) [8].
SEM models are traditionally referred to as nested when we constrain or free a set of parameters and conduct the LR test to statistically compare models. However, some models that are not nested in parameters can be nested in terms of vanishing tetrads. That is, models are nested 'if the model-implied non-redundant vanishing tetrads from one model are contained within the set of implied non-redundant vanishing tetrads from the other model' ( [8], p.1532). When models are compared (i.e. nested), a χ 2 -difference test is formed, and a highly significant p-value would provide support for the model with fewest implied vanishing tetrads.
Three alternative models were developed for the CTA of EQ-5D dimensions (Fig. 1). Model 1 tested for any causal pattern, where all 5 EQ-5D items were treated as effect indicators, indicated by the arrows pointing away from the HRQoL construct. Models 2 and 3 are multiple cause multiple indicator (MIMIC) models: Model 2 tested whether symptom items (PD and AD) should be treated as causal indicators (indicated by the arrows pointing from the items to the HRQoL construct) and activity/participation items (MO, SC and UA) as effect indicators. Model 3 treated symptom items (PD and AD) and mobility (MO) as causal indicators, and SC and UA as effect indicators. A bootstrap tetrad test was used to minimize the problem of non-normality [38].
As explained above, an all-effect indicator model with the 5 EQ-5D items (Model 1) would imply 15 vanishing tetrads. However, a model specifying only the three activity/participation items as effect indicators (Model 2) would imply only nine vanishing tetrads (as a subset of the 15 vanishing tetrads). As illustrated in Bollen and Ting [34], this model implies nine tetrads as we always consider four random variables at a time, and any foursome of the items in Model 2 with 3 effect indicators would imply either three or one vanishing tetrads. Removing one causal indicator thus always leaves three items specified as effect indicators, whereas removing one effect indicator would always leave two items specified as effect indicators. A foursome that includes three or four effect indicators implies three vanishing tetrads (i.e. they are tetrad equivalent, which means they cannot be distinguished in terms of vanishing tetrads), while a foursome with two effect indicators implies only one vanishing tetrad. Considering Model 2 with three effect indicators and two causal indicators, the five subsets of four items would produce nine model-implied vanishing tetrads. That is, removing a casual indicator would imply three vanishing tetrads each (3 + 3).
Following a similar procedure, Model 3 implies three vanishing tetrads. Note that a model with only one effect indicator has zero vanishing tetrads [34]. Both Model 2 and Model 3 could be compared with the all-effect indicator model with a nested CTA using χ 2 difference test. If this test is highly significant, the model with the fewest vanishing tetrads would be favoured. In this scenario, the test is against the appropriateness of the additional vanishing tetrads implied by the all-effect indicator model. Note that models that are not nested in standard LR test can be nested in CTA. For instance, Model 3 in CTA has fewer vanishing tetrads than Model 2 and is therefore nested in Model 2. CTA is estimated using the Stata user command referred to as "tetrad" [39].

Confirmatory factor analysis
The models in Fig. 1 can be tested using CFA. Furthermore, a MIMIC model illustrated in Fig. 2 specified the hypothesized relationships among EQ-5D dimensions where MO has an intermediate position.
(Due to the uncertain nature of AD and the investigation of reversed causality, alternative models were specified, not illustrated).
Maximum likelihood (ML) estimation is considered robust when using non-continuous data [40][41][42] or data that violate multivariate normality assumptions [43][44][45]. However, since ML can be affected by deviation from normality [46], bootstrap standard errors (with 1000 bootstrap draws) were used [47]. Model fit to data was examined using fit indices, i.e. the comparative fit index (CFI), the Tucker-Lewis index (TLI), root-mean square error of approximation (RMSEA), standardized root-mean square residual (SRMR), Akaike information criterion (AIC) and sample-size adjusted Bayesian information criterion (SABIC). CFI and TLI values greater than 0.95, and SRMR less than 0.08 represent a well-fitting model [48]. While RMSEA less than 0.05 is considered to reflect a good  [49], values as high as 0.08 reflect adequate fit [50]. AIC and SABIC are only meaningful when different models are compared, and models with the lowest values are those with the best fit. Statistical analyses were performed in Stata version 14.0 (StataCorp LP), except the path analyses which were performed with Mplus version 6.11.

Results
Respondent characteristics on age, sex, education, and disease groups are provided in Tables 4 and 5 in Appendix. The healthy respondents and those reporting chronic disease were similar on gender and education, but those with chronic disease were older, as could be expected. As shown in Table 1, the highest Spearman's rank correlation were between MO and UA (0.73), while the lowest were between AD and MO (0.26), indicating support for our conceptual model. The correlation between PD and SC was lower than that between PD and MO or UA. Table 2 shows the frequency distribution of EQ-5D-5L health states in terms of decrements in symptom items or activities/participation items. Excluding those who reported full health (health state 11,111), the most prevalent combinations were three health states that only had slight decrements in PD and/or AD, i.e. 11121 (slight pain/discomfort), 11122 (slight pain/discomfort and slight anxiety/depression), 11112 (slight anxiety/depression). These three accounted for more than one-third (34.9%) of all possible combinations of non-perfect health states. When all health states with decrements in symptoms without any decrements in activity/participation (i.e. MO + SC + UA = 3, PD + AD > 2) were included, 47% (3031 respondents) of the sample was covered. In contrast, only 1.5% (94) of all respondents reported decrements in activity/participation without any decrements in symptoms (i.e. MO + SC + UA > 3, PD + AD = 2), suggesting that symptoms precede problems with activity/participation. Figure 3 shows the relationship between increases in the summary score of the symptom items (from 2 to 10 on the horizontal axis) and the corresponding summary score of the activity/participation items (from 3 to 15 on the vertical axis). The corresponding data are shown in Table 6 in Appendix. The results indicate that increasing pain/discomfort and anxiety/depression is associated with increasing problems with mobility, self-care and usual activities, but the problems on these activity/participation items appear to lag after the symptoms. This supports the suggestion from Table 2 that symptoms precede problems with activity/ participation.
The results of the CFA are presented in Table 3. Model 1 and Model 2 produced poor fit to the data, while Model 3 produced satisfactory model fit based on CFI, TLI, RMSEA, and SRMR. These results are in line with the finding from CTA that Model 3 produced a better fit than the first two models. Model 4 (only tested with CFA) produced a satisfactory fit similar to Model 3. However, the information criteria AIC and SABIC indicate that Model 3 is the preferred one.
An alternative model specifying AD as an effect indicator with SC and UA did not produce a good fit, either with CTA (χ 2 = 927.93, df = 6, p < 0.0001) or CFA (CFI = 0.965; TLI = 0.922; RMSEA = 0.122; SRMR 0.026). Further models investigated other specifications of the interrelationships between the three causal indicators (MO, PD and AD) in Model 4, including PD causing AD (or reversed causality), PD causing AD and MO, and PD causing AD and MO including MO as a cause of AD. All these models had a poor fit compared to the chosen model (results not reported here). The main CTA and CFA analyses were performed using the full sample (N = 7933), and removing the 1530 respondents reporting full health (11111) produced similar results.

Discussion
We developed a conceptual framework for an empirical investigation of the causal and effect nature of EQ-5D dimensions. Based on theoretical models of HRQoL, the dimensions were classified as either symptoms, and thus causal, variables (PD and AD), or activities/participation and thus effect indicators (MO, SC and UA) [2,12,15]. While SC and UA acted as effect indicators, MO, PD and AD appeared to be causal in nature, driving changes in SC and UA. Although MO could play an intermediate role as indicated in Fig. 2, the results suggest that MO is predominantly causal.
There are reasons to believe that the role of AD might vary depending on the severity of anxiety or depression. If moderate or severe (levels [3][4][5], AD could reflect more of a clinical symptom that may cause dysfunctions (MO, SC, UA) and typically requires treatment. If mild (level 2), it could reflect more subjective well-being, which may vary according to personality traits (e.g. optimist vs pessimist, or level of neuroticism) and thus acts more as an effect variable (in line with the finding that emotional well-being in EORTC was an effect variable) [3]. Further investigation into the various disease groups might have indicated that the causal nature of AD is disease-specific.
Our observation of a causal pattern across EQ-5D dimensions supports the need for preference weighting [2]. The EQ-5D-5L values sets based on population preferences in four western countries (Canada, England, Spain, the Netherlands) [52][53][54][55] reveal striking similarities in the relative importance of the five dimensions. The dimensions that our conceptual  CFI comparative fit index, TLI Tucker-Lewis index, RMSEA root-mean square error of approximation, SRMR standardized root-mean square residual, AIC Akaike information criterion, SABIC sample-size adjusted Bayesian information criterion model classified as causal indicators (PD and AD) have similar preference weightings, and they are on average 50% stronger than each of the three effect indicators (MO, SC, UA), i.e. the sum of the weights of the two symptom dimensions equals the sum of the three functioning items. The basis for the two causal dimensions being more important to people than the three effect dimensions might be that people find it easier to adapt to functional impairments than to pain/discomfort and anxiety/depression. The current findings may be useful when exploring additional dimensions that could act as 'bolt-ons' to the five core EQ-5D dimensions. While these five dimensions have proved relevant to patients across the spectrum of diagnoses and to the general population, the EuroQol Group has been experimenting to investigate whether additional dimensions such as vision, tiredness, or sleep could enhance the instrument's performance in some settings [56]. An interesting question is whether an HSU instrument like the EQ-5D should broaden its operationalization of the HRQoL concept in the direction of effect dimensions (e.g. social connections/network or general well-being) or in the direction of causal dimensions (e.g. vision or tiredness). Most quality of life instruments include both causal and effect indicators [57]. Causal indicators are important to measure because they affect HRQoL [2] and are often treated to avoid disruption of HRQoL. This is the rationale behind many healthcare interventions (e.g. treating arthritic pain to enable a person to continue working).
Some limitations should be acknowledged with respect to the data analyses presented here. The MIC study is based on respondents who have volunteered to participate, something which might lead to self-selection bias. Second, it is difficult to claim causality from cross-sectional data. Third, CTA is primarily intended to test for model misspecification, which does not necessarily mean that indicators are causal rather than effect indicators [35]. Future research should ideally apply panel data, which would provide better illustration of the expected temporal relationship between causal and effect dimensions.

Conclusion
Based on theoretical models of HRQoL, we develop a conceptual framework for causal and effect relationships among the five dimensions of the EQ-5D instrument. Empirical testing on EQ-5D-5L data from a large multinational survey provided supporting evidence that the EQ-5D comprises both causal variables (Mobility, Pain/ discomfort, Anxiety/depression) and effect variables (Self-care and Usual activities).

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Authors' contributions TGK analyzed and interpreted the data. CG and JAO were major contributors in writing the manuscript. All authors read and approved the final manuscript.
Ethics approval and consent to participate Data for this study were obtained from the multi-instrument comparison study which was approved by the Monash University Human Research Ethics Committee (Project numbers: CF11/1758-2,011,000,974 and CF11/3192-2,011,001,748).

Consent for publication
Not applicable.

Competing interests
CG is a member of the EuroQol Group. The other authors declare that they have no competing interests.