Skip to main content

Preference heterogeneity in health valuation: a latent class analysis of the Peru EQ-5D-5L values



Preference heterogeneity in health valuation has become a topic of greater discussion among health technology assessment agencies. To better understand heterogeneity within a national population, valuation studies may identify latent groups that place different absolute and relative importance (i.e., scale and taste parameters) on the attributes of health profiles.


Using discrete choice responses from a Peruvian valuation study, we estimated EQ-5D-5L values on a quality-adjusted life-year (QALY) scale accounting for latent heterogeneity in scale and taste, as well as controlling heteroskedasticity at task level variation.


We conducted a series of latent class analyses, each including the 20 main effects of the EQ-5D-5L and a power function that relaxes the constant proportionality assumption (i.e., discounting) between value and lifespan. Taste class membership was conditional on respondent-specific characteristics and their experience with the composite time trade-off (cTTO) tasks. Scale class membership was conditional on behavioral characteristics such as survey duration and self-stated difficulty level in understanding tasks. Each analysis allowed the scale factor to vary by task type and completion time (i.e., heteroskedasticity).


The results indicated three taste classes: a quality-of-life oriented class (33.35%) that placed the highest value on levels of severity, a length-of-life oriented class (26.72%) that placed the highest value on lifespan, and a middle class (39.71%) with health attribute effects lower than the quality class and lifespan effect lower than the length-of-life oriented class. The EQ-5D-5L values ranged from − 2.11 to 0.86 (quality-of-life oriented class), from − 0.38 to 1.02 (middle class), and from 0.36 to 1.01 (length-of-life oriented class). The likelihood of being a member of the quality-of-life class was highly dependent on whether the respondent completed the cTTO tasks (p-value < 0.001), which indicated that the cTTO tasks might cause the Peru respondents to inflate the burden of health problems on a QALY scale compared to those who did not complete the cTTO tasks. The results also showed two scale classes as well as heteroskedasticity within each scale class.


Accounting for taste and scale classes simultaneously improveds understanding of preference heterogeneity in health valuation. Future studies may confirm the differences in taste between classes in terms of the effect of quality of life and lifespan attributes. Furthermore, confirmatory evidence is needed on how behavioral variables captured within a study protocol may enhance analyses of preference heterogeneity.


In economic evaluation, valuing health outcomes on a quality-adjusted life-year (QALY) scale is a standard practice in cost utility analysis (CUA) to enable comparisons across different healthcare interventions. Many forms of disease-specific or generic instruments have been adapted to describe domains of health-related quality of life (HRQoL) in health valuation studies [1]. Using QALYs, health valuation studies assess the value of health profiles relative to health-related quality and length of life.

The EQ-5D-5L is a widely used preference-based instrument to measure and value people's health-related quality of life (HRQoL). Corresponding to the five dimensions of health, this descriptive system has five attributes: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. In the latest version of the EQ-VT protocol, each attribute has five levels—no problems, slight problems, moderate problems, severe problems, and extreme problems/unable [2].

According to EuroQol Valuation Technology (EQ-VT) protocol (2.0), the primary method to collect preference evidence on EQ-5D-5L profiles is the composite time trade-off (cTTO) technique [3]. The protocol also included ordinal tasks (i.e., paired comparisons) that were designed to overcome some of the limitations of the cTTO. For example, the cTTO method assumes that respondents do not discount their future health (e.g., pain today has the same effect on choice as pain ten years from now) [4]. Also, in paired comparisons, respondents need to indicate a preference with a discrete set of options that are likely to be less burdensome than cTTO questions that require a respondent to match two health profiles by trading off life years [5]. The inclusion of ordinal tasks within the EQ-VT protocol followed a significant period of piloting and testing of different approaches [3]. This paper examines the ordinal responses collected during the Peru EQ-5D-5L valuation study, specifically to explore the heterogeneity in EQ-5D-5L values on a QALY scale [6].

In the analysis of preference evidence, heteroskedasticity and heterogeneity have long been identified as potential issues that need to be further explored. Preference and behaviors vary within a sample, and different methodologies have been adapted to address those issues in analyzing and interpreting preference evidence [7]. Similarly to the presence of individual preference differences, heteroskedasticity, another source of outcome variability, refers to differences in the variance of the response variable attributable to observable (e.g., task-level) factors. It is important to point out that heteroskedasticity should not be considered a form of preference heterogeneity in case the differences in response variability are induced solely by task characteristics rather than by individual differences in utility valuations.

When latent, preference heterogeneity refers to the extent of variation of individuals' tastes caused by sources that are not observed by the researchers. The effect of these unobservable sources on preferences can be channeled through different latent pathways. First, respondents systematically like or dislike different alternatives that reflect the relative importance of the attributes, and individuals with similar relative importance can be grouped into clusters or classes. (i.e., taste classes). Thus, taste heterogeneity refers to differences in the relative effects of each attribute level. For example, some respondents may weigh improved health profiles with shorter lifespans; others may prefer living longer with compromised health. Another source of heterogeneity is scale heterogeneity, which refers to more subtle differences in the absolute effects of all attributes. Individuals with similar scales can also be grouped into a scale class. Scale class differences may be related to differences in utility caused by differences in the randomness of individual behavior that creates scale variation across persons [8]. Previously, some health valuation papers observed preference heterogeneity using the standard latent class model where no scale issues were controlled for [9, 10]. However, estimating differences in attribute importance between respondents without controlling for scale heterogeneity can often mislead the interpretation of taste heterogeneity, which is confounded by scale heterogeneity [11].

Using data on respondent characteristics and behaviors, a scale-adjusted latent class (SALC) model [12] can account for taste and scale heterogeneity simultaneously by identifying latent classes of persons who differ in their relative importance (taste classes), as well as latent scale classes – groups of people who differ by how intense (or indifferent) their preferences are. Furthermore, by accounting for task complexity, we adapt this model to allow for heteroskedasticity within each scale class. A similar fashion of the heteroskedastic SALC model has been applied by Rigby et al. [13] in Best–Worst Scaling (BWS) data and Karim et al. [14] in health valuation in pits scale. To our knowledge, no studies previously estimated the heteroskedastic SALC model on a QALY scale.

In this paper, we hypothesize that both taste and scale latent classes co-exist within a national population and that heteroskedasticity is associated with observable differences in scale at task-level characteristics. Hence, the paper aims to demonstrate the implications of controlling heteroskedasticity and heterogeneity simultaneously in health valuation and separating taste and scale heterogeneity on a QALY scale. In addition, we also hypothesized that the time preference among individuals does not hold the constant proportionality assumption, and the marginal value of lifespan follows a decreasing pattern (i.e., discounting, time preferences) [4]. The results of this secondary analysis of the Peru EQ-5D-5L study have implications for future analyses of ordinal responses in health valuation, particularly those that attempt to characterize the values of a heterogeneous population.


Study design and data collection

The details of the Peru EQ-5D-5L study design and data collection process can be found in the original paper [6]. In brief, the study followed version 2 of the EQ-VT protocol developed by the EuroQol Group, which was developed specifically for valuing EQ-5D-5L health states using computer-assisted personal interviews [2]. As mentioned in the original paper, a random sample of adults (N = 1000) was recruited for a household survey in Lima, Arequipa, and Iquitos. Some of the respondents (N = 300) were randomly selected to complete 11 cTTO tasks prior to the ordinal ones. All respondents completed ten paired comparisons with five EQ-5D-5L attributes (i.e., the latent scale pair A vs. B) followed by twelve matched pairs (i.e., A vs. B and B vs. C) with EQ-5D-5L and lifespan attributes. Within the matched pairs, 50% of the respondents received questions with a shared lifespan attribute in the first pair and the rest with differential lifespan attributes. An example of a paired comparison and matched pair is presented in Fig. 3.

Econometric analysis

For the original publication, Augustovski et al. [6] estimated EQ-5D-5L values on a QALY scale for the general population of Peru. This econometric analysis extends its predecessor's logit estimation by examining heterogeneity and heteroskedasticity in health valuation.

Specification of health value

In this analysis, the value \((V)\) of a health outcome is defined on a QALY scale, where there holds a proportional relationship between the independent values of quality of life \({V}_{1}(Q)\) and length of life\({V}_{2}(T)\).Footnote 1 The quality of life is specified by EQ-5D-5L health profilesFootnote 2 where we have used an additive regression (\({V}_{1}\left(Q\right)=\left(1-{\beta }^{\mathrm{^{\prime}}}X\right)))\) to show the effect of quality of life on the value of health. Here, \(X\) is the EQ-5D-5L attributes, and \(\beta\) shows the incremental difference between severity levels under each dimension.Footnote 3 For example, \({\beta }_{2}\) represents the difference in value between severity levels 2 and 3 under the mobility dimension [\({MO}_{\mathrm{2,3}}\)]. By construction, all coefficients on the incremental dummies (\({\beta }_{k} ; k=1\; to\; 20)\) are hypothesized to be positive and represent losses in quality due to increases in the level of severity of a health condition from the full health profile 1. The value of the length of life, \({V}_{2}\left(T\right),\) is defined by a power function (\({V}_{2}\left(T\right)= {T}^{\alpha }\)) with power \((\alpha )\) less than one to capture discounting effect of future health.

$$V={V}_{1}\left(Q\right){\times V}_{2}\left(T\right)$$
$$V=\left(1-{\beta }^{^{\prime}}X\right)\times {T}^{\alpha }$$
$$V= \left(1- \left(\begin{array}{c}{\beta }_{1}{MO}_{\mathrm{1,2}}+{\beta }_{2}{MO}_{\mathrm{2,3}}+ {\beta }_{3}{MO}_{\mathrm{3,4}}+{\beta }_{4}{MO}_{\mathrm{4,5}}+\\ {\beta }_{5}{SC}_{\mathrm{1,2}}+{\beta }_{6}{SC}_{\mathrm{2,3}}+ {\beta }_{7}{SC}_{\mathrm{3,4}}+{\beta }_{8}{SC}_{\mathrm{4,5}}+\\ {\beta }_{9}{UA}_{\mathrm{1,2}}+{\beta }_{10}{UA}_{\mathrm{2,3}}+ {\beta }_{11}{UA}_{\mathrm{3,4}}+{\beta }_{12}{UA}_{\mathrm{4,5}}+\\ {\beta }_{13}{PD}_{\mathrm{1,2}}+{\beta }_{14}{PD}_{\mathrm{2,3}}+ {\beta }_{15}{PD}_{\mathrm{3,4}}+{\beta }_{16}{PD}_{\mathrm{4,5}}+\\ {\beta }_{17}{AD}_{\mathrm{1,2}}+{\beta }_{18}{AD}_{\mathrm{2,3}}+ {\beta }_{19}{AD}_{\mathrm{3,4}}+{\beta }_{20}{AD}_{\mathrm{4,5}}\end{array}\right)\right)\times {T}^{\alpha }$$

Specification of model

The standard framework for the analysis of ordinal response data is the random utility model (RUM). The underlying assumption of the random utility model (RUM) is that an individual considers some alternatives and chooses the alternative that results in the highest expected utility at any given choice occasion. In a simple setup, each individual is faced with a particular choice task where they present a pair of alternatives (i.e., health profiles) and are asked to choose the preferred one between the two outcomes in each pair. In health valuation, this approach to health profiles is known as the episodic random utility model, which is distinct from its instant and angular counterparts [15].

In this health valuation study, respondents consider multiple health profiles across the paired comparison tasks. For each respondent \(i\), utility \(({U}_{itj})\) of health profile, \(j\), in the choice situation, \(t\), is composed of an explainable component, \({V}_{itj}\) and an unexplainable random component \({\varepsilon }_{itj}\) (i.e., \({U}_{itj}={V}_{itj}+ {\varepsilon }_{itj}\)), where \({V}_{itj}\) included both quality and quantity attributes with associated coefficients (\({\beta }_{k}\; and\; \alpha ; k=1\; to \;20)\) to be estimated.

Conditional and heteroskedastic logit

Considering the random component of each health profile as an extreme value type 1 distribution, the choice per task formed logistic distribution, and under the conditional logit [16] framework, for individual \(i\) the probabilityFootnote 4 of selecting health profile \(j\) in task \(t\) is,

$$P\left( {y_{itj} = 1|\beta ,\alpha ,\mu } \right) = \frac{{{\text{exp}}\left( {\mu V_{itj} } \right)}}{{\mathop \sum \nolimits_{g = 1}^{J} {\text{exp}}\left( {\mu V_{itg} } \right)}} = \frac{{\exp \left( {{\upmu }\left( {\left( {1 - \beta^{{\prime }} X_{itj} } \right) \times T_{itj}^{\alpha } } \right)} \right)}}{{\mathop \sum \nolimits_{g = 1}^{J} \exp \left( {{\upmu } \left( {\left( {1 - \beta^{{\prime }} X_{itg} } \right) \times T_{itg}^{\alpha } } \right)} \right)}}$$

The conditional logit (CL) model is commonly used on ordinal tasks data for health preference studies, and the model gives an average preference weight \((\beta )\) to each attribute and a power (\(\alpha\)), which are homogeneous for everyone with a constant scale parameter, \(\mu\). The parameters are estimated in a natural log scale (i.e., log odds ratio), and the scale is usually assumed to be 1. In this analysis, we measured preference in a QALY scale, and we would need the appropriate transformation procedure of the log-odds scale to the QALY scale. An important issue regarding the logit model specification is that the health value is defined by the difference in health profiles between the paired alternatives in each task. The value of 1 QALY, which is the difference between the two anchor points [immediate death (T = 0) and full health (11,111) with T = 1 year], is captured by the scale parameter \(\mu\). Hence, rather than fixing the scale parameter, \(\mu\), we estimated it, which worked as a conversion factor from the QALY scale to the log odds scale. In a heteroskedastic logit, the constant error variance assumption of the CL model was relaxed by assuming that its variance may vary between tasks systematically in response to task format. Variations in task format (\({Z}_{3})\) had been hypothesized to influence the scale parameter (i.e., scale and variance are inversely related), namely, whether the task is a latent scale pair or a matched pair and the duration to answer each of the tasks. In order to have scale as a non-negative value, the scale parameter was modeled as follows:

$$\mu_{it} = {\text{exp}}\left( {\gamma^{{\prime }} z_{3it} } \right)$$

where \({z}_{3it}\) are hypothesized variables that affect the scale, and testing the sign and significance of associated \(\gamma\) parameters explains how task characteristics influence the scale (i.e., heteroskedasticity).

Latent class analysis of taste and scale heterogeneity

Preference heterogeneity from latent sources may be modeled using individual-specific or class-specific parameters (e.g., mixed or latent class logit). The scale-adjusted latent class (SALC) analysis used in this study examined class-specific heterogeneity (scale and taste) in the presence of heteroskedasticity [8, 13, 17]. The SALC model is an advanced econometric approach where in addition to the standard decomposition of the population into \(M\) distinct taste classes, the model allows for heterogeneity in scale with distinct \(S\) scale classes (each with relatively different error variances). The model allows controlling for the differences in the error variance by assuming that despite sharing the same taste structure within the same class (\({\beta }_{m}\) and \({\alpha }_{m}\)), some people may display different levels of uncertainty (\({\gamma }_{s}\)), thereby belonging to different scale classes (s) [12]. Like a heteroskedastic logit model by class, the probability of choice then becomes:

$$P\left( {y_{itj} = 1{|}s,m} \right) = \frac{{\exp \left( {\exp \left( {\gamma_{s}^{{\prime }} z_{3it} } \right) \left( {\left( {1 - \beta_{m}^{{\prime }} X_{itj} } \right) \times T_{itj}^{{\alpha_{m} }} } \right)} \right)}}{{\mathop \sum \nolimits_{g = 1}^{J} \exp \left( {\exp \left( {\gamma_{s}^{{\prime }} z_{3it} } \right) \left( {\left( {1 - \beta_{m}^{{\prime }} X_{itg} } \right) \times T_{itg}^{{\alpha_{m} }} } \right)} \right)}}$$

The diagram in Fig. 1 shows the pathway of the SALC model to identify scale and taste classes separately. To avoid misidentification of taste and scale classes, we have used separate sets of covariates to identify each individual's taste and scale class membership (\({Z}_{1}\) and \({Z}_{2}\), respectively) as well as the third set of covariates \({Z}_{3}\) to identify heteroskedasticity within each scale class [8].

Fig. 1
figure 1

The SALC analysis in health valuation

The difference among the taste classes captured the difference in taste variation across groups of people by different attribute coefficients across classes once the scale was adjusted. So, we hypothesized that the sources associated with the likelihood of belonging to a taste class \({Z}_{1}\) include individual characteristics (e.g., demographics—age, gender) and the subject's experiences (e.g., self-stated health condition, whether they care for themselves or family members, and whether they have completed the cTTO task prior to ordinal tasks). As for the potential sources of scale heterogeneity, individuals' likelihood of belonging to a particular scale class was considered to be associated with the irregularities and idiosyncratic features of choice behavior which were not associated with any particular attribute level which captured the variability across subjects. The variables to identify the scale class membership of each individual \({Z}_{2}\) were demographic characteristics (age, gender), choice uncertainty (e.g., difficulties in understanding the questions, attribute difference, and choosing the best answer), perception (e.g., completed cTTO questions or not), and behavioral phenomena (e.g., length of the survey). All grade-of-membership variables were included as dummy variables in the model.

The taste and scale class membership probabilities are as follows:

$$p\left( {m|{\updelta }} \right) = \frac{{\exp \left( {\delta_{m}^{{\prime }} z_{{1{\text{i}}}} } \right)}}{{\mathop \sum \nolimits_{j = 1}^{M} \exp \left( {\delta_{j}^{{\prime }} z_{{1{\text{i}}}} } \right)}}$$
$$p\left( {s|{\uptheta }} \right) = \frac{{\exp \left( {\theta_{s}^{{\prime }} z_{{2{\text{i}}}} } \right)}}{{\mathop \sum \nolimits_{j = 1}^{S} \exp \left( {\theta_{j}^{{\prime }} z_{{2{\text{i}}}} } \right)}}$$

And the full choice model for each respondent \(i\) becomes

$$P\left( {y_{i} {|}\beta ,\alpha ,\gamma ,\delta ,\theta } \right) = \mathop \sum \limits_{m = 1}^{M} P\left( {m{|}\delta } \right)\mathop \sum \limits_{s = 1}^{S} P\left( {s{|}\theta } \right)\mathop \prod \limits_{t = 1}^{T} \mathop \prod \limits_{{{\text{j}} = 1}}^{J} P\left( {y_{itj} |s,m} \right)^{{y_{itj} }}$$

Under a SALC framework (Fig. 1), a respondent's probability of a choice depends on the attributes of the health profiles (X and \(T\)) and the respondent's characteristics and behaviors as well as task characteristics \((Z_{1} ,Z_{2}\), and \(Z_{3}\) respectively). Due to the lack of data and convergence issues, models with more than three taste classes or two scale classes were not estimated. This secondary analysis was conducted in recognition that the original study is not sufficiently large enough to identify the full breadth of taste and scale classes within the Peru general population. Statistical analyses were done in R 4.0.2 [18,19,20]. Following the standard practice of maximization routine for latent class models, we have conducted Maximum likelihood estimation using the expected-maximization (EM) algorithm procedure [21]. For the maximization of the expected function in each step of the algorithm, we used the maxLik package [19]. Multiple starting values using the random number generator were used to prevent ending up with a local solution of the likelihood estimation (“Appendix 2”) [21, 22]. The best-fitted model with the optimal number of taste and scale class combinations is identified using the Bayesian information criterion (BIC) [23]. A significance level of 0.05 was considered statistically significant.


In total, 1000 respondents were interviewed for the study. Augustovski et al. [6] has a discussion on the sample background characteristics in detail. This secondary analysis was based on a balanced panel (i.e., respondents completed all tasks) with 983 respondents and dropped 17 respondents who did not complete the whole questionnaire. Each respondent has 34 completed tasks [e.g., 10 latent scale pairs, 12 matched pairs (A vs. B), and 12 matched pairs (B vs. C)].

Difference between homoskedastic and heteroskedastic results

Table 1 presents the conditional logit (CL) model and heteroskedastic logit (HCL) model results. From the CL to HCL model, the BIC value decreased 38,981.67–38,383.63, which showed that the HCL is a better fitted model. Also, the HCL model has consistent estimates with all positive coefficients in quality-of-life (QoL) attributes, whereas two of the 20 QoL coefficients in the CL model had negative estimates. Except for the changes from slight to moderate problem under usual activity and no problem to slight problem under anxiety/depression, all the other coefficients were statistically significant (p-value < 0.05) in the HCL model. The ranking of coefficients of both models showed similar order where the maximum effects in decrement of health value occurred from changes in attribute level from severe to extreme across all the dimensions. Compared to the CL, the standard error is improved by around 29.10% among the QoL attributes in the heteroskedastic model. Also, the correlation between the QoL coefficients is 0.887 (Lin's concordance; 95% CI 0.808–0.935). These results suggested that the heteroskedastic model improved the model fit with coherent coefficient estimates and improved precision level.

Table 1 Value set from the conditional and heteroskedastic logit model

While looking at the scale function, we saw some interesting effects of task-level characteristics on the scale under the specification of the HCL (Table 4). The intercept of the heteroskedastic model corresponds to the logarithmic scale parameter for the reference category, i.e., latent pair tasks completed between 15 and 29 s. As we are estimating the coefficients on a QALY scale, in addition to the variability, the intercept also represents the conversion value from the QALY scale to the natural scale of the logit model. So, based on the results, the intercept represents the conversion of QALY to log odds as 4.020 for latent pair responses which were completed between 15 and 29 s, and matched pair responses increased variability by negatively affecting the scale, 1.077.Footnote 5 Under the latent scale pair responses, compared to the responses that took between 15 and 29 s to complete, other response time categories (either less than 15 or greater than 29 s) were associated with lower values for the scale parameter. And under matched pair, if the task completion time was between 1 and 14 s, it positively affected the scale compared to the baseline period of 30–59 s (p-value < 0.05).

Latent class analysis of taste and scale heterogeneity

We conducted latent class analyses with up to three taste and/or scale classes. The estimated latent class models with an unadjusted scale converged successfully up to two classes; however, in SALC analysis, the highest converged model is the ‘2scale-3taste’ class. The BIC values among the converged models showed that the model with two scale classes and three taste classes has the lowest BIC value (35,112.00) (Table 5). So, we have selected the ‘2scale-3taste’ class model as the best-fitted model to show scale and taste variation among the respondents.

Among the three taste classes in the ‘2scale-3taste’ class SALC analysis, the first taste class is considered the most quality-of-life oriented class with the largest coefficients in all health attributes and with the lowest coefficient value in the power of the lifespan attribute. Around 33 percent of the respondents fell into this class and preferred the quality of their health more than their lifespan (Table 2, column 1) in terms of valuing health. All the coefficients in this class had the expected positive sign. Across all the dimensions, the highest loss in QALY occurred at the changes in level from severe [4] to extreme [5], and the least loss in QALY occurred from changing levels from slight [2] to moderate [3]. This class was also most sensitive to pain/discomfort. The coefficients associated with changes in severity levels from slight to moderate, moderate to severe, and severe to extreme are the highest in the pain/discomfort dimension compared to other dimensions. Having the lowest power value of lifespan attribute indicated that the group of people associated with this class had the highest decreasing marginal value of lifespan with respect to lifespan. For health profiles such as 55,555, 44,444, and 33,333, the health value associated with this class was worse than death, − 2.114, − 0.958, and − 0.130, respectively. Also, the QALY range in this class was from − 2.114 to 0.866.

Table 2 Value sets of three taste classes in scale class one of the two-scale class three-taste class model

Contrary to taste class 1 (quality-of-life oriented class), taste class 3 (Table 2, column 7) had the least effect on the health attributes and the largest effect on lifespan attribute in valuing health. This means around 26.72 percent of the respondents who belonged to this class preferred to value their lifespan (e.g., the quantity of life years) more than their quality of life compared to the other two taste classes. There were negative signs in three coefficients (level change from no problem to slight problem in dimensions mobility, self-care, anxiety/depression). Even in the extreme health condition across all dimensions, 55,555, they possessed a positive value of life (0.356), and the QALY ranged between 0.356 and 1.009.

Lastly, taste class 2 in Table 2 was comprised of around 40 percent of the respondents. This class is considered a middle class whose coefficients fell between the quality-of-life oriented class (taste class 1) and length-of-life oriented class (taste class 3). For this class, the highest QALY loss occurred due to the inability to walk, followed by usual activity, pain/discomfort, self-care, and anxiety/disorder. Although at the extreme condition of 55,555, the health value is lower than death (− 0.384), in situations such as 44,444 and 33,333, the QALY is positive, 0.274, and 0.792, respectively. The power value of lifespan attributes indicated that the marginal value of life decreases faster than the quality-of-life oriented class and slower than the length-of-life oriented class. The QALY range in this class was − 0.384 to 1.022.

The distribution of individual grade-of-membership in taste class 1 and taste class 2 from the 2scale-3taste class SALC model is shown in Fig. 4. The likelihood of being a member of the highest quality-sensitive class (taste class 1) was highly dependent on whether the respondent completed the cTTO task (Table 3). Those who completed the cTTO tasks were more likely to be in the quality-of-life oriented class (i.e., the odds ratio of the cTTO coefficient is less than 1 for taste class 2 and taste class 3 grade-of-membership), i.e., they found health problems to be relatively more burdensome. For example, the effect of being at the extreme stage in any health dimension (Fig. 2) reduces the value from full health (QALY 1) to the lowest level in taste class 1. On the other hand, the effect of not completing the cTTO task was higher in taste class 3 than in taste class 2, and for the same health detriment, the burden was higher in taste class 2 compared to taste class 3. In summary, completing the cTTO task caused the Peru respondents to inflate the burden of health problems on a QALY scale compared to those who did not complete the cTTO task (i.e., framing).

Table 3 Taste class grade-of-membership variables of the SALC model
Fig. 2
figure 2

The effect of health problems on a quality-adjusted life-year

Comparing the coefficients of the two scale classes (Table 6) indicated that the effect of both latent pair and matched pair responses was higher on the scale in scale class 2, meaning scale class 2 had a lower error variance (less random class). The effect of task duration among latent pairs was similar between the two scale classes. Compared to the base category (task completed between 15 and 29 s), both shorter and longer times required to complete tasks decreased scale (increased variances). Similarly, for the matched pair, the effect was also similar between the two scale classes; compared to the base category (task completed between 30 and 59 s), shorter duration of tasks increased the scale, and longer duration tasks reduced the scale. The results concluded that the task types induced the major differences in scale classes.

The inclusion of the hypothesized variables to identify scale classes showed that none of the scale covariates were significant in identifying scale class membership (Table 7). Around 34 percent of the respondents belonged to scale class 1, and around 66 percent belonged to scale class 2. Figure 5 showed the distribution of individual grade-of-membership for scale class 2. Interestingly, respondents of the less random class (i.e., scale class 2) were most likely to be middle and older (45–80 years) who reported a better understanding of health states and were unlikely to complete the cTTO tasks.


In this paper, we explored preference heterogeneity in health valuation among the general Peruvian population. We identified possible task-level factors that might affect the variance within scale classes (heteroskedasticity) and separated latent groups based on different absolute and relative importance (i.e., scale and taste classes). The heteroscedastic model improved the precision of estimated parameters as expected.

The SALC analysis showed that the EQ-5D-5L values in Peru are heterogeneous, with three distinct taste classes. The fundamental difference between the three taste classes was their preference for quality versus quantity of health. All the coefficients on the EQ-5D-5L variables were consistent with its descriptive system in the quality-of-life oriented class (all positive and most of them with significant coefficients). Also, this class's estimated power value for the lifespan attribute was the lowest. In contrast, the power parameter was highest, and the coefficients on the QoL attributes were lowest for the quantity-of-life oriented class. Lastly, a middle class also existed, which fell in between the quality and quantity-oriented classes.

The grade-of-membership results showed a positive association between completing the cTTO and quality-of-life oriented class membership. This suggests that cTTO completion before ordinal tasks caused a paradigm shift in respondents' choices and might influence the burden of health problems. While the SALC analysis can disentangle taste and scale heterogeneity using different sets of covariates, further research is needed to better understand how the cTTO completion influences respondents to select quality over quantity of life (e.g., greater familiarity with health problems and the concept of worse-than-death profiles). In addition, understanding the fundamental nature of people to trade-off between quality and quality of life while valuing their health is an interesting hypothesis that can be further explored in other valuation studies with paired comparison responses only.

A well-known criticism of the SALC model is that the model fails to separate scale and taste heterogeneity [24]. This was addressed in this analysis by incorporating distinct variables in the grade of membership equations (e.g., completion of the cTTO). Overall, the SALC analyses performed well in separating two taste and two scale classes and accounting for heteroskedasticity within the scale classes. However, the small size of this study may have limited the number of classes that could be identified, suggesting further analysis with larger datasets. In conclusion, these latent class analyses provided insight into heterogeneous EQ-5D-5L values in Peru. The study demonstrated taste heterogeneity and its association with scale heterogeneity in the presence of heteroskedasticity. Future studies may extend the SALC analysis under the Zermelo–Bradley–Terry specification similar to the one used in the original study. Furthermore, heterogeneity has been analyzed, assuming that unobserved heterogeneity holds discrete distributions for scale and taste. A hybrid approach may be further explored with continuous distributions. Based on these results, future valuation studies may collect large samples and capture more respondent characteristics and behavioral variables (e.g., experience with health problems) that better identify preference heterogeneity in health valuation.

Availability of data and materials

The dataset and analysis code are available from the corresponding author upon request.


  1. \(Q\) represents the quality of life and \(T\) represents length of life.

  2. EQ-5D-5L has five attributes: MO = mobility, SC = self-care, UA = usual activity, AD = anxiety/depression, PD = pain/discomfort.

  3. On a QALY scale, the coefficient of the first level (1) of each dimension is 0 by construction \({V}_{1}\left(11111\right)=1\), for which the regression has 20 parameters.

  4. In the probability function, \({y}_{itj}\) represents a dummy (binary) variable, taking a value 1 if individual \(i\) has chosen health profile \(j\) in task \(t\), and a value 0 otherwise.

  5. The scale function is in log scale. So, the effect of the latent pair is exp (1.3914) = 4.020, and the effect of matched pair is exp (1.3914–1.3176) = 1.077.



Best–worst scaling


EuroQol 5 domains 5 levels


EuroQol valuation technology




Health preference research


Health-related quality of life


Quality-adjusted life-year


Quality of life


Scale-adjusted latent class


Composite time trade-off


  1. Nemeth G. Health related quality of life outcome instruments. Eur Spine J. 2006;15(Suppl 1):S44-51.

    Article  Google Scholar 

  2. Balestroni G. Bertolotti G [EuroQol-5D (EQ-5D): an instrument for measuring quality of life]. Monaldi Arch Chest Dis. 2012;78:155–9.

    Google Scholar 

  3. Stolk E, Ludwig K, Rand K, et al. Overview, update, and lessons learned from the international EQ-5D-5L Valuation work: version 2 of the EQ-5D-5L valuation protocol. Value Health. 2019;22:23–30.

    Article  Google Scholar 

  4. Craig BM, Rand K, Bailey H, et al. Quality-adjusted life-years without constant proportionality. Value Health. 2018;21:1124–31.

    Article  Google Scholar 

  5. Robinson A, Spencer AE, Pinto-Prades JL, et al. Exploring differences between TTO and DCE in the valuation of health states. Med Decis Mak. 2016;37:273–84.

    Article  Google Scholar 

  6. Augustovski F, Belizán M, Gibbons L, et al. Peruvian valuation of the EQ-5D-5L: a direct comparison of time trade-off and discrete choice experiments. Value Health. 2020;23:880–8.

    Article  Google Scholar 

  7. Vass CM, Boeri M, Karim S, et al. Accounting for preference heterogeneity in discrete-choice experiments: a review of the state of practice. Value Health (Under Review). 2021.

  8. Karim S, Craig BM, Poteet S. Does controlling for scale heterogeneity better explain respondents’ preference segmentation in discrete choice experiments? A case study of us health insurance demand. Med Decis Mak. 2021;41:573–83.

    Article  Google Scholar 

  9. Mulhern BJ, Bansback N, Norman R, et al. Valuing the SF-6Dv2 classification system in the United Kingdom using a discrete-choice experiment with duration. Med Care. 2020;58:566–73.

    Article  Google Scholar 

  10. Ramos-Goñi JM, Oppe M, Estévez-Carrillo A, et al. Accounting for unobservable preference heterogeneity and evaluating alternative anchoring approaches to estimate country-specific EQ-5D-Y value sets: a case study using spanish preference data. Value Health. 2021;25:835–43.

    Article  Google Scholar 

  11. Louviere J, Eagle T. Confound it! That pesky little scale constant messes up our convenient assumptions. In: Proceedings of the sawtooth software conference; 2006. P. 211–28.

  12. Magidson J, Vermunt J. Removing the scale factor confound in multinomial logit choice models to obtain better estimates of preference 1. In: Sawtooth Software Conference. 2007.

  13. Rigby D, Burton M, Lusk JL. Journals, preferences, and publishing in agricultural and environmental economics. Am J Agric Econ. 2015;97:490–509.

    Article  Google Scholar 

  14. Karim S, Craig BM, Groothuis-Oudshoorn CGM. Exploring the importance of controlling heteroskedasticity and heterogeneity in health valuation: a case study on Dutch EQ-5D-5L. Health Qual Life Outcomes. 2022;20:85.

    Article  Google Scholar 

  15. Craig BM, Oppe M. From a different angle: a novel approach to health valuation. Soc Sci Med. 2010;70:169–74.

    Article  Google Scholar 

  16. Clogg CC, Petkova E, Haritou A. Statistical methods for comparing regression coefficients between models. Am J Sociol. 1995;100:1261–93.

    Article  Google Scholar 

  17. Davis KJ, Burton M, Kragt ME. Discrete choice models: scale heterogeneity and why it matters. 2016.

  18. Hadley W, Mara A, Jennifer B, et al. Welcome to the tidyverse. J Open Source Softw. 2019;4:1686.

    Article  Google Scholar 

  19. Henningsen A, Toomet O. maxLik: a package for maximum likelihood estimation in R. Comput Stat. 2011;26:443–58.

    Article  Google Scholar 

  20. Team RDC. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing, 2020.

  21. Train KE. EM Algorithms for nonparametric estimation of mixing distributions. J Choice Model. 2008;1:40–69.

    Article  Google Scholar 

  22. Vermunt JK, Magidson JJBMSII. Technical guide for Latent GOLD 4.0: Basic and advanced. 2005.

  23. Latent class models. In: Hensher DA, Rose JM, Greene WH, editors. Applied choice analysis. 2 edn. Cambridge: Cambridge University Press; 2015.

  24. Hess S, Train K. Correlation and scale in mixed logit models. J Choice Model. 2017;23:1–8.

    Article  Google Scholar 

  25. Dean N, Raftery AE. Latent class analysis variable selection. Ann Inst Stat Math. 2010;62(1):11–35.

    Article  Google Scholar 

  26. Vermunt JK, Magidson J. Technical guide for latent GOLD 4.0: basic and advanced. Belmont: Statistical Innovations Inc; 2005.

    Google Scholar 

Download references


The authors would like to thank the EuroQol Group for funding this study- E.Q. Project 138-2020RA (Preference heterogeneity in health valuation: Peru EQ-VT data).


EuroQol Group funded this study, EQ Project 138-2020RA. Fonds de recherche du Québec-Santé (FRQS).

Author information

Authors and Affiliations



RAT, FA, and BMC supported the data. SK and BMC were involved in the development of the methodology, data analysis, and manuscript draft. All authors reviewed the manuscript and contributed.

Corresponding author

Correspondence to Suzana Karim.

Ethics declarations

Ethical approval and consent to participate

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Consent for publication

Not applicable.

Competing interests

Suzana Karim received grants from the EuroQol Research Foundation. Dr. Benjamin M Craig is a member of the EuroQol Research Foundation.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix 1

See Figs. 3, 4, 5 and Tables 4, 5, 6 and 7.

Fig. 3
figure 3

Latent-scale (single-pair ordinal response). Initial screen example

Fig. 4
figure 4

Distribution of Individual grade-of-membership of taste classes in 2scale-3taste class SALC model

Fig. 5
figure 5

Distribution of Individual grade-of-membership of scale class 2 in 2scale-3taste class SALC model

Table 4 Scale factors of conditional and heteroskedastic logit model
Table 5 Summary of the converged estimated models
Table 6 Heteroskedastic variables of the SALC model
Table 7 Scale class grade-of-membership variables of the SALC model

Appendix 2

Defining the starting values

We have used the standard procedure for conducting the latent class analysis. We conducted a sequence of models, starting with a one-class model and then specifying models with one additional class at a time. Finding the starting values is one of the major issues in implementing the EM algorithm successfully. To make the model insensitive to starting values is important because we seek the global maximum of the log-likelihood rather than a local maximum. As no existing algorithm can distinguish between a global maximum and a local maximum of the log-likelihood, we have repeatedly fit models with randomly selected start values and chosen the solution with the highest converged log-likelihood value. In this study, we did not use any algorithms to select start values for each parameter. Rather, we followed a stepwise method where at the end of the estimation of a given class (K), we used the estimated values of the parameters of the current model as the starting values for the estimation of the additional class (K + 1) model [25]. For the starting value of the dependent variable parameters of the additional class, we started with random variables from a uniform distribution range from 0 to 1. Also, for the starting values of the covariate parameters of the additional class, we used starting values of 0 followed by the Latent Gold software package [26].

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karim, S., Craig, B.M., Tejada, R.A. et al. Preference heterogeneity in health valuation: a latent class analysis of the Peru EQ-5D-5L values. Health Qual Life Outcomes 21, 1 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: