Comparison of four value sets derived using different TTO and DCE approaches: application to the new region-specific PBM, AP-7D

Background AP-7D is a newly developed preference-based measure (PBM) in East and Southeast Asia. However, no value set has been established yet. Comparison of the characteristics of value sets obtained by different methods is necessary to consider the most appropriate methodology for valuation survey of AP-7D. Method We surveyed the general population’s preference of AP-7D health states by four valuation methods (a) composite time trade-off (cTTO); (b) simple discrete choice experiment (DCE); (c) DCE with duration; and (d) ternary DCE. In Japan, we collected approximately 1,000 samples for cTTO tasks through a face-to-face survey and 2,500 samples for each of the three DCE tasks. Respondents were selected through quota sampling based on the sex and age. The cTTO data were analyzed using a linear mixed and tobit model; the DCE data were analyzed using a simple and panel conditional logit model. Where the results of the analysis showed inconsistencies, a constrained model was used. Results Since all the unconstrained models, except simple DCE, showed one or more inconsistencies, the constrained model was used for the analyses. The minimum values for the models were as follows: TTO model, -0.101; simple DCE model, -0.106; DCE with duration model, -0.706; ternary DCE model, -0.306. The score for the DCE with the duration model was much lower than that for the other models. Although the value sets for AP-7D differed among the four valuation methods, the ternary DCE model showed intermediate characteristics between those of the cTTO and DCE with duration models. As compared with to EQ-5D-5L, the distributions of all the scores on the Japanese AP-7D moved to the left. Although “Energy” was one of the domains with the least influence on the AP-7D score in all four models, “Burden to others” had the largest impact on the preferences. Conclusion We constructed four value sets using different TTO and DCE methods. Our findings are expected not only to contribute to the development of AP-7D, but also other preference-based measures. Supplementary Information The online version contains supplementary material available at 10.1186/s12955-024-02233-2.


Introduction
Economic evaluations to measure the outcomes of healthcare technologies are often based on calculation of the quality-adjusted life-years (QALYs).Public health technology assessment (HTA) agencies normally recommend its use for cost-effectiveness analysis.For example, in Japan, new HTA systems for drug and medical-device pricing were introduced in 2019 [1].Preference-based measures (PBMs) are generally used to measure the utilies of health states, which can then be used for calculating QALYs.PBMs, such as a general PBM [2][3][4][5][6], PBM for pediatric and/or adolescent people [7,8], disease-specific PBM [9,10], and PBM for social care [11,12] have also been developed.However, until now, these PBMs have mainly been developed for Western countries; for example, the EQ-5D was developed in Europe, HUI in Canada, 15D in Northern Europe, AQoL in Australia, and SF-6D, ASCOT, and CHU-9D in the UK.
Considering this situation, we developed a new Asiapreference-based measure-7 dimension (AP-7D) [13] (Additional file 1) based on an interview survey and qualitative analysis of data from nine Asian countries (i.e., Indonesia, Japan, Korea, mainland China, Malaysia, the Philippines, Singapore, Taiwan, and Thailand).The AP-7D was developed to reflect the important concepts of East and Southeast Asians for utility measurements, collaborating between HTAsiaLink and Center for Outcomes Research and Economic Evaluation for Health (C2H) in Japan.
After the new PBM was developed, we needed to construct a value set of AP-7D for every country.The value set might differ between countries because of differences in culture, population characteristics, and potential issues with questionnaire translation.Therefore, it is important to develop value sets in each country and compare them across countries to better understand the differences in preferences for the AP-7D states among countries or regions.However, currently, we lack a methodology for appropriately evaluating the health states of AP-7D.Some standard methods are used to construct value sets in valuation surveys.Time trade off (TTO), discrete choice experiments (DCE), DCE with duration, and ternary DCE are typical examples of valuation methods.There is no consensus on which the most appropriate method might be, as it depends on the characteristics of the PBM.Thus, in this study, we constructed the first preliminary value sets for AP-7D in Japan, and compared four valuation methods to consider the most appropriate methodology for valuation survey of AP-7D.

AP-7D
The AP-7D was co-developed by HTAisaLink and the Center for Outcomes Research and Economic Evaluation for Health (C2H), National Institute of Public Health (NIPH) in Japan, and was established based on East and Southeast Asian concepts of health and health-related impacts.Our new PBM comprises seven domains: pain/ discomfort (PD), mental health (MH), energy (EN), mobility (MO), work/school (WS), interpersonal interactions (II), and burden to others (BO), each of them classified on a four-grade scale (not at all, a little, quite a bit, and very much).AP-7D was originally developed in English and then translated into eight local languages.The instrument is shown in the Supplement.

Composite TTO, Simple DCE, DCE with duration, and ternary DCE
We evaluated the AP-7D health states using the composite TTO (cTTO) [14], simple DCE [15], DCE with duration, and ternary DCE methods [16].The TTO survey respondents always began with a conventional TTO task, i.e., living for 10 years in a health state described by the AP-7D, or living for x years in full health.If they considered the presented AP-7D state to be better than immediate death (i.e., x > 0), the value of x was varied until indifference was reached and the value of the AP-7D state was x/10.If the participants considered immediate death to be better than living for 10 years in the AP-7D state (i.e., x < 0), a lead time TTO [17] was started, which allowed estimation of negative values.In lead-time TTO, a set of choices is offered between "y years of life in full health" and "10 years in sound health followed by 10 years in the presented AP-7D state".The value of y was varied until indifference was reached and the value of the AP-7D state was (y-10)/10.
The DCE method presented two health states (A and B) described by AP-7D.In the case of DCE with duration and ternary DCE, expected life-years (1, 4, 7, and 10 years) were combined with the AP-7D description.In the simple DCE and DCE with duration methods, the respondents chose the option they preferred between the two given choices.In the ternary method, three health states (state A, state B, and "immediate death") were shown to the respondents, and they were asked to identify what they believed were the best and the worst health states.

Face-to-face survey for cTTO
A face-to-face survey was conducted to collect the cTTO data.Respondents (aged 20-69 years) were recruited through a panel owned by a research company, based on non-random quota sampling by sex and age.Those aged 20-69 years were included.As it was challenging to recruit elderly people for this survey during the COVID-19 outbreak considering a high risk for contracting COVID-1, respondents aged > 69 years could not be recruited for valuation of AP-7D.
The target sample size was approximately 1,000.This was not based on the number of subjects included in the EQ-5D-5L valuation survey.The respondents were asked to visit a survey center in Tokyo.Computer-assisted personal interviews (CAPI) was performed with the interviewers' support in a one-on-one, 60-min session at the survey center.
We prepared 14 blocks, and each block included 8 cTTO tasks based on an orthogonal design.The block by orthogonal design was generated by Ngene, which considers D-error minimization.Each respondent was randomly allocated to one block.The three training TTO tasks were completed before the actual TTO tasks [18].The health states for the block were shown in random order.Responses were automatically collected as electronic data.

Online survey for DCE
An online survey was conducted to collect DCE data, including simple DCE, DCE with duration, and ternary DCE.Respondents (aged 20-69 years for consistency with the face-to-face population) were recruited through a Japanese web panel, based on quota sampling by sex and age.The target sample number was approximately 2,500 for each of the DCE valuation methods, namely, simple DCE, DCE with duration, and ternary DCE.Each block had 15 pairs, and each respondent was randomly allocated to 10 blocks, based on the D-Optimal design methods in NGene.The health state pairs in the block and position of the cards (left or right) were shown in random order to prevent ordering and positioning effects.

Statistical analysis
We calculated the numbers and percentages for the background factors, which were then compared with the norm data.The total time taken to complete all the 8 TTO or 15 DCE tasks was also calculated.

a) cTTO
Responses to the TTO task were converted into TTO scores as described in the subsection of "Composite TTO, simple DCE, DCE with duration, and ternary DCE".The data were analyzed using a linear mixed model with "1-utility" as the dependent variable.The constant term and dummy vari-ables representing the levels of the seven dimensions (7 × [4 − 1] = 21) were treated as fixed effects, and the respondents were treated as random effects.Interaction with any level 4 responses was considered by adding the N4 term (N4 = 1, if any level 4 responses were included in the health states) to the normal linear mixed model.The N34 term was also similarly defined (N34 = 1, if any level 3 or 4 responses were included in the health states) to consider the effects on the worst health states, which were observed in the EQ-5D-3L and -5L valuation surveys in a few countries.In addition, the TTO score was censored at 1. Considering these distribution characteristics, the Tobit model was also used for the cTTO data.b) Simple DCE The DCE data were analyzed using a simple and panel conditional logit model with the same 21 dummy variables as in the cTTO model.Similar to the case in the cTTO analysis, N4 and N34 terms were also considered in the conditional logit model.These analyses extracted the latent coefficients for AP-7D scoring.The DCE latent "dis-score, " defined as the sum of the latent DCE coefficients for each health state, was converted to the utility scale.
To convert the latent DCE scores to a scale anchored at full health (1) and death (0), the modeled DCE values were anchored using the observed cTTO values.The linear relationship function between the mean latent DCE scores and mean cTTO values of the 112 health states measured in this face-to-face survey were estimated.Finally, the DCE coefficients were transformed by the estimated linear mapping function.

c) DCE with duration and ternary DCE
A simple and panel conditional logit model with or without N4 or N34 interactions was used to analyze the choice tasks, similar to the case for the simple DCE data.In the case of ternary DCE, a task was separated into two dichotomous choices and in the immediate death profile, the duration was treated as 0. For both types of the DCE data, the model for the estimation of coefficients was based on Bansback et al. [19] and included continuous duration (time) as well as interaction between the duration and each domain.Assuming t to be the duration and u ij to be the utility of profile j for individual i, u ij can be formulated as follows: where ε ij denotes the error term.However, the estimated β 2 , which indicates the vector of all the DCE coefficients in each domain, is not anchored to death (0) or full health (1).To change the latent coefficients to the disutility of each level, we divided the ratio of estimated β 2 (vector) by the coefficients of time (β 1, scholar).
If the estimated disutility was not consistent (consistency implied that "weights at the higher level in the same domain were higher and those at the lower level were lower"), inconsistent levels were combined and was similarly analyzed by the same model ("constrained" model).
These analyses were performed using SAS 9.4 and Stata 17.

Results
The collected sample included 1,050 respondents for the cTTO tasks; 2,725 respondents for the simple DCE tasks; 2,739 for the DCE with duration tasks; and 2,742 for the ternary DCE tasks.Thus, we were able to collect more samples than planned.The median total response time of the respondents to the eight TTO questions was 19.8 min (interquartile range (IQR) 17.5-23.0min), to the 15 DCE questions was 7.1 min (IQR 4.5-10.8min), to the 15 DCE with duration questions was 7.7 min (IQR 4.8-12.1 min), and to the 15 ternary DCE questions was 8.2 min (IQR 5.2-13.5 min).TTO tasks, based on faceto-face tasks, require more time than DCE web-based tasks.The response times for the DCE with duration and ternary DCE tasks were longer than those for the simple DCE tasks.

Demographic factors
Table 1 shows the background characteristics of the respondents.The actual percentages of population by age category are 10.1% (aged 20-29), 10.9% (30-39), 13.9% (40-49), 14.0% (50-59), and 12.0% (60-69).We used the same weight of every age category for sampling, because equality of weight between generations should be reflected.The median household income ranged from JPY 5 to 7 million.As compared with the average household income of all Japanese families of JPY 5.6 million in 2021 [20], the household income was slightly higher.According to the 2019 Labour Force Survey, [21] fulltime and part-time workers accounted for 31.6% and 13.7%, respectively.In total, 24.3% of Japanese individuals had graduated from university or graduate school in 2017, and 61.3% and 31.6% were married and unmarried, respectively, in 2015.Thus, the characteristics were comparable to the observations in the general population.However, as the respondents were recruited based on non-random sampling, the differences may influence the results.

cTTO
The 1,050 respondents collectively yielded 8,400 TTO data points.The TTO score for the health state [2222222] was 0.79 (highest) excluding health state [1111111], and the score for the health state [4444444] was -0.14 (lowest) (Additional file 1).In the task of evaluating health state [4444444], 47 respondents (62.7%) preferred the worst state (4,444,444) to death and 28 (37.3%)evaluated it as worse than death (WTD).Considering all responses, only 10.1% (N = 849) were evaluated as WTD health states.As the misery score (the sum of level scores across dimensions) increased, the mean cTTO value decreased, and the standard deviation increased with the misery score (Fig. 1). Figure 2 shows the distribution of the cTTO values.The peak of the distribution was at cTTO score = 0.5, and in regard to the distribution, the density of cTTO score < 0 was very low.
Table 2(a) presents the coefficients of the analysis obtained using the inconsistent and consistent models.One inconsistency (the second level of energy (EN)) was observed in the simple linear mixed models (model 1, model 3 and model 4), and the level was combined with the first level (model 2).However, the results of the Tobit model did not reveal any inconsistency.No significant interactions were observed in model 3 and model 4. The estimated utility values for the worst health state [4444444] were -0.02 (model 1) and -0.101 (model 4).  3 shows the relationship between the observed disutility and the derived DCE values.The fitting of the linear regression seems satisfactory.Therefore, levels 1 and 2 of EN and levels 1 and 2 of WS were combined.In contrast to DCE with duration data, the panel conditional logit model (model 20) showed an increased number of inconsistencies, although the AIC of model 20 was smaller than that of model 16.The score estimated using the DCE with the duration model was much lower than the scores estimated using the other models.Figure 4 shows the distribution of the utility of all the health states described by AP-7D and the Japan EQ-5D-5L.As compared with the EQ-5D-5L based on the Japanese value set, all the scores on the Japanese AP-7D had moved to the left.The distributions of the results obtained using the TTO and simple DCE tasks overlapped.Those obtained using the ternary DCE method were distributed between the results obtained with the simple DCE and DCE with duration methods.Figure 5 compares the coefficients of the worst level (level 4) by the four valuation methods.BO showed one of the largest decrements in the seven domains.

Discussion
We constructed four value sets for AP-7D.All models, except simple DCE, showed some inconsistencies (the second level of EN in cTTO, DCE with duration and ternary DCE, and the second level of WS in the ternary DCE), but the number was limited.Therefore, a   constrained model was constructed and the first preliminarily value sets for AP-7D in Japan were calculated.As shown in Fig. 5, EN was among the domains with the least influence on the AP-7D score in all four models.Especially, the second level of EN did not have any significant negative preferences as compared with the first, except in the DCE with mapping model.In contrast, PD, MO, and BO had the largest coefficients on the scoring algorithm in all models.It is noteworthy that BO showed the largest impact on preferences, similar to PD and MO.We think that this result may reflect the characteristics of Japanese people who hesitate much before troubling another.The influences of MH, WS, and II differed depending on the model.The coefficient of MH was larger in the DCE with duration and ternary DCE models than in the TTO and DCE with mapping models.The influence of the WS domain was similar to that of the EN domain in the ternary DCE model.The scoring algorithm drawn using the DCE with mapping model showed that the coefficients of domain II were the smallest, which implies that it was smaller than that of the EN domain.The importance of some domains differed among the models.
We used four valuation methods to construct a value set for AP-7D.The minimum value was the highest (-0.101) by the TTO model and lowest by the DCE with duration (-0.706).EQ-5D-5L was also valued by the cTTO method, and the utility of the worst health state by the Japanese EQ-5D-5L was -0.025 [22], which is the  highest recorded value in the world.Japanese people have a strong risk-averse feeling about death and are reluctant to trade health states with death.Therefore, the Japanese TTO-based value set may overestimate the utility of each health state.In contrast, the utility scores obtained using DCE with duration were very low.This means that Japanese people willingly trade life-years with their health state, although they do not prefer death.It is difficult to interpret this; they may imagine that the reduction in life years is different from death.This means that the utility scores obtained using DCE with duration may underestimate the utility of the AP-7D health states.However, ternary DCE included the "immediate death" card.In the ternary tasks, the respondents traded health states Fig. 4 Distribution of the Japanese EQ-5D-5L and AP-7D Fig. 5 Coefficients of the worst level (level 4) obtained using the four valuation methods with death, although they responded to DCE with duration tasks.The value set obtained using the ternary DCE method showed intermediate characteristics between the value sets obtained using the cTTO and DCE with duration tasks.However, the Japanese guidelines for economic evaluation recommend using EQ-5D-5L ("8.2.1 If Japanese QoL scores (utilities) are newly collected for a cost-effectiveness analysis, EQ-5D-5L is recommended as the first choice.").For example, the NICE in the UK and HAS in France also require the submission of EQ-5D-based utility scores.It may be important that new instruments are valuated using a similar cTTO-based method as the EQ-5D-5L.
One limitation of this study was the sampling method.Neither face-to-face nor web surveys allow respondents to be chosen randomly across Japan.Although the major background factors of the respondents are similar to those of the Japanese population, the influence of the sampling method may not be negligible.In addition, our sample was limited to people aged 20-69 years because of the outbreak of COVID-19.We recognize it is better to include more elderly people in our survey.The inclusion criteria of respondents has to be reconsidered when actual valuation survey is performed.Face-to-face survey was used only for the TTO survey.Difference in the survey mode could have influenced the results.Additionally, the survey was limited to Japan.It is unclear whether our findings and discussions can be generalized to other countries.Moreover, the influence of the COVID-19 outbreak, which could have changed the preferences for health states, is unknown.Elderly people could not be recruited into this survey because they were a high-risk population for COVID-19.

Conclusion
We constructed and compared four value sets for the Japanese AP-7D, which paves the way for considering valuation methods for an international AP-7D valuation survey.To reflect people's preferences more appropriately for effective decision making, we have to consider the methods to be applied.As discussed above, the value sets are completely different depending on the valuation methods, especially in the range of the measurement (negative utility scores).In addition, our findings could contribute to the development of not only AP-7D, but also other PBMs.The choice of "immediate death" significantly impacts the results, and the degree of death-risk acceptance may differ among countries, reflecting their respective cultures.Our goal is to show this instrument as a good alternative to existing PBMs, such as EQ-5D, SF-6D, and HUI.This is the first step of our future plan to improve decision making.To select the most appropriate valuation methods, we require more qualitative and deliberate processes with expert as well as non-expert members.The input of actual decision makers may also be required.

Fig. 1 Fig. 2
Fig.1The relation between utility and severity of health states

Fig. 3
Fig. 3 The relation between cTTO and DCE disutility

Table 2 (
b) presents the parameter estimates obtained from the DCE data.No inconsistencies were observed between groups in any of the models.No significant interactions were observed in model 3 and model 4. Using the coefficients of model 8 in Table 2(b), latent DCE scores were computed for the AP-7D states, because the AIC of model 8 was the smallest.The linear relation was estimated to predict the cTTO values based on the latent DCE values.The estimated equation from the regression of the cTTO score (disutility) to the latent DCE score was 1-cTTO score (disutility) = 0.223*x + 0.0433, where x denotes the latent DCE score.The DCE coefficients were rescaled using this equation.Figure

Table 2 (
c) shows the results for the DCE with duration and ternary DCE methods.The estimated coefficients using a simple conditional logit model showed two inconsistencies (model 10 to model 12), and levels 1 and 2 of EN and levels 3 and 4 of II were combined.The results by the panel conditional logit model (model 14) showed only one inconsistency.Similarly, two inconsistencies were observed in the coefficients of ternary DCE in model 16 to model 18 shown in Table 2(d).

Table 3
19)ts the anchored results obtained using the coefficients from the constrained models (model 5, model 9, model 15, and model19).The selections from some models were determined mainly considering the number of

Table 1
Background factors

Table 2
Estimated coefficients by TTO and DCE

Table 3
Scoring algorithm for AP-7D by all four models PD pain/discomfort, MH mental health, EN energy, MO mobility, WS work/school, II interpersonal interactions, BO burden to others