Valuing Chinese medicine quality of life-11 dimensions (CQ-11D) health states using a discrete choice experiment with survival duration (DCETTO)

Objective To explore generating a health utility value set for the Chinese medicine Quality of life-11 Dimensions (CQ-11D), a utility instrument designed to assess patients’ health status while receiving TCM treatment, among the Chinese population. Methods The study was designed to recruit at least 2400 respondents across mainland China to complete one-to-one, face-to-face interviews. Respondents completed ten discrete choice experiment with survival duration (DCETTO) tasks during interviews. The conditional logit models were used to generate the health utility value set for the CQ-11D using the DCETTO data. Results A total of 2,586 respondents were invited to participate in the survey and 2498 valid interviews were completed (a completion rate of 96.60%). The modified conditional logit model with combing logically inconsistent levels was ultimately selected to construct the health utility value set for the CQ-11D instrument. The range of the measurable health utility value was -0.868 ~ 1. Conclusion The study provides the first utility value set for the CQ-11D among the Chinese population. The CQ-11D and corresponding utility value set can be used to measure the health utility values of patients undergoing traditional Chinese medicine interventions, and further facilitate relevant cost-utility analyses. The application of the CQ-11D can support TCM resource allocation in China. Supplementary Information The online version contains supplementary material available at 10.1186/s12955-023-02180-4.


Introduction
Economic evaluations of health care interventions often involve incremental cost-effectiveness ratios, where the quality-adjusted life-year (QALY) is used to capture the health outcome of different interventions.Generic preference-based measures (GPBMs) are commonly used to calculate the QALY.Most of the GPBMs, such as the EQ-5D and SF-6D, were developed in Europe and North America, and are often translated into other languages to use in many non-English speaking countries [1].One of the advantages of using these international GPBMs is that researchers can use the same instrument to measure the health-related quality of life (HRQoL) of populations in different counties or regions, allowing for cross-country/cross-cultural comparisons [2].Similarly, when these GPBMs are applied in China for assessing health outcomes associated with Traditional Chinese Medicine (TCM), researchers can apply the adapted versions with their corresponding health utility value sets generated based on the Chinese population.
However, health is a culturally related concept, and health evaluation indicators formulated in the Western cultural environment may not include Chinese cultural views on health.A study evaluating the similarities and differences of health-related quality of life concepts between the East and the West compared 8 HRQoL instruments developed in the Chinese cultural context with 3 HRQoL instruments developed outside China.This study found that, although there is a consensus between the East and West on some of the HRQoL domains, domains such as emotional control, weather adaptation, social adaptation, spirituality, and skin color are unique to the Chinese cultural background [3].Mao Z et al. [4] conducted a Q-methodological investigation study, and the results showed that several HRQoL domains were rated highly as most important by a diverse range of Chinese respondents but were not covered in the commonly used Western HRQoL instrument, such as the EQ-5D.
Traditional Chinese medicine is a model related to the concept of health in Chinese traditional culture, which better reflects the understanding of Chinese culture on health.Some HRQoL instruments developed in China, such as the Chinese quality of life scale [5,6], the Chinese PRO scale [7], and the sub-health assessment scale [8], have designed health indicators including spirit, appetite, sleep, and other concepts, and have been widely used among the Chinese population.These instruments all reflect the relevant domains of TCM health concept such as "unity of body and spirit, unity of man and nature, unity of man and society," "seven emotions", and "shape, spirit, and emotion."In brief, well-rounded health is the unity of inseparability of the body (including orifices of sense organs) and spirit (including emotion and mind), adaptation to the natural environment and society as well as the harmony of social contact.In TCM terms, the body is an outward manifestation of the spirit, and the spirit is the master of the body.Therefore, the coordination between body and spirit [9] and correspondence between the natural environment and the human body constitute the TCM holism that maintains the consistency of the bio-psycho-social medical model (Fig. 1).However, there are no items with similar meanings as these concepts in the international GPBMs such as the EQ-5D.Therefore, the health states described by these international instruments may not be consistent with TCM theories [10][11][12][13], and therefore, might not be comprehensive for evaluating TCM treatments.Besides, it is commonly recognized that the EQ-5D is not sensitive enough for assessing sub-health conditions due to its ceiling effects, whereas the SF-6D is not adequate for discriminating mild diseases [14,15].based on the optimization of the first version of the Chinese Medicine QOL assessment scale (CM-QOL), with the overall view of traditional Chinese medicine and the health concept as the guiding ideology, using literature research, patient interviews, expert consultation, questionnaire surveys and constructed through standard processes.CM-QOL includes 19 items, such as complexion, appetite, sleep quality, stool, and attention.To make the original scale more suitable for compiling discrete choice experiment (DCE) tasks to develop a health utility value set, the CQ-11D was developed by modifying the items included in the CM-QOL.This proves was referred to a previous study modifying the SF-36 to the SF-6D conducted by Brazier et al. [16].The basic principles of the modification are the followings: ① To avoid the redundancy of entries, if there are two or more items that basically describe the same aspect of health and are closely related, then only one entry is kept; ② Items with negative descriptions are preferentially reserved because these items are considered more relevant to health assessments and services.After modification, a TCM HRQoL instrument, the CQ-11D, with 11 items and four levels for each item was finally developed.After evaluations of the measurement properties, it was demonstrated that the CQ-11D has good reliability, construct validity, and standard correlation validity [17].The CQ-11D has been issued by the China Association of Chinese Medicine under the standard number T/CACM1372-2021 with a release date of August 18, 2021, and an implementation date of August 18, 2021 [18].
DCE with survival duration (DCE TTO ) is a relatively new preference elicitation technique that is successfully used to generate health utility value sets for GPBMs in many different countries [19][20][21][22][23][24][25].This technique has not been used previously to value a TCM HRQoL instrument.Respondents complete a series of choice sets, including health state descriptions with an corresponding survival duration.Responses are modeled to generate a set of coefficients that lying on the 1-0 full health-dead QALY scale to calculate the utility values of all health states described by the classification system [26].
Since TCM plays an important role as a kind of Complementary and Alternative Medicine (CAM) for healthcare systems worldwide, a validated instrument for assessing disease impacts and health outcomes is needed for TCM interventions.We aimed to develop the health utility value set for the CQ-11D.This article reports the valuation of the CQ-11D in China using online DCE TTO among a representative sample of the Chinese general population.

CQ-11D instrument
Holism based on TCM theories was used to guide the development of the CQ-11D.The methods for formulating the instrument included searching the literature, interviewing patients, consulting experts, and using a questionnaire survey.The original instrument consisted of two parts: a self-rated health status questionnaire and a visual analog instrument score.The self-assessed health status questionnaire had 11 questions (Table 1) and was divided into two sections: ① Xing Shen Tong Ju-Xing (physical functioning) Dimension, which centered around physical functioning, contains 8 questions and ② Xing Shen Tong Ju-Shen (psychological well-being) Dimension, which centered around psychological wellbeing, contained three questions.According to a preliminary survey of developing CQ-11D, its feasibility evaluation results showed a good acceptance rate.The total Cronbach's α of the scale is 0.820, and the Cronbach's α of each dimension is greater than 0.6, indicating that the instrument had a good internal consistency.Using the exploratory factor analysis method, the KMO value of the scale is 0.791, Bartlett's sphericity test χ2 = 318.414,P < 0.05, which is suitable for factor analysis.The factor analysis results showed that the cumulative contribution rate of variance of the three common factors is 58.603%, and the items in the three common factors have the inherent logical relationship of the scale, indicating the instrument had structural validity.The CQ-11D and the EQ-5D-3L as standard benchmark instruments correlated with 0.651, indicating a good standard validity [17].

Investigation method and content
The DCE TTO questionnaire was developed by the Lighthouse Studio 9.9.2 software.The accompanying survival time dimensions were set to 4 levels, namely 1 year, 4 years, 7 years, and 10 years.A total of 700 pairs of health conditions were selected and distributed to 70 sets of DCE TTO tasks were generated using the balanced overlap method [27][28][29].Each set (i.e., ten DCE TTO tasks) was randomly selected during the survey for the respondent to answer; the task order and the left-right position of health states within each task were all randomized [29].Mock tests were performed on the generated discretechoice questionnaires to evaluate the equilibrium of health status extraction.A simulated sample size of 2,400 cases were set in Lighthouse Studio software to test the quality of the discrete choice experimental design.The interaction between each item and the dimension of survival time was checked.The test results showed that 12,015 (50.06%) of the 24,000 choices with a simulated sample size of 2,400 chose option 1, and 11,985 (49.94%) chose option 2. As a general guideline, the standard error should be 0.05 or less for main-effects procedures and 0.10 or less for interaction-effects procedures.The test results show that the standard errors of the main effects are all less than 0.05 (Table 2), the standard errors of the interaction effects are all less than 0.10, and the level of each item is well balanced.Other parts of the questionnaire included CQ-11D, basic information questionnaire, six-dimensional health survey summary form SF-6D, EQ-5D-3L, etc.

Respondent and interviewer
For discrete choice experiments, an average of more than 20 respondents should answer each set of questionnaires in order to estimate a reliable model [30].The DCE TTO design of this study generated 70 sets of questionnaires, so the effective sample size of this study was planned to be 2,400 respondents.Numerous provinces and cities in mainland China were selected for the investigation.The surveyed provinces and cities spread in North China, Northeast China, East China, Central China, South China, Northwest China, and Southwest China with a total of 28 provinces and municipalities, including 118 prefecture-level cities, to cover sufficient geographical distribution and diversified levels of economic development in China.A stratified sampling method was applied, in which two quotas were set for age and sex, to ensure these distributions of the sample resembled those of the general Chinese population (Table 3) [31].Recruit participants by posting recruitment advertisements in a way that is convenient for the interviewer.Recruitment was conducted in publicly accessible places (Parks, shops, streets, and university campuses) and private areas (participants' residences).Respondents are required to meet the following inclusion criteria: ① Age ≥ 18 years old; ② Chinese citizens with Chinese nationality; ③ Have been living in Mainland China for the past five years; ④ Agree to participate in this research.Respondents are also required not to meet the exclusion criteria: ① Have listening, speaking, reading, and writing difficulties or are unable to understand the interview content; ② Abnormal mental condition.The main steps of the investigation were as follows: ① The respondents were screened into the research and informed consent; ② The interviewer guided the respondent to complete the CQ-11D questionnaire; ③ The interviewer guided the respondent in completing the DCE TTO tasks.In addition, after completing the DCE TTO tasks, respondents were asked to self-assess the difficulty of understanding and answering these tasks according to a 5-point Likert scale ranging from very easy to very difficult; ④ The interviewer guided the respondent to complete the background information questionnaire and the EQ-5D-3L and the SF-6D; ⑥ Recorded the time for the respondent to complete the survey; ⑦ Checked whether the questionnaire was clear and complete.

Quality control
A total of 125 interviewers divided into six teams were involved with one quality control leader and one project supervisor in each group.The following quality control methods were carried out.
(1) Interviewer training.All interviewers received a full-day training, including DCE operational processes, questionnaire examples, and quality control requirements to ensure equivalent task understanding, standard procedures, and good respondent interactions.(2) Team management.All interviewers were divided into six teams.Each team was designated a team leader who was responsible for the management and guidance of interviewers and collecting survey recordings for quality control; there was also a supervisor interviewer who was mainly responsible for the supervision of the process, follow-up visits for respondents, and review of quality control materials (interview sound recordings, informed consents, and other materials) to ensure the data quality.
(3) Questionnaire invalidation criteria.1)The respondent had difficulty understanding the task, was impatient, did not cooperate with the interviewer, or did not respond according to relevant requirements and instructions; 2) The interviewer failed to operate in accordance with the research specifications or the interviewer's manual; 3) The respondent  failed to complete the entire questionnaire; 4) The time of completing the questionnaire was too short (less than 5 min), which affected the quality of the interview.(4) The unique design of DCE TTO task choice: Each item of the DCE TTO task includes four levels, and the corresponding degree words are in the order of best, relatively good, relatively poor, the worst corresponds to the four colors of dark green, light green, light red, and dark red, respectively, in order to facilitate the respondents to understand and remember the degree of the health state (Fig. 2).( 5) Data entry: Two research team members daily entered and checked the data to ensure accuracy.( 6) Identification of potentially problematic data: Identified the data who always select the same options, such as "AAA AAA AAAA"; or select "ABABABABAB" in the DCE TTO [16,32,33].

Statistical and analysis methods
The DCE TTO data were analyzed under the random utility framework using a conditional logit model, which assumes a homogenous preference from the respondents, following the model specification proposed by Bansback et al. [16,19]: Among them, U i represented potential utility, t dl represents survival time, x dl t dl represented the interaction between item dimension level and survival time, t represented the main effect of survival time, and it was taken as a linear continuous variable [28].The DCE TTO value for each health state can be anchored on the QALY scale as follows: The variable definitions in the model construction of this study are shown in S1.The dependent variable y is the choice of each respondent, and it is a binary variable with a value of 0 or 1.Independent variables include survival duration, which is considered to be a linear continuous variable.In addition, there are 11 items of the CQ-11D, including "activity" (HD: hd2y, hd3y, hd4y), "appetite" (SY: sy2y, sy3y, sy4y), "Stool status"(DB: db2y, db3y, db4y), "Sleep quality" (SM: sm2y, sm3y, sm4y), "Vigor" (JS: js2y, js3y, js4y), "Dizziness" (TY: ty2y, ty3y, ty4y), "Palpitation" (XH: xh2y, xh3y, xh4y), "Pain" (TT: (1) tt2y, tt3y, tt4y), "Fatigue" (PL: pl2y, pl3y, pl4y), "Irritability" (FZ: fz2y, fz3y, fz4y) and "Frustrated" (JL: jl2y, jl3y, jl4y).Excel 2016 was used for saving, merging, screening, and basic data conversion.Descriptive statistics were applied by SPSS (Version 20) to summarize the detailed number and proportion of respondents of the specific level of demographic variables.STATA 15.0 was used to construct conditional logit models.We conducted the t-test for continuous variables and the χ2 or Fisher's exact test for categorical variables.Differences in the distribution of characteristics and model coefficients were considered statistically significant if p < 0.05.A correlation coefficient and difference test were used to determine if respondents' responses were consistent and whether health evaluation results differed across instruments.Because of the large sample size in this study, Spearman correlation coefficients and Pearson correlation coefficients are calculated simultaneously in the correlation analysis if the variable does not conform to the normal distribution.For the EQ-5D-3L, the utility value was calculated using the Chinese value set conducted in 2014 [34], and for the SF-6D, the utility value was calculated using the Chinese Hong value set [35,36].
This study protocol was approved by the ethics committee of the Beijing University of Chinese Medicine (Approval number: 2021BZYLL03012).Informed consent was obtained from all respondents included in the study.

Characteristics of the sample
A total of 2,586 respondents were involved, of which 88 interviews were excluded because the respondents did not complete the whole interview (N = 57), or the interviews did not meet the inclusion criteria (N = 5), or answered with logical inconsistencies (N = 9), or the interview took less than 5 min (N = 17).Finally, a total of 2498 respondents were included (Fig. 3).As illustrated in Table 4, 46.08% were males, 42.91% were agricultural accounts, and each geographic distribution ranged from 8.85% to 17.53%.The characteristics of respondents were close to those of the general Chinese population.
The mean ± SD time of the interviews was 14.5 ± 5.9 min, the minimum was 5.0 min, and the maximum was 52.0 min.68.29% of the respondents thought that the health status displayed by the DCE TTO tasks was very easy or easy to understand, and 7.65% of the respondents thought it was difficult or very difficult to understand; in terms of tasks choice, 50.56% of the respondents thought it was very easy or easy, and 18.33% of the respondents thought it was difficult or very difficult.Overall, the DCE TTO tasks were relatively easy to complete by the general Chinese population.Nevertheless, potentially problematic answer patterns were observed in respondents who always selected the same options (e.g., 25 respondents responded ' AAA AAA AAAA' , 5 respondents responded 'BBBBBBBBBB' and 6 respondents responded ' ABABABABAB') in the DCE TTO .These very small proportion of respondents (i.e., 1.40% of total respondents) were not observed noticeable differences in demographic characteristics, and some answers may be due to random errors.Therefore, these respondents were not excluded from this study [28].

Construction of health utility value set
The results of the conditional logit model estimations are shown in Table 5.There are five non-monotonic coefficients in the conditional logit model, namely SY, SM, TY, PL, and JL.We therefore modified the conditional logit model by combing level 2 with level 1 for the "Appetite " dimension, level 2 with level 1 for the "Sleep Quality" dimension, level 2 with level 1 for the "Dizziness" dimension, level 2 with level 1 for the "Fatigue " dimension, and level 2 and level 1 for the "Anxiety or depression" dimension.The latent utility values generated was then anchored by using the coefficient of the survival time dimension to obtain the anchored coefficients (Table 5).

CQ-11D value set
According to the anchored results of the conditional logit model, it can be determined that the utility value set of CQ-11D based on the health preference of the general population in China was presented in Table 6.The formula for calculating the health utility value of CQ-11D based on the health preference of the Chinese population is as follows: All the health states described by the CQ-11D can be calculated using this function.For example, if the health status is 11,111,111,111 for all the dimensions, the health utility value is 1.The utility value of the health state "13,112,121,223" can be calculated as U (13112121223) = 1-0-0.102-0-0-0.022-0-0.007-0-0-0.006-0.052= 0.811.The worst health state "44,444,444,444" utility value can be calculated as U (44444444444) = 1-0.500-0.149-0.099-0.118-0.143-0.135-0.131-0.211-0.114-0.109-0.159= -0.868.

Fig. 3 Flow chart of sample inclusion
The range of measurable utility values of the CQ-11D is -0.868 ~ 1.

Comparative study on the utility value of CQ-11D, SF-6D, EQ-5D-3L
The correlation analysis results are shown in S2.The utility value measured by the CQ-11D was significantly correlated with the utility value measured by the SF-6D and EQ-5D-3L.As shown in Table 7, the utility value   difference between the CQ-11D, SF-6D, and EQ-5D-3L was statistically significant (P < 0.01) (S3), and the mean and median utility values of the EQ-5D-3L instrument were the largest, the mean and median utility values of the SF-6D instrument were the smallest, while the mean and median utility values of the CQ-11D were in the middle.Correlation analysis results are shown in S2; the utility value measured by the CQ-11D is significantly correlated with that measured by the SF-6D and EQ-5D-3L.A statistically significant difference (p < 0.01) was observed between the CQ-11D, SF-6D, and EQ-5D-3L utility values (S3).Aside from that, EQ-5D-3L had the highest mean and median utility values, the SF-6D had the lowest, and the CQ-11D had a mean and median utility value that was midway between the two.When comparing health utility values measured by the CQ-11D, SF-6D, and EQ-5D-3L, the proportions with a health utility value of 1 were 11.53%, 7.77%, and 62.49%, respectively.The specific results are shown in Figs. 4, 5 and 6.

Discussion
This study reports a Chinese-specific value set for the CQ-11D that can be used for economic evaluations.The measurement range of the value set is -0.868 ~ 1.This study has two substantial advantages.First, the utility of undergoing TCM treatment can be directly quantified for economic evaluation.The CQ-11D can be used to solve the problems of insufficient presentation of PROs and difficult access to health utility values for TCM interventions.It also provides an effective measurement tool for clinical and economic evaluations of TCM.Secondly, it captures aspects of Chinese culture and TCM theory that are not included in other GPBMs (such as appetite, stool, and dizziness).The largest utility decrements in Action and life self-care (HD), Pain (TT), Anxiety or depression (JL), and Appetite (SY) dimensions had more impact on utility values but were not fully reflected in generic instruments.
Even though the utility decrements for the TCM theories-sensitive dimensions were smaller than those for the more generic dimensions, their inclusion provides a more relevant measure of utility for TCM treatment under the Chinese culture.The pits state in this study is -0.868, which is relatively low.Similarly, in the valuation study in Australia for the QLU-C10D, which has a similar number of dimensions to CQ-11D, it was observed that the pits state was -0.96 [37].Different health state classification systems, valuation methods, and utility functional forms, as well as country-specific cultural disparities in views toward trading between mortality and morbidity, all contribute to variations in the value of health states [38].One could argue that a lower pits state value indicates a wider range in a utility value set, which might result in more variation between interventions in CUAs [37].
In this study, the DCE TTO questionnaire was developed by the Lighthouse Studio 9.9.2 software.The accompanying survival time dimensions were set to 4 levels, namely 1 year, 4 years, 7 years, and 10 years.A total of 700 pairs of health conditions were selected and distributed to 70 sets of DCE TTO tasks were generated using the balanced overlap method [27][28][29].It should be noted that this is not the first time used the lighthouse studio software to develop DCE TTO design.It has been used in previous  valuation studies which use DCE or DCETTO [28,39].In previous studies, there were some (statistically insignificant) inconsistent coefficients in the DCE TTO model.
Following the previous study, the adjacent inconsistent levels were combined when developing value sets, to produce a fully consistent model [28,32,40,41].Finally, The latent utility values generated by the DCE TTO data needs to be rescaled using the coefficient of the additional dimension of survival duration.Compared with traditional approaches such as the standard gamble (SG) and time trade-off (TTO), this approach is easy to understand, operate and manage.The DCE TTO requires respondents to simply point out that option A is better than option B without going through an iterative process to determine at which point the respondents believe that A and B are indistinguishable [19,42].
Nevertheless, there are certain limitations in that the research process is affected by the understanding and compliance of some populations as a result of the effective sample size included in the age quota.There are more people in the 18-29 age group, and the number of 50-59-year-old participants is relatively small.For a pits state of -0.868, we have some important issues for future research.A related issue is sensitivity to varying degrees of impairment.Assess the sensitivity of CQ-11D to differences in the impact of different degrees (mild or extreme) of QOL and compare them on a generic instrument [37].There were also some limitations in the experiment design.In this study, the DCE TTO questionnaire was developed by the Lighthouse Studio 9.9.2 software.Sawtooth Software's procedure does not formally estimate D-efficiency and assumes that designs that are level balanced and near orthogonal will lead to identified preference-model parameters.A disadvantage, however, is that design heterogeneity could be confounded with taste heterogeneity and scale differences [27].The respondents choose among sets of experimentally controlled sets of profiles and these choices are modeled via multinomial logit as a function of the experimental design variables [29].However, the multinomial logit model which means it cannot tell the preference heterogeneities of different groups of respondents among the whole sampled population [43].Thus this study was likely to favor a conditional logit regardless of whether preference heterogeneity was in fact present.However, based on previous studies it was considered necessary to explore the results of mixed logit models [28,39,44].This study explores the results of mixed logit model construction.Since the experiment design bias did not show significant preference heterogeneity, the corresponding results were not presented in the research results section.The relevant results can be found in S4.Another problem is the generate the value set under nonlinear temporal preferences.Jonker et al. find that the best statistical fit was obtained when using a hyperbolic discount function, which resulted in smaller QALY decrements and fewer health states classified as worse than immediate death [45,46].It's unlikely to be able to assess non-linear time preferences in this study given that it was optimized under linear time preferences.In the future, the value set of the CQ-11D can be further improved based on the aforementioned research issues.

Conclusion
The study provides the first value set for the CQ-11D, which can facilitate cost-utility analyses when applied to data collected with the CQ-11D prospectively and retrospectively.The valuation tool of the CQ-11D was developed for measuring the quality of life and health utility of patients undergoing traditional Chinese medicine interventions.The application of CQ-11D can support TCM resource allocation in China.

Fig. 2 A
Fig. 2 A Sample set of DCETTO choice task (A: Chinses version; B: English version).Note: The corresponding degree words are in the order of best, relatively good, relatively poor, and the worst corresponds to the four colors of dark green, light green, light red, and dark red respectively

Fig. 4 Fig. 5 Fig. 6
Fig.4 Histogram of the frequency distribution of the health utility value of the general population in China on CQ-11D

Table 2
Experimental extraction of equilibrium main effect simulation test results items 1-11 correspond to 11 items of the traditional Chinese Medicine quality of Life Assessment scale (CQ-11D), and item 12 represents the dimension of survival time

Table 3
Sample quota design * The proportion of China's adult population was calculated using the data from China Statistical Yearbook (2019); a 18-19-year-old population data in the "China Statistical Yearbook (2019)" was included in the 15-19-year-old population, so it was obtained by calculating the average population of each age in the 15-19 years old

Table 4
Basic information of respondents

Table 5
Conditional logit model and logit model calculation results

Table 6
CQ-11D utility value set