A comparison of EQ-5D index scores using the UK, US, and Japan preference weights in a Thai sample with type 2 diabetes
© Sakthong et al; licensee BioMed Central Ltd. 2008
Received: 04 February 2008
Accepted: 23 September 2008
Published: 23 September 2008
Data are scarce on the comparison of EQ-5D index scores using the UK, US, and Japan preference weights in other populations. This study was aimed to examine the differences and agreements between these three weights, psychometric properties including test-retest reliability, convergent and known-groups validity, and the impact of differences in the EQ-5D scores on the outcome of cost-utility analysis in Thai people.
A convenience sample of 303 type 2 diabetic outpatients (18 years or older) from a cross-sectional study was examined. ANOVA and pos-hoc Bonferroni tests were used to determine the differences among the three EQ-5D scores. The agreements among the EQ-5D scores were assessed employing intraclass correlations coefficients (ICCs) and Bland-Altman plots. The ICCs were utilized to examine the test-retest reliability. Spearman's rho correlation coefficients were used to assess the convergent validity between the EQ-5D scores and sociodemographic & clinical data, and health status. Mann-Whitney U tests were used to test the differences in EQ-5D scores between the known groups including HbA1c level (cut point of 7%), and the presence of diabetic complications namely neuropathy, retinopathy, nephropathy and cardiovascular diseases. Seven hypothetical decision trees were created to evaluate the impact of differences in the EQ-5D scores on the incremental cost-utility ratio (ICUR).
The US weights yielded higher scores than those of the UK and the Japan weights (p < 0.001, both), while the UK and the Japan weighted scores did not differ (p > 0.05). Both UK and US scores had more agreement with each other than with the Japan scores. Regarding psychometric properties, the Japan scheme provided better test-retest reliability, convergent and known-groups validity than both UK and US schemes. The variation in EQ-5D scores estimated from UK, US, and Japan preference weights had a marginal impact on ICUR (range: 1.23–6.32%).
Since the Japan model showed more preferable psychometric properties than the UK and the US models and the differences in these EQ-5D scores had a small impact on ICUR, we recommended that for both clinical and policy purposes the Japan scheme should be used in Thai people. However, more research needs to be done.
The health utility (HU) approach to assessing health-related quality of life (HRQoL) is a commonly used technique for determining preferences for health outcomes in evaluation of public health and healthcare interventions such as cost-utility analyses (CUA) [1, 2]. In CUA, a utility score is assigned to the health state on the cardinal scale in which dead = 0 and perfect health = 1 to indicate their preferences for different outcomes. The utility score is incorporated into quality-adjusted life-year (QALYs) which combine, in a single index, gains or losses in quantity (life expectancy) and quality of life (HU). The EuroQoL (EQ-5D) is the most frequently used HU instrument for calculating QALYs based on actual measurements of patients' HRQoL .
The EQ-5D instrument consists of a five-item descriptive system of health states and a visual analog scale (VAS). Scores for the five health states can be converted into a utility index score by using scores from value sets (preference weights) elicited from a general population. The best-known preference weights were derived from samples of the United Kingdom (UK) population which is the original one for estimating EQ-5D index scores . The UK-based preference weights are applied to other populations when country-specific weights are not available. However, evidence suggests valuations of health states could differ for people in different countries due to differences in demographic backgrounds, social-cultural values, and economic systems [5–8]. Thus, it is advisable to use country-specific weights in a given country if available.
Unfortunately, preference weights of EQ-5D for Thai people are not available yet. Valuation of the EQ-5D health states nationwide is a complex, time-consuming, and expensive task, so applying other existing preference weights is essential if not available in the country. Nevertheless, whose weighting scheme or which cultural/country-specific populations are appropriate are not known for Thai population. Besides the UK weights, there are a number of other countries having their own population-based preference weights for the EQ-5D [7, 9–14]. Of these, the United States (US) weight scheme is a unique D1 model  different from the UK model (N3 model) that was applied to other countries' models. Studies have also shown that EQ-5D scores derived from the US weights were different from those of the UK [15–17].
Japan has been the first Asian country to develop its own preference weights of EQ-5D since 2002 . The Japan model was chosen to represent Asian preference weights. We were interested in knowing how different EQ-5D index scores using the UK, US, and Japan preference weights were. Little was also known about psychometric properties of these schemes in different cultural contexts and specific patient samples (all models were developed in general population). Therefore, we would like to determine the differences and agreements among these three countries' preference weighted scores (the three countries are located in three different continents as well) using a Thai patient sample. Their psychometric properties including test-retest reliability, convergent and known-groups validity were also explored. The psychometric properties would provide additional evidence of validity for the use of the EQ-5D index score in Thai settings. Moreover, we would examine the impact of differences in the EQ-5D scores on the outcome of CUA employing hypothetical scenarios.
Subjects and procedures
The data used in this paper was derived from a cross-sectional study . In this study, a convenience sample of 303 type 2 diabetic outpatients was collected from the General Police Hospital in Bangkok, Thailand, between January-June, 2007. Patients with type 2 diabetics waiting for seeing physicians were approached to participate in this study. Patients who were eligible for the study were at least 18 years old and were able to understand the Thai language. Patients with health problems or cognitive impairments that could not complete interview were excluded. The face-to-face interviews include Morisky Medication Adherence Scale, Center for Epidemiologic Studies Depression (CES-D), EQ-5D questionnaire, VAS, sociodemographic and clinical data, together with reviewing medical records. In addition, about one-fifth of this sample (N = 64) was randomly selected to conduct one-two week test-retest reliability via telephone. This study was approved by the Ethics Committee of the Police Hospital.
EQ-5D: UK, US, and Japan preference weights
The EQ-5D includes a five-item descriptive system, with one item for each of the following health attributes: mobility, self care, usual activity, pain/discomfort, and anxiety/depression. Each attribute has three levels: no problem, some problem, and major problem. A total of 243 possible health states are generated.
The UK valuation study was conducted based on the Measuring and Valuation Health (MVH) protocol to collect a general adult population in the United Kingdom (England, Scotland, and Wales) [4, 19]. The preference values for 42 core health states were elicited using time trade-off (TTO) methods. The valuations of the 42 health states were then interpolated by regression models to predict the index scores for all EQ-5D possible health states. The UK model consists of a set of variables representing each EQ-5D health dimension, with two dummy variables representing the levels of each dimension. A dichotomous variable (N3) was also added to the model to indicate if level 3 (major problem) occurs within at least one dimension.
The US health state valuation study was derived based on the UK MVH protocol. But the US algorithm replaced the N3 variable by D1, which represents additional number of dimensions at level 2 and 3 beyond the first .
The Japan valuation study is a quasi-replication of the UK MVH protocol using the modified protocol, where each respondent was presented with 17 health states, instead of 42 health states. The plain main effects model was preferred .
The EQ-5D index scores were calculated using the UK, US, and Japan preference weights. We first determined the differences among the three index scores using ANOVA, followed by pos-hoc Bonferroni tests. The agreements among the EQ-5D scores using the UK, US, and Japan preference weights were also assessed employing intraclass correlations coefficients (ICCs) and Bland-Altman plots . We then examined the psychometric properties of these EQ-5D scores using the following approaches: one-two week test-retest reliability, convergent validity and known-groups validity .
To evaluate the test-retest reliability, intraclass correlations coefficients (ICCs) were employed. For convergent validity, we assessed the associations between the three EQ-5D scores and sociodemographic & clinical data and health status including age, gender, income, duration of diabetes, body mass index (an indicator of obesity), HbA1c level, number of diabetic complications, CES-D scores, and VAS scores using Spearman's rho correlation coefficients.
Concerning known-groups validity, we examined the ability of the three EQ-5D scores using the UK, US, and Japan preference weights to discriminate between clinical known groups including HbA1c level (below versus equal or above 7%), and presence and absence of diabetic complications namely neuropathy, retinopathy, nephropathy and cardiovascular. Mann-Whitney U tests were used to test the differences in EQ-5D index scores between these known groups because the distributions of EQ-5D utility scores had a number of outliers. All analyses were performed using SPSS version 13.0.
Data component of the base-case scenario (decision tree 1)
Drug A (new drug)
Drug B (existing drug)
Probability of treatment results
Characteristics of type 2 diabetic patients (N = 303)
Mean ± SD
61.1 ± 11.4
Income (Baht per month)
Median (25th percentile, 75th percentile)
5,000 (0, 16,300)
Duration of diabetes (year)
Mean ± SD
12.2 ± 8.4
Mean ± SD
7.7 ± 1.7%
Body mass index (Kg/m2)
Mean ± SD
26.7 ± 1.7
Median (25th percentile, 75th percentile)
5 (2, 10)
VAS scores (usual scores: 0–1)
Mean ± SD
0.69 ± 0.16
Descriptive statistics of EQ-5D index scores using the UK, US, and Japan preference weights
Descriptive statistics of differences in EQ-5D index scores using the UK, US, and Japan preference weights
Agreement between the EQ-5D index scores using UK, US, and Japan preference weights
Agreement between UK, US, and Japan weights
ICC (95% CI)
UK & US
UK & Japan
US & Japan
The Bland-Altman plot of UK and US weights showed that 96.4% of the difference scores were between the limits of agreement, 3.3% below the lower agreement line, and 0.3% above the upper agreement line (Figures 2A). Approximately 64% of the UK weights were lower than the US weights (less than zero), 31% are equal, and 5% are higher (greater than zero).
The Bland-Altman plot of UK and Japan weights showed that 96% of the difference scores were between the limits of agreement, 4% below the lower agreement line, and none of the scores above the upper agreement line (Figures 2B). Approximately 64% of the UK weights were higher than the Japan weights (greater than zero), 21% are equal, and 15% are lower (less than zero).
The Bland-Altman plot of US and Japan weights showed that 96.4% of the difference scores were between the limits of agreement, 3.6% below the lower agreement line, and none above the upper agreement line (Figures 2C). Approximately 75% of the US weights were higher than the Japan weights (greater than zero), 21% are equal, and 4% are lower (less than zero).
One-two week test-retest reliability of EQ-5D index scores using UK, US, and Japan preference weights (N = 64)
ICC (95% CI)
Convergent validity of EQ-5D index scores using UK, US, and Japan preference weights
Gender (0 = male, 1 female)
Duration of diabetes (year)
Body mass index (Kg/m2)
HbA1c level (%)
Number of diabetic complications
Known-groups validity of the EQ-5D index scores using UK, US, and Japan preference weights
HbA1c < 7%
HbA1c ≤ 7%
The impact of the differences in EQ-5D index scores using UK, US, and Japan preference weights on cost-utility analysis
Impact of differences in EQ-5D index scores using UK, US, and Japan preference weights on ICUR for 7 hypothetical decision trees
Hypothetical decision tree
EQ-5D index scores when the drug was effective (success)
EQ-5D index scores when the drug was not effective (failure)
Incremental cost (Baht)
% Difference in ICUR from base case (Decision tree 1)
To the best of our knowledge, this is the first study examining the differences and cross-cultural validation between EQ-5D scores derived from UK, US, and Japan preference weights. The results showed that there were significant differences across the three EQ-5D index scores. US weights yielded higher scores than those of UK and Japan weights (p < 0.001, both), while the UK and Japan weighted scores did not differ (p > 0.05). The EQ-5D index scores derived from both UK and Japan weights were also comparable to that of a previous study which showed that type 2 diabetes provided the mean EQ-5D score of 0.75 . Both UK and US scores had more agreement with each other than with the Japan scores. As for psychometric properties, the Japan scheme provided better test-retest reliability, convergent and known-groups validity than both UK and US schemes. We also determined the impact of the differences in these EQ-5D index scores on the outcome of CUA. It was found that variation in utility scores estimated from UK, US, and Japan preference weights had a relatively small impact on CUA (range: 1.23–6.32%).
Our study showed that the US weighted scores were higher than the UK weighted scores. This result is consistent with the previous study conducted in US patients living with HIV infection . However, our study yielded larger mean difference scores (mean difference = 0.05) than those of the previous study (mean difference = 0.03). This may be due to differences in health states of patient populations. Johnson et al found that the discrepancy between the US and UK schemes was smaller for better health states, but larger for extreme health problems . This finding is also similar to our study (please see Figure 2A). In the previous US study, the HIV patients had better health (mean EQ-5D scores using US and UK was 0.87 and 0.84, respectively) than those of the diabetic patients in the present study (mean EQ-5D scores using US and UK was 0.81 and 0.76, respectively). Therefore, this may be the reason why the larger mean difference between US and UK was found in the present study.
This study also showed that the EQ-5D index scores using the US scheme were higher than those of the Japan scheme with the estimated mean difference of 0.07, while the UK model yielded slightly higher scores than the Japan model with the mean difference of 0.02 (not statistically significant). No previous study has compared between US and Japan weighted scores; however, the large discrepancy may be attributable to differences in algorithms, cultures, research methods, and/or other factors. Tsuchiya and colleagues have reported that the Japan scheme yielded consistently higher scores than the UK weights except for the very mild states . This finding contrasted with our results that the mean UK weighted scores had slightly higher than the mean Japan index scores but they were not significantly different. Also, the Bland-Altman plot (Figure 2B) presented that the majority of the UK weighted scores (62%) was higher than the Japan weights except for the extreme health states. These different results may be due to the fact that they did the previous study in a general population, but we used a real patient population. The utility weights derived from a heterogeneous general population and applied to a patient population may be less precise to detect differences across cultures. In addition, due to differences in population ratings and healthcare settings between Japan and Thailand, EQ-5D valuations would perform differently when applied to different populations.
It is not surprising that UK and US preference weights had more agreement with each other than with Japan weights because they are western countries whose cultures are different from that of Japan which is in Asia. Moreover, the Japan scheme provided better test-retest reliability, convergent and known-groups validity than both UK and US schemes in this Thai sample. These results may reflect the fact that Thailand is an Asian country whose culture is closer to Japan than to both UK and US. Thus, it is more likely that the Japan weights should be used for EQ-5D valuations for Thai people than the UK or US weights.
Even though our results showed that there was difference in EQ-5D scores derived from the UK, US and Japan weights, the impact on ICUR was marginal. This leads to the question of which preference weights should be used and in what situations. All of our results suggest that if the EQ-5D index scores is used as a HRQoL measure for the purpose of clinical decision making such as using the utility scores to be a clinical indicator to monitor patients' health status, the Japan should be applied for Thais. However, if one would like to evaluate CUA or CEA whose outcomes are QALYs gained, the choice of weighting scheme does not matter. Nevertheless, if we have to recommend a method, the Japan should be the most appropriate one because they demonstrated better psychometric properties than the UK and US weights.
The results of this study need to be interpreted in the light of these following limitations. First, we used only cross-sectional data. Differences in change scores may be likely to have a greater impact on ICUR than changes in absolute scores. Thus, further study should be done in longitudinal data. Second, our data were derived from diabetic outpatients, so the results were limited to a specific patient group. The findings are not likely to be able to be generalized to other patient populations. Other clinical populations need investigation. Finally, we utilized a simple hypothetical decision tree model to examine the impact of variability in EQ-5D index scores on ICUR. Therefore, using real CUA data should be more informative.
In this study, we compared weights on EQ-5D valuations using algorithms developed in the UK, US, and Japan general populations, but cross-validated using a Thai patient sample. Our results suggest that the US scheme provided higher EQ-5D index scores than the UK and Japan schemes, while the UK and Japan weighted scores did not significantly differ. However, the impact of the differences in these EQ-5D index scores on the outcome of CUA was quite small. Both UK and US scores had more agreement with each other than with Japan scores. The Japan scheme provided better test-retest reliability, convergent and known-groups validity than both UK and US schemes. We recommended that among these three weights the Japan model should be used in Thai people. However, more research needs to be done.
health-related quality of life
visual analog scale
Center for Epidemiologic Studies Depression
analysis of variance
intraclass correlations coefficients
- 95% CI:
95% confidence interval
clinically important difference
incremental cost-utility ratio.
This research was supported by a grant from Chulalongkorn University. The authors thank diabetic patients for providing their valuable data, and nurses and physicians for their assistance in collecting the data.
- Gold MR, Russell LB, Siegel JE, Weinstein MC: Cost-effectiveness in health and medicine. New York: Oxford University Press; 1996.Google Scholar
- Drummond MF, Sculpher MJ, Torrance GW, O'Brien BJ, Stoddart GL: Methods for the economic evaluation of health care programmes. New York: Oxford University Press; 2005.Google Scholar
- Rasanen P, Roine E, Sintonen H, Semberg-Konttinen V, Ryynanen OP, Roine R: Use of quality-adjusted life years for the estimation of effectiveness of health care: A systematic literature review. Int J technol Assess Health Care 2006, 22: 235–241. 10.1017/S0266462306051051View ArticlePubMedGoogle Scholar
- Dolan P: Modeling valuations for EuroQol health states. Med Care 1997, 35: 1095–1108. 10.1097/00005650-199711000-00002View ArticlePubMedGoogle Scholar
- Guillemin F, Bombardier C, Beaton D: Cross-cultural adaptation of health-related quality of life measures: Literature review and proposed guidelines. J Clin Epidemiol 1993, 46: 1417–1432. 10.1016/0895-4356(93)90142-NView ArticlePubMedGoogle Scholar
- Kaplan RM, Feeny D, Revicki DA: Methods for assessing relative importance in preference based outcome measures. Qual Life Res 1993, 2: 467–475. 10.1007/BF00422221View ArticlePubMedGoogle Scholar
- Badia X, Roset M, Herdman M, Kind P: A comparison of United Kingdom and Spanish general population time trade-off values for EQ-5D health states. Med Decis Making 2001, 21: 7–16.View ArticlePubMedGoogle Scholar
- Busschbach J, Weijnen T, Nieuwenhuizen Oppe S, Badia X, Dolan P, Greiner W, Kind P, Krabbe P, Ohinmaa A, Roset M, Sintonen H, Tsuchiya A, Williams A, Yfantopoulos J, De Charro F: A comparison of EQ-5D time trade-off values obtained in Germany, United Kingdom, Spain. In The measurement and valuation of health status using EQ-5D: A European perspective. Edited by: Brooks RG, Rabin F, De Charro. Boston: Kluwer Academic Publishers; 2003:143–165.View ArticleGoogle Scholar
- Claes C, Greiner W, Uber A, Graf Schulenberg J-M: An interview-based comparison of the TTO and VAS values given to EuroQoL states of health by the German population. In Proceedings of the 15th Plenary Meeting of the EuroQol group: 1–2 september 1998; Hannover Germany. Edited by: Greiner W, Graf von der Schulenberg J-M, Piercy J. Hannover: Centre for Health Economics and Health Systems Research, University of Hannover; 1999:13–38.Google Scholar
- Wittrup-jensen KU, Lauridsen JT, Gudex C: Estimating Danish EQ-5D tariffs using the time trade-off (TTO) and visual analog scale (VAS) methods. In Proceedings of the 15th Plenary Meeting of the EuroQol group: 6–7 september 2001; Copenhagen Denmark. Edited by: Norinder AL, Pedersen KM, Roos P. Lund: Swedish Institute for Health economics; 2002:257–292.Google Scholar
- Tsuchiya A, Ikeda S, Ikekami N, Nishimura S, Sakai I, Fukuda T, Hamashima C, Hisashige A, Tamura M: Estimating an EQ-5D population value set: the case of Japan. Health Econ 2002, 11: 341–345. 10.1002/hec.673View ArticlePubMedGoogle Scholar
- Jelsma J, Hansen K, de Weerdt W, de Cock P, Kind P: How do Zimbabeweans value health states? Popul Health Met 2003, 1: 1–11. 10.1186/1478-7954-1-1View ArticleGoogle Scholar
- Shaw JW, Johnson JA, Coons SJ: US valuation of the EQ-5D health states: development and testing of the D1 valuation model. Med Care 2005, 43: 203–220. 10.1097/00005650-200503000-00003View ArticlePubMedGoogle Scholar
- Min-Woo J, Sang-Il L: General population time trade-off values for 42 EQ-5D health states in South Korea. J Prev Med Public Health 2007, 40: 169–176.View ArticleGoogle Scholar
- Johnson JA, Luo N, Shaw JW, Kind P, Coons SJ: Valuations of the EQ-5D health states: Are the united states and united kingdom different? Med Care 2005, 43: 221–228. 10.1097/00005650-200503000-00004View ArticlePubMedGoogle Scholar
- Nan L, Johnson JA, Shaw JW, Coons SJ: A comparison of EQ-5D index scores derived from the US and UK population-based scoring functions. Med Decis Making 2007, 27: 321–326. 10.1177/0272989X07300603View ArticleGoogle Scholar
- Huang IC, Willke RJ, Atkinson MJ, Lenderking WR, franngakis C, Wu AW: US and UK versions of the EQ-5D preference weights: Dose choice of preference weights make a difference? Qual Life Res 2007, 16: 1065–1072. 10.1007/s11136-007-9206-4View ArticlePubMedGoogle Scholar
- Chabunthom R, Sakthong P: Effect of depression on medication adherence and glycemic control in patients with type 2 diabetes. Thai Journal of Hospital Pharmacy 2009, in press.Google Scholar
- Dolan P, Gudex C, Kind P, Williams A: The time trade-off method; Results from a general population study. Health Econ 1996, 5: 141–154. Publisher Full Text 10.1002/(SICI)1099-1050(199603)5:2<141::AID-HEC189>3.0.CO;2-NView ArticlePubMedGoogle Scholar
- Bland MJ, Altman DG: Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet 1986, 1: 307–310.View ArticleGoogle Scholar
- Brazier J, Deverill M: A checklist for judging preference-based measures of health related quality of life: learning from psychometrics. Health Economics 1999, 8: 41–51. 10.1002/(SICI)1099-1050(199902)8:1<41::AID-HEC395>3.0.CO;2-#View ArticlePubMedGoogle Scholar
- Brazier J, Roberts J, Tsuchiya A, Busschbach J: A comparison of the EQ-5D and SF-6D across seven patient groups. Health Econ 2004, 13: 873–884. 10.1002/hec.866View ArticlePubMedGoogle Scholar
- Rosner B: Fundamental of Biostatistics. Pacific Grove California: Duxbury Thomson Learning; 2000.Google Scholar
- Weinberger M, Oddone EZ, Samsa GP, Landsman : Are health-related quality of life measures affected by the mode of administration? J Clin Epidemiol 1996, 49: 135–140. 10.1016/0895-4356(95)00556-0View ArticlePubMedGoogle Scholar
- Colton T: Regression and correlation. In Statistics in medicine. Little, Brown and Company; 1974:307–310.Google Scholar
- Matza LS, Boye KS, Yurgin KS: Validation of two generic patient-reported outcome measures in patients with type 2 diabetes. Health Qual Life Outcomes 2007., 5:Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.