Bayesian nonparametric estimation of EQ-5D utilities for United States using the existing United Kingdom data

Background Valuations of health state descriptors such as EQ-5D or SF6D have been conducted in different countries. There is a scope to make use of the results in one country as informative priors to help with the analysis of a study in another, for this to enable better estimation to be obtained in the new country than analyzing its data separately. Methods Data from 2 EQ-5D valuation studies were analyzed using the time trade-off technique, where values for 42 health states were devised from representative samples of the UK and US populations. A Bayesian non-parametric approach has been applied to predict the health utilities of the US population, where the UK results were used as informative priors in the model to improve their estimation. Results The findings showed that employing additional information from the UK data helped in the production of US utility estimates much more precisely than would have been possible using the US study data alone. Conclusion It is very plausible that this method would serve useful in countries where the conduction of large evaluation studies is not very feasible.


Background
In the era of the preference based measures of health related quality of life (HRQoL), several multi-attribute health status classifications have been developed. Those tools include the EQ-5D [1], HUI2 and 3 [2,3], AQoL [4], QWB [5] and the most recent SF-6D [6], in addition to condition specific classifications [7]. These tools allow the person to generate a description of their health state at a given point in time, in addition to the integration of an empirically deduced health state value that could be employed to estimate the quality adjusted life years (QALYs), a commonly used effectiveness measure in a specific form of cost-effectiveness analyses; cost-utility analysis [8].
Nowadays, the EQ-5D has become one of the most commonly used health preference tool to measure HRQoL, mainly in Europe, albeit gaining popularity in North America. One of the many cardinal handiness of the worldwide utilization of the EQ-5D is the possibility of employing the results of one country to improve those of another country, and for this to enable the generation of utility estimates of the second country much more precisely than would have been possible when implementing and analyzing the country's data alone.
In a previous attempt to model health preference based data, Kharroubi et al. [9] have developed a nonparametric Bayesian method where the intrinsic characteristics of the individual health state valuation have been tackled, rendering the method more theoretically appropriate than the previously adopted conventional parametric models [6,10,11]. This method have been applied to the SF-6D UK health state preference data based on the standard gamble (SG) approach [12], and extended to address covariates [13]. Nevertheless, this work spread out extensively where it reached other countries and hence it has been adopted for the SF-6D HK and Japan valuation data [14,15], in addition to other preference based measures such as the HUI2 UK [16]. Further, it has been extended to handle the joint US-UK EQ-5D with time trade off technique (TTO) [17,18], and recently the joint UK-HK and UK-Japan SF-6D data set [19,20].
Similarly to the benefits obtained from the worldwide application of the EQ-5D, the pivotal perk of the nonparametric Bayesian approach is the possibility to use the results generated in one country as informative priors in designing the model to be implemented in another country. To our knowledge, we haven't come across any work exploring this potential benefit previously. It is likely that this kind of analysis (borrowing strength from existing countries' valuations) will prove to permit much smaller studies than have hitherto been employed when developing valuations for new countries. This will be hugely important in countries without the same capacity to conduct large scale health state valuations.
The objectives of this research are (a) to develop a Bayesian statistical method to enable evidence from one country to serve as prior information for a study in another, and (b) to apply this method to the analysis of a valuation study for EQ-5D in US using the already existing UK data.
A brief description of both the UK and the US EQ-5D valuation studies and the data set adopted is provided in Methods section of this manuscript. Then, in Modelling section, we describe the Bayesian non-parametric model implemented in the work developed by Kharroubi et al. [9] with additional implemented novelties in methodological advances deemed necessary for the development of better estimates for the preference utility function. In Results section, we present the findings obtained using the modified Bayesian method applied on the US/UK EQ-5D data sets and compare it to the results generated by the original model of Kharroubi et al. [9]). Finally, we conclude in Discussion section by discussing our results while shedding light on how they could be implicated in future uses of the EQ-5D and modelling in this field.

Methods
The EQ-5D descriptive system The EQ-5D has been developed by a group of researchers distributed over seven centers in five countries. It stands for the EuroQoL-5D or the European quality of life with 5 health dimensions being: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each dimension has 3 levels being no problem, moderate problem, and severe problem from level 1 to level 3 respectively; hence their combinations generate 243 different health states, and each state is described in the form of a five-digit code using the three levels. For example, 11,111 and 33,333 describe the best health state and worst state respectively. In addition, unconsciousness and immediate death have been added to the valuation process but not to the descriptive system in order to complete the process.

The valuation survey and data set
For use in cost-effectiveness analysis of health technologies, we need to assign a value to each health state that represents its utility, with 1 corresponds to perfect health, 0 corresponds to being dead and negative scores corresponds to health states judged worse that being dead. Those utility indexes have been obtained through the survey developed by the UK Measurement and Valuation of Health (MVH) group at the University of York, using a variant of the visual analog scale (VAS) and the time trade off (TTO) techniques [21]. A representative sample of the UK general population were interviewed in their own homes using TTO and VAS, where they were asked to value 12 health states. A total of 42 health states of the EQ-5D (excluding full health) were valued in this way. The sample was selected using a stratified random sampling method to ensure a balance of very mild (e.g. 11,112, 11,211, 21,111 …), mild (e.g. 11,122, 11,113, 12,121 …), moderate (e.g. 13,212, 12,222, 22,222 …) and severe (e.g. 33,232, 32,223, 23,313 …) states. A detailed description of the study is provided elsewhere [10].
The EQ-5D valuation study in the US used the same states and valued them using the same methods, however shifting from the simple sampling design adopted in the UK study. In fact, the research group created a 4-stage cluster sampling design, focusing on 2 of the largest minority groups, the Hispanics and non-Hispanic [22]. The UK study interviewed a total of 3395 individuals with a response rate of 64%, while the US study interviewed a total of 4048 individuals with a response rate of 59.4%. However, after excluding respondents with incomplete or inconsistent responses, the usable valuation data ended up being obtained from 2997 and 3773 respondents from UK and US respectively, where both samples are representative of their populations on the sociodemographic and economic levels [10,22].
Both the US and UK studies used the TTO method [10,22] for eliciting health state value. Briefly, respondents were asked to bracket the number of years x (x ≤ 10) spent in full health which they value equivalent to 10 years spent in the state in question, for states considered better than death. Hence, the smaller the degree of indifference, the state is regarded as worse. For states considered worse than being dead,, respondents were asked to decide whether they preferred immediate death or spending (10-x) years in that state followed by x in full health. Then, in order to compensate for a shorter period in the state in question, more time is needed in the perfect health state 11,111 when x increases, thus indicating a worse health state. Afterwards, scores have been transformed based on the formula x/10, for states regarded as better than death, and -x/10, for states considered worse than death, in order to bound them on the scale −1 to 1, with x being the time spent in full health [10,21].
Another difference between the UK and US studies is the allocation of the 42 health states across respondents. In the UK study, 41 health states (excluding 33,333) were spread over 4 groups based on the severity of the problem, where each individual was randomly assigned 11 states with varying severities (2 very mild, 3 mild, 3 moderate, and 3 severe) in addition to the 33,333 state. Whereas in the US study, respondents were randomized to obtain 1 of the 5 groups of predefined health states, where 4 groups were considered as the modelling sample and they each included: the worst state 33,333 in addition to 11 randomly selected health states (2 very mild states and 9 states selected from the remaining 36 states). As for the 5th group, the validation sample, it included the 33,333 health state and 11 health states randomly selected from the remaining 41 states. Further difference was that the interviews in UK were conducted in English, while in the US it has been done in either English or Spanish. Both valuation studies have been previously described in details [10,22].

Modelling
The preference based health state measure, EQ-5D, provides 243 possible health states, in the time when the empirical survey conducted could only gather a valuation for a small subset. Therefore, the purpose of modelling is to estimate health state utility values for all the EQ-5D states based on the 42 valued states. Parametric models with assumptions about the functional form have been implemented earlier. However, in their works, Kharroubi et al. [9] have developed a more realistic and flexible Bayesian non-parametric model to contrast the long used parametric method. In the next section, we will review the Bayesian non-parametric model created by Kharroubi et al. [9], which we shall refer to henceforth as the K-Model. This will form the basis of the development of the modified model in the following section. The latter will be referred to as the KM-Model.

The K-Model
Kharroubi et al. [9] propose the following model where i = 1,2,…,I j and j = 1,2,…,J, x ij is the ith health state valued by the respondent j, y ij is the dependent variable representing the TTO valuation given by the respondent j for the specified health state, α j is a random respondent residual term and ε ij is a zero mean random error term. Kharroubi et al. [9] proposed the following distributions α j ∼LNðt T j θ; τ 2 Þ and ε ij ∼Nð0; υ 2 Þ: where t j is the vector of covariates for respondent j. Kharroubi et al. [9] next model the prior distribution for u(x) as follows: Given Eq. (2), it is worth noting that x is a vector consisting of discrete levels on each of the five health dimensions. In addition, the mean function of (2) represents a belief that the predicted utility will be roughly linear and additive in its different dimensions, whereas the parametric model would have imposed the assumptions of linearity and additivity. In fact, the function in our model is free to vary around the mean based on its multivariate normal distribution, hence, taking unconditionally any functional form suggested by the data. Based on the latter difference, the model is described as nonparametric, rendering it more realistic and appropriate. For instance, if the data are strong, then they will overrule the prior expectation. However, from a practical point of view, the data will be less strong, thus the prior model will smooth the empirical relationship suggested by the data towards the form suggested by the mean function of Eq. (2). More details on this are given in Kharroubi et al. [9].
Additionally, the values of u(x) and u(x ′ ) for two separate states x and x ′ have a correlation c(x, x ′ ) that decreases as the distance between x and x ′ increases, and is defined as where for d = 1,2,…,5, x d and x ′ d are the levels of dimension d in the health state x and x' respectively, and b d is a roughness parameter in the dimension d which controls how well the true utility function is expected to adhere to a linear form in a dimension d. This function has been employed to assert that if the states x and x' are very similar (their levels are close in all dimensions, hence they might be adjacent), their utilities will be almost the same, thus the preference function varies smoothly with the shift in the health state. Kharroubi et al. [9] provide a more thorough explanation about this specific point.
Finally, it's noted by Kharroubi et al. [9] that the population mean utility for a given health state x is defined as follows where E(α) is the mean value of α over the whole population. This will only be equal to 1 if the mean and median are the same, which is not generally the case. More details on the evaluation of E(α) are given in Kharroubi et al. [9].

The KM-Model
In this section, we further elaborate the non-parametric model to include the existing UK results elicited from the K-Model as informative priors, in the aim to improve the accuracy of the predictions of the US population utility function.
As the case of the K-Model, the ith valuation provided by respondent j in the US study is modelled as followsỹ where ε ij is the error term having a distribution as ε ij~N (0, ῦ 2 ) and ᾶ j is the random respondent effect.
We next assumeũ x ð Þ to be the utility function of health state x evaluated in the US study, then based on Eq. (2), the prior distribution forũ x ð Þ is multivariate normal as well, with mean defined as and variance-covariance matrix where E(u(x)) and cov(u(x), u(x ′ )) are the mean health utility of health state x and the variance-covariance matrix of u(x) and u(x ′ ) respectively, obtained from the analysis of the existing UK data, and c(x, x ′ ) is the correlation betweenũ x ð Þ andũ x ′ ð Þ defined analogously to Eq. (3). Notice in general that, in addition to the advantages discussed in The K-Model section, the new modelling of the utility functionũ x ð Þ allows the existing data in one country to contribute substantial prior information to the analysis of the study in another country. Thus, the inclusion of E(u(x)) and cov(u(x), u(x ′ )) in the mean function and the variance-covariance matrix of the prior distribution forũ x ð Þ is more likely to produce estimation in the US much more precisely than would have been possible without it i.e. using the US data alone.
Finally, as noted by Kharroubi et al. [9], the population mean utility for a given health state x is defined as follows where E(ᾶ) is the mean value of ᾶ over the whole US population. This will only be equal to 1 if the mean and median are the same, which is not generally the case. Therefore, the population's mean health state utility ofũ x ð Þ is not the same as the median health state utilityũ x ð Þ. General theory and full technical details of the new Bayesian statistical model in this article are given in Kharroubi [23]. Programs to undertake the Bayesian approach were written in Matlab. We will be pleased to supply the Matlab codes on request. However, these codes are not general and the user will need to modify them for his/her own purposes.

Results
We now apply the KM-model to the analysis of a valuation study for EQ-5D in the US using the already existing UK data. The posterior distribution of the UK utility function will be used as a prior distribution to analyze the study in the US. This will be compared to the analysis of the US data alone using the original K-model. The two models are compared in terms of their predictive ability, including plots of predicted to actual values, calculations of the root mean squared error (RMSE) and plots of the standardized residuals and the Bland-Altman agreement plots. These assessments are undertaken within the full estimation sample and in an out of sample random selection of 3 states by re-estimating the models using data sets excluding these 3 states.
The two models are compared in terms of their predictive ability in Figs. 1 and 2, where the predicted and actual mean values for the 42 health states valued in the survey together with the full health, ordered according to the predicted values. Figure 1a shows the US predicted mean health state valuations (squared line) using the K-Model, along with the actual mean health state valuations (diamond marked line), in addition to the errors computed by calculating the difference between the two valuations (triangles marked line). Whereas Fig. 1b reflects the results obtained using KM-Model, corresponding to the US data having UK results as informative priors. When comparing the plots, it is clear that the KM-Model predicts the data quite well, and better than the K-Model for all health states. In particular, although the KM-Model provides marginally better predictions for moderate health states, it produces quite well predictions for the mild (health states 2-9 on the graph) and severe (health states 39-43 on the graph) health states. Moreover, the plots reflect a larger difference between the valuations for the K-Model, indicating that the KM-Model is less prone to systematic bias.  For an improved quantification of the gains in terms of bias, Fig. 2a shows the Bland-Altman agreement plot [24], where the difference scores of the predicted and actual mean health state valuations are plotted against the average scores of the two valuations for the K-Model. The solid line represents the average bias (or the average of the differences) and the dotted lines are the 95% limits of agreement. Figure 2b presents the corresponding agreement plot for the KM-model. The plots suggest that the KM-Model shows a better agreement since the width of the 95% limits of agreement is equal to 0.096 (0.085 -(−0.011), which is narrower than that of the K-Model, which is equal to 0.109 (0.105 -(−0.004)). In addition, the difference in average bias between the KM-Model and K-Model is 0.037 and 0.051 respectively, where the difference for the KM-Model is smaller. Similarly, the standard deviation of the differences for the KM-Model is also smaller than that of the K-Model with the respective values of 0.024 and 0.028 respectively, thus justifying the large variations of the differences in Fig. 2a. In contrast, it is clear from Fig. 2b that the KM-Model differences are well validated.
The inferences for the mean health state utility values of the 42 states valued in the valuation survey are shown in Table 1. For each state, the predicted mean, standard deviation and corresponding 95% credible interval for the population mean health state utility from both models, in addition to the results for the population mean health state utility from the UK study that were used as informative priors in the KM-Model. Notice that some of these states (those marked with an asterisk) were randomly selected from the 200 remaining EQ-5D health states (excluding full health) that were not used in the experimental study and so estimates are being derived from the fitted model. As can be seen, throughout the 42 states (omitting the perfect health), the KM-Model proved to serve as a better predictive tool with a root mean square error (RMSE) of 0.044 versus 0.058 for the K-Model.
Other significant differences between the models are clearly reflected in Table 1. For instance, the pits state has a predictive utility of −0.3082 from the K-Model and −0.3114 from the KM-Model, whereas the observed value is −0.346. Moreover, the standard deviations of the KM-Model are smaller since it employs the UK results as priors, hence the better estimation. Other spotted performance differences between the models include monotonicity. In fact, out of the 243 adjacent health state pairs, non-monotonicity is observed by 20% of the cases in the K-Model, while the rate is 10% in the KM-Model.
A clearer representation of the differences is reflected in Fig. 3, which shows the predicted values using the K-Model and KM-Model against observed mean values of the 42 health states, in addition to the perfect predictions indicated by a 45-degree line of unity (solid line). In theory, we would expect the predicted values from the two models to lie roughly on the perfect predictions line. Despite the good validation of the models by their predictive performance, Fig. 3b shows the predictions from the KM-model to be closer to the theoretical line, as opposed to Fig. 3a, which shows a larger scatter of the deviating points from the solid line. Therefore, we could stress the fact that the KM-Model produces better predictions.
Another aspect to verify the validity of the models is through plotting histograms of standardized residuals. Thus, Fig. 4a and b present histograms for the standardized residuals across all 45,276 valuations for the K-Model and KM-Model respectively. Theoretically, we would expect these to follow a N(0,1). In practice, the theory is generally supported by both figures, although  To this end, we perform an out-of-sample leave-oneout prediction at the level of health states. This is done by sequentially removing three of the 42 observed health states from both sets of data, fitting the model to the remaining 41 (using the UK 41 as informative priors), then making a prediction for the left out state in the US data. Table 2 displays the resulting out-of-sample predictions from both models, along with their observed mean values from the valuation study. Results showed that the KM-Model overtakes the K-Model in its predictive performance, with a RMSE of 0.0274 for the KM-model compared to 0.0358 for the K-Model. It is worth noting finally that the posterior standard deviations in Table 2 are larger than those in Table 1 since this analysis was conducted on an out-of-sample data, whereas previously, we were predicting pre-estimated data.

Discussion
In this paper we have developed a Bayesian statistical method for estimating the utility values of health states defined by the EQ-5D generic descriptive system, in order to generate QALYs and hence to conduct cost utility analysis of health care interventions. The new method enables evidence from one country to serve as prior information for a study in another. We have also applied this method to the analysis of a valuation study for EQ-5D in United States using the already existing United Kingdom data. The methodology builds on a successful Bayesian nonparametric modelling of the UK EQ-5D valuation data. The posterior distribution of the UK utility function was used as a prior distribution to analyze the new study in the US.
We have shown that the new modelling of the utility function allowed the UK data to contribute substantial prior information to the analysis of the US study. As a result, the US utilities for the 42 EQ-5D health states were estimated much more precisely than would have been possible using the US study data alone, yet respect the inherent monotonicity of the underlying utility measure even further. Careful model checking, including prediction of left-out data, confirm that the new KMmodel fits well and better than that of the K-model.
The novel part of the analysis was to make use of experience in one country to help with the analysis of a study in another. There is also a scope to make use of this to help with the design of a study in another, and for this to enable a smaller sample to be used in the new country. The choice of new health states to be valued in a follow-on study was made to provide information primarily about parts of the EQ-5D descriptive space of 243 health states that were less estimated from the US study alone and the UK evidence. Work is in progress on demonstrating this idea in the context of a smaller country.
In the analysis presented here we have data on two countries that are culturally similar (UK and US). The results suggest that drawing extra information from the UK produces better estimation of the US utilities than using the US data alone. The next thing would be worthwhile to explore is the use of our model when we have data on two countries that are sufficiently different. Work is also in progress on exploring whether using the UK data might help with the design and analysis of a valuation study for SF-6D in Hong Kong.
Health technology assessment is an international endeavor, with pharmaceuticals clinical trials are being conducted in different countries. The World Health Organization undertakes cost-effectiveness analysis of interventions across national boundaries. As more statutory funding bodies around the world demand costeffectiveness assessments, effectiveness analysis will become more international with syntheses of data across countries. A key element in this will be the valuation of health states in order to calculate QALYs. Thus, precise estimation of health state utility values is an important component of this. For instance, the K-Model estimates the health utility for state 33,323 to be −0.2184, whereas the KM-Model achieves −0.1894, and so the difference in utility estimates is almost 0.03. This could result in an increase in QALYs from a treatment that extends life by  The modified model presented here is also applicable to other preference-based measures such as SF-6D and HUI, as well as to more condition specific preferencebased measures. Ongoing work is on the application to SF-6D measure. Matlab code for undertaking the KMmodel and K-model is available upon request.

Conclusion
In conclusion, the new Bayesian statistical method is a powerful technique that might be applied to design and analyze health-related quality of life utility valuation studies for wide range of health state descriptive systems when data already exist in another country. It is likely that this kind of analysis (borrowing strength from existing countries' valuations) will prove to permit much smaller studies than have hitherto been employed when developing valuations for new countries. The implications of these results will be hugely important in countries without the same capacity to conduct large scale health state valuations.