Predicting EQ-5D-3L index value from the PROMIS-29 prole for the United Kingdom, France, and Germany

Background: EQ-5D health state utilities (HSU) are commonly used in health economics to compute quality-adjusted life years (QALYs). The EQ-5D, which is country-specic, can be derived directly or by mapping from self-reported health-related quality of life (HRQoL) scales such as the PROMIS-29 prole. The PROMIS-29 from the Patient Reported Outcome Measures Information System is a comprehensive assessment of self-reported health with excellent psychometric properties. We sought to nd optimal models predicting the EQ-5D from the PROMIS-29 in the United Kingdom, France, and Germany and compared the prediction performances with that of a US model. Methods: We collected EQ-5D-5L and PROMIS-29 proles and three samples representative of the general populations in the UK (n=1,509), France (n=1,501), and Germany (n=1,502). We used stepwise regression with backward selection to nd the best models to predict the EQ-5D from all seven PROMIS-29 domains. We investigated the agreement between the observed and predicted EQ-5D in all three countries using various indices for the prediction performance, including Bland-Altman plots to examine the performance along the HSU continuum. Results: The EQ-5D was best predicted in France (nRMSE FRA = 0.075, nMAE FRA = 0.052), followed by the UK (nRMSE UK = 0.076, nMAE UK = 0.053) and Germany (nRMSE GER = 0.079, nMAE GER = 0.051). The Bland-Altman plots show that the inclusion of higher-order effects reduced the overprediction of low HSU scores. Conclusions: Our models provide a valid method to predict the EQ-5D from the PROMIS-29 for the UK, France, and Germany. Prole to assess seven core domains of each with four physical function, fatigue, pain interference, anxiety, depression, sleep disturbance, and the ability to participate in social roles and activities (referred participation plus the visual analogue scale (VAS) expressing pain intensity on a scale ranging from 0 to 10(28). PROMIS-29 has, compared to other short forms, enough items to achieve a sucient degree of precision while maintaining a reasonable response burden. Items are measured on ve levels (e.g. “sometimes”, “always” or “not at all”, “a little bit”, “somewhat”, “quite a bit”, and refer to the past 7 days (except physical function). Answers yield a number from one to ve, which, once fed into the online PROMIS converter (http://www.healthmeasures.net/score-and-interpret/calculate-scores), give one correspondent PROMIS T-Score (M 50 ± SD per domain with the US general population as a reference. Note that due to the invariance property of IRT, T-Scores obtained from the PROMIS-29 are on the same metric as the scores Revicki used in his analysis, though these scores were generated using different items. For desirable constructs (e.g., physical function), higher T-scores indicate better health, whereas for undesirable domains (e.g., depression), higher T-scores indicate poorer health states. EQ-5D-3L patient-reported HRQoL questionnaires. these questionnaires, preference-based generic EQ-5D-5L EQ-5D-3L index value, can be derived(4–7,26). Five health dimensions are involved: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each dimension of the EQ-5D-3L has three levels (i.e. response options): problems” (or 1), “Some problems” (2), and “Extreme problems” (3), dening 3 5 or 243 different health states. Each dimension of the EQ-5D-5L has ve problems” (or problems” (2), problems” (3), problems” (4), and problems” (5), 5 or different health 5L version can differentiate more health states and is more sensitive than the 3L version, is EQ-5D-5L questionnaire our


Background
Quality-adjusted life years (QALYs) are routinely used in cost-utility analyses (CUA) to evaluate the economic effectiveness of health care innovations or interventions (1). QALYs are of particular importance in health technology assessments (HTAs) (2). For example, the National Institute of Health and Clinical Excellence (NICE) in England and Wales has endorsed QALYs to compare health care interventions from an economic perspective (1). In light of budget constraints in publicly funded health care systems, QALYs serve as a benchmark for the allocation of scarce resources in a way that maximizes utility to individuals and to society (2).
A QALY is de ned as the product of the number of life years and a health state utility (HSU) score that represents the value of a particular health state. HSU values can at best achieve a value of 1 (full health). A value of 0 is considered dead and health states with a negative value are considered worse than dead. Individual HSU scores are patient-reported, generic, preference-based measures of health-related quality of life (HRQoL) (3). The most frequently used generic HRQoL measure is the EuroQoL EQ-5D-5L or EQ-5D-3L index value (in the remainder of this paper referred to as EQ-5D), differentiating up to 3125 (i.e., 5 5 ) health states. The EQ-5D is the default HSU score for economic evaluations demanded by HTA agencies such as NICE (4)(5)(6)(7).
The Patient Reported Outcome Measurement Information System (PROMIS), on the other hand, is increasingly used internationally to measure clinical and condition-speci c, non-preference HRQoL for its favourable psychometric properties: high validity, high reliability, high precision, and exible administration (8,9). PROMIS is a common metric for a large variety of different health domains, aiming at comprehensive assessment, standardization and integration of different measures and items. It constitutes a collection of generic and condition-speci c, non-preference-based patient reported outcome measures (PROMs) that have been developed using item response theory (IRT) (10). For each PROM, so-called item banks have been developed comprising items that are highly informative regarding the PROM to be measured and that do not function substantially different across the most prominent demographic groups (e.g., women and men) (11,12). These item banks can be used to develop tailored short forms or for computerized adaptive testing (CAT) (13). PROMIS overcomes signi cant limitations of legacy instruments such as ceiling effects and is, being translated to many languages and showing invariance to nationality, becoming the international reference measurement approach to PROMs (9,(14)(15)(16).
For economic evaluations, the preference-based EQ-5D is best obtained directly using the EQ-5D-3L or EQ-5D-5L questionnaire. If direct assessment is not available, a common strategy is to estimate HSU scores by using a mapping algorithm from a non-preference-based PROM such as PROMIS (14,(17)(18)(19)(20). Little consensus exists on which mapping method is the most appropriate. In a recent systematic review, 147 studies mapping the EQ-5D were identi ed (17). In more than 75% ordinary least squares (OLS) linear regression was used. Although OLS linear regression showed robust results compared to alternative methods, it has several drawbacks (21,22): First, predicted HSU scores may fall outside the possible range of the metric (i.e., values greater than one). Second, the relationship between non-preference-based PROM and HSU might be non-linear, meaning that the impact of health domains differs across the HSU continuum (22).
As PROMIS is increasingly used in clinical, non-preference HRQoL measurement and the EQ-5D is the required preference-based measurement for economic evaluations, developing a mapping between these two would open the perspective to use PROMIS for economic evaluations. As both are multidimensional generic HRQoL measures covering similar dimensions or domains (EQ-5D mobility and EQ-5D self-care vs PROMIS physical function, EQ-5D pain/discomfort vs PROMIS pain interference, EQ-5D anxiety/depression vs PROMIS anxiety or PROMIS depression, EQ-5D usual activities vs PROMIS ability to participate in social roles and activities), we can reasonably assume conceptual overlap, as previous mappings have as well (19,20).
Mapping PROMIS to EQ-5D also opens a perspective for the use of other PROMs in economic evaluations: Because of its invariance property, PROMIS domains can also be measured using items from a different condition-speci c measure that is anchored to the PROMIS metrics. For example, items from self-reported anxiety measured by MASQ, PANAS and GAD-7 are anchored on the PROMIS Anxiety metric (23). Items from the BDI-2, CES-D, and PHQ-9, measuring depression, are anchored on the PROMIS Depression metric (24). Therefore, mapping from PROMIS T-scores to EQ-5D enables the mapping of a broad range of PROMs to the EQ-5D via PROMIS.
Using OLS linear regression on US data , Revicki (2009) estimated a model to predict EQ-5D from ve PROMIS T-scores (19): physical function, fatigue, pain interference, anxiety, and depression. For this PROMIS domain model, Revicki reports that approximately 57% (adjusted R 2 ) of the variance in EQ-5D can be explained by the variables in the model, and the intraclass correlation coe cient (ICC) is 0.73. Furthermore, 95% of all the residuals are between -0.20 (2.5%) and 0.15 (97.5%). The relatively small width of these so-called empirical limits of agreement (LoA) is indicative of an appropriate tted model. However, Revicki also reported that the model does not work very well for low levels of health (EQ-5D < 0.40). Revicki used the EQ-5D-3L questionnaire and applied the US EQ-5D-3L value set by Shaw (2005) (25). As health preferences differ between countries, the EQ-5D and mappings are country-speci c (26,27). Revicki's model can therefore only be used to predict the EQ-5D from PROMIS in the US. Therefore, the primary aim of this study is to develop mapping functions from PROMIS-29 to the EQ-5D for the UK, France, and Germany so that PROMIS can be used for economic valuations in these countries. For each health domain, we explored the form of its relationship with the EQ-5D and examined whether these relationships would be the same across the three countries under investigation. Also, we aimed at improving prediction performance by including higher order coe cients. Furthermore, we investigated whether the optimal models would be structurally equivalent across countries and compared prediction performance of our models to Revicki's model.

Samples
Data were collected online by an independent polling company (Ipsos) in April and May 2015. Quota sampling was employed to obtain samples representative of the general population with respect to sex, age, occupation, region, and population density of the UK (n=1,509), France (n=1,501), and Germany (n=1,502). Sample weights were calculated using the random iterative method (RIM) to match the latest data available in each country (census 2011 for the UK and Germany, census 2012 for France).
Participation in our general population samples was voluntary and data protection laws obeyed by Ipsos. If a respondent chose to drop out at some point, the data given until that point was not included. As skipping items was not possible, there were no missing data.

PROMIS domains and item banks
We used the PROMIS-29 v2.0 Pro le to assess seven core domains of health, each assessed with four items: physical function, fatigue, pain interference, anxiety, depression, sleep disturbance, and the ability to participate in social roles and activities (referred to as participation in the remainder of this article) plus the visual analogue scale (VAS) expressing pain intensity on a scale ranging from 0 to 10(28). PROMIS-29 has, compared to other short forms, enough items to achieve a su cient degree of precision while maintaining a reasonable response burden. Items are measured on ve levels (e.g. "never", "rarely", "sometimes", "often", "always" or "not at all", "a little bit", "somewhat", "quite a bit", "very much") and refer to the past 7 days (except physical function). Answers yield a number from one to ve, which, once fed into the online PROMIS converter (http://www.healthmeasures.net/score-and-interpret/calculate-scores), give one correspondent PROMIS T-Score (M = 50 ± SD = 10) per domain with the US general population as a reference. Note that due to the invariance property of IRT, T-Scores obtained from the PROMIS-29 are on the same metric as the scores Revicki used in his analysis, though these scores were generated using different items. For desirable constructs (e.g., physical function), higher T-scores indicate better health, whereas for undesirable domains (e.g., depression), higher T-scores indicate poorer health states.
An earlier analysis of the data used in this study revealed that scores on the seven health domains of the PROMIS-29 are measurement invariant across the UK, France, and Germany except for one item (33).
Note that people in different countries value health states differently, so both EQ-5D index values are country-speci c (25,26,(34)(35)(36). They can be derived from EQ-5D-5L questionnaire using either the crosswalk to the 3L value set or using the new 5L value sets (26). Crosswalks to the 3L value sets are available for the UK, France, and Germany (4,26). A 5L value set is available for Germany (35). There is also one for England, which is not equivalent to our sample of the UK, and none yet for France (36,37). We therefore used the 3L crosswalk set for all three samples, thereby ensuring comparability among our samples and to Revicki's model, which used the 3L value set for the US (19,25,26).
The value assigned to each of these health states with the 3L value set is determined using time trade-off (TTO) and visual analogue scale (VAS) as preference elicitation methods (4,26). The maximum value for the best health state of 11111 is 1.00 or "full health" while 0.00 is considered "dead".
The minimum value of the worst health state of 55555 is negative, then considered "worse than dead": -0.594 in the UK, -0.530 in France, and -0.205 in Germany. In the remainder of this paper, when referring to our EQ-5D-3L index value, we use the term EQ-5D.

Statistical analysis 2.3.1 Relationships among individual health domains and health state utility across the UK, France, and Germany
To obtain a rst impression of the form of the relationships among individual health domains and HSU and to judge whether the relationships are stable across the three countries under investigation, we plotted the seven domain scores against HSU in the UK, France, and Germany.

Optimal models for predicting health state utility in the three countries
We applied stepwise regression with backward selection to nd the best models to predict the EQ-5D for the UK, France, and Germany, starting with full models that incorporated linear, quadratic, and cubic effects for all seven PROMIS-29 domains. We included polynomials up to the third degree as we expected that such polynomials can more exibly t the observed data, e.g. in case of nonlinear relationships between predictors and outcome.
We used raw polynomials for linear, quadratic and cubic effects in order to obtain coe cients which can be used for prediction independently.
Because sociodemographic factors such as age and sex are known to be useful in predicting HSU, they were also entered as possible predictors (17).
The PROMIS pain intensity VAS was not included as pain is already covered by the pain interference domain, which proved to be superior than the VAS (38). Also, while all other domains comprise of 4 items, the pain intensity domain within PROMIS-29 has only this single item, not measured on a T-Score metric.
The Bayesian information criterion (BIC) was used to steer the inclusion and exclusion of predictors in the stepwise regression analyses (39). We chose nRMSE and nMAE as measures of the prediction precision and bias as they are preferred over either R 2 or BIC used by Revicki(19,40). The nRMSE is the normalized root of the sum of the squared residuals between observed and predicted scores and the nMAE is the normalized mean absolute error of the absolute residuals. Both are normalized with respect to the different scale ranges of the EQ-5D in the UK, France, and Germany. We also determined the width between the 95% empirical limits of agreement and compared them to the 95% theoretical limits of agreement (i.e., ± 1.96 * SD(residuals)). To check the prediction performance along the HSU continuum, Bland-Altman plots were used.
We use cross-validation to check for over tting (41). With this in-sample cross-validation technique, the initial dataset is randomly split into 10 subsamples of approximately equal size. One of these subsamples is kept for validation, while the other nine subsamples are used for parameter estimation. This process is repeated ten times, and the results are averaged across repetitions. Over tting would show when a model's nRMSE is substantially smaller than the average nRMSE of the models of the 10 subsamples.
We used R version 3.4.1, IBM SPSS Statistics version 23, and Microsoft Excel version 15 to run the analyses.

Impact of misspeci ed mapping functions on the prediction performance
To the best of our knowledge, as of September 2020, the mapping function by Revicki was the only one available for predicting the EQ-5D from the PROMIS-29 T-scores (19): EQ-5D=1.0266+0.0077*Physical Functioning-0.0021*Fatigue-0.0040*Pain Interference-0.0023*Anxiety-0.0022*Depression We were interested in quantifying the detrimental effect of applying this foreign mapping function to the data collected in Europe. Note that application of Revicki's model to the data collected in the UK, France and Germany (i) disregards the country speci city of the EQ-5D, (ii) does not utilize the potential predictive value of the two PROMIS-29 health domains not used by Revicki, (iii) does not take higher-order effects into account, and in combination with the foregoing, (iv) disregards country dependency of the form of relationships (i.e., the speci c values of the regression coe cients used).
Because we were also interested in which factor is mainly responsible for the differences in prediction performance, we moved stepwise from Revicki's model to our models as follows: First, we used the ve health domains of Revicki's model, but with regression coe cients optimized towards the data collected in each country separately. Second, we investigated the incremental value of adding either sleep disturbance, participation, or both to the prediction equation. Third, we allowed for incorporation of quadratic and/or cubic effects.

Sample characteristics
We only brie y summarize the most important differences between the three samples here. The interested reader is referred to for a comprehensive overview of the marginal distributions of sex, age, educational level, occupational status, and income in the three samples.
Participants in the German sample (mean age = 50.0 years old) were slightly older than participants in the French (48.4 years old) and UK samples (47.8 years old). Participants in the German sample were more likely to have a low educational background (23.4%) than participants in the French (7.6%) and UK samples (8.1%). Participants in the French sample were more likely to be unemployed/inactive (48.4%) than participants in the German (41.5%) and UK samples (39.4%).

Relationships among individual health domains and health state utility across the UK, France, and Germany
The relationships among the seven PROMIS domains and HSU expressed by the EQ-5D score in the three European countries are displayed in Figure   1.  Recall that we used stepwise regression with backward selection to nd optimal models for predicting the EQ-5D scores for the UK, France, and Germany. The primary models thus comprised linear, quadratic, and cubic effects for each PROMIS domain plus effects for age and sex. Effects that did not signi cantly improve the prediction performance were sequentially removed from these models. The coe cients of the nal models to optimally estimate the EQ-5D from PROMIS-29 for the UK, France, and Germany can be found in table 1. Coe cients are displayed as negative exponentials with four digits, beginning with the rst non-zero digit of the coe cient. HSU is expressed on a scale ranging from -0.594 (UK), -0.53 (France), and -0.205 (Germany) to 1, and the PROMIS domains are expressed as T-scores (M=50). All the coe cients displayed differ signi cantly from zero at p < 0.01.
First, the regression coe cients of the higher-order effects appear to be much smaller than those for the linear effects, as the values of the predictor variables (with M=50) are taken to the power of two for the quadratic effects (M 2 =2,500) and to the power of three for the cubic effects (M 3 =125,000). Hence, coe cients have a substantially larger impact on the scale of the criterion.
Second, the single standardized regression coe cients shown in table 1 should not be used to infer the form of the relationship between the individual health domains and the EQ-5D because we have up to three effects (linear, quadratic, and cubic) in each health domain, and the relationship thus must be described by the summed effect of all three effects. Furthermore, not all coe cients are in agreement with gure 1 which plotted the relationship of a single health domain to the EQ-5D, irrespective of the values in all the other health domains. Instead, the regression coe cients are optimal given the effect of all the other effects already taken into account (stepwise procedure), which also explains why the nal models in the three countries are so different. Age, for example, has a positive effect on HSU in the UK, a negative effect on HSU in France, and no effect on HSU in Germany. Although out of the 23 possible predictors twelve (UK and France) and ten (Germany) were kept in the nal models, only four effects were consistently chosen across countries: the linear effect of participation, the quadratic effect of physical functioning, and cubic effects of depression and pain interference.
The prediction performance of these models is summarized in table 2. HSU expressed by the EQ-5D can be best mapped from the PROMIS-29 in France (nRMSE FRA = 0.075, nMAE FRA = 0.052), followed by the UK (nRMSE UK = 0.076, nMAE UK = 0.053) and Germany (nRMSE GER = 0.079, nMAE GER = 0.051). Furthermore, for all three countries, the widths of the empirical limits of agreement are always smaller than the widths of the theoretical limits of agreement. All models were con rmed by 10-fold cross-validation, having a marginally smaller nRMSE and nMAE compared the mean nRMSE and mean nMAE, respectively, of the 10 models of the cross-validation subsamples. The prediction performances of the nal models along the HSU continuum are depicted in the Bland-Altman plots in gure 2. Note that especially in the German sample, there are not many respondents with low HSU (EQ-5D < 0.2). Furthermore, prediction performance appears to be slightly better for high levels of HSU (EQ-5D > 0.8) than for intermediate or low HSU.

Impact of misspeci ed mapping functions on the prediction performance
The differences in the prediction performances between the applications of Revicki's model versus our models are depicted in table 3. The application of Revicki's model to the European data would systematically underestimate the EQ-5D for the UK (-0.10) and for France (-0.09) but not for Germany.
The prediction performance of Revicki's model is the best in Germany, and the differences in the prediction performances between Revicki's and our mapping functions are smaller in Germany than for the UK or for France, as indicated by the values of the nRMSE, nMAE, and empirical LoAs. The last step was to investigate which factor was mainly responsible for the observed differences in the prediction performances between Revicki However, neither this model (M1) nor the incorporation of sleep disturbance and/or participation (M2c) improves the prediction performance for low levels of HSU, but the incorporation of quadratic and cubic effects (M3) does improve the prediction performance for low levels of HSU. That is, overprediction of these health states is clearly reduced by adding these higher-order effects to the three regression equations.

Summary of main ndings
We developed optimal models for mapping the EQ-5D from the PROMIS-29 in the UK, France, and Germany. Furthermore, we showed that the incorporation of higher-order effects into the regression equations substantially reduced overestimation of low HSU. The EQ-5D can therefore now be predicted from PROMIS-29 in three major European countries for QALY in CUA for HTA assessments, enabling the use of PROMIS for economic evaluations in Europe. This is of practical importance since HTA agencies demand the EQ-5D as HSU for QALY and PROMIS is more frequently used in clinical, non-preference HRQoL. We believe our models are highly applicable achieving a good degree of precision, also in lower spectrums of health, while at the same time avoiding high complexity with a manageable number of predictors. Our results in terms of the nRMSE and nMAE perform very well compared to what is usually reported for mapping algorithms (17,(42)(43)(44)(45)(46).
The major comparator to our models is Revicki's OLS linear US model, the only one predicting the EQ-5D from PROMIS-29. All our models perform better in terms of R-squared and ICC while the LoA were comparable. Revicki did neither report MAE nor RMSE. Furthermore, Revicki's uses the EQ-5D with the US value set as target measure, while we use the value sets from the UK, France, and Germany, respectively. We demonstrated that the application of Revicki's US model to European data will yield biased results, especially for poor health states. However, this model performs well in upper ranges of health. One might therefore consider using a foreign model with domestic data as a second-best option to predict the EQ-5D for QALY in CUA if a country-speci c mapping algorithm is not available, especially in a group of healthier patients. This decision might make sense, for example, when using our German model for Austrian data in or using Revicki's US model for Canadian data, since in both cases, cultural proximity can reasonably be assumed.
Apart from Revicki's model predicting the EQ-5D from PROMIS-29, there is also another model of his, predicting the EQ-5D from PROMIS Global Health (GH) items, using linear regression in a US sample (19). Thompson (2017) mapped PROMIS-GH to the EQ-5D in a US sample applying linear and equipercentile equating, treating PROMIS-GH items as categorical variables (20). So compared to our models, both models differ in respect of population, source measure, and target measure: They use the US value set for the EQ-5D while we use the ones for the UK, France, and Germany, respectively. Thompson's models additionally differs in the mapping method applied. In terms of R-squared, our model for Germany performs at least as good and our models for the UK and France perform better than both Revicki's and Thompson's PROMIS-GH models. In terms of MAE, all our models perform better. Despite Thompson's the different method, low EQ-5D scores where still overestimated (20). Both studies did not report a RMSE.
Generally however, researchers should be aware that the consequences of working with a suboptimal mapping algorithm can be substantial: incremental cost-effectiveness ratio (ICER) of costs per QALY can differ between British pound sterling (GBP) 18,000 and GBP 32,000 depending on what mapping algorithm is used (47). NICE has adopted a threshold of GBP 30,000 per QALY representing the public's maximum additional willingness to pay for a new treatment or a new drug compared to the existing standard of care(48). Consequently, imprecise mapping methods have a great impact on CUA in HTA assessments and consequently on what innovations are made available to patients.

Strengths and limitations
This study was conducted using three large samples representative of the general population in three European countries. To ensure comparability, the sampling strategies were the same across countries. This strength of our study is directly related to its foremost weakness: Severe health states are not frequently observed in the general population, and the proposed models therefore rely on few observations for low health states. Furthermore, our models allowed judgement of the incremental value of incorporating two additional health domains and higher-order effects for HSU prediction.
Finally, some authors have argued against OLS regression as a type of mapping method even though, as outlined above, it is the most widely used method. First, arguments against that method are due to the phenomenon of regression to the mean. Second, linear regression models tend to predict HSU score greater than one, which is a value that is impossible by de nition of HSU (22). In our study, the risk of predicting HSU values greater than one is circumvented by incorporation of non-linear trends.

Directions for future research and the PROMIS Preference Score (PROPr) for QALYs
Our mapping functions should be con rmed to samples with a greater frequency of low health states. Therefore, we are planning to replicate our ndings with data collected from spine patients who were assessed before surgery. It would also be interesting whether regressing the EQ-5D dimensions on the PROMIS domain scores rst and then calculating the EQ-5D from the regressed EQ-5D dimensions has incremental value (49).
PROMIS data can also be used to estimate a new preference-based HSU score: Hanmer developed the PROMIS Preference Score (PROPr) to compute HSU for QALYs directly from 7 PROMIS health domains: cognition, depression, fatigue, pain, physical function, sleep disturbance, and participation (50)(51)(52)(53)(54). Note that these 7 PROMIS domains are not equivalent with those 7 domains from the PROMIS-29 pro le (anxiety is missing in the PROPr, while cognition is missing in the PROMIS-29). The PROPr was valuated in US preferences using the standard gamble method (SG), while the EQ-5D uses TTO (25,37,52,55).
The PROPr could potentially be used instead of the EQ-5D in CUA. Since many European HTA authorities such as NICE speci cally demand the use of the well-established EQ-5D to measure HSU in CUA, mapping the PROMIS-29 to the EQ-5D will still be needed(48). Also, as of September 2020, there is no PROPr value set for European preferences(52,53).

Conclusion
Our mapping functions can be used to predict the EQ-5D from the PROMIS-29 for CUA in HTA for the UK, France and Germany. The inclusion of polynomial regression terms decreases the prediction bias for lower health states.
Our results support the assertion that mapping functions are country-speci c. The application of Revicki's model to the data collected in the three European countries leads to biased HSU estimates for the UK and France and to less precise estimates in all three countries. Estimation of countryspeci c regression coe cients for the ve health domains identi ed by Revicki strongly improves the average prediction performance but does not remedy the overestimation of low health states.

Declarations
Ethical Approval and Consent to participate: All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. Informed consent was obtained from all individual participants included in the study. Participation was voluntary.
Consent for publication: Not applicable.
Availability of supporting data: Data is available on reasonable request.  Figure 1 Relationships among the PROMIS domains and health state utility expressed by the EQ-5D score Bland-Altman plots of the predicted and observed health state utility scores for the UK, France, and Germany