 Research
 Open Access
 Published:
Comparing the mapping between EQ5D5L, EQ5D3L and the EORTCQLQC30 in nonsmall cell lung cancer patients
Health and Quality of Life Outcomesvolume 14, Article number: 60 (2016)
Abstract
Background
Several mapping algorithms have been published with the EORTCQLQC30 for estimating EQ5D3L utilities. However, none are available with EQ5D5L. Moreover, a comparison between mapping algorithms in the same set of patients has not been performed for these two instruments simultaneously. In this prospective data set of 100 nonsmall cell lung cancer (NSCLC) patients, we investigate three mapping algorithms using the EQ5D3L and EQ5D5L and compare their performance.
Methods
A prospective noninterventional cohort of 100 NSCLC patients were followed up for 12 months. EQ5D3L, EQ5D5L and EORTCQLQC30 were assessed monthly. EQ5D5L was completed at least 1 week after EQ5D3L. A random effects linear regression model, a betabinomial (BB) and a Limited Variable Dependent Mixture (LVDM) model were used to determine a mapping algorithm between EQ5D3L, EQ5D5L and QLQC30. Simulation and cross validation and other statistical measures were used to compare the performances of the algorithms.
Results
Mapping from the EQ5D5L was better: lower AIC, RMSE, MAE and higher R^{2} were reported with the EQ5D5L than with EQ5D3L regardless of the functional form of the algorithm. The BB model proved to be more useful for both instruments: for the EQ5D5L, AIC was –485, R^{2} of 75 %, MAE of 0.075 and RMSE was 0.092. This was –385, 69 %, 0.099 and 0.113 for EQ5D3L respectively. The mean observed vs. predicted utilities were 0.572 vs. 0.577 and 0.515 vs. 0.523 for EQ5D5L and EQ5D3L respectively, for OLS; for BB, these were 0.572 vs. 0.575 and 0.515 vs. 0.518 respectively and for LVDMM 0.532 vs 0.515 and 0.569 vs 0.572 respectively. Less overprediction at poorer health states was observed with EQ5D5L.
Conclusions
The BB mapping algorithm is confirmed to offer a better fit for both EQ5D3L and EQ5D5L. The results confirm previous and more recent results on the use of BB type modelling approaches for mapping. It is recommended that in studies where EQ5D utilities have not been collected, an EQ5D5L mapping algorithm is used.
Background
Health Related Quality of Life (HRQoL) is an important outcome from both clinical and economic perspectives. For cancer patients, it can be considered as a measure of the tradeoff between survival benefit, toxicity from treatments and the physical and emotional wellbeing of the patients [1]. HRQoL is also considered to be an important predictor of survival [2]. Furthermore, HRQoL is critical for understanding the economic value of (cancer) treatments, because some cancer treatments are not only expensive but also the clinical benefits are modest and the burden of adverse events is quite high. Therefore, the riskbenefit relationship of cancer treatments can be guided by HRQoL outcomes [3].
One feature of health economic evaluation is the use of generic HRQoL measures to determine patient level health utilities for adjusting clinical outcomes to generate Quality Adjusted Life Years (QALYs) [4]. In some cases, utilities from commonly used generic HRQoL measures such as EQ5D3L or EQ5D5L are not always available. Therefore, reliance is made on alternative approaches to estimate patient level utilities using ‘mapping’ or ‘crosswalking’ – where a statistical algorithm developed from a conditionspecific measure (e.g. such as the cancer specific EORTCQLQC30) is used.
The advantages and limitations of mapping have been discussed in detail elsewhere (Khan, 2014; Brazier, 2010) [5, 6]. Recently Crott (2014), Arnold (2015) and Doble (2015) [7–9] examined the performance of the most common mapping algorithms applied to the QLQC30. Several limitations of some of the simpler mapping algorithms from the EQ5D3L were noted. These related to untenable assumptions of linearity, homoscedasticity, multimodality, skewness, censoring and an over reliance on R^{2} as the metric of model performance; and in some cases poor over prediction, particularly at poorer health states [5, 7, 8, 10, 11]. Mapping algorithms based on EQ5D3L have been shown to consistently overpredict utilities, particularly at poorer health states [5, 6]. In order to address some of the limitations, alternative functional and statistical forms of mapping algorithms were examined (Kharroubi 2007, Crott, 2010, Khan, 2014, Hernandez, 2012, Sabourin et al., 2015) [5, 10–13]. These functional forms in some cases generated improved predictive capability (e.g. Hernandez, 2012, Khan, 2014). In some cases however, changing the functional form did not offer improved prediction over and above simpler models [5, 6]. Moreover, when applied to external data, some of the algorithms performed poorly [7, 8].
In addition to the statistical framework of mapping algorithms, questions have been raised about the usefulness and indeed validity of mapping (Round, 2012) [14]. It is suggested that it is unclear as to what exactly is being predicted from mapping models, because the target is unknown (Round, 2012) [14]. However, this is precisely what a mapping model is supposed to do  to estimate the unknown utilities, which we assume to be ‘knowable’ based on reasonable assumptions. Although this, among other criticisms of mapping are important [5, 6, 15], they are perhaps not strong enough to dismiss mapping altogether. Consequently, about 25 % of health technology appraisal (HTA) submissions to NICE have used mapping (Longworth, 2013) [16] in the UK; while in Australia, this was reported to be about 24 % (Suchffham, 2008) [17]. Moreover, the published mapping models (for the QLQC30), suggest the unknown utilities are likely to be ‘knowable’ to some extent because some mapping algorithms have shown to yield close approximates of the target mean utility. Therefore, mapping can serve a useful purpose for estimating patient level utilities and continues to be used in HTAs of cancer drugs for estimating utilities (or sensitivity analyses) despite these criticisms.
Separately, concerns have also been raised about the sensitivity of the EQ5D3L and by extension to the derived mapped utilities [18–21]. Most mapping using the EORTCQLQC30 (QLQC30) are based on EQ5D3L. Given the reported limitations and criticisms levelled against the EQ5D3L and the consequent development of the EQ5D5L, a mapping algorithm for the EQ5D5L appears to be the next logical step in this area of research.
There are two commonly used generic HRQoL measures for determining utilities used in health economic evaluation  EQ5D3L and the more recent EQ5D5L. The main difference between these two instruments is that the latter has responses measured on a 5 point scale, with many more health states [22]. EQ5D3L was suggested as having limited discriminative ability and less power to detect between group differences compared with EQ5D5L [22–24]. Research is ongoing as to the best value sets for use with EQ5D5L. Meanwhile, an interim scoring is currently available for EQ5D5L using a crosswalk algorithm from EQ5D3L to EQ5D5L.
In this research we compare the performance of three mapping algorithms (from QLQC30): a Random Effects linear model, a BetaBinomial (BB) and a Limited Dependent Variable Mixture Model (LDVMM), for each of two utility measures: EQ5D5L and EQ5D3L, separately. To our knowledge, no study of mapping compares algorithms from both instruments in the same set of patients; and none are available between EQ5D5L and QLQC30, particularly from a nonsmall cell lung cancer (NSCLC) patient population. Khan & Morris (2014), using data from a randomized controlled trial (RCT) [5], showed that a threepart BB model performed best amongst other commonly used algorithms. This analysis examines mapping models using data from NSCLC patient in a real world NHS setting. This will offer researchers a way of computing patientlevel utilities from the EQ5D5L (and EQ5D3L) with greater generalizability than a RCT.
Methods
Study design
A single cohort prospective (noninterventional) follow up study in 100 NSCLC patients was designed. Patients with histologically confirmed NSCLC gave informed consent (for data collection and follow up) and were followed up during their routine anticancer treatment and cancer management for a period of at least 12 months. Patients were recruited between March 2014 and July 2015 from the Liverpool and Clatterbridge Cancer Centre. The trial recieved local ethics approval (Liverpool Central) and research was conducted in compliance with the Helsinki declaration.
EQ5D5L, EQ5D3L and QLQC30 assessments were carried out monthly from registration. EQ5D3L and EQ5D5L were assessed at least 1 week apart to avoid potential for ‘carry over’. Patients were given the HRQoL forms to take home and they returned them by post or when they attended their next hospital visit. They were instructed to complete the EQ5D3L in the first week and the EQ5D5L in the second (or third) week of each month.
Instruments
EQ5D3L is widely used for economic evaluation, has 243 health states and for each state, a corresponding utility value is available [5, 6]. In this paper, we use the UK tariffs based on the Time TradeOff (TTO) method [23]. The raw scores from the EQ5Ds were converted into an index ranging from 0.549 to 1, where 1 denotes 'perfect' quality of life, 0 for death and values below 0 as states 'worse than death'. EQ5D5L consists of five questions identical to EQ5D3L (mobility, selfcare, usual activities, pain/discomfort, and anxiety/depression), but with an expanded 5 point scale (compared to the 3 point scale of EQ5D3L) [25]. These are ‘no problem’, “’slight problems’, ‘moderate problems’ and ‘severe problems’ in all five dimensions, and ‘unable’ in mobility, selfcare and usual activities or ‘extreme problems’ in pain/discomfort and anxiety/ depression. The scoring of EQ5D5L uses an interim crosswalk based algorithm (UK value sets) between EQ5D3L and EQ5D5L (Van Hout, 2012) in the absence of a full value set [22, 26].
The EORTC QLQC30 is an established instrument for measuring HRQoL in various cancers [27]. QLQC30 has 15 domains, scored on a 0 to 100 scale. The scoring consists of 5 function scales: Physical Function (PF), Role Function (RF), Emotional Function (EF), Cognitive Function (CF) and social functioning (SF). There are also 9 symptom scales: Fatigue (FA), Nausea & Vomiting (NV), Pain (PA), Dyspnoea (DY), Insomnia (IN), Appetite Loss (AL), Constipation (CO), Diarrhoea (DI) and Financial Problems (FI); there is also a global health status score (QL). For the global health and function domains, high scores indicate better QoL. For the symptom domains, low scores indicate better symptoms.
Statistical methods
Three models were used to compare the mapping.
Linear random effects model
The linear model with a random effect is an extension of the ordinary least squares (OLS) model. One importance difference is that subject level effects are included (sometimes called a mixed effects model). In the context of mapping, because utility scores are observed for each subject on more than one occasion, the responses are not independent. The subject level differences (between subject variability) can be modelled with a random effect. For this reason the model is termed a mixed effects model because variability of utilities occurs between and within subjects. This model is relatively easy to use when applied to an external data set to predict patient level utilities. This is important because, in practice, a mapping algorithm should also have a feature that it can be used practically and as simply as possible. Overly complicated models require more assumptions and hence introduce greater uncertainty. The principle of parsimony should be adopted when developing a mapping model. The model form in a general linear mixed model framework is:
Where β is a matrix with the fixed effects parameters (e.g. the 15 coefficients of the QLQC30) and u is a matrix (or vector) with the random (subject) terms and ε is the experimental error term (corresponding to the fixed effects).
Limited dependent variable mixture model (LDVMM)
A second model proposed by Hernandez et al. [10] belonging to the class of limited dependent variable (LDV) models is the socalled Adjusted Limited Variable Dependent Mixture Model (ALVDMM) [10]. This particular model has several noteworthy features. The first is that it assumes additivity of effects (as in a linear model). The second is that it involves a latent variable that is censored. The censoring occurs (similarly applied in a TOBIT model) because there are considered to be unobservable values. Hernandez et al. [10] noted that since there is a gap in utilities between the values 0.833 and 1 for the EQ5D3L, the preferences for health states are in effect ‘cutoff’ on the higher side of values at (or above) 0.833 to a value of 1 (we essentially capture the ceiling effect). That is, if a patient’s (true) utility is >0.833, the instrument (EQ5D) cannot capture this and we assume a value of 1.
The LDV type models generate predicted estimates in a more complex way which involve finding the probability that the unobserved (latent) value is above or below the censored threshold value (e.g. 0.833) using the ratio of the probability density function (PDF) to cumulative density functions (CDF). This feature of the LDVs allow the possibility to model the presence of several distributions simultaneously. Hernandez et al. [10] modelled data against the (simpler) health assessment questionnaire (HAQ) in an arthritis population. The greater the number of latent classes, the greater the complexity of interpretation. Application of 3 classes in the context of 15 QLQC30 domain parameters is likely to lead to a much more complex latent class structure and therefore two classes (two mixed distributions) are used for both the EQ5D3L and EQ5D5L in this analysis. This is justified by observing the kernel density estimates which suggest a bimodal distribution for EQ5D3L (values between about –0.549 to 0.3 and 0.3 to 1) in this data set (see Fig. 1). For the EQ5D5L, the mixture of distributions is not obvious, although there is marked skewness.
The model form for the mixture model used in this context is now described in further detail:
Assuming responses Y (i.e. EQ5D utilities), whose distribution depends on an unobservable random variable S; S can occupy one of k states (k = 2 in this example), the number of which might be unknown but is at least known to be finite. Since S is not observable, it is referred to as a latent variable. Let π_{j} denote the probability that S takes on state j. For example, in the case of the EQ5D3L for the ALVDMM, j = 1 might refer to values of EQ5D3L < 0.833 and j = 2 would refer to states such that EQ5D3L utilities are > 0.833.
Conditional on S, the distribution of the response Y is assumed to be f_{j}(y;α_{j}, β _{j} S = j). What this expression (i.e. (f_{j}(y;α_{j}, β _{j} S = j)) means is that depending on the number of states (S), a model (with a form f_{j}(y;α_{j}, β) can be used to determine the relationship between Y (the EQ5D) and a set of predictors, β (e.g. the 15 QLQC30 coefficients). For example, for j = 1 (values of EQ5D3L between 0.549 and 0.3), the EQ5D3L are assumed to follow a Normal distribution. For values between 0.3 and 1 (j = 2), the data can be considered to follow a Beta Binomial (BB) distribution. In another scenario, for j = 1, a Weibull function could be used, and for j = 2 a Normal distribution used; there would 6 parameters to estimate (2 parameters for the Weibull, 2 parameters for the Normal and consequently two mixing probabilities (π_{1} and π_{2}), the probability of observations belonging to one or another class. The 6 parameters to be predicted do not include any of the QLQC30 predictors (parameters), where a further, 16 parameters are estimated.
The following mixture models were simultaneously fitted:

(i)
EQ5D as a function of 15 QLQC30 domain scores (Normal Distribution assumed between 0.549 and 0.30 for example)

(ii)
EQ5D as a function of 15 QLQC30 domain scores (Beta Binomial distribution assumed between 0.30 and 1 for example)

(iii)
The Mixing probabilities as a function of the 15 QLQC30 domain scores (two mixing probabilities which classify observations as belonging to distributions in (i) or (ii))
Clearly, the above modelling approach is complex, perhaps unnecessary and can lead to model non convergence. Its practical implementation as an external algorithm is therefore an important consideration. A transformation may be carried out if specific distributions are assumed (e.g. modelling negative values). For example, for values between 0.549 and 0.30, a Gamma (or Beta Binomial) distribution would not be possible.
Therefore, in this analysis two distributions are considered for modelling:

(i)
Assume Normality between 0.549 and <0.30 for the 15 predictor variables

(ii)
Assume Beta Binomial between >=30 and 1.0 for the 15 predictor variables
The predicted estimates are determined in a complicated way from the ratio of the CDF to the PDF of the EQ5D responses and using the estimated mixing probabilities. The mixing probabilities can be interpreted as the ratio of observations belonging to one of two distributions. If the mixing probabilities were 0.5, then 50 % of the EQ5D3L might be considered to follow a Normal distribution and the remaining 50 % a different distribution. A useful exposition of finite mixture models can be found in Schlattman (2009) [28].
A maximum likelihood estimation for continuous and discrete response distributions is used based on a dual quasiNewton optimization algorithm using the SAS® software [29]. A global maxima was sought using initial starting values to search for a local maxima, followed by rerunning the model using estimates generated from previous model runs.
Beta binomial model
For the ALVDMM previously used, censoring occurs for values at 0.833 for the EQ5D3L. This is not the case for the EQ5D5L, where values between 0.833 and 1 do exist. For this reason (Fig. 1) the distribution of the EQ5D5L can be considered appropriate for modelling on a continuous type scale between 0.549 and 1.0 (after a transformation of Ya/ba), and therefore the BB model is the third model that is considered for mapping. The details of the BB model are elaborated and discussed in Khan & Morris (2014) [5] and show an improved fit compared with simpler linear and LDV type models (e.g. TOBIT and CLAD).
Model performance criteria
Several model performance statistics were used including the root mean square error (RMSE) which is a measure of model fit (lower values indicate better fit), mean prediction error, R^{2}, mean absolute error (MAE), and percent predicted >1 and < 0.594 were. Chai (2014) argues that the RMSE is more appropriate than the MAE, particularly if the error distribution is Normally distributed [30]. In addition, the Aikakes Information Criteria (AIC) values and percent predicted within a target range (e.g. ±5 %, ±10 %) of the observed values were determined.
Simulation and cross validation
Multivariate simulation (1,000 simulations using Fleishman methods) [31, 32] were used to test the uncertainty of the models. The method of Fleishman uses higher order moments (e.g. kurtosis and skewness) to generate correlated simulated data regardless of the distribution of each of the original variables. The steps involved in simulation require computing the mean, SD, skewness and kurtosis for each of the observed 15 QLQC30 domain scores. Using the Fleishman (1978) [31] power transform:
The values of α, β, δ and γ are estimated from randomly generated data Z, normally distributed with mean of zero and a variance of 1 and the observed measures for kurtosis and skewness. The values of α, β, δ and γ are estimated through a process of iteration so that Y can be determined. The derived Y (e.g. 15 QLQC30 scores) are simulated (correlated) responses which are not necessarily normally distributed. Khan et al. [5] have shown that the QLQC30 scores are unlikely to follow a Normal distribution in most cases.
For each simulated data set, cross validation was used. Half (50 %) of the simulated dataset (randomly selected) was used to develop the mapping model and the other half used to test the model (out of sample predictions). For each realization (i.e. dataset simulated), the model performance statistics (e.g. RMSE and R^{2}) were generated and reported. Although, there is no theoretical reason for 50 % of the data used for developing the model, other cutoffs (e.g. 75 % vs 25 %) were also considered.
Results
Between March 2014 and July 2015, a total of 100 patients were registered for follow up, out of whom, two patients withdrew before follow up started. Consequently, 98 (98 %) were followed up and included in the statistical analysis; 23 patients (23 %) died during the follow up and 2 patients (2 %) dropped out due to personal reasons (Fig. 2 CONSORT). There were a total of 985 observations (responses) across 98 patients for EQ5D5L and EQ5D3L HRQoL forms, respectively; HRQoL forms were completed by 97/98 (99 %) patients at baseline; completion rates at 3 and 6 months were 78/98 (79 %) and 41/98 (55 %) respectively. Completion rates were, therefore, similar for all three (EQ5D5L, EQ5D3L and QLQC30) instruments. There were 146 observed health states (5 % of all possible health states) observed with EQ5D5L and 62 (26 %) for EQ5D3L. The most frequent health states with the EQ5D5L were 11111 (6 %), followed by 21222 (5 %), 43533 (3 %) and 31331 (3 %). For EQ5D3L these were 21222 (11 %), followed by 22222 (10 %), 22221 (7 %), 22322 (6 %) and 11111 (6 %).
Demographics
Median age was 69 years (range 39 to 86); 55/98 (56 %) were male, 67/98 (68 %) were exsmokers and 19/98 (19 %) current smokers. There were 61/98 (64 %) patients who were Easter Cooperative Oncology Group (ECOG) (02) and the remaining with ECOG >2; ECOG is used as a measure of wellbeing (and prognosis), with higher values suggesting poorer prognosis; 15/98 (15 %) were Stage III and 83/98 (85 %) were Stage III and higher; Histology subtypes were 43/98 (44 %) with adenocarcinoma and 36/98 (37 %) with squamous cell. The remainder were of varying subtypes (Table 1).
Performance of EQ5D5L and EQ5D3L Mapping Algorithms
Overall
The best performing model regardless of EQ5D3L or EQ5D5L was the BB model (Table 2 & Fig. 3): this had AIC, R^{2}, RMSE, MAE and % predicted to within ±5 % and ±10 % of 485.3, 75 %, 0.092, 0.075, 29 % and 59 %; for EQ5D3L and were 385.4, 69 %, 0.113, 0.099, 21 % and 47 % for EQ5D5L respectively. The BB therefore had good model fit characteristics and predicted more utilities to within ±10 % of the observed value compared to other models, particularly for the EQ5D5L.
Random effects model
The performance of the random effects model was comparable to the LDVMM. Table 3 shows the parameter estimates for the 15 QLQC30 coefficients. If all scores for the functional domain, Global score and Finance score are assumed to be perfect (i.e. score of 100) and no signs and symptoms are present (i.e. score of 0), the predicted EQ5D3L and EQ5D5L scores are estimated to be about 0.89 and 0.96 respectively. On the other hand, if symptom and functional scores are the worst possible (scores of 0 and 100 for function and symptoms respectively), the predicted EQ5D3L and EQ5D5L falls to about 0.10 and 0.09 respectively. EQ5D5L therefore predicts better at both extremes Table 5.
Beta binomial model
Following on from above, the BB can be used to predict the EQ5D using a standard logit link: P/1P = exp (α + βX), such that P = 1/1 + exp (α + βX), where P are the predicted EQ5D and X are the QLQC30 scores.
The first step is to predict the EQ5D using the estimates in Table 4. Setting the functional scores of the EQ5D3L to perfect HRQoL for the two function and symptom scores (score = 100 and 0 respectively), the predicted EQ5D5L is estimated as:
1/[1 + exp(‐ α + βX) = exp[0.2255 + (100 * PF + 100 * SF + … … + 0 * FA …. + 0 * FI)] = 0.983. Hence, the predicted EQ5D5L are 0.983, approximating the value 1.00. Table 5 below shows results from scenarios between the 3 models.
LDVM
The LDVM model estimates are more complicated to generate as they involve two distributions and two mixing probabilities. Consequently more than 32 parameters are involved in determining predictions for the best and worst case scenarios (Table 6). The LDVMM also predicts well at extremes, despite similar R^{2} and RMSE to the random effects model (Table 5 and Table 7). However, the LDVMM is much more complex to use as an algorithm. Users would also need to know details of the mixing probabilities as well as make stronger assumptions about the mixed distribution. Other mixtures were also considered but the Normal/Beta mixture offered the best (smallest AIC) fitting model.
Health states
EQ5D3L prediction by health state were generally as observed in literature (Khan & Morris 2014) [5]: overprediction at poorer health states. There does however appear to be some evidence that mapping algorithms based on EQ5D5L may yield improved predicted utilities at poorer health states. In particular, the BB model showed improved predictions regardless of the instrument.
The predictions at poorer health states (Fig. 4) present some interesting findings. Modelling with the LDVMM consisted of a BB and Normal distribution. Values >0.30 were modelled assuming a BB distribution. Predictions at poorer health states (assumed to be 0549 to 0.30) appear slightly worse. Better predictions with the LDVM after EQ5D values >=0.30 are observed. This supports a BB algorithm as a plausible model for developing a mapping algorithm.
The predicted values are notably worse for the EQ5D3L. About 50 % of predicted utilities were overpredictions (higher than the observed value by any amount) with the EQ5D5L; for EQ5D3L this was 67 %; 93 % vs 97 % of utilities were overpredictions for the EQ5D5L vs EQ5D3L respectively.
Simulation and cross validation
Each simulated data set of 985 observations for EQ5D5L and EQ5D3L were subject to a cross validation using a 50 % random sample (about 492 observations each for EQ5D5L and EQ5D3L respectively) for the BB model. Hence, a total of 1,000 R^{2}, RMSE and mean predicted values were observed (Table 8 and Figures. 5.4 – 5.7). For EQ5D5L and EQ5D3L respectively, the average (mean) R^{2} from the BB model was 76 % (range 51 % to 89 %) and 68 % (range 38 % to 79 %); RMSEs averaged around 0.099 (range 0.069 to 0.155) and 0.113 (range 0.058 to 0.177). Simulations from the Random Effects and LDVM models showed similar performance but were both worse compared to the BB.
Predicted mean utilities were closer to the observed for the EQ5D5L: 0.572 vs. 0.575 whereas, for the EQ5D3L these were 0.515 vs. 0.518 (Table 8 and Figs 5, 6, 7 and 8). Hence, out of sample predictions for the EQ5D5L appeared more accurate than those of the EQ5D3L, particularly with the BB model. When a different cutoff was used (e.g. 75 % to model the data and 25 % for prediction), there were no changes in conclusions.
Discussion
We have developed and compared three mapping algorithms for the EQ5D5L and EQ5D3L using contemporary and novel modelling methods. We have shown that EQ5D5L may offer better prediction at poorer health states where several previous algorithms with EQ5D3L have, by and large, overpredicted. Modest improvements of an algorithm based on EQ5D5L over one based on EQ5D3L in terms of statistical metrics (e.g. R^{2}, percent predicted) have been confirmed with a BB model in this and previous analyses [5]. Young et al. [33] suggested that twopart models may offer a way to predict the different parts of the distribution in the context of mapping with improved performance for handling overprediction. More recently, Crott [34] confirms the suitability of the BB type models over other models. In this analysis we have confirmed the bimodal nature of the EQ5D5L value sets noted earlier (Oppe et al.) [24] (Fig. 6).
This is the first time to our knowledge a mapping algorithm has been developed simultaneously from EQ5D5L and EQ5D3L in the same lung cancer patients using EORTCQLQC30 and compared with each other in a real world NHS setting. Previous works with the EQ5D5L highlighted some of the limitations of the EQ5D3L relating to aspects such as bimodality of utilities and a lack of sensitivity to detect differences between treatment groups [35–37]. Some earlier mapping models did not take this into account. Cheung et al. [25] for example, report an algorithm using the FACTB in a breast cancer population with R^{2} of around 48 % (AIC was not reported).
In this analysis, overprediction at poorer health states still exists with EQ5D5L, although it is not as marked as EQ5D3L. It is yet to be seen whether the final value sets (Oppe et al.) [24] currently being developed and validated will impact predictions at poorer health states. The reasons for overprediction may be due to several factors, including the functional form of the model, the range of the scale (5 point vs 3 point scale), number of health states and other clinical characteristics. Khan & Morris [5] previously suggested overestimates at poorer health states may be related to other factors such as poorer prognosis. Preliminary evidence of this is shown by observing the relationship between ECOG performance and EQ5D utilities (Table 9). It is possible a further complexity is required in the modelling by using the joint distribution of utilities and other outcomes (e.g. Adverse events) to model the QLQC30 scores.
In this study, the EQ5D5L and 3L assessments were taken close together in time. Therefore, there may be some concern about ‘carryover’ or recall bias. To check this, we determined whether health state responses were recorded similarly. For example, if a response of 11112 was observed for EQ5D3L, we checked whether this was also observed for EQ5D5L (responses >3 are not possible for EQ5D3L). We noted that for 15 of the 146 (EQ5D5L) health states, the responses for EQ5D5L and EQ5D3L were the same  for example, patients with responses of 11111 to both EQ5D5L and EQ5D3L in 18 of the 985 (pairs) of observations (<2 %). In the vast majority of cases the responses were different. This suggests that patients did not recall the previous responses and the presence of carryovermay be unlikely.
There are several limitations of this research. The first is that this is a small sample size with relatively few health states, although the sample size is larger than the algorithm reported by Kontodimopoulous (2009) [38]. Secondly, inferences need to be restricted to a similar NSCLC population until further evidence emerges of wider applicability across tumour types. Thirdly, external validity was not possible in an independent data set and therefore crossvalidation was used as a ‘second best’ accompanied by simulation for out of sample predictions. Fifthly, insufficient numbers of events were available for reliable computation of QALYs and therefore the impact on QALYs could not be reliably observed at this time (a sufficient number of events are not yet available for this to be estimated reliably). Finally, the values of the EQ5D5L are crosswalked from the EQ5D3L and are therefore subject to uncertainty. However, in the absence of a readily identified set of value sets, and given that the EQ5D5L is being used in current clinical research, using the EQ5D3L crosswalk sets should be considered acceptable in the interim.
Despite these limitations, this is the first mapping algorithm for the EQ5D5L using real world data with enhanced generalizability outside the RCT context. That further research is required, is consequently inevitable.
Conclusion
Mapping algorithms developed from EQ5D5L appear to provide improved estimates of utilities compared with EQ5D3L, particularly at poorer health states. Two part models fit the data well and this result confirms earlier and more recent work. It is recommended that in studies where EQ5D utilities have not been collected, an EQ5D5L mapping algorithm is used.
Panel: research in context
Systematic review
We carried out an extensive review of the literature before designing this study. At the time no comparison of HRQoL responses across several important HRQoL instruments were made in a lung cancer patient population, particularly the EQ5D3L and EQ5D5L. Understanding HRQoL continues to be an important aspect of managing NSCLC patients and this research will be valuable for future economic evaluations and understanding the way different HRQoL instruments measure utility.
Interpretation
We have demonstrated that the EQ5D5L can be mapped from the EORTCQLQC30 successfully. Our findings suggest that the EQ5D5L may be a preferred choice of mapping in NSCLC patients due to its higher R^{2}, improved prediction in general and at poorer health states, where EQ5D3L algorithms have shown to over predict. The results of this study may lead to wider use of the EQ5D5L.
References
 1.
Damm K, Roeske N, Jacob C. Healthrelated quality of life questionnaires in lung cancer trials: a systematic literature review. Health Econ Rev. 2013;3(1):15.
 2.
Davidoff AJ, Tang M, Seal B, Edelman MJ. Chemotherapy and survival benefit in elderly patients with advanced nonsmallcell lung cancer. J Clin Oncol. 2010;28:2191–7.
 3.
Montazeri A, Milroy R, Hole D, McEwen J, Gillis CR. Quality of life in lung cancer patients: as an important prognostic factor. Lung Cancer. 2001;31(23):233–40.
 4.
Brazier J, Ratcliffe J, Salomon JA, Tsuchiya A. Measuring and valuing health benefits for economic valuation. Oxford: Oxford University Press; 2007. p. 360. ISBN: 9780198569824.
 5.
Khan I, Morris S. A nonlinear betabinomial regression model for mapping EORTC QLQ C30 to the EQ5D3L in lung cancer patients: a comparison with existing approaches. Health Qual Life Outcomes. 2014;12(1).
 6.
Brazier J, Yang Y, Tsuchiya A, Rowen D. A review of studies mapping (or cross walking) nonpreference based measures of health to generic preferencebased measures. Eur J Health Econ. 2009;11(2):215–25.
 7.
Crott R. Mapping algorithms from QLQC30 to EQ5D utilities: no firm ground to stand on yet. Expert Rev Pharmacoecon Outcomes Res. 2014;14(4):569–76.
 8.
Arnold D, Rowen D, Versteegh M, Morley A, Hooper C, Maskell N. Testing mapping algorithms of the cancerspecific EORTC QLQC30 onto EQ5D in malignant mesothelioma. Health Qual Life Outcomes. 2015;13:6.
 9.
Doble, B, Lorgelly, P. Mapping the EORTC QLQC30 onto the EQ5D3L: assessing the external validity of existing mapping algorithms. Qual Life Res. 2015. Epub ahead of print.
 10.
Hernández Alava M, Wailoo A, Ara R. Tails from the Peak District: adjusted limited dependent variable mixture models of EQ5D questionnaire health state utility values. Value Health. 2012;15(3):550–61.
 11.
Kharroubi SA, Brazier JE, Roberts J, O'Hagan A. Modelling SF6D health state preference data using a nonparametric Bayesian method. J Health Econ. 2007;26(3):597–612.
 12.
Sabourin C, Crott R, Aballea S, Toumi M. Alternative regression methods for mapping utilities in oncology; ISPOR 18th Annual European Congress; Milan, Italy; November, 2015; http://www.ispor.org/ScientificPresentationsDatabase/Presentation/60984
 13.
Crott R, Briggs A. Mapping the QLQC30 quality of life cancer questionnaire to EQ5D patient preferences. Eur J Health Econ. 2010;11(4):427–34.
 14.
Round J. Is a QALY still a QALY at the end of life? J Health Econ. 2012;31(3):521–7.
 15.
Round J. Capturing information loss in estimates of uncertainty that arise from mapping algorithms. Aberdeen: Health Economics Study Group (HESG); 2008.
 16.
Longworth L, Rowen D. Mapping to Obtain EQ5D utility values for use in NICE health technology assessments. Value Health. 2013;16(1):202–10.
 17.
Scuffham P, Whitty J, Mitchell A, Viney R. The use of QALY weights for QALY calculations. Pharmacoeconomics. 2008;26(4):297–310.
 18.
Malkin A, Goldstein J, Perlmutter M, Massof R. Responsiveness of the EQ5D to the effects of low vision rehabilitation. Optom Vis Sci. 2013;90(8):799–805.
 19.
Krahn M, Bremner K, Tomlinson G, Ritvo P, Irvine J, Naglie G. Responsiveness of diseasespecific and generic utility instruments in prostate cancer patients. Qual Life Res. 2006;16(3):509–22.
 20.
Buchholz I, Thielker K, Feng Y, Kupatz P, Kohlmann T. Measuring changes in health over time using the EQ5D3L and 5L: a headtohead comparison of measurement properties and sensitivity to change in a German inpatient rehabilitation sample. Qual Life Res. 2014;24(4):829–35.
 21.
Richardson J, Khan M, Iezzi A, Maxwell A. Comparing and explaining differences in the magnitude, content, and sensitivity of utilities predicted by the EQ5D, SF6D, HUI 3, 15D, QWB, and AQoL8D multiattribute utility instruments. Med Decis Making. 2014;35(3):276–91.
 22.
Van Hout B, Janssen MF, et al. Interim scoring for the EQ5D5L: Mapping the EQ5D5L to EQ5D3L value sets. Value in Health. 2012;15(5):708–15.
 23.
Dolan P. Modeling valuations for EuroQol health states. Med Care. 1997;35(11):1095–108.
 24.
Oppe M, Devlin N, van Hout B, Krabbe P, de Charro F. A program of methodological research to arrive at the new international EQ5D5L valuation protocol. Value Health. 2014;17(4):445–53.
 25.
Cheung Y, Luo N, Ng R, Lee C. Mapping the functional assessment of cancer therapybreast (FACTB) to the 5level EuroQoL Group’s 5dimension questionnaire (EQ5D5L) utility index in a multiethnic Asian population. Health Qual Life Outcomes. 2014;12:180.
 26.
http://www.euroqol.org/abouteq5d/valuationofeq5d/eq5d5lvaluesets.html.
 27.
Groups.eortc.be. Questionnaires  EORTC. [online] Available at: http://groups.eortc.be/qol/eortcqlqc30 2016 [Accessed 10 Oct. 2012].
 28.
Schlattmann P. Medical applications of finite mixture models. (Statistics for Biology and Health) Hardcover –Springer; 2009 edition.
 29.
Dave K, Allen MD. Introducing the FMM Procedure for finite mixture models paper 3282012; SAS institute Inc. Cary, NC: SAS Global Forum; 2012.
 30.
Chai T, Draxler RR. Root mean square error (RMSE) or mean absolute error (MAE)? – arguments against avoiding RMSE in the literature. Geosci Model Dev. 2014;7:1247–50.
 31.
Fleishman A. A method for simulating nonnormal distributions. Psychometrika. 1978;43(4):521–32.
 32.
Pourahmadi M, Daniels M, Park T. Simultaneous modelling of the Cholesky decomposition of several covariance matrices. J Multivariate Analysis. 2007;98(3):568–87.
 33.
Young T, Mukuria C, Rowen D, Brazier J, Longworth L. Mapping functions in healthrelated quality of life: mapping from two cancerspecific healthrelated qualityoflife instruments to EQ5D3L. Med Decis Making. 2015;35(7):912–26.
 34.
Crott R. Direct mapping of the QLQC30 to EQ5D preferences: a comparison of regression methods. Pharmacoecon. 2016 (in press)
 35.
Lee C, Luo N, Ng R, Wong N, Yap Y, Lo S, Chia, W., Yee, A., Krishna, L., Wong, C., Goh, C. and Cheung, Y. Comparison of the measurement properties between a short and generic instrument, the 5level EuroQoL Group’s 5dimension (EQ5D5L) questionnaire, and a longer and diseasespecific instrument, the functional assessment of cancer therapy—breast (FACTB), in Asian breast cancer patients. Qual Life Res. 2012;22(7):1745–51.
 36.
Kim S, Kim H, Lee S, Jo M. Comparing the psychometric properties of the EQ5D3L and EQ5D5L in cancer patients in Korea. Qual Life Res. 2011;21(6):1065–73.
 37.
Pattanaphesaj J, Thavorncharoensap M. Measurement properties of the EQ5D5L compared to EQ5D3L in the Thai diabetes patients. Health Qual Life Outcomes. 2015;13(1):14.
 38.
Kontodimopoulos N, Aletras V, Paliouras D, Niakas D. Mapping the cancerspecific EORTC QLQC30 to the preferencebased EQ5D, SF6D, and 15D instruments. Value Health. 2009;12(8):1151–7.
Acknowledgements
We are most grateful to all the participating patients and local research staff for their helpful advice and comments throughout the study. We are also grateful to Veronica Kelly, Aisha Khan, Hana Barlas and anonymous reviewers to help improve the manuscript.
Participating clinicians and centres
Liverpool Heart and Chest Hospital, Liverpool and Clatterbridge Cancer Centre Wirral (J Maguire).
Financial support
None
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
Conception and design: IK and JM Provision of study materials or patients: JM, BM, IK Statistical Analysis: IK Data Interpretation: I K, SM, NP Manuscript writing: IK, JM, SM, NP, ZB Final approval of manuscript: IK, JM, SM, NP, BM, ZB. All authors read and approved the final manuscript.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Root Mean Square Error
 NSCLC Patient
 Mapping Algorithm
 Poor Health State
 Mean Absolute Error