Mapping the Minnesota living with heart failure questionnaire (MLHFQ) to EQ-5D-5L in patients with heart failure

Background Mapping algorithms can be used to convert scores from a non-preference based instrument to health state utilities. The objective of this study was to develop mapping algorithms which will enable the Minnesota Living with Heart Failure Questionnaire (MLHFQ) scores to be converted into EQ-5D-5L utility scores that can be used in heart failure related cost utility studies. Method Patients diagnosed with heart failure were recruited from Australia. Mapping algorithms were developed using both direct and indirect response mapping approach. Three model specifications were considered to predict the EQ-5D-5 L utility score using MLHFQ total score (Model 1), MLHFQ domain scores (Model 2), or MLHFQ item scores (Model 3). Six regression techniques, each of which has the capability to cope with either skewness, heteroscedasticity, ceiling effects and/or the potential presence of outliers in the data set were used to identify the optimal mapping functions for each of the three models. Goodness-of-fit of the models were assessed using six indicators. In the absence of an external validation dataset, predictive performance of was assessed using three-fold cross validation method. In the indirect response mapping, EQ. 5D 5 L responses were predicted separately using the MLHFQ item scores using ordered logit model. Results A total of 141 patients participated in the study. The lowest mean absolute error (MAE) was recorded from the multivariable fractional polynomials (MFP) model in all three-model specifications. Regarding the indirect response mapping, results showed that the performance was comparable with the direct mapping approach based on root mean squared error (RMSE) but was worse based on MAE. Conclusion The MLHFQ can be mapped onto EQ-5D-5 L utilities with good predictive accuracy using both direct and indirect response mapping techniques. The reported mapping algorithms would facilitate calculation of health utility for economic evaluations related to heart failure.


Introduction
Cardiovascular disease (CVD) is one of the leading causes of death in the developed countries [1]. Heart failure (HF) is the fastest growing CVD in the world which poses a significant global burden, affecting nearly 26 million people worldwide [2]. It is a chronic debilitating illness in which the symptoms worsen with progression of the disease. Disease progression is associated with significant impact to the physical and social wellbeing, increased hospitalization [3,4] as well as increased mortality. It is estimated that over 61,000 (6.9 per 1000 person-years) Australians aged ≥ 45 years are diagnosed with clinically over HF every year. Heart failure accounts for an estimated 150,000 hospitalisations and over 1 million days in hospital per annum [5,6] and poses a significant burden to the health budgets globally. It is estimated that 1-2% of total healthcare expenditures in Europe and North America is spent for the treatment of HF [7]. In Australia, the annual cost of managing HF in the community is approximately $900 million and nearly $2.7 billion when considering the additional cost of in-patient care [8].
Economic evaluation presents evidence to inform comparative decisions, particularly about value for money. It is used by the regulatory agencies of Australia, United Kingdom and Switzerland in their evaluations of the costeffectiveness of new health interventions prior to funding. Cost-utility analysis (CUA) is one method of economic evaluation that has been used to inform resource allocation decisions [9]. In CUA, health benefits are usually measured by quality-adjusted life-years (QALYs).
QALYs incorporate both changes in life expectancy and quality of life in a single metric. Utility is the component of the QALY that accounts for the quality of life which is measured using generic multi-attribute utility instruments (MAUIs) such as the EQ-5D or the SF-6D [10]. However, evidence indicate that the disease specific quality of life instruments are superior to generic instruments (eg. EQ-5D), owing to their superior sensitivity to changes in quality of life [11][12][13]. However, most disease specific quality of life instruments are not preferencebased and cannot directly generate the utilities. Nonpreference base instruments are characterized by measuring, but not valuing health states. Currently there are no HF specific MAUIs available to estimate utility, thus generic instruments such as EQ-5D are widely used [14].
In this context, mapping algorithms are of importance, as they can convert scores from a disease specific instrument to utilities. Mapping algorithms have been successfully developed to many disease specific quality of life instruments including instruments related to cardiovascular diseases [15,16]. However, to the authors' knowledge, currently there is no study on the development of a mapping algorithm for a heart failure specific study instrument. The main objective of this study was to develop mapping algorithms which will enable the Minnesota Living with Heart Failure Questionnaire (MLHFQ) scores to be converted into utility scores that can be used in the heart failure related cost utility studies.

Study design
Ambulatory patients were recruited from cardiology out-patient clinics at Royal Brisbane and Women's Hospital (RBWH), Brisbane, Australia. The RBWH is the largest hospital in Queensland and has nearly 1000 hospital beds. Patients with HF attending the between January 2018 to March 2018 were included in the study. Patients with documented evidence of HF were recruited to the study using convenient sampling method and upon recruitment, the diagnosis was confirmed by the clinical cardiology staff.
Following informed consent the study participants completed a three-sectioned questionnaire. The first section included the socio-demographic information such as age and sex and diagnosis of the patient. The second section included the five-level EQ-5D questionnaire (EQ-5D-5 L) and the third section included the Minnesota Living with Heart Failure questionnaire (MLHFQ). Institutional ethics committee approval was obtained from the Griffith University Human Research Ethics Committee (Reference no. 2017/069).

Instruments
The source instrument for mapping was the MLHFQ and the target instrument was the EQ-5D-5 L.
EuroQol five-dimensional questionnaire (EQ-5D) The three-level version EQ-5D is the most widely used preference-based instrument [17]. In 2011 the new version of the instrument, the five-level EQ-5D (EQ-5D-5 L) was developed to improve the ability of the instrument to measure small changes in the health state, especially in patients with milder conditions [18]. EQ. 5D is an instrument which has been used to assess quality of life of heart failure patients [19]. Furthermore, EQ-5D-5 L is a valid instrument to be used in health research in Australia [20]. This instrument contains five domains: mobility, self-care, usual activities, pain/discomfort and anxiety/depression. Each domain has one item and each item has five response levels with one denoting no problems and five denoting extreme problems. Thus, EQ-5D-5 L can define mutually exclusive 3125 different health states [21]. Since then EQ-5D-5 L has been used to measure health state utility of different disease conditions [22][23][24] including cardiovascular disease [25,26]. In this study the EQ-5D-5 L was scored using the widely used UK tariff [21] since by far the Australian-specific tariff is not yet available. Therefore, in Australia, UK tariffs are commonly used to calculate EQ-5D-5 L utility scores [27,28]. The EQ-5D-5 L utility scores (based on the UK tariff) range from − 0.594 (the worst health state) to 1.0 (the best health state), whilst 0 equals being dead and negative values represent health status considered worse than "dead".
Minnesota living with heart failure questionnaire (MLHFQ) The MLHFQ is a self-administered, 21-item diseasespecific instrument for patients with heart failure [29]. MLHFQ is an instrument which has been widely used to assess quality of life among heart failure patients [30][31][32]. Each item is scored in a 6-point Likert Scale (0 to 5), thus the total score could range from 0 to 105, with higher scores indicating more significant impairment in health-related quality of life. The MLHFQ has two domains; physical domain (eight items, score range from 0 to 40) and emotional domain (five items, score range from 0 to 25).

Statistical analysis
The patient characteristics were summarized using mean (standard deviation [SD]) and median for continuous variables while frequency (percentage) was used for the categorical variables. Normality of the continuous variables was assessed using the Shapiro-Wilks test. A scatter plot and Spearman correlation coefficient were used to describe the correlation between the MLHFQ score and the EQ-5D-5 L utility score. The magnitude of the correlation coefficients (r) were interpreted according to Guilford's criteria [33]. According to this criteria the correlation coefficients are divided in to five categories depending on the strength of the association, namely; very low (r: 0.00-0.20), low (r: 0.21-0.40), moderate (r: 0.41-0.60), high (r: 0.61-0.80) and very high (r: 0.81-1.00).
Both direct and indirect response mapping were conducted in the study.
The best regression method to develop a predictive model is a widely discussed topic. The current consensus is that there is no one method that fits all data sets [34]. To get around this uncertainty, during direct mapping, six regression techniques were used on the same dataset and the best was chosen based on validation parameters. Six regression techniques, each of which has the capability to cope with either skewness, heteroscedasticity, ceiling effects and/or the potential presence of outliers in the data set [35], were used to identify the optimal mapping functions for each of the three models:

Ordinary least square (OLS)
In the OLS the coefficients and the intercept are calculated by minimising the sum of the squares of the differences between the observed and predicted utility scores. The model assumes that the errors are normally distributed with mean zero and has a constant variance (homoscedasticity) [36]. The OLS is the most widely reported method in mapping literature although violating the above assumptions [35].

Generalized linear modelling (GLM)
GLM allows the errors to have a skewed distribution by having a priori specifying the distribution. All potential combination of family and link functions were investigated and the ones with the best mapping performance was chosen for each model. For the Models 1 and 2, gamma distribution and identity link function produced the best prediction model, while Gaussian distribution and 'identity' link function (which is equivalent to the OLS) produced the best prediction model for the Model 3.

Censored least absolute deviations (CLAD)
This method is best suited for outcome variables censored at lower or upper endpoints. This method uses median parameters rather than means, thus robust to distributional assumptions and heteroscedasticity [37].

Multivariable fractional polynomials (MFP)
MFP is a useful modelling technique to be used when the dependant and the independent variables have a non-linear relationship [38]. Different regression methods were tested (such as OLS and GLM mentioned above) and the median regression produced the best prediction model in all three models.

Robust MM estimator (MM)
This method is useful when presence of either heteroscedasticity or outliers limit the use of traditional regression methods [39].

Beta regression model (BETA)
This method is robust to skewness and can estimate both unimodal and bimodal utilities [35].
The direct mapping algorithms were developed using aforementioned regression techniques. In particular, three model specifications were considered to predict the EQ-5D-5 L utility score mainly using MLHFQ total score (Model 1), MLHFQ domain scores (Model 2), or MLHFQ item scores (Model 3) (see below). Based on the previous literature [40,41], squared terms of MLHFQ total score, MLHFQ domain scores and MLHFQ item scores were added as independent variables to the linear modes (i.e. OLS, CLAD, MM) in order to account for the non-linear relationship between EQ-5D-5 L utility values and MLHFQ. However, for the non-linear models, i.e. the GLM, MFP, and BETA, we only included the original term in the modelling since the potential nonlinear relationship will be considered during the modelling process. Socio-demographic characteristics such as age and sex were included in the models to improve the predictive performance. Forward stepwise regression method was used to identify the statistically significant predictors (i.e. P < 0.05) to be included in the final mapping functions.
In the indirect response mapping, EQ. 5D 5 L responses were predicted separately using the MLHFQ item scores using ordered logit model [42]. This will produce a set of mapping algorithms which will predict each of the 5 EQ. 5D 5 L dimension responses. This will enable calculating country-specific EQ-5D-5 L utilities by applying country-specific tariffs, not just the UK tariff that was used for this study. The MLHFQ items that should be used to predict each of the EQ. 5D 5 L dimension responses were selected using forward stepwise regression technique.

Assessing model performance
Goodness-of-fit of the models were mainly assessed using mean absolute error (MAE) and the root mean square error (RMSE). MAE was computed as the mean of the absolute differences between the predicted and actual observed EQ-5D-5 L utilities, while the root square value of the mean squared differences between the actual and predicted EQ-5D-5 L utilities was considered as RMSE. However, more weight was given to MAE as it is easily interpretable and considered to be less sensitive to outliers [43]. Furthermore, four additional criteria were also considered to assess the models a) exactness of the predicted sample mean b) the range of predictions c) the proportion of predicted utilities deviating from observed values by absolute error < 0.03 and < 0.05 d) intra-class correlation coefficients In the absence of an external validation dataset, predictive performance of the models were assessed using 3-fold cross validation method [44,45]. The data set was randomly divided in to three equal-sized sections using random number generation algorithms. During each iteration, two groups (67% of the data set) were allocated to the "estimation sample" and all six regression models were applied to develop the coefficients. Then the remaining group (33% of the data set) was used as the 'validation sample', where the estimates generated during the previous step were used to estimate the predicted values for the 'validation sample'. This process was repeated three times, so as to make certain that each of the three subgroups was used in the estimation and validation iterations. Thereafter, the validation results were pooled together and model performance based on the pooled estimated goodness-of-fit statistics (MSE and MAE) was assessed.
The "Mapping onto Preference-based measures reporting Standards" (MAPS) checklist was followed in this study [46]. All statistical analyses were conducted using STATA Software version 15.0.

Sample characteristics
A total of 141 patients diagnosed with heart failure participated in the study. The mean age of the study participants was 63.3 (SD 14.8) years and more than half (n = 96; 68.0%) of them were males ( Table 1). The mean and the median EQ-5D-5 L utility scores were 0.6619 (SD 0.27) and 0.708 (0.553-0.877) respectively. The mean MLHFQ total score was 28.9 (SD 23.5). Frequency distribution plots of EQ-5D-5 L and MLHFQ total score is depicted in Fig. 1. EQ-5D-5 L utility values were negatively skewed while the MLHFQ total score was positively skewed, indicating that both values were nonnormally distributed. This was further proven by Shapiro Wilks test of normality (p < 0.001). A moderately strong negative correlation was observed between EQ-5D-5 L utility scores and MLHFQ total score (Spearman correlation coefficient (r) = − 0.580; p < 0.001) (Fig. 2).

Prediction of EQ. 5D 5 L utility scores
In the direct mapping, six regression methods and three model specifications were assessed separately. Age (p > 0.2) and sex (p > 0.8) were consistently insignificant in all regressions, thus they were excluded from the regression models. Of the 21 items in the instrument, only three items (item 04, 17 and 21) were found to be statistically significant in the forward stepwise regression method. Thus, only those three items were included in the final equation of the Model 3. Furthermore, squared terms used in the OLS, CLAD, MM models were found to be not significant, thus were removed from the final model. Similar to direct mapping, age (p > 0.05) and sex (p > 0.05) were not significant in the indirect mapping response as well. Of the 21 items in the instrument, the items which were statistically significant in predicting each of the EQ-5D-5 L dimension responses (selected using the forward stepwise regression method) are indicated below.
Mobility ➔ Item 3, Item 5 and Item 15.  Table 2 summarised the key goodness-of-fit statistics for different model and method combinations based on the full sample (both direct and indirect response mapping). In the direct mapping, all models under predicted  Regarding the indirect response mapping, results show that the performance was comparable with the direct mapping approach based on RMSE but was worse based on MAE.

Validation
In the absence of an external validation dataset, predictive performance of the models was assessed using threefold cross validation method (Table 3). All models were assessed for goodness of fit using the MAE and RMSE and a consistent pattern was seen in all three-model specifications (direct mapping); OLS and GLM showed the lowest RMSE value and MFP estimates showed the lowest MAE value. Based on the results in Table 3, it is concluded that mapping algorithms developed using MFP regression technique exhibited the best predictive ability to predict the EQ-5D-5 L utility score using MLHFQ total score, MLHFQ domain scores and MLHFQ item scores.
Regarding the indirect response mapping, validation results was very much similar to the results of the main sample.

Best performing models
The best models to predict the EQ-5D-5 L utility score using MLHFQ total score, MLHFQ domain scores and MLHFQ item scores were selected on the basis of their performance in the cross validation step, with more weight put on the MAE following evidence in the literature [43]. MFP regression technique performed best in all three-model specifications. Detailed Goodness-of-fit indicators for the above three models are indicated in the Table 4. Fig. 3 illustrates the scatter plots of observed vs predicted EQ-5D-5 L using the selected best performing models. Table 4 (Fig. 4). Table 5 reports the detailed MFP regression coefficients for each model specification, which can be used to predict the EQ-5D-5 L utility score in the three specification scenarios. The transformation scores of the MFP models are as follows.
MLHF Emotional domain transformation factor -6.595744681. For example, the EQ-5D-5 L utility score could be predicted from MLHFQ total score using the following equation.

MFP
Step 1calculate a transformed MLHF total score.
Calculating the country specific utility values using indirect response mapping algorithms is a three-step process.
Step 2: Check the predicted EQ-5D-5 L mobility response score against the cut-off values (see Table 6).
Predicted EQ-5D-5 L mobility response score ➔ 3.899258. Since 3.899258 is between 2.64014 and 4.632778 the final EQ-5D-5 L mobility response is 3. Conduct the same process for all other EQ-5D-5 L domains.
Step 3: Appy country-specific tariffs to calculate the country-specific EQ-5D-5 L utilities.

Discussion
This is the first study to map MLHF onto EQ-5D-5 L utility scores using both direct and indirect response mapping techniques. Any previous study which did not use a preference-based instrument, but included MLHF questionnaire for data collection can use this algorithm to calculate utility values and use them in estimating cost effectiveness of the intervention in cost per QALY terms. Our regression analyses showed that the EQ-5D-5 L utility scores of heart failure patients in our sample was best predicted by the MFP regression model. Furthermore, results indicated that the indirect response mapping algorithms can be used effectively to calculate country specific utility values.
The mean MLHFQ score in the study population was 28.7 (SD 23.5). However, mean MLHFQ values available in the literature show a wide variation. According to Fu et al. (2016) [47] and Mogle et al. (2017) [48] who have validated the MLHFQ to Taiwan and Spanish populations, the mean MLHFQ score was 25.3 and 27.8 respectively. But a couple of studies have reported higher mean MLHFQ scores [49,50]. Therefore, the sample used in the present study may not represent the wide spectrum of HF population. Concurrent validity between two instruments implies the conceptual overlap between the two instruments, and evidence indicate that this is an important determinant of a successful mapping    analysis [51,52]. In the present study a moderately strong negative correlation was observed between EQ-5D-5 L utility scores and MLHFQ total score (r = − 0.580), MLHFQ physical domain score (r = − 0.5773) and MLHFQ emotional domain score (r = − 0.5498), implying good concurrent validity. The best models to predict the EQ-5D-5 L utility score were selected on the basis of their performance in the cross validation step, with more weight put on the MAE following evidence in the literature [43]. The choice of an error measure can affect selection of the best performing model. RMSE is strongly influenced by scale and more sensitive to outliers. However, MAE is easily interpretable, avoids the need for trimming and considered to be less sensitive to outliers in [43]. Therefore, MAE is considered a reliable error measure. MFP regression technique performed best in all three-model specifications, i.e. had the lowest MAE.
Absence of previous comparable mapping studies between MLHFQ and EQ-5D-5 L, precludes direct comparison of validity parameters of this study with literature. However, in the present study the validity criteria used to evaluate the regression models to select the best predictive model, indicated mixed results. The MAE value of all three models were relatively higher compared to the values reported in the literature [13,40]. Furthermore, the RMSE values were also relatively higher in the present study indicating that the absolute deviation from the observed values is higher. In our analysis all three models over predicted the more severe health states. The minimum observed value was − 0.2630, but the predicted minimum values in the three models were 0.1593, 0.1505 and 0.1775. This narrow range of the predicted values, which is commonly reported in the mapping literature [53,54], could explain the relative high RMSE values in the present study.
Association of substantial decrements in utility weights of EQ-5D-5 L in the severe health states and conceptual differences between the two instruments are believed to be the reasons for this commonly observed narrow range of the predicted values [55,56]. Thus, the algorithms presented in this paper should only be used to predict the mean utility score of a sample, but should not be used to make individual predictions. We also conducted indirect response mapping to predict the responses to each of the EQ-5D-5 L dimensions. This will enable calculation of different utility values from different country-specific value sets of the EQ-5D-5 L, thus the reported indirect mapping algorithms can be used by researches of other countries as well.  Table 5 were statistically significant (all P < 0.05) EQ-5D dimensions: MO mobility, SC self-care, UA usual activities, PD pain/discomfort, AD anxiety/depression, /cut# estimated cut points; **p < 0.001, *p < 0.05. Standard errors in parentheses.
However, it is important to note that the results of the indirect mapping algorithms depend on whether the heart failure patients in other countries will have a similar response pattern to the patients reported in the present study. This study has several strengths. Firstly, we used six regression methods to predict EQ-5D-5 L to account for the distribution of the data in the sample. The OLS is less suited for data sets with skewed distributions and homoscedasticity [36]. The GLM allows the errors to have a skewed distribution by having a priori specifying the distribution. MM-estimator is useful in the presence of either heteroscedasticity or outliers [39]. Beta regression is robust to skewness [35]. However, despite the strong assumptions, MFP performed superior to all the other regression methods used. The MFP is a useful modelling technique to be used when the dependant and the independent variables have a non-linear relationship [38]. Superiority of MFP compared to other robust regression models has been demonstrated previously as well [57]. Secondly, this is the only available algorithm in the medical literature which convers HF specific quality of life scores in to EQ-5D-5 L utility scores.
This study is not without limitations. Our sample size was 142. Although mapping studies have been conducted with similar sample sizes [51,54] it is recommended to conduct further mapping studies using a larger sample size to evaluate the reliability of the mapping algorithm reported in the present study. Secondly, this model was validated using an internal sample, however validation using an external sample would have been ideal. Thirdly, since the presented algorithm may over-estimate the severe health states, they may underestimate the utility gain in a study. Fourthly, in the present study the EQ-5D-5 L was scored using the UK tariff since the Australian-specific tariff is not available at present.

Conclusion
In conclusion, to the authors' knowledge, this is the first algorithm in the literature which converts HF specific quality of life scores onto EQ-5D-5 L utility scores using both direct and indirect response mapping techniques. The reported mapping algorithms would facilitate calculation of QALY in CUA related to heart failure.