Responsiveness and minimal clinically important difference of the Minnesota living with heart failure questionnaire

Background The Minnesota Living with Heart Failure Questionnaire (MLHFQ) is one of the most widely used health-related quality of life questionnaires for patients with heart failure (HF). The objective of the present study was to explore the responsiveness of the MLHFQ by estimating the minimal detectable change (MDC) and the minimal clinically important difference (MCID) in Spain. Methods Patients hospitalized for HF in the participating hospitals completed the MLHFQ at baseline and 6 months, plus anchor questions at 6 months. To study responsiveness, patients were classified as having “improved”, remained “the same” or “worsened”, using anchor questions. We used the standardized effect size (SES), and standardized response mean (SRM) to measure the magnitude of the changes scores and calculate the MDC and MCID. Results Overall, 1211 patients completed the baseline and follow-up questionnaires 6 months after discharge. The mean changes in all MLHFQ domains followed a trend (P < 0.0001) with larger gains in quality of life among patients classified as “improved”, smaller gains among those classified as “the same”, and losses among those classified as “worsened”. The SES and SRM responsiveness parameters in the “improved” group were ≥ 0.80 on nearly all scales. Among patients classified as “worsened”, effect sizes were < 0.40, while among patients classified as “the same”, the values ranged from 0.24 to 0.52. The MDC ranged from 7.27 to 16.96. The MCID based on patients whose response to the anchor question was “somewhat better”, ranged from 3.59 to 19.14 points. Conclusions All of these results suggest that all domains of the MLHFQ have a good sensitivity to change in the population studied.


Background
The Heart Failure Association of the European Society of Cardiology defines heart failure (HF) as "a clinical syndrome characterized by typical symptoms (e.g. breathlessness, ankle swelling and fatigue) that may accompanied by signs (e.g. elevated jugular venous pressure, pulmonary crackles and peripheral edema) caused by a structural and/or functional cardiac abnormality, resulting in a reduced cardiac output and/or elevated intracardiac pressures at rest or during stress" [1]. HF is a highly prevalent condition, associated with significant morbidity and a poor prognosis [2][3][4]. A 2013 update from the American Heart Association estimated that there are 5.1 million people with HF in USA and 23 million worldwide [5]. The incidence of HF also increases significantly with age, and hence, because of the aging population, the prevalence of HF can be expected to increase substantially in the future [6]. In addition, the majority of patients with HF experience a considerable reduction in health-related quality of life (HRQoL) [7].
The HRQoL of patients with HF is an important outcome as it reflects the impact of HF on their daily lives [8]. HF patients experience high levels of physical, functional and emotional distress [9]. Indeed, there is evidence that adults with HF have poorer HRQoL than those without HF [10][11][12][13].
In recent decades, various specific HRQoL questionnaires for patients with HF have become regarded as important assessment tools [14][15][16][17]. Among these, one of the most widely known and used is the Minnesota Living with Heart Failure Questionnaire (MLHFQ) [16,18]. It is a self-administered disease-specific questionnaire for patients with HF [19], comprising 21 items. It provides a total score as well as scores for two domains, physical and emotional. The questionnaire has been translated into and validated in Spanish [14].
There are several definitions of measurement responsiveness [20,21]. For this study, we defined responsiveness as the ability of an instrument to detect real changes in the concept being measured [22]. In the Spanish version of the MLHFQ, responsiveness has not been studied in depth and it is very important to determine whether an HRQoL questionnaire is able detect changes over time in the patient that occur naturally or due to clinical intervention. In addition, HRQoL questionnaires should be easily interpretable by clinicians, and one of the most common ways to facilitate interpretation is ascertaining the minimal detectable change (MDC) and the minimal clinically important difference (MCID) [23]. The latter may vary by patient characteristics or clinical status.
Therefore, the objective of the present study was to examine the responsiveness of the MLHFQ, using distribution and anchor-based approaches [21,22,[24][25][26]. Among distribution-based methods, we calculated the MDC at the individual level. We also compared the MDC with the MCID for each domain [23,27]. To the best of our knowledge, only one previous study has analyzed the responsiveness of the Spanish version of the MLHFQ [14], and our study is the first to explore the responsiveness of this questionnaire by estimating MDC and MCID in Spain.

Sample and materials
Secondary analysis of a prospective, multicenter cohort study, carried out in 13 Spanish hospitals: two in the Canary Islands, one in Catalonia, four in Andalusia and six in the Basque Country. The cohort consisted of hospitalized patients with HF on cardiology or internal medicine departments of the participating hospitals, between January 2009 and May 2013. Within the study period, the first admission was considered the index admission. The Institutional Review Boards of each hospital approved the study. All patients signed a declaration of informed consent.
The inclusion criteria were: having a diagnosis of HF (International Classification of Diseases, 9th Revision, Clinical Modification code 428), being more than 18 years of age, completing the questionnaires at baseline and 6 months, and consenting to participate in the study. Patients who developed an HF episode during hospitalization, those referred from other health care centers and those who died during their hospital stay or within 6 months from discharge were excluded from the study.
All eligible patients were sent a letter informing them about the study and asking for their voluntary participation.
Clinical data including smoking history, left ventricular ejection fraction, New York Heart Association classification (NYHA) [28] and comorbidities (assessed with Charlson's Index [29]) were extracted from clinical records. In addition, during admission, patients completed the MLHFQ, plus questions requesting sociodemographic information. At 6 months, we sent the same questionnaires to each participant plus anchor questions. These questions were different for each of the MLHFQ dimensions. For the physical dimension, the anchor question was "How would you rate your physical problems related to your heart disease compared with how you felt 6 months ago?" For the emotional dimension the anchor question was "How would you rate your emotional problems related to your heart disease compared with how you felt 6 months ago?" The response options for both questions were presented as 5-point ordinal scales (1 = much worse, 2 = somewhat worse, 3 = about the same, 4 = somewhat better, 5 = much better).
The 21 items of MLHFQ [16,18] are rated on six-point Likert scales, representing different degrees of impact of HF on HRQoL, from 0 (no) to 5 (very much). It provides a total score (range 0-105), from best to worst HRQoL, as well as scores for two subscales, the physical (range 0-40) and emotional (range 0-25) domains, composed of 8 and 5 items respectively. The other eight items (of the total of 21) are only considered for the calculation of the total score. The MLHFQ has been translated and culturally adapted into at least 34 languages, and has demonstrated good psychometric properties in numerous studies [8,[14][15][16][30][31][32][33][34][35][36].

Statistical analysis Descriptive statistics
Descriptive data are expressed as frequencies and percentages for qualitative variables and means with standard deviations (SDs) for quantitative variables. We compared sociodemographic and clinical characteristics, and MLHFQ scores at baseline between the responders and non-responders at follow-up using the chi-square test or Student's t-test.

Statistically significant change
To study responsiveness, patients were classified as having "improved", remained "the same" or "worsened", according to their anchor question responses. Those who, in response to the anchor questions at 6 months, rated their condition as "much better" or "somewhat better" compared to 6 months earlier were classified as having "improved", those who rated their condition as "about the same" were classified as having remained "the same", and patients who rated their condition as "somewhat worse" or "much worse" were classified as having "worsened". For the total score of MLHFQ, the response was calculated taking into account the answers to the anchor questions of the physical and emotional domains. Specifically, those who rated their condition as "improved" in both anchor questions or "improved" in one and "the same" in other anchor question were classified as "improved"; and those who rated them as "the same" in both anchor questions were classified as "the same"; while those who rated them as "worsened" in either of the anchor questions were classified as "worsened".
Ceiling and floor effects at baseline and 6 months after discharge were examined to evaluate the discriminatory power of the scales in each subgroup of patients, and we used 15% as the critical value for such effects [37]. Means and SDs were calculated for the MLHFQ scales at baseline and 6 months after discharge in each subgroup of patients, and a paired t-test or the nonparametric Wilcoxon signed-rank test was used to compare scores at the two time points. Changes in MLHFQ scores were calculated by subtracting follow-up scores from baseline scores, a positive result indicating a gain in quality of life. Mean changes were also compared among the three subgroups by analysis of variance with the Scheffe test for multiple comparisons, or the nonparametric Kruskall-Wallis test.

Standardized effect size and standardized response mean
To measure the responsiveness of the MLHFQ, we calculated the standardized effect size (SES), defined as the mean change score divided by the SD of the baseline scores, and standardized response mean (SRM), defined as the mean change score divided by the SD of the change scores [22]. Cohen's benchmarks were used to classify the magnitude of the effect sizes: < 0.20, not significant; 0.20 to 0.49, small; 0.50 to 0.79, moderate; and ≥ 0.80, large [24]. We hypothesized that there would be larger HRQoL changes in patients who rated their condition as better or worse than in patients who considered that they had remained "the same".

MDC and MCID
The MDC expresses the minimal magnitude of change above which the observed change is likely to be real and not just measurement error. To estimate MDC, the standard error of measurement (SEM), which represents the amount of error associated with the assessment of an individual [38], was estimated first using the formula: , where SD T1 is the SD of the sample at baseline and R is the reliability coefficient. Cronbach's α was used as a reliability measure [39]. From the SEM, the MDC was derived as follows [38]: . A 95% confidence level (MDC 95% ) was established, corresponding to a z-value of 1.96: if a patient has a change score at or above the MDC 95% threshold, it is possible to state with 95% confidence that this change is reliable and not the result of a measurement error.
To assess the usefulness of anchor questions in establishing the MCID, we have evaluated their validity through the association between anchor question responses and the change scores in MLHFQ domains, by calculating partial correlation coefficients, controlling for baseline score. We hypothesized that these correlations should be higher than 0.50 [40].
We used two different statistical methods to calculate values for MCID. MCID reflects the smallest changes after a clinical intervention or natural progression that are meaningful for the patient. First, the MCID was estimated for MLHFQ considering the mean change score for patients whose response to the corresponding anchor question was "somewhat better" [41]. For the total score, this group corresponded to those patients whose response to one anchor question was "somewhat better" and to the other was "about the same", and those whose response to both questions was "somewhat better". In addition, to calculate cut-off values for MCID, we used the receiver operating characteristic (ROC) curve approach, considering the dichotomized anchor question responses (improved vs the same or worsened) as the dependent variable, and the change score for each dimension as an independent variable. For each dimension, the cut-off that maximized the sum of sensitivity and specificity was considered the optimal cut-off.
Further, we estimated the MCID and MDC proportions, which are the proportions of the sample with change scores exceeding the MCID and MDC 95% , respectively. Finally, the MCID was divided by the MDC 95% to determine whether the MCID exceeded the MDC 95% [27]. If this ratio is greater than 1, the MCID can be discriminated distinguished from measurement error.
All effects were considered statistically significant at P < 0.05. All statistical analyses were performed using SAS 9.2 (SAS Inc., Cary, NC) and IBM SPSS Statistics for Windows, Version 23.0 (IBM Corp; Armonk, NY).

Descriptive statistics
During the recruitment period, 2565 patients hospitalized for HF fulfilled the selection criteria, agreed to participate and completed the baseline questionnaires. Of these, 416 died within 6 months, and of the remaining 2149 patients, 1211 (56.36%) completed the questionnaires 6 months after discharge. Table 1 shows descriptive statistics for sociodemographic, clinical and HRQoL data at baseline of responders and non-responders at 6 months. There were no statistically significant differences between responders and non-responders in body mass index or smoking history. In contrast, non-responders were more likely to be older (78.45 vs. 75.92), to be women (52.51% vs. 46.49%), to have a left ventricular ejection fraction > 45% (63.65% vs. 57.85%), to be in NYHA class III or IV at discharge (50.29% vs. 37.89%) or and to have a high score (> 3) in the Charlson comorbidity index (23.71% vs 19.49%). The distribution of the MLHFQ subscales at baseline reach the possible range; psysical subscale from 0 to 40, emotional subscale from 0 to 25 and total scale from 0 to 105. In responders, the mean (SD) of the MLHFQ baseline scores were 26.14(9.66), 11.73 (7.18) and 55.45 (23.66) for the physical, emotional and total scale. The non-responders had the mean scores significantly higher in all MLHFQ domains.

Statistically significant change
All the MLHFQ scales showed minor floor and ceiling effects (< 15%) both at baseline and at six months after discharge. The mean changes in all MLHFQ domains followed a gradient (P < 0.0001) with larger gains in quality of life among patients classified as "improved", smaller gains among those classified as "the same", and losses among those classified as "worsened" ( Table 2). Six months after discharge, the MLHFQ physical, emotional, and total scores decreased 10.41, 4.07 and 21.83 points, respectively, among patients classified as "improved", all of these changes being statistically significant (P < 0.0001). Among patients classified as "worsened", losses in quality of life were detected in all MLHFQ domains, with negative mean changes, although the changes in the total scale were not significant. Among patients classified as "the same", changes were also significant in all MLHFQ domains (P < 0.0001) but were smaller than the changes in "improved" patients.

SES and SRM
The SES and SRM responsiveness parameters in "improved" patients were higher than 0.80 for the physical and total scale, but the scores values were 0.57 and 0.51 for the MLHFQ emotional scale. Among patients classified as "worsened", effect sizes were below 0.40, while among patients classified as "the same", values ranged from 0.52 to 0.24.

MCID and MDC
MDC ranged from 7.27 for the emotional domain to 16.96 for the total score of the MLHFQ ( Table 3)  The MCID based on patients whose response to the anchor question was "somewhat better" ranged from 3.59 points in the emotional domain to 19.14 points in the total score. The MCID proportion based on anchor questions was similar for all domains, ranging from 35.25% in the physical domain to 37.49% in the emotional domain. The ratio of the MDC 95% and MCID based on anchor questions exceeded 1 for the physical domain and total score of the MLHFQ, but was less than 1 for emotional domain.
The MCID based on ROC analysis ranged from 1.75 points in the emotional domain to 8.20 points in the total score, and the MCID proportion ranged from 46.76% in the emotional domain to 53.41% in the total score. The ratio of MCID based on ROC analysis to MDC was less than 1 in all domains.

Discussion
The results of this prospective observational study with a large sample of patients with HF offer new information Paired t-test to compare the mean scores at baseline and at 6 months after discharge SD standard deviation, SES standardized effect size, SRM standardized response mean, CI 95% confidence interval MLHFQ physical subscale scores range from 0 to 40, emotional subscale scores from 0 to 25, and total scores from 0 to 105, with higher scores indicating worse health status. Changes were calculated by subtracting follow-up scores from baseline scores; a positive result indicates a gain in quality of life abc Superscript letters indicated differences among the three subgroups (improved, equal, and worsened) according to Scheffe's test for multiple comparisons at P < 0.05 about the responsiveness, MCID and MDC of the MLHFQ. This questionnaire was highly responsive capturing changes in HRQoL 6 months after discharge. There was extensive evidence supporting responsiveness of the MLHFQ and its capacity to discriminate between different magnitudes of change in patients' HRQoL. A systemic review with meta-analyses carried out by Garin et al. evaluate and compare data on the conceptual model and metric properties of several HF specific HRQoL instruments and conclude that they would primarily support the use of the MLHFQ [15].
The small ceiling and floor effects and the use of the full range of scores in a sample which covers the full range of severity, indicate that the instrument is likely to detect improvement or deterioration.
We have analyzed the validity and reliability of our anchor questions through correlations, as described in the literature [42,43]. Partial correlations between anchor question responses and change scores were nearly 0.50 for all MLHFQ domains. This could be due to the anchor question for the total score not being a direct question asked to the patient, but rather a calculated response. Specifically, it was calculated by combining the responses to the other two anchor questions, and this could be expected to affect its validity and reliability.
In our study, the anchor question responses indicated that patients who reported improvement gained more points than patients who remained the same or worsened in all domains of the MLHFQ. In line with this, a large effect (SES > 0.90) was found in "improved" patients in the physical domain and total score, and a moderate effect (SES 0.57) in the emotional domain. Taken together, these findings suggest that all domains of the MLHFQ have a good sensitivity to change in our population. These results are similar to those of previous responsiveness studies in other languages [14,18,30,44,45], and what is more, our SES and SRM for "improved" patients and our change scores for improvement in all domains are larger than those reported in the other studies analyzed [14,18,44,45]. The SES has also been found to be lower in the emotional domain than in the other domains in several MLHFQ responsiveness studies [14,30,45]. Nevertheless, in our case, this effect was moderate, being larger than SES values considered non-significant or small in other responsiveness studies.
On the other hand, in "worsened" patients, the effect size is small or not significant in nearly all domains. That is, the instrument is more responsive to improvement than worsening, not reflecting well changes in patients whose health deteriorates. These results are similar to the outcomes in the Spanish validation of the MLHFQ, the effect sizes for the patients showing "deterioration" only being > 0.26 for all domains [14]. In line with this, other studies carried out in US [45,46] also found this questionnaire to lack discriminative power for detecting negative changes. For this reason, the MCID for "worsened" patients was not calculated. Nevertheless, for patients whose response to the anchor question was "somewhat better", we found that changes were large enough to exceed the MDC at the individual level with a 95% level of confidence in the physical domain and total score of the MLHFQ. That is, in both cases, the change observed was greater than that required to be considered a true change, and hence, the MLHFQ can be considered responsive to detect true changes in HF patients at the individual level in the physical domain and total score.
Considering cut-off values determined by the ROC analysis, we found less conservative values for MCID, patients needing fewer points to detect such a difference. In addition, the ratio obtained from ROC analysis was not greater than 1, meaning that the change could not be distinguished from measurement error. We conclude that the MCID based on our anchor questions is more appropriate for detecting true changes.
Just like in our study, the findings of responsiveness and sensitivity of MLHFQ in other language versions show similar results. In general, patients with an improvement (measured by different ways), on overage, experienced large improvements in HRQoL. However patients with no change still experienced a moderate improvement, and those who worsened, on average, had little to no change in HRQoL [14,18,[44][45][46][47][48]. Likewise, these studies confirm that the MCID in the MLHFQ exceeded predefined criteria and be more clinically valid for patients with HF than other instruments [48].
This study has various limitations that should be taken into account. Comparing responders and non-responders, those who did not respond were found to have poorer baseline MLHFQ. This might have skewed the result but any such bias would have been in our favor, as patients with a poorer health status at baseline tend to have more gains in HRQoL in follow-up. Hence, the effect size might have been smaller in our responders than it would have been in the non-responders. In addition, the differences identified may have been statistically significant due to the large sample size, while not seeming to be clinically relevant. ES are measures of the magnitude of the change scores, rather than the validity of the change scores. Therefore, ES should be considered inappropriate as parameters of the responsiveness. However, we have included these measures because they are frequently used and easily identifiable in the literature. On the other hand, we have used the Spanish version of the questionnaire, and therefore, the results may not be generalizable to other population or languages.
Nevertheless, the current study also has a number of strengths. Despite there being other studies of MLHFQ responsiveness, to the best of our knowledge, this is the first study that explores the responsiveness of the MLHFQ by estimating MDC and MCID in Spain. The present study provides data on responsiveness of the MLHFQ that could help to interpret changes detected by the questionnaire. In particular, the analysis included calculation of the MDC and MCID, a type of anchor-based method that is considered important for patients and clinicians and directly reflects their points of view [49].
Recently, Bilbao et al. [50] compared different factor structures of the MLHFQ proposed by several authors and found that their results supported the existence of a third factor, a social dimension, with good validity and reliability. They concluded that Munyombwe's [34] model had the best psychometric properties among the social factor proposed. Unfortunately, we have not analyzed the responsiveness of this factor because we did not have an appropriate anchor question to measure it. Further studies are needed to explore the responsiveness of the social factor proposed by Munyombwe [34].

Conclusions
To sum up, all of these results suggest that all domains of the MLHFQ have a good sensitivity to change in the population studied. It is very important to determine the responsiveness of all MLHFQ domains because these domains could reveal physiological or pathological changes, and detection of such changes could allow interventions to prevent further deterioration in heart function, and thereby reduce repeat hospitalization and mortality rates.