Health state utility values in major depressive disorder treated with pharmacological interventions: a systematic literature review

Background Major depressive disorder (MDD) is associated with decreased patient well-being and symptoms that can cause substantial impairments in patient functioning and even lead to suicide. Worldwide, MDD currently causes the second-most years lived with disability and is predicted to become the leading cause of disability by 2030. Utility values, capturing patient quality of life, are required in economic evaluations for new treatments undergoing reimbursement submissions. We aimed to identify health state utility values (HSUVs) and disutilities in MDD for use in future economic evaluations of pharmacological treatments. Methods Embase, PubMed, Econlit, and Cochrane databases, plus gray literature, were searched from January 1998 to December 21, 2018, with no language or geographical restrictions, for relevant studies that reported HSUVs and disutilities for patients with MDD receiving pharmacological interventions. Results 443 studies were identified; 79 met the inclusion criteria. We focused on a subgroup of 28 articles that reported primary utility data from 16 unique studies of MDD treated with pharmacological interventions. HSUVs were elicited using EQ-5D (13/16, 81%; EQ-5D-3L: 11/16, 69%; EQ-5D-3L or EQ-5D-5L not specified: 2/16), EQ-VAS (5/16, 31%), and standard gamble (1/16, 6%). Most studies reported baseline HSUVs defined by study entry criteria. HSUVs for a first or recurrent major depressive episode (MDE) ranged from 0.33 to 0.544 and expanded from 0.2 to 0.61 for patients with and without painful physical symptoms, respectively. HSUVs for an MDE with inadequate treatment response ranged from 0.337 to 0.449. Three studies reported HSUVs defined by MADRS or HAMD-17 clinical thresholds. There was a large amount of heterogeneity in patient characteristics between the studies. One study reported disutility estimates associated with treatment side effects. Conclusions Published HSUVs in MDD, elicited using methods accepted by health technology assessment bodies, are available for future economic evaluations. However, the evidence base is limited, and it is important to select appropriate HSUVs for the intervention being evaluated and that align with clinical health state definitions used within an economic model. Future studies are recommended to elicit HSUVs for new treatments and their side effects and add to the existing evidence where data are lacking. Supplementary Information The online version contains supplementary material available at 10.1186/s12955-021-01723-x.


Introduction
Patients with depressive disorders can experience sadness, loss of interest or pleasure, feelings of guilt or low self-worth, disturbed sleep or appetite, feelings of tiredness, and poor concentration. These symptoms can cause substantial impairments in a patient's ability to function and, in some cases, may lead to suicide [1]. There are two main subcategories of depressive disorders: major depressive disorder (MDD), in which patients experience major depressive episodes (MDEs), and dysthymia, which is a chronic and milder form of depression [1]. An analysis of the Global Burden of Disease database by Liu et al. [2] found that 93.7% of patients with depression in 2017 had MDD. It is estimated that MDD causes the secondmost years lived with disability, after lower back pain [3]. The worldwide incidence of MDD increased from an estimated 162 million cases in 1990 to 241 million cases in 2017 [2], and MDD is predicted to become the leading cause of disability by 2030 [4].
Patients with MDD who experience an MDE can be classified based on clinical thresholds of disease severity [for example, mild, moderate, or severe as adopted in the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-V)], assessed using scales such as the Hamilton Depression Rating Scale (HAMD) or the Montgomery-Åsberg Depression Rating Scale (MADRS), and on the duration of the disorder (for example, first or recurrent MDE). Major depressive disorder can be managed pharmacologically with different classes of antidepressant treatments (ADTs), including selective serotonin reuptake inhibitors (SSRIs), serotonin-norepinephrine reuptake inhibitors (SNRIs), bupropion, tricyclic antidepressants (TCAs), and monoamine oxidase inhibitors (MAOIs), as well as antipsychotics [5]. Patients who do not respond to or tolerate an initial treatment, or who relapse, usually switch to a different class of ADT or augment agents.
New pharmacological treatments are being developed to improve clinical outcomes for patients with MDD, and economic evaluations may need to be performed to assess their value. Economic evaluations are performed to assess the cost-effectiveness of the new treatments in relation to treatments already available in local health care markets. Many health care payers require cost-utility analyses that use quality-adjusted life-years (QALYs) as the main measurement of effectiveness [6]. The QALY is a generic measure of disease burden that allows comparative analyses of the value of medical interventions to be conducted. They capture both the quantity and quality of life and are calculated by multiplying time spent in certain health states with corresponding health state utility values (HSUVs) [7]; HSUVs quantify health-related quality of life (HRQoL) as a single value on a scale from 0 (dead) to 1 (perfect health). Some HRQoL instruments allow for negative values for health states worse than death. Health state utility values represent the strength of an individual's preferences for specific health-related outcomes and can be elicited using different instruments and techniques. Discrete conditionspecific health states can be measured directly using choice-based methods such as standard gamble (SG), time trade-off, and discrete-choice experiments; ranking exercises; or a visual analogue scale (VAS) [8]. Indirect measurement of HSUVs is most commonly performed using generic multi-attribute utility instruments such as the EQ-5D, Short Form six dimensions (SF-6D), and Health Utilities Index that define health states according to scores on multiple distinct domains of health. Scores are converted to HSUVs by using utility tariffs derived from general population surveys that account for public preferences. Disease-specific instruments can be used to measure HSUVs in a similar way or by mapping results to a generic instrument. Many European health technology assessment (HTA) bodies prefer that HSUVs be measured indirectly by using generic, preference-based instruments, with the EQ-5D being the most popular instrument [9]. Across 25 European countries with pharmacoeconomic guidelines, only two countries prefer HSUVs to be measured directly [6].
Economic models used to assess the cost-effectiveness of new treatments need to capture health states experienced by patients with MDD throughout the course of the disease. Health states used in cost-utility models can include different severities of depression (i.e., mild, moderate, and severe), different levels of treatment response (i.e., remission, response, and no response or refractory), a return to normal health (i.e., recovery), and disease progression (i.e., relapse and recurrence). Health states can be defined by thresholds in clinical scores such as HAMD and MADRS. Transition probabilities calculated using efficacy data from clinical trials and published, long-term outcome data are used to predict the movement of patients between the modelled health states over time. Time spent in each health state is multiplied with within an economic model. Future studies are recommended to elicit HSUVs for new treatments and their side effects and add to the existing evidence where data are lacking.
Keywords: Major depressive disorder, Health state utility values, Disutilities, Systematic review the corresponding HSUV to calculate QALYs of patients receiving each treatment being assessed.
The aim of this systematic literature review was to identify published HSUVs and disutility values for treatment-related adverse events that can be used to populate future economic models of pharmacological treatments for adult patients with MDD. Furthermore, we set out to highlight gaps in the evidence base and discuss considerations for cost-utility analyses informing reimbursement decisions.

Methods
A systematic literature review was performed, using a prespecified protocol, to identify utility studies for adult patients with MDD receiving pharmacological treatment, including those on adjunctive treatment. Literature searches were conducted in PubMed, Embase, Econlit, and Cochrane databases from January 1, 1998, to December 21, 2018. Literature search strategies were designed using Medical Subject Headings (MeSH) and free-text terms (see Additional files 1-4). References of identified systematic literature reviews and cost-effectiveness analyses were searched to identify primary utility studies. Additionally, the websites of health technology agencies and relevant conferences were searched. These included the National Institute for Health and Care Excellence (NICE); the Scottish Medical Consortium; the Canadian Agency for Drugs and Technologies in Health; the National Institute for Health Research, Health Technology Assessment Database; the International Society for Pharmacoeconomics and Outcomes Research (ISPOR); the Tufts Cost-Effectiveness Analysis Registry; and the American Psychiatric Association. Conference abstracts from ISPOR meetings were indexed in Embase at the time of the searches, so separate hand searches were not performed for this conference.
For inclusion, studies were required to be conducted in adults (aged ≥ 18 years) with MDD receiving pharmacological treatment, published in English, and to report utility or disutility estimates. Excluded from the review were studies that included children (aged < 18 years), studies in which patients received only non-pharmacological interventions, studies that reported quality-of-life data only, and conference abstracts published before 2016.
Screening was performed by one researcher, with a random 20% quality check performed by a second researcher, using the predefined inclusion and exclusion criteria (see Additional file 5). Screening was conducted in two stages; at level 1, titles and abstracts were screened for eligibility, and at level 2, full-text articles of those included at level 1 were obtained and screened to confirm eligibility. If an agreement could not be reached on the eligibility of a study, a third researcher was consulted to reach consensus on the eligibility of the study. One researcher extracted data from the eligible studies included in the review. A second researcher performed a quality check of all extracted data back to the original source.

Results
A total of 441 unique records were identified in the literature searches after the removal of duplicates. After the initial screening of titles and abstracts, 93 articles were progressed to full-text review. Of those, 77 articles met the predefined inclusion criteria. An additional two articles were identified through supplemental searches, resulting in a total of 79 articles meeting the predefined inclusion criteria. A total of 28 articles reporting primary utility data for MDD treated with pharmacological interventions were included as the focus of this manuscript, and 51 articles that did not report primary utility data were excluded from this manuscript. The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) diagram [10] presents the study selection process (Fig. 1). Table 1 presents information about the population, study type and recruitment, and utility data reported in the 28 articles. Articles reporting separate analyses of the same study are grouped together, and studies are organized by geographic region and country. Eleven of the included articles reported analyses from four unique studies in Asia. Of these, seven articles report analyses from a prospective observational study that recruited patients from six East Asian countries [11][12][13][14][15][16][17], with utility values first published by Lee et al. [14]. Two articles reported analyses from a prospective observational study in Japan [18,19], with utility values first published by Kuga et al. [19]. Kim et al. [20] and Husain et al. [21] reported results from a cross-sectional study in South Korea and a randomized controlled trial (RCT) in Pakistan, respectively.
Ten of the included articles reported analyses from eight unique studies in Europe. Of these, two articles reported analyses from an RCT that recruited patients from 14 countries [22,23], with utility values first published by Montgomery et al. [22]. Garcia-Cebrian et al. [24] reported results from a prospective observational study that recruited patients from 12 countries, and Reed et al. [25] reported a subsequent analysis of this study. Of the remaining six studies in Europe, three are economic evaluations alongside clinical trials [26][27][28], one is an RCT [29], and two are prospective observational studies [30,31]. Of these, the studies reported by Kuyken et al. [27], Serfaty et al. [29], and Morriss et al. [28] were conducted in the United Kingdom (UK), the study reported by Sapin et al. [30] was in France, the study reported by Fernandez et al. [26] was in eight European countries, and the study reported by Saragoussi et al. [31] was in five European countries.
Two of the included articles reported unique studies in the Americas. Soares et al. [32] reported results from an RCT conducted in five countries across North and South America. Revicki and Wood [33] reported results from a prospective observational study conducted in Canada and the United States (US).  Baseline EQ-5D-3L a utility scores and change from baseline at 6 months were reported, stratified by the presence of PPS Chen et al. [12], Taiwan subanalysis Subgroup analysis of 194 Taiwanese patients from study conducted in Lee et al. [14] Baseline EQ-5D-3L a utility scores and change from baseline at 3 months were reported, stratified by the presence of PPS and prescribed intervention (SNRI or SSRI) Li et al. [15], China subanalysis Subgroup analysis of 299 Chinese patients from study conducted in Lee et al. [14] EQ-VAS scores were reported at baseline and at 3 months for the overall population, stratified by the presence of PPS and prescribed intervention (SSRI or SNRI) Novick et al. [17], multinational (China, Hong Kong, Malaysia, Singapore, South Korea, and Taiwan) Subgroup analysis of 426 patients who started ADT at the baseline visit and had information on adherence during the follow-up period in the study conducted in Lee et al. [14] EQ-5D-3L a utility scores were reported at baseline and 3 months for patients with clinically reported adherence or nonadherence to ADT Novick et al. [16], China subanalysis Subgroup analysis of 300 Chinese patients from study conducted in Lee et al. [14] EQ-5D-3L a utility and EQ-VAS scores were reported at baseline and at 3 months for the overall population; scores stratified by the presence of PPS were reported at 3 months Baseline EQ-5D b utility scores and change from baseline at 2, 4, 6, and 12 weeks were reported, stratified by the prescribed intervention (duloxetine or SSRIs)

Studies linked to primary publication reported by
Kuga et al. [18], Japan Subgroup analyses of study conducted in Kuga et al. [19] EQ-5D b utility scores stratified by the prescribed intervention (duloxetine or SSRIs) were reported at week 12 for the following subgroups (at baseline): number of MDEs, BPI-SF average pain score, and HAMD-17 total score  The use of EQ-5D-3L or EQ-5D-5L was not stated and nor could it be deduced based on the date of the study or references Brockbank et al. Health Qual Life Outcomes (2021) 19:94 Five of the included articles reported analyses of two unique studies in more than one region. Duenas et al. [34] first reported results from a prospective observational study that recruited patients from 12 countries across Asia, Europe, and the Americas. Three of the articles reported subsequent analyses of this study [35][36][37]. Florea et al. [38] reported results from an across-study comparison of six multinational clinical trials of vortioxetine.
Overall, 16 unique studies were reported in the 28 included articles. Twenty-two of the articles (13 unique studies) reported utility estimates elicited indirectly; all of these used the EQ-5D. Of these, 19 articles (11 unique studies) used the EQ-5D-3L, and use of the EQ-5D-3L or EQ-5D-5L could not be determined in three articles (2 unique studies). Thirteen articles (6 unique studies) [11, 12, 15-17, 21-25, 35-37] reported utility estimates elicited directly, of which 12 used a VAS (EQ-VAS) and one used the SG technique [33]. Seven articles (3 unique studies) reported utility estimates elicited using both the EQ-5D and EQ-VAS. Table 2 presents a summary of the HSUVs reported in the included studies. Information about the health state and clinical features, MDEs and prior therapy of the population, and the interventions are presented for each HSUV, along with the instrument used. Reported utility estimates where a specific health state could not be ascertained (for example, aggregated utility estimates for a study population at a time point where patients had different levels of treatment response) and indirect instrument scores where utility tariffs had not been applied (for example, EQ-5D domain scores) were excluded. Six articles did not report relevant HSUVs and were excluded from Table 2 [19,32,[34][35][36]38]. Reed et al. [25] reported the same HSUV as that in the primary analysis by Garcia-Cebrian et al. [24]. A quality assessment of the studies reporting relevant HSUVs (using criteria from Papaioannou et al. [39]) is presented in Additional file 6.
Most of the articles included in Table 2 reported HSUVs at baseline, with the MDD health state defined by the study population entry criteria. Many of the articles reported utility estimates at subsequent time points where a specific health state could not be ascertained. Considerable heterogeneity in patient characteristics was found between the studies, and a wide range of utility values were reported. The baseline HSUVs can be differentiated based on key features of the study populations, including severity of MDD, current MDE status (i.e., presenting with a first or new episode, or within an existing episode), lines of prior therapy, and presence of comorbidities [analyses of the studies first reported by Lee et al. [14] and Duenas et al. [34] stratified baseline estimates by the presence of painful physical symptoms (PPS)]. Treatments under investigation were also heterogenous; some articles specified which treatments were investigated, while others stated that investigations included only SSRIs or treatment as usual, which made comparison between the studies difficult. Several articles reported HSUVs at baseline for patients presenting with a first or recurrent MDE who were about to start a new treatment. The utility estimates ranged from 0.33 [30] to 0.544 [17] at baseline. The range widened when estimates stratified by the presence of PPS were included; the lowest estimate for patients with PPS was 0.20 [14], and the highest estimate for patients without PPS was 0.61 [13]. Several articles reported HSUVs at baseline for patients with an existing MDE who inadequately responded to treatment and were about to switch therapy, with utility estimates ranging from 0.337 [28] to 0.449 [23].
Three studies reported utility estimates for health states defined by specific clinical thresholds [20,30,33]. Kim et al. [20] reported EQ-5D HSUVs for South Korean patients receiving ADT during the usual course of care stratified by disease severity defined by MADRS score thresholds (very severe, severe, moderate, mild, or remission). The HSUVs increased progressively through disease severities, from 0.615 for patients with very severe MDD (MADRS score: 35-60) to 0.806 for patients with mild MDD or remission (MADRS score: 0-25). Sapin et al. [30] reported EQ-5D HSUVs for French patients who had received first-line ADT for 8 weeks, stratified by treatment response (responder remitters, responder nonremitters, nonresponders). Remission was defined by a MADRS score threshold, whereas responder nonremitter and no response were defined by thresholds for percentage change in MADRS score. This was the only study to use thresholds based on percentage changes in clinical scores and to report HSUVs at a specific time point after study entry. The HSUVs increased from 0.33 for patients with MDD at baseline before treatment to 0.58 for patients with no response (< 50% decrease from baseline in MADRS score), 0.72 for patients with nonremitting response (≥ 50% decrease from baseline in MADRS score), and 0.85 for patients in remission (MADRS score ≤ 12). Revicki and Wood [33] reported directly elicited HSUVs for Canadian and US patients who were receiving or who had recently completed an ADT regimen (nefazodone, fluoxetine, or imipramine) within the last 2 months prior to study entry, stratified by disease severity (severe, moderate, mild, remission on or off treatment) defined by HAMD score thresholds (the thresholds were not reported). The HSUVs were stratified by treatment received and increased progressively through disease severities, from 0.30 for patients with severe MDD (untreated) to 0.86 for patients in remission (off treatment). Kuyken et al. [27] reported EQ-5D HSUVs for UK patients with recurrent MDD with three Lee et al. [14], multinational Husain et al. [21], Pakistan    EQ-5D-3L was not explicitly stated in the study, but was deduced, either from the date of the study or from the date of references to EQ-5D methodology (the EQ-5D-5L was introduced after 2009) c The use of EQ-5D-3L or EQ-5D-5L was not stated, nor could it be deduced based on the date of the study or references or more previous MDEs in full or partial remission. However, remission was defined using the DSM-IV at study entry rather than by a clinical measure used within the trial. Revicki and Wood [33] was the only study to report disutility estimates associated with treatment side effects (Table 3). Patients had been treated previously with fluoxetine, imipramine, nefazodone, or a combination of treatments; however, the study did not report adverse events by the different treatments. Disutilities were reported for several key adverse events associated with the ADTs, but the study was published in 1998 and may not represent current practice. Disutilities were calculated as the difference between mean SG utilities elicited directly from patients with and without specific adverse events. The mean differences ranged from 0.01 for nausea or dry mouth to 0.12 for nervousness and light-headedness/dizziness (Table 3), with the latter being the only statistically significant difference (P = 0.030).

Discussion
Health state utility values in MDD are required for use in cost-utility analyses for new treatments. Health states experienced by patients with MDD include different severities of depression (i.e., mild, moderate, and severe), different levels of treatment response (i.e., remission, response, and no response), a return to normal health (i.e., recovery), and disease progression (i.e., relapse and recurrence). This systematic literature review identified 79 articles reporting utility estimates for patients with MDD receiving pharmacological treatment, of which 28 reported primary utility data across a range of health states that can be used in economic models.

HSUVs
Overall, a range of HSUVs were identified that can be used as parameters in a cost-utility model. However, the values were predominantly captured at study baseline, with health states defined by study entry criteria rather than specific clinical thresholds. Many of the studies, particularly RCTs, did capture utility estimates at other time points, but specific health states could not be determined. Such estimates may be suitable to include in simple economic analyses mirroring clinical trials but not for models with distinct health states requiring HSUVs. Three studies did report HSUVs for different depression severity levels that could be used for models with health states defined by corresponding clinical thresholds. Only one study (Sapin et al. [30]) reported HSUVs for a response health state, which was defined by a percentage change in clinical score. While this definition is often used in clinical practice, response could be defined by using a specific clinical threshold within an economic model that allows use of alternative HSUVs based on depression severity. Similarly, no HSUVs were identified specifically for a relapse health state, but relapse could be defined by using a specific clinical threshold within an economic model. For a recovery health state (i.e., a return to normal health), general population utility values could be used. While HSUVs are available to populate a model, the evidence base is limited, and it is important to select values that align with clinical thresholds used within a model.
The pharmacological treatments that were captured by the systematic literature review included SSRIs (including escitalopram and fluoxetine), SNRIs (including duloxetine and venlafaxine), TCAs, selective serotonin receptor antagonists, and serotonin antagonist and reuptake inhibitors. Some of the studies identified did not specify the treatment, instead listing SSRIs, SNRIs, or physician's choice. There is a data gap of utility estimates available for augmentation agents used alongside ADT, such as lithium and atypical antipsychotics. Future studies could be performed to elicit HSUVs for augmentation agents commonly used for treatment-resistant MDD. This would allow the impact of adverse events associated with augmentation agents on HRQoL to be more easily evaluated. If treatment-independent HSUVs are used in a cost-utility model, it is important to select values from studies with a population that corresponds to that of the intervention being evaluated in an economic analysis.

Disutility estimates
Revicki and Wood [33] was the only study to report disutility estimates for treatment-related adverse events.
Future studies could be conducted to elicit disutilities for a comprehensive set of adverse events associated with current MDD treatments. It is important that the impact of HRQoL in models that do not use treatmentspecific HSUVs be accurately captured, particularly when comparisons are made between treatments with similar efficacy that can be differentiated by their side-effect profiles.

Suitability for HTA
Utility estimates elicited indirectly using generic, preference-based instruments are preferred by most HTA bodies, and the EQ-5D is often specifically recommended in pharmacoeconomic guidelines [6]. The majority of the studies (13) reported utility estimates elicited indirectly using the EQ-5D. Of these, 11 studies used the EQ-5D-3L; use of the three-level or five-level version could not be determined in 2 studies. Seven articles stated the tariff used for valuation; all used the UK tariff. Three studies reported utility estimates elicited using both the EQ-5D and EQ-VAS [16,17,37]; of these, the EQ-5D utilities are preferred by most HTA bodies [6]. In total, six studies reported utility estimates elicited directly from patients; five of these used the EQ-VAS [11,12,15,22,23], and one used the SG technique [33]. Utility estimates elicited directly using choice-based tasks such as SG are generally considered methodologically superior to those elicited using rating tasks such as a VAS because they incorporate additional information about individual risk attitude. Moreover, rating tasks are prone to scaling biases [7]; as such, the identified utility estimates elicited using EQ-VAS may be less preferable to HTA bodies than the other directly elicited estimates using SG and indirectly elicited EQ-5D estimates. The relevance of studies should be assessed in accordance with local HTA requirements before data are used in a cost-utility model.
Collection of EQ-5D data is recommended in future trials for new treatments that will undergo reimbursement submissions to HTA bodies. Moreover, it is important to elicit HSUVs for the core health states that will be used within a cost-utility model, such as response, remission, relapse, and no response, using appropriate clinical thresholds. Additional utility studies could be performed to elicit utility estimates for health states and adverse events needed within a model, for which there is a paucity of data.

Study limitations
The primary aim of the study was to identify utility estimates that can be used to populate future economic models for new pharmacological treatments in MDD. Therefore, the literature review focused on studies of patients with MDD receiving pharmacological treatment. However, nonpharmacological interventions such as cognitive-behavioral therapy and transcranial magnetic stimulation are also used in the treatment of MDD, particularly for patients with more severe or treatment-resistant MDD. Studies in patients with MDD receiving nonpharmacological treatments without pharmacological treatment were outside the scope of this review. Additionally, screening of articles was conducted by a single researcher with a random 20% quality check of studies performed by a second researcher, which means that there is a small chance that relevant studies were missed.

Conclusions
This study systematically identified published HSUVs and disutilities for patients with MDD receiving pharmacological treatment that can be used as parameters within future economic evaluations. Health state utility values, elicited using methods accepted by HTA bodies, are available for key MDD health states defined by clinical thresholds. However, there is a limited evidence base from studies with heterogenous populations and clinical definitions. It is important to select HSUVs that are appropriate for the intervention being evaluated and that align with clinical health state definitions used within a model. Only one study reported disutilities associated with adverse events. It is recommended to elicit HSUVs in clinical trials for new treatments that may undergo reimbursement submissions to HTA bodies and to conduct utility studies where data gaps exist.