 Research
 Open Access
 Published:
Mapping analysis to estimate EQ5D utility values using the COPD assessment test in Korea
Health and Quality of Life Outcomesvolume 17, Article number: 97 (2019)
Abstract
Background
There is no research on mapping algorithms between EQ5D and COPD assessment test (CAT) in Korea. The purpose of this study was to develop mapping algorithms that predict EQ5D3 L utility from the CAT in patients with COPD.
Methods
Survey data of 300 COPD patients were collected from three tertiary teaching hospitals in Korea. To predict EQ5D3 L utility from the CAT, various models were assessed. Models were developed using randomly split training samples. Subsequently, the models were validated based on root mean square error (RMSE) and mean absolute error (MAE) in validation samples. The models were also validated using the bootstrap method, which involves iterative splitting, training, and validating of the sample data at least 10,000 times. Average RMSEs and MAEs were used as criteria for model selection.
Results
The recommended mapping algorithms were based on ordinary least squares (OLS) regression models, which revealed five CAT items (chest tightness, breathlessness, activity, leaving home, and energy) as statistically significant on the EQ5D3 L. The mapping models estimated the overall mean of EQ5D3 L utilities effectively, but EQ5D3 L utilities for severe (low utility) patients (< 0.6) were overestimated as the observed EQ5D3 L utilities were often distributed over 0.6.
Conclusion
Mapping algorithms can be used to predict EQ5D3 L utilities from the CAT. However, mapping algorithms should be used cautiously when applied to groups with greater disease severity.
Introduction
Chronic obstructive pulmonary disease (COPD) has a high global prevalence and mortality, and its socioeconomic burden continues to rise. In 2007 − 2012, the COPD prevalence in Korea was 13.1% in people aged over 40 years. Currently, 20.5% of men and 7.3% of women suffer from COPD [1]. The symptoms of COPD include bronchial closure and shortness of breath, which are known to reduce the mobility of patients, to negatively impact mental health, and, as a consequence, to reduce quality of life (QoL) [2]. In a study comparing the quality of life of patients with chronic diseases, COPD patients scored the lowest [3].
The COPD assessment test (CAT) was designed to provide a simple and reliable measure of health status in COPD patients. The CAT questionnaire consists of eight simple items about coughing, phlegm, chest tightness, breathlessness, home activities, leaving home, sleep problems, and energy. Each item is scored on a scale of 0–5, such that the total CAT score will fall within a range from 0 to 40. A score of 0 represents the best possible health status, and higher scores indicate worsening health. The St. George’s Respiratory Questionnaire (SGRQ) shares several similarities with the CAT, but it is a more exhaustive screening assessment because it contains 50 questions [4]. Recently, the COPD guidelines recommended using the CAT to assess COPD patients. Hence, the use of the CAT is being expanded in many clinical practices and studies [5].
EQ5D3 L (developed by the EuroQoL Group) is a measure of health status comprising five dimensions. Each dimension has three levels: no problems (1), some problems (2), and extreme problems (3). EQ5D3 L can indicate 243 health status combinations, from 11,111 (perfect health) to 33,333 (worst health). Preferencebased utilities can be obtained from the EQ5D3 L health status, allowing calculations of the qualityadjusted life year (QALY) for cost–utility analyses. Valuation studies for the 243 EQ5D3 L health status indicators were launched in the U in 1995; studies targeting Koreans began in 2006 [6, 7].
The CAT does not show the respondents’ quantitative utilities or preferences because it measures health status related to lung disease. However, measures of quantitative utility are required for economic evaluations, such as cost–utility analysis (CUA). In cases where only CAT data is available for CUA, a mapping algorithm that can estimate quantitative utility measures from the CAT, such as the EQ5D utility index, can be useful. Previous mapping studies to estimate EQ5D3 L utilities for COPD patients have developed algorithms from the SGRQ [8,9,10], the clinical COPD questionnaire (CCQ) [11], and the CAT [12].
In this report, we describe the first mapping study of Korean patients with COPD. Further, this study used the CAT and EQ5D3 L data of 299 patients to develop mapping algorithms. In Korea, an economic evaluation is required to approve new drug reimbursements and guidelines recommend using a Koreanbased utility index. Therefore, this mapping algorithm may be useful for economic evaluations of Korean COPD patients and similar patient populations.
Methods
Data
A total of 300 COPD patients from three tertiary teaching hospital were surveyed from July–December, 2014. All three hospitals are located in Seoul, the metropolitan city, and therefore most of the patients are urban residents.
The survey was approved by the Institutional Review Board (IRB) of each institute. Patients were eligible for the study if they fulfilled the following criteria: 1) were over the age of 40, 2) had been diagnosed with COPD before 2013, 3) had continuously received outpatient care since January 2013, 4) had an FEV1 (forced expiratory volume in 1 s)/FVC (forced vital capacity) ratio of less than 0.7 immediately after using bronchodilators, and 5) had more than 10 packyears of smoking history. Patients were excluded if they suffered an acute exacerbation in the six weeks prior to the survey or a cardiovascular event (myocardial infarction or arrhythmias) in the three months prior to the survey.
The survey was conducted by nurses from the respective hospitals using the Korean version of the CAT and EQ5D3 L questionnaire. The nurses explained the questionnaires and conducted inperson interviews with the patients. After the interview, the patients’ characteristics (sex, age, duration of COPD, lung function measurement, complication, prescription drugs, resource usage, etc.) were recorded by reviewing their medical records. One of the 300 patients withdrew their consent, so the results from 299 patients were included in this study. For the CAT instrument used in this survey, a Korean version had been evaluated for validity by Lee et al. [13], Hwang et al. [14]. Both evaluations concluded that the Korean version of CAT had good internal consistency and could be used to assess the impacts of COPD on patient health.
Table 1 shows the descriptive statistics of patient characteristics that contains sex, age, BMI, smoking history and FEV1% predicted. The mean age of the patients was 69.2 years, and greater than 74% of the study population consisted of patients over 65 years. The proportion of males was 86.3%. The mean BMI was 22.85, and 8.7% of the patients reported a BMI of less than 18.5 (underweight). Mean smoking history was 36.9 packyears, and 85% of the patients were current or exsmokers. The mean duration of COPD was 5.4 years. The FEV1/FVC ratio represents the proportion of a person’s vital capacity that they are able to expire in the first second of forced expiration (FEV1) to the full, forced vital capacity (FVC). The result of this ratio is expressed as FEV1%. FEV1% predicted is defined as FEV1% of the patient divided by the average FEV1% in the population for any person of similar age, sex, and body composition. Less FEV1% predicted value means more severe condition. Severity is divided into four groups based on the FEV1% predicted value. The mean predicted FEV1% was 55.8%, and approximately 90% of patients had moderate or severe COPD (Table 1).
The results of the EQ5D3 L and CAT questionnaire survey of 299 people are presented in Table 2. The most frequent response for every EQ5D3 L item was 1 (42.1–72.6%), and very few respondents (0.7–2.3%) selected option 3. Eightytwo respondents (27.4%) chose option 1 for all five items. Subsequently, the EQ5D3 L utilities were calculated using the method for Korean population developed by Lee et al. [7].
EQ5D3 L utility = 1–0.050  0.096 M2–0.418 M3–0.046SC2–0.136SC3–0.051UA20.208UA3–0.037PD2–0.151PD3–0.043 AD2–0.043 AD3–0.050 N3 (M2, mobility level 2; M3, mobility level 3; SC2, selfcare level 2; SC3, selfcare level 3; UA2, usual activities level 2; UA3, usual activities level 3; PD2, pain or discomfort level 2; PD3, pain or discomfort level 3; AD2, anxiety or depression level 2; AD3, anxiety or depression level 3; N3, any dimension on level 3)
The mean of utility scores was 0.83 (SD = 0.15). The mean of total CAT scores was 16.38, with scores ranging from 0 to 38. Among the eight items, the fourth item (breathlessness) was calculated as the highest (severe) average score at 3.23, and the lowest average score was reported for the first item (cough) at 1.68. There was no missing value in the variables used for this study.
Model development
The mapping models were developed using the EQ5D3 L utilities as the dependent variables and either the total CAT score or eight scores of each CAT item as the explanatory variables in the following formulas:
Models 1 and 2 used the total CAT score, age, and sex as explanatory variables. In contrast, Model 3 used eight scores of each CAT item instead of the total CAT score. Backward stepwise selection of explanatory variables was used with significance defined as α = 0.05. The following estimation methods were used: ordinary least squares (OLS), generalized linear models (GLM), Tobit models, and beta regression. Because EQ5D3 L utilities have skewed and censored values, we used and compared GLM, Tobit and beta regression as well as OLS. The probability distributions and link function of GLM that we investigated are Gaussianlog, Poissonlog, gammainverse, quasiidentity.
A twopart model was also considered as an alternative estimation method to analyze skewed data [8]. In this study, a large proportion (27%) of observed EQ5D3 L utilities had a value of 1, which indicated perfect health status. The first part of the twopart model, logistic regression, would determine the probability of having a perfect health status. The second part would use previous OLS, GLM, Tobit and beta regression estimations to predict EQ5D3 L utilities. The EQ5D3 L utility score was calculated using the following equation:
P(perfect health) is the probability of perfect health obtained from logistic regression. The PredictedEQ5D3 LUtility value is derived from previous OLS, GLM, Tobit and beta regression estimations.
All statistical analyses were conducted using R (ver 3.3.3; R Foundation for Statistical Computing).
The R code used for this analysis is provided as Additional file 2.
Model validation
The datasets of 299 COPD patients were randomly split into a training set of 150 patients (50%) and a validation set of 149 patients (50%). The training dataset was used to develop the models. The validation set was used to validate the models through calculations and comparisons of root mean square errors (RMSE) and mean absolute errors (MAE).
The bootstrap method was used to generate a robust estimate of RMSE and MAE in the limited sample size. The previously described framework, which consists of random splitting, training and validation, was iterated 10,000 times to collect 10,000 RMSEs and MAEs. The means of the collected RMSEs and MAEs were used as criteria for model selection. As such, the model with the lowest RMSE or MAE was selected as the most suitable method.
Results
Model development
Table 3 presents the results of mapping models using OLS and Gaussian loglink GLM. All RMSEs and the MAEs values represent the means of 10,000 bootstrap values. Among the four types (Gaussianlog, Poissonlog, gammainverse, and quasiidentity) of GLM results, the Gaussianlog model generated the best (lowest RMSE and MAE) values. Both the Tobit and beta regression reported higher RMSE and MAE than OLS or GLM. Additionally, the twopart models showed worse performance (higher RMSE and MAE) than the corresponding single equation models. The results of Tobit and beta regression that are not presented in Table 3 are provided as Additional file 1.
Among the total CAT score models (Models 1 and 2), OLS1 resulted in the lowest RMSE and GLM1 resulted in the lowest MAE. Model 2, which included the square of the total CAT score as an explanatory variable, performed worse than Model 1, which only included the total CAT score. As for the selected items model (Model 3), OLS3 resulted in lower RMSE and GLM3 resulted in lower MAE. Due to the simplicity of use as well as the accuracy of the estimation, we would recommend OLS1 and OLS3 models rather than GLMs. Of which, the selected CAT items model (OLS3) provided more accurate estimates than the total CAT score model (OLS1). However, the OLS1 model must be used to estimate EQ5D3 L utilities when the total CAT score is the only known value.
Recommended models
Table 4 presents the recommended mapping models—OLS1 and OLS3 models. To create a bestfit mapping algorithm for EQ5D3 L utility predictions, OLS equations were estimated using the full 299 patient dataset. Both models included the age variable as a negative coefficient value. The sex variable was not statistically significant and was excluded. In the selected items model, the third, fourth, fifth, sixth, and eighth CAT items had significantly negative effects on the EQ5D3 L utilities, whereas the other items did not. The significant effects of the third, fifth, sixth and eighth CAT items were consistent with results from a previous mapping study by Hoyle et al. [12]. In this study, one additional item (breathlessness) was included in the model. The equations of the two recommended mapping models are as follows.
Five items whose responses had significant effects on the EQ5D3 L utility scores
Q3: My chest does not feel tight at all (very tight) Q4: When I walk up a hill or one flight of stairs I am not breathless (very breathless) Q5: I am not limited doing any activities at home (very limited) Q6: I am confident leaving my home despite my lung condition (not at all confident) Q8: I have lots of energy (no energy at all) 
Scatter plots of predicted and observed EQ5D3 L utilities are shown in Fig. 1. As illustrated by the scatter plots, the mapping model overestimated EQ5D3 L utilities for severe status patients (observed EQ5D3 L utilities < 0.6). The range of observed EQ5D3 L utilities (0.095 − 1) differed from the range of predicted EQ5D3 L utilities (0.6 − 1).
Disease severity
Table 5 presents observed and predicted EQ5D3 L utilities categorized by COPD severity. Both OLS1 and OLS3 produced similar utilities to EQ5D3 L for moderate and severe health status; however, the utility was underestimated for mild COPD and overestimated for very severe COPD.
Discussion
This study developed mapping algorithms to predict EQ5D3 L utility using CAT responses from 299 Korean COPD patients’ survey data. Unlike previous mapping studies that used data collected from two or more preceding randomized clinical trial studies, this study used survey data conducted at three tertiary teaching hospital. Under the Korean healthcare system, patients are not obligated to visit a general practitioner to get a referral and are free to visit any hospital. Under this circumstances, patients without serious symptoms are allowed to visit tertiary hospitals. For this reason, symptoms of COPD can vary in severity, even if the data is collected from a tertiary hospital.
The mapping model can estimate the effects of lung health on QoL, and it can be used for economic evaluations that require a quantitative utility measure. RMSE and MAE results, which serve as indicators of algorithm performances, were comparable to previous mapping studies in COPD [8,9,10,11,12]. To identify appropriate models, we investigated a wide range of mapping algorithms, including OLS, GLM, Tobit, beta regression, and two part models. In addition, we used the bootstrap method to generate robust estimates of RMSE and MAE in the limited sample size of 299 patients. Based on the results, OLS regression models are the recommended mapping algorithms as more complex models failed to improve results.
The recommended OLS models estimated the overall mean of EQ5D3 L utilities accurately. However, the recommended and the other mapping algorithms used in this study overestimated EQ5D3 L utilities for low utility patients (< 0.6). The predicted EQ5D3 L utilities have a floor effect of 0.6. The cause of this floor effect could be from the conceptual overlap and differences between CAT and EQ5D. The three CAT items about cough (Q1), phlegm (Q2) and sleep (Q7) were little correlated with each five EQ5D3 L dimension, conceptually and in practice. And, the pain/discomfort dimension of EQ5D3 L were little correlated with each eight CAT item. These differences between CAT and EQ5D means a limitation of predicting EQ5D using CAT. Additional causes of the floor effect could be (i) the lack of women in the sample (Table 1) and (ii) the lack of severe patients in the sample that only 11 patients in the sample had utilities below 0.6 (Fig. 1). This overestimation for patients with severe health states has been reported in previous mapping studies; as such, this is recognized as a general problem with mapping studies [8,9,10,11,12]. Therefore, when using the mapping models to predict EQ5D3 L utilities by severity, these predictions are likely to be biased.
This study is the first mapping study of Korean patients with COPD. We conducted iterated estimation using the bootstrap method. The general procedure for model development consisted of sample splitting, training, and validating. The validation step is a weak point because the model performance ranking based on goodnessoffit can change if the split sample changes. Therefore, selecting a model based on a single result has a probability of flawed model selection (as the model will be wellfitted to the specific validation set only, but not wellfitted generally). The iterated estimation process using the bootstrap method limits the impact of sample split and reduces the probability of poor model selection. From the bootstrap method, we ranked the mapping models based on their mean RMSEs and MAEs. These rankings were the same as rankings based on RMSE and MAE estimations using the full data set. From this finding, we questioned whether the sample split is necessary in the model development process.
It is known that EQ5D3 L utility is not distributed as a Gaussian distribution, for which OLS is suitable. Many previous mapping studies considered other algorithms as more suitable alternatives for the censored distribution of EQ5D3 L utility. However, OLSbased algorithms were recommended by most (80%) of the previous mapping studies as well as this study [15]. Because the OLS estimator minimizes the sum of squared error (which minimizes RMSE), OLS shows the lowest RMSE and would be selected as the best model when RMSE is used as a criterion. Considering other algorithms, other model selection criteria are needed, not just RMSE.
The study had some limitations, including a relatively small sample size (299 patients), inclusion of few severe status patients, and access to only one survey dataset (which could not conduct external validation). This skewed distributed sample might have introduced bias in the severe patient group estimation. Despite these limitations, the mapping algorithms in this study can be used to predict the mean level of EQ5D3 L utility in similar COPD patient groups.
Conclusions
In this study, mapping algorithms were capable of predicting EQ5D3 L utility from the CAT of this population and comparable patient populations. The algorithms predicted overall, yet not in severe groups, mean EQ5D3 L utilities accurately, but these predictions in very severe groups were likely to be biased. Therefore, mapping algorithms should be used cautiously when calculated for severe disease groups.
Abbreviations
 CAT:

COPD assessment test
 CCQ:

Clinical COPD questionnaire
 COPD:

Chronic obstructive pulmonary disease
 CUA:

Cost utility analysis
 EQ5D:

EuroQol5 dimension
 FEV1:

First second of forced expiration
 FVC:

Full forced vital capacity
 GLM:

Generalized linear model
 MAE:

Mean absolute error
 OLS:

Ordinary least squares
 QALY:

Quality adjusted life year
 RMSE:

Root mean square error
 SGRQ:

St. George’s Respiratory Questionnaire
References
 1.
Park HJ, Lee AY, Lee SH, et al. Comorbidities in obstructive lung disease in Korea: data from the fourth and fifth Korean National Health and nutrition examination survey. Int. J. Chron. Obstruct. Pulmon. Dis. 2015;10:1571–82.
 2.
Kang KS, Na SO, Yu YB, Shin JH. The quality of life in COPD patients according to gender: based on the 4th Korea National Health and nutrition examination survey. J Korean Acad Community Health Nurs. 2015;26:61–8.
 3.
Stavem K, Lossius MI, Kvien TK, Guldvog B. The healthrelated quality of life of patients with epilepsy compared with angina pectoris, rheumatoid arthritis, asthma and chronic obstructive pulmonary disease. Qual Life Res. 2000;9:865–71.
 4.
Jones PW, Harding G, Berry P, Wiklund I, Chen WH, Kline Leidy N. Development and first validation of the COPD assessment test. Eur Respir J. 2009;34:648–54.
 5.
The Korean Academy of Tuberculosis and Respiratory Disease & Clinical Research Center for Chronic Obstructive Airway Disease Research Center. COPD Guidelines Revision Committee 2014. (in Korean).
 6.
Kang EJ, Shin HS, Park HJ, et al. A valuation of health status using EQ5D. Korean J Health Econ Policy. 2006;12:19–43 in Korean.
 7.
Lee YK, Nam SH, Chuang LH, et al. South Korean time tradeoff values for EQ5D health states: modeling with observed values for 101 health states. Value Health. 2009;12:1187–93.
 8.
Starkie HJ, Briggs AH, Chambers MG, Jones P. Predicting EQ5D values using the SGRQ. Value Health. 2011;14:354–60.
 9.
Stahl E, Lindberg A, Jansson SA, Ronmark E, Svensson K, Andersson F, et al. Healthrelated quality of life is related to COPD disease severity. Health Qual Life Outcomes. 2005;3:56.
 10.
Oba Y. Costeffectiveness of longacting bronchodilators for chronic obstructive pulmonary disease. Mayo Clin Proc. 2007;82:575–82.
 11.
Boland MR, van Boven JF, Kocks JW, van der Molen T, Goossens LM, Chavannes NH, Ruttenvan Mölken MP. Mapping the clinical chronic obstructive pulmonary disease questionnaire onto generic preferencebased EQ5D values. Value Health. 2015;18:299–307.
 12.
Hoyle CK, Tabberer M, Brooks J. Mapping the COPD assessment test onto EQ5D. Value Health. 2016;19:469–77.
 13.
Lee SH, Lee JS, Song JW, et al. Validation of the Korean version of chronic obstructive pulmonary disease assessment test (CAT) and Dyspnea12 questionnaire. Tuberc. Respir. Dis. 2010;69:171–6 in Korean.
 14.
Hwang YI, Jung KS, Lim SY, et al. A validation study for the Korean version of chronic obstructive pulmonary disease assessment test (CAT). Tuberc. Respir. Dis. 2013;74:256–63.
 15.
Dakin H. Review of studies mapping from quality of life or clinical measures to EQ5D: an online database. Health Qual Life Outcomes. 2013;11:151.
Acknowledgements
Not applicable
Funding
This study was supported by Boehringer Ingelheim Korea Ltd. SEC had previously received research fund from Boehringer Ingelheim Korea Ltd. The funding body did not play a role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Availability of data and materials
Please contact author for data requests.
Author information
Affiliations
Contributions
All authors contributed to the conception and design of the study and the interpretation of the results. EAL and GSS collected and reviewed previous studies. JL and EB analyzed the data. JL and DK were involved in drafting the manuscript. SEC conceived of the study, and revised the manuscript to ensure its critically important content. All authors have read and approved the final manuscript.
Corresponding author
Correspondence to SangEun Choi.
Ethics declarations
Ethics approval and consent to participate
This study received ethics approval from following three IRB committee’s.
Seoul Saint Mary’s Hospital IRB Secretariat(ED14144)
Korea University Medical Center at Anam IRB committee(KUGH14146)
Korea University Medical Center at Guro IRB committee(KC14QIMI0470)
This study received a statement on consent to participate under the ‘Ethics, consent and permissions’ heading.
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interest.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional files
Additional file 1:
Supplement 1: Comparison of the models based on RMSE, MAE (full sample). Supplement 2. Comparison of the models based on the means of RMSEs, MAEs from 10,000 times iterated estimation. (R 22 kb)
Additional file 2:
Mapping EQ5D using the CAT. (R 23 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 EQ5D
 CAT
 COPD
 Bootstrap
 Korea