CHF-PROM: validation of a patient-reported outcome measure for patients with chronic heart failure

Tian, Jing; Xue, Jiangping; Hu, Xiaojuan; Han, Qinghua; Zhang, Yanbo

doi:10.1186/s12955-018-0874-2

Research
Open access
Published: 20 March 2018

CHF-PROM: validation of a patient-reported outcome measure for patients with chronic heart failure

Jing Tian¹,
Jiangping Xue²,
Xiaojuan Hu²,
Qinghua Han¹ &
…
Yanbo Zhang^2,3

Health and Quality of Life Outcomes volume 16, Article number: 51 (2018) Cite this article

5105 Accesses
13 Citations
9 Altmetric
Metrics details

Abstract

Background

Due to a lack of an appropriate disease-specific patient-reported outcome (PRO) instrument for chronic heart failure including its social support and treatment aspects in China, this study was performed to develop a patient-reported outcome measure (PROM) for patients with chronic heart failure and evaluate its reliability, validity, and feasibility.

Methods

According to the standard PROM guidelines established by the Food and Drug Administration, an item pool was formed by reviewing a large amount of relevant literature and interviewing patients with chronic heart failure about their main symptoms. Thus, the primary scale was created after adjusting the items and language with the help of patients and experts in the field. Next, 155 patients from 8 hospitals in different districts were recruited for a pilot survey using questionnaires containing these items. The patients’ responses were analyzed using the classical test theory and item response theory to select high-quality items and determine the subdomains of the scale. This was followed by a formal investigation in the same eight hospitals. In total, 360 patients and 100 healthy subjects were included to evaluate the reliability, validity, and feasibility of the items. Through this process, the final scale was established.

Results

The final scale comprised 12 subdomains with 57 items related to physical, psychological, social, and therapeutic areas. The data analysis results of the formal investigation showed that the PROM for chronic heart failure had good reliability, validity, and feasibility. Reliability was verified by Cronbach’s alpha coefficient, which was 0.913 for the total scale, 0.903 for the physical domain, 0.941 for the psychological domain, 0.827 for the social domain, and 0.839 for the therapeutic domain. The construct validity results met the relative criteria of confirmatory factor analysis. Discriminant validity was represented by score comparisons of nine subdomains. The response rate and the effective rate of return of the CHF-PROM were 98.94% and 98.92%, respectively.

Conclusions

The final scale coincides with the theoretical framework and better reflects the overall quality of life of patients with chronic heart failure. This scale can be used as a valid instrument to evaluate clinical treatment and clinical trials of chronic heart failure.

Background

Heart failure (HF) is a syndrome caused by a functional heart disorder. The heart is unable to meet the needs of the body at the normal pressure [1]. As a complex clinical syndrome, heart failure (HF) is the terminal phase of all systemic heart diseases by various causes. More than 26 million individuals have HF, and this number is increasing. By 2050, an estimated 20% people among those aged > 65 years will have developed HF [2]. HF has become an overwhelming threat to human health and social development. Based on the severity of disease, HF can be divided into acute HF (AHF) and chronic HF (CHF) [3].

CHF is the final stage of heart disease. It is a complex clinical syndrome characterized by dyspnea, edema, and fatigue [4]. Its treatment includes medical therapy, mechanical circulatory assistance, and cardiac transplantation [5]. Individual therapeutic strategies based on patients’ reported outcomes, which can reflect patients’ individual situations, has been proven effective for relieving the symptoms of CHF and improving patients’ quality of life (QoL). Compared with many other chronic diseases, CHF affects QoL more profoundly. QoL has become a major concern in modern medicine in recent years. However, clinical management and research have not taken CHF into consideration to a satisfactory degree [6]. Depression and social function disability have been shown to have a significant impact on QoL in patients with CHF [7]. Other factors affecting QoL include treatment compliance, satisfaction with treatment, and adverse effects of related treatments [8]. Additionally, decisions regarding therapy can change over time depending on the feelings of the patients and their families.

Patient-reported outcomes (PROs) are based on health-related quality of life (HRQoL). HRQoL reflects patients’ overall feelings regarding their disease and correspondent therapy. As a central part of PROs, HRQoL is essential and indispensable for evaluating patients’ health status [9]. PROs are not summaries provided by medical professionals but are instead patient-centered self-reports of patients’ feelings regarding their health state, functional status, and therapeutics. Thus, PROs are helpful in diagnosis and therapy and are of significant importance in clinical practice [10,11,12,13,14]. Widely accepted by medical professionals, PROs make use of patients’ feedback and view patient self-evaluation as an important aspect of the end-point in clinical trials. In 2006, the United States Food and Drug Administration circulated a publication entitled “Guidance for Industry: Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling” [9], which further standardized the development and validation of PROs both clinically and academically [15,16,17].

Health-related quality of life instruments includes generic measures and disease-specific measures. All of these can reflect the quality of life of patients. General measurements for patients with chronic HF include the Nottingham Health Profile, Simple SF-36 Health Survey Questionnaire, and World Health Organization Quality of Life Scale–Brief Version [18]. These general measurements are not specific for CHF; therefore, they cannot specifically and completely represent the situation of patients with CHF. However, disease-specific measures quantify more clinically relevant domains than generic health status measures and are often more sensitive to clinical change. As the terminal phase of all organic heart diseases, CHF has specific clinical features and treatments; therefore, development of disease-specific measures for HF is necessary. Meanwhile, specific measurements used in the clinical setting include the Minnesota Living with Heart Failure Questionnaire (MLHFQ), Chronic Heart Failure Questionnaire, Kansas City Cardiomyopathy Questionnaire (KCCQ), and Quality of Life Index–Cardiac Version [18,19,20,21]. Among these, the MLHFQ and KCCQ are more popular than the others. The MLHFQ was the first questionnaire used in HF and has been translated and culturally adapted into at least 34 languages. It contains 21 items, most of which focus on physical and emotional domains; only one focuses on therapy [19, 20]. The Chronic Heart Failure Questionnaire evaluates fatigue, dyspnea, and emotion [20]. The KCCQ reports an overall summary score and five subdomain scores: physical limitations, symptoms, self-efficacy, social interference, and HRQoL. It focuses more on physical limitations, symptoms, and HRQoL and gives little attention to self-efficacy and social interference [18]. The Quality of Life Index–Cardiac Version was established in Europe and can be used for all types of heart disease [20].

Notably, doctors change treatment plans based on their patients’ social support and therapy status. For example, if the patient’s compliance decreases during the treatment period, the doctor can identify the specific cause by calculating the score of the related items in the scale. This may provide doctors with a relatively objective solution to improve patients’ dependence. Additionally, the score for the social support dimension of the scale can reflect the patient’s family situation and social environment. This could guide community doctors to help patients or their family members to solve corresponding problems and provide better community medical services. However, existing questionnaires rarely assess such factors [18, 20, 21].

Therefore, developing a Chinese questionnaire, specifically one that is culturally relevant to mainland China, is necessary because the management of CHF strongly depends on the different societal value systems, medical provision priorities, and economic environments in this country. We herein propose a measure based on PROs for patients with chronic HF to improve the current questionnaire for cardiovascular disease and guide clinical treatment.

Methods

1. Establishment of CHF-PROM

1.1 Conceptual framework construction

A conceptual framework for the CHF-PROM was constructed by considering the principles for developing PRO scales established by the Food and Drug Administration [22], previous life-quality questionnaires for patients with HF, and the relevant theories of CHF. The CHF-PROM should include four domains: the physical domain (PHD), psychological domain (PSD), social domain (SOD), and therapeutic domain (TRD).

1.2 Item generation

We consulted a large number of relevant studies and related questionnaires [9, 18,19,20,21,22]. The patients’ major disease symptoms, psychological and social conditions, and satisfaction towards medical services or side effects of treatment were also collected. The item pool was generated according to all of this information.

1.3 Formation of preliminary scale

Face-to face interviews regarding the above-mentioned items were required. Patients’ subjective opinions were taken into consideration. The item pool was applied to 10 patients with CHF in hospitals or communities (5 males, 5 females; average age, 65 years). During this process, the patients were asked to point out words they could not understand, and items were added or deleted as necessary. The items were revised by three cardiovascular disease experts, a psychologist, and a sociologist, who were invited to make suggestions regarding all four domains. Based on the patients’ and experts’ opinions, the CHF-PROM was further modified to form a preliminary scale. The scores of the items were calculated using a 5-point Likert scale.

1.4 Determination of the preliminary scale and formation of the final scale

1.4.1 Survey sample and sample size

Patients were enrolled from eight different hospitals in Shanxi Province, China. The inclusion criteria for this study were an age of > 18 years, with the principal diagnosis of Chronic Heart Failure according to the 2013 ACC/AHA guideline on HF [2], and consent to fill out the questionnaire. We excluded patients with combined psychiatric disorders and those who were incapable of understanding or completing the questionnaire because of language barriers or intellectual disabilities. Healthy subjects were defined as people who had not been diagnosed with any diseases by physicians. Healthy subjects who matched the basic characteristics of patients with CHF were recruited from communities of Shanxi Province. Before collecting healthy subjects, the investigators contacted related departments of target communities to obtain support from community workers. At the same time, full preparations for publicity were made by creating posters to display in the communities. Documents that introduced the survey were also distributed. Healthy subjects who were willing to participate in the questionnaire survey provided written informed consent. The participants filled out the questionnaire by following the same survey process followed by patients with CHF. In cases of missing, we corrected and supplemented the data in a timely manner. In factor analysis, Nunnally [23] suggested that the number of subjects should be at least 10 times the number of study variables. Some scholars have suggested that the actual sample size should be 5 to 10 times greater than the number of observed variables to obtain accurate parameter estimates and reliable results [24].

The purpose of our study was thoroughly explained to all participants. Written informed consent was obtained from all participants. These questionnaires were made available on the first day of hospitalization. During hospitalization, the patients independently completed the questionnaires according to their own physical conditions by following the instructions provided by the investigators. For the elderly patients who were unable to complete the questionnaires, the investigators read the content of the questionnaires and/or filled in the answers according to the patients’ selections without any suggestions. Data entry and its verification are important in the process of data management in clinical studies [25]. Double data entry was adopted to control data quality using EpiData3.1 software. In total, 105 patients and 50 healthy subjects were enrolled in the pilot study. Various statistical analyses were conducted to select high-quality items and develop the preliminary scale, such as the classical test theory [e.g., discrete trend, factor analysis, correlation coefficient, Cronbach’s α if item deleted (CAID) and corrected item-total correlation (CITC)] and item response theory. A further larger-scale survey involving 365 patients with CHF and 100 healthy subjects was conducted by using the preliminary scale.

1.4.2 Scale scoring

Patients responded to each item on a 5-point Likert scale to reflect how often they had experienced each issue during the past 2 weeks. An initial value ranging from 0 to 4 was assigned for each category (0 = never, 1 = occasionally, 2 = about half of the time, 3 = often, and 4 = almost every day). To ensure a consistent relationship between the responses to all items and the PROM, all responses were transformed in the following way: positively scored items were recorded as the original score plus 1, while negatively scored items were recorded as 5 minus the original score. This resulted in a score ranging from 1 to 5 for each item, with a higher score associated with a more positive PROM.

1.4.3 Item reduction based on both CTT and IRT

Discrete trend

A low discrete degree indicated that the subjects were inclined to select the same answer. In other words, the items were not useful for indicating differences. The scores generally exhibited a normal distribution; thus, the standard deviation was calculated for every item. Items with a low standard deviation (< 1.0) were deleted. Generally, a value of > 1.0 indicates that the participants may select different answers for an item [26].

Exploratory factor analysis

Considering the small sample size, an exploratory factor analysis was performed and the solution was rotated separately in each field (physical, psychological, social, and therapeutic). We determined the number of factors according to the eigenvalue and variance contribution ratio. The eigenvalue should be > 1.0, and the maximum cumulative variance contribution rate was 70%. Items with low factor loading (< 0.4) were removed. Generally, it was considered that the measurable variable (e.g., item) was mainly affected by this potential factor (e.g., subdomain) if factor loading was ≥0.4 [27].

CAID

The CAID and CITC were used to evaluate the internal consistency among the items. If an item had a negative effect on the internal consistency of its own dimension, Cronbach’s α coefficient increased greatly when the item was deleted. A CITC of < 0.4 indicated that an item was poorly correlated to the scale. In this circumstance, the item should be deleted [28].

Correlation coefficient

The representativeness of an item was measured by the correlation coefficient with its own subdomain. An item with a correlative value of < 0.6 was generally considered to be poorly correlated to the corresponding subdomain [29]. Such an item was removed.

IRT

IRT is part of modern measurement theory and was proposed to overcome the defects of CTT [30]. It is also called latent trait theory and has advantages in terms of item selection and test construction. It claims that the relationship between subjects’ abilities and their responses to an item can be described as a function. The basic task is to define this relationship. In brief, IRT can be viewed as a probabilistic method for discussing the relationship between subjects’ potential traits and their responses to items.

If we set θ as a subject’s ability, then p(θ) is the probability that the subject will respond to an item correctly. The functional relationship can be reflected by a curve called the item characteristic curve. We selected two important parameters on the curve: α reflects the discriminant degree, and b indicates the item difficulty. A graded response model appropriate for hierarchical and continuous data was constructed considering the 5-point Likert scale used in this study, extending a unidimensional model to a multidimensional one [31]. Five parameters were estimated in our study, namely a, b₁, b₂, b₃, and b₄, where b₁ is the difficulty level parameter between Answers 1 and 2, and so on, and b₁ < b₂ < b₃ < b₄. Here, a must have a value of > 0.60, and b ranges from − 3 to 3. Items supported by at least three methods were retained in the final CHF-PROM.

2. Validation of the final scale

Reliability

We calculated Cronbach’s alpha coefficients for four fields and the total scale to measure the internal consistency of the CHF-PROM. Generally, a value of > 0.70 indicates that individual items provide an adequate contribution to the overall scale [32].

Validity

Content validity

The patients’ opinions were typically consulted to validate the content with respect to how well the items met the empirical indexes of interest [33].

Construct validity

We subjected the factor structure of the scale to confirmatory factor analysis (CFA). The model was assessed with respect to the following relative goodness-of-fit statistics: root mean square error approximation (values of < 0.08 indicated adequate fit and values of < 0.05 indicated close fit of the data to the model) [34], normed fit index (values of ≥0.90), non-normed fit index (values of ≥0.90), incremental fit index (values of ≥0.90), comparative fit index (values of ≥0.90), and root mean square residual (values of < 0.09) [33]. We used LISREL 8.70 to assess the construct validity with CFA.

Discriminant validity

We determined the discriminant validity by comparing the mean scores for every subdomain of the CHF-PROM among the healthy people and patients with CHF. We compared the differences using a t-test, with the significance level set at P < 0.05 [35].

Feasibility

We evaluated the feasibility of the CHF-PROM by examining the response rate, completion rate, response time to completion, percentage of missing data, and score distribution. We considered response and return rates of < 85% to be inadequate and a completion time of 30 min to be acceptable. SPSS 16.0, Multilog 7.03, EpiData3.1, and LISREL 8.70 were used to conduct the data analysis. The entire study flow diagram is present in Fig. 1.

Results

Generation of item pool

After consulting relevant literature and interviewing patients with CHF, we established four domains as described in the Methods section: physical domain, psychological domain, social domain, and therapeutic domain. These 4 domains were then divided into 12 subdomains and a pool of 67 items (see Additional file 1). The conceptual framework of the instrument is shown in Fig. 2.

Formation of preliminary scale

Establishment of the CHF-PROM was based on published literature and related questionnaires. Consultants were also needed to improve the validity of the questionnaire [3, 7, 8, 12,13,14,15]. According to the advice provided by patients and experts in this field, six items were removed (“PHD1. Do you feel that your limb is weak?”, “PHD15. Do you have constipation?,” “PSD13. Do you often check things over and over again?,” “PSD14. Do you often wash your hands or count over and over again?,” “PSD22. Do you feel that people do not judge your achievements properly?,” and “TRD6. Did you think the examinations are necessary?”), three items were added (“Do you feel that your illness is a burden to your family?,” “Do you know the side effects of the drugs?,” and “Are you worried about the side effects of the drugs?”), and one item was divided into two items (“PSD4. Do you feel less concentrated and forget things easily?”). As a result, we generated 65 items for the CHF-PROM.

Item selection

Participant characteristics

The screening phase involved 105 patients and 50 healthy subjects. The patients with CHF had an average age of 69.16 ± 11.24 years. The normal subjects had an average age of 56.96 ± 14.96 years. The basic characteristics of the patients with CHF and healthy subjects are shown in Table 1. The demographic data were compared using the chi-square test for categorical variables.

Table 1 Demographic characteristics of the participants in the item-selection phase

Full size table

First item-selection phase

Five statistical methods within the CTT and IRT were used to select the items. Items PHD3, PHD7, PSD12, and SOD9 were deleted according to the above-mentioned criteria. As a result, the initial scale contained 61 items, 10 subdomains, and 4 domains.

Second item-selection phase

As shown in Table 2, PHD9, PHD10, PHD14, PSD2, PSD18, PSD19, PSD20, PSD21, SOD1, and TRE1 were deleted according to the discrete trend ( s < 0.96). PHD4, PHD5, PHD8, PHD9, PHD10, PHD13, PHD14, and PHD15 were removed according to the factor analysis. PHD9, PHD10, TRE11, and TRE12 were deleted because the correlation coefficient was < 0.6. We also deleted PHD16 and SOD6 based on the CAID method. SOD6, SOD8, TRE1, TRE2, TRE3, TRE4, TRE11, and TRE12 were eliminated according to IRT. Figure 3 shows the item characteristic curve matrix of each item. Items proposed by at least three methods were retained. The final scale contained 57 items, 12 subdomains, and 4 domains (see Additional file 2). The final construction frame is shown in Table 3.

Table 2 Summary of the second item-selection phase using CTT and IRT

Full size table

Table 3 Structure of the 57 items in the final scale

Full size table

Validation of the scale

The scale was validated in large-scale sample. The sample size was determined based on Nunnally’s rule. The sample size was only slightly below the target sample size. Patients were enrolled from different departments of eight different hospitals in Shanxi Province, China. Some patients were not willing to participate in the questionnaire because of their physical condition at that time, fear of disclosing their privacy, and other factors. In these target hospitals, several departments of cardiology were participating in investigations using other psychological questionnaires and were therefore unwilling to take part in the survey. Bias many be introduced into the study results if inpatients with CHF participate in two questionnaires simultaneously. So, 470 questionnaires were sent out and 467 were collected (98.50%) totally. There were 460 valid questionnaires (patients with CHF, 360; healthy people, 100). The patients with CHF had an average age of 69.87 ± 10.60 years, and the healthy subjects had an average age of 57.06 ± 14.67 years. The participants’ baseline data are shown in Table 4. The demographic data were compared using the chi-square test for categorical variables.

Table 4 Baseline data of the participants in the formal survey

Full size table