- Open Access
CHF-PROM: validation of a patient-reported outcome measure for patients with chronic heart failure
Health and Quality of Life Outcomesvolume 16, Article number: 51 (2018)
Due to a lack of an appropriate disease-specific patient-reported outcome (PRO) instrument for chronic heart failure including its social support and treatment aspects in China, this study was performed to develop a patient-reported outcome measure (PROM) for patients with chronic heart failure and evaluate its reliability, validity, and feasibility.
According to the standard PROM guidelines established by the Food and Drug Administration, an item pool was formed by reviewing a large amount of relevant literature and interviewing patients with chronic heart failure about their main symptoms. Thus, the primary scale was created after adjusting the items and language with the help of patients and experts in the field. Next, 155 patients from 8 hospitals in different districts were recruited for a pilot survey using questionnaires containing these items. The patients’ responses were analyzed using the classical test theory and item response theory to select high-quality items and determine the subdomains of the scale. This was followed by a formal investigation in the same eight hospitals. In total, 360 patients and 100 healthy subjects were included to evaluate the reliability, validity, and feasibility of the items. Through this process, the final scale was established.
The final scale comprised 12 subdomains with 57 items related to physical, psychological, social, and therapeutic areas. The data analysis results of the formal investigation showed that the PROM for chronic heart failure had good reliability, validity, and feasibility. Reliability was verified by Cronbach’s alpha coefficient, which was 0.913 for the total scale, 0.903 for the physical domain, 0.941 for the psychological domain, 0.827 for the social domain, and 0.839 for the therapeutic domain. The construct validity results met the relative criteria of confirmatory factor analysis. Discriminant validity was represented by score comparisons of nine subdomains. The response rate and the effective rate of return of the CHF-PROM were 98.94% and 98.92%, respectively.
The final scale coincides with the theoretical framework and better reflects the overall quality of life of patients with chronic heart failure. This scale can be used as a valid instrument to evaluate clinical treatment and clinical trials of chronic heart failure.
Heart failure (HF) is a syndrome caused by a functional heart disorder. The heart is unable to meet the needs of the body at the normal pressure . As a complex clinical syndrome, heart failure (HF) is the terminal phase of all systemic heart diseases by various causes. More than 26 million individuals have HF, and this number is increasing. By 2050, an estimated 20% people among those aged > 65 years will have developed HF . HF has become an overwhelming threat to human health and social development. Based on the severity of disease, HF can be divided into acute HF (AHF) and chronic HF (CHF) .
CHF is the final stage of heart disease. It is a complex clinical syndrome characterized by dyspnea, edema, and fatigue . Its treatment includes medical therapy, mechanical circulatory assistance, and cardiac transplantation . Individual therapeutic strategies based on patients’ reported outcomes, which can reflect patients’ individual situations, has been proven effective for relieving the symptoms of CHF and improving patients’ quality of life (QoL). Compared with many other chronic diseases, CHF affects QoL more profoundly. QoL has become a major concern in modern medicine in recent years. However, clinical management and research have not taken CHF into consideration to a satisfactory degree . Depression and social function disability have been shown to have a significant impact on QoL in patients with CHF . Other factors affecting QoL include treatment compliance, satisfaction with treatment, and adverse effects of related treatments . Additionally, decisions regarding therapy can change over time depending on the feelings of the patients and their families.
Patient-reported outcomes (PROs) are based on health-related quality of life (HRQoL). HRQoL reflects patients’ overall feelings regarding their disease and correspondent therapy. As a central part of PROs, HRQoL is essential and indispensable for evaluating patients’ health status . PROs are not summaries provided by medical professionals but are instead patient-centered self-reports of patients’ feelings regarding their health state, functional status, and therapeutics. Thus, PROs are helpful in diagnosis and therapy and are of significant importance in clinical practice [10,11,12,13,14]. Widely accepted by medical professionals, PROs make use of patients’ feedback and view patient self-evaluation as an important aspect of the end-point in clinical trials. In 2006, the United States Food and Drug Administration circulated a publication entitled “Guidance for Industry: Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling” , which further standardized the development and validation of PROs both clinically and academically [15,16,17].
Health-related quality of life instruments includes generic measures and disease-specific measures. All of these can reflect the quality of life of patients. General measurements for patients with chronic HF include the Nottingham Health Profile, Simple SF-36 Health Survey Questionnaire, and World Health Organization Quality of Life Scale–Brief Version . These general measurements are not specific for CHF; therefore, they cannot specifically and completely represent the situation of patients with CHF. However, disease-specific measures quantify more clinically relevant domains than generic health status measures and are often more sensitive to clinical change. As the terminal phase of all organic heart diseases, CHF has specific clinical features and treatments; therefore, development of disease-specific measures for HF is necessary. Meanwhile, specific measurements used in the clinical setting include the Minnesota Living with Heart Failure Questionnaire (MLHFQ), Chronic Heart Failure Questionnaire, Kansas City Cardiomyopathy Questionnaire (KCCQ), and Quality of Life Index–Cardiac Version [18,19,20,21]. Among these, the MLHFQ and KCCQ are more popular than the others. The MLHFQ was the first questionnaire used in HF and has been translated and culturally adapted into at least 34 languages. It contains 21 items, most of which focus on physical and emotional domains; only one focuses on therapy [19, 20]. The Chronic Heart Failure Questionnaire evaluates fatigue, dyspnea, and emotion . The KCCQ reports an overall summary score and five subdomain scores: physical limitations, symptoms, self-efficacy, social interference, and HRQoL. It focuses more on physical limitations, symptoms, and HRQoL and gives little attention to self-efficacy and social interference . The Quality of Life Index–Cardiac Version was established in Europe and can be used for all types of heart disease .
Notably, doctors change treatment plans based on their patients’ social support and therapy status. For example, if the patient’s compliance decreases during the treatment period, the doctor can identify the specific cause by calculating the score of the related items in the scale. This may provide doctors with a relatively objective solution to improve patients’ dependence. Additionally, the score for the social support dimension of the scale can reflect the patient’s family situation and social environment. This could guide community doctors to help patients or their family members to solve corresponding problems and provide better community medical services. However, existing questionnaires rarely assess such factors [18, 20, 21].
Therefore, developing a Chinese questionnaire, specifically one that is culturally relevant to mainland China, is necessary because the management of CHF strongly depends on the different societal value systems, medical provision priorities, and economic environments in this country. We herein propose a measure based on PROs for patients with chronic HF to improve the current questionnaire for cardiovascular disease and guide clinical treatment.
1. Establishment of CHF-PROM
1.1 Conceptual framework construction
A conceptual framework for the CHF-PROM was constructed by considering the principles for developing PRO scales established by the Food and Drug Administration , previous life-quality questionnaires for patients with HF, and the relevant theories of CHF. The CHF-PROM should include four domains: the physical domain (PHD), psychological domain (PSD), social domain (SOD), and therapeutic domain (TRD).
1.2 Item generation
We consulted a large number of relevant studies and related questionnaires [9, 18,19,20,21,22]. The patients’ major disease symptoms, psychological and social conditions, and satisfaction towards medical services or side effects of treatment were also collected. The item pool was generated according to all of this information.
1.3 Formation of preliminary scale
Face-to face interviews regarding the above-mentioned items were required. Patients’ subjective opinions were taken into consideration. The item pool was applied to 10 patients with CHF in hospitals or communities (5 males, 5 females; average age, 65 years). During this process, the patients were asked to point out words they could not understand, and items were added or deleted as necessary. The items were revised by three cardiovascular disease experts, a psychologist, and a sociologist, who were invited to make suggestions regarding all four domains. Based on the patients’ and experts’ opinions, the CHF-PROM was further modified to form a preliminary scale. The scores of the items were calculated using a 5-point Likert scale.
1.4 Determination of the preliminary scale and formation of the final scale
1.4.1 Survey sample and sample size
Patients were enrolled from eight different hospitals in Shanxi Province, China. The inclusion criteria for this study were an age of > 18 years, with the principal diagnosis of Chronic Heart Failure according to the 2013 ACC/AHA guideline on HF , and consent to fill out the questionnaire. We excluded patients with combined psychiatric disorders and those who were incapable of understanding or completing the questionnaire because of language barriers or intellectual disabilities. Healthy subjects were defined as people who had not been diagnosed with any diseases by physicians. Healthy subjects who matched the basic characteristics of patients with CHF were recruited from communities of Shanxi Province. Before collecting healthy subjects, the investigators contacted related departments of target communities to obtain support from community workers. At the same time, full preparations for publicity were made by creating posters to display in the communities. Documents that introduced the survey were also distributed. Healthy subjects who were willing to participate in the questionnaire survey provided written informed consent. The participants filled out the questionnaire by following the same survey process followed by patients with CHF. In cases of missing, we corrected and supplemented the data in a timely manner. In factor analysis, Nunnally  suggested that the number of subjects should be at least 10 times the number of study variables. Some scholars have suggested that the actual sample size should be 5 to 10 times greater than the number of observed variables to obtain accurate parameter estimates and reliable results .
The purpose of our study was thoroughly explained to all participants. Written informed consent was obtained from all participants. These questionnaires were made available on the first day of hospitalization. During hospitalization, the patients independently completed the questionnaires according to their own physical conditions by following the instructions provided by the investigators. For the elderly patients who were unable to complete the questionnaires, the investigators read the content of the questionnaires and/or filled in the answers according to the patients’ selections without any suggestions. Data entry and its verification are important in the process of data management in clinical studies . Double data entry was adopted to control data quality using EpiData3.1 software. In total, 105 patients and 50 healthy subjects were enrolled in the pilot study. Various statistical analyses were conducted to select high-quality items and develop the preliminary scale, such as the classical test theory [e.g., discrete trend, factor analysis, correlation coefficient, Cronbach’s α if item deleted (CAID) and corrected item-total correlation (CITC)] and item response theory. A further larger-scale survey involving 365 patients with CHF and 100 healthy subjects was conducted by using the preliminary scale.
1.4.2 Scale scoring
Patients responded to each item on a 5-point Likert scale to reflect how often they had experienced each issue during the past 2 weeks. An initial value ranging from 0 to 4 was assigned for each category (0 = never, 1 = occasionally, 2 = about half of the time, 3 = often, and 4 = almost every day). To ensure a consistent relationship between the responses to all items and the PROM, all responses were transformed in the following way: positively scored items were recorded as the original score plus 1, while negatively scored items were recorded as 5 minus the original score. This resulted in a score ranging from 1 to 5 for each item, with a higher score associated with a more positive PROM.
1.4.3 Item reduction based on both CTT and IRT
A low discrete degree indicated that the subjects were inclined to select the same answer. In other words, the items were not useful for indicating differences. The scores generally exhibited a normal distribution; thus, the standard deviation was calculated for every item. Items with a low standard deviation (< 1.0) were deleted. Generally, a value of > 1.0 indicates that the participants may select different answers for an item .
Exploratory factor analysis
Considering the small sample size, an exploratory factor analysis was performed and the solution was rotated separately in each field (physical, psychological, social, and therapeutic). We determined the number of factors according to the eigenvalue and variance contribution ratio. The eigenvalue should be > 1.0, and the maximum cumulative variance contribution rate was 70%. Items with low factor loading (< 0.4) were removed. Generally, it was considered that the measurable variable (e.g., item) was mainly affected by this potential factor (e.g., subdomain) if factor loading was ≥0.4 .
The CAID and CITC were used to evaluate the internal consistency among the items. If an item had a negative effect on the internal consistency of its own dimension, Cronbach’s α coefficient increased greatly when the item was deleted. A CITC of < 0.4 indicated that an item was poorly correlated to the scale. In this circumstance, the item should be deleted .
The representativeness of an item was measured by the correlation coefficient with its own subdomain. An item with a correlative value of < 0.6 was generally considered to be poorly correlated to the corresponding subdomain . Such an item was removed.
IRT is part of modern measurement theory and was proposed to overcome the defects of CTT . It is also called latent trait theory and has advantages in terms of item selection and test construction. It claims that the relationship between subjects’ abilities and their responses to an item can be described as a function. The basic task is to define this relationship. In brief, IRT can be viewed as a probabilistic method for discussing the relationship between subjects’ potential traits and their responses to items.
If we set θ as a subject’s ability, then p(θ) is the probability that the subject will respond to an item correctly. The functional relationship can be reflected by a curve called the item characteristic curve. We selected two important parameters on the curve: α reflects the discriminant degree, and b indicates the item difficulty. A graded response model appropriate for hierarchical and continuous data was constructed considering the 5-point Likert scale used in this study, extending a unidimensional model to a multidimensional one . Five parameters were estimated in our study, namely a, b1, b2, b3, and b4, where b1 is the difficulty level parameter between Answers 1 and 2, and so on, and b1 < b2 < b3 < b4. Here, a must have a value of > 0.60, and b ranges from − 3 to 3. Items supported by at least three methods were retained in the final CHF-PROM.
2. Validation of the final scale
We calculated Cronbach’s alpha coefficients for four fields and the total scale to measure the internal consistency of the CHF-PROM. Generally, a value of > 0.70 indicates that individual items provide an adequate contribution to the overall scale .
The patients’ opinions were typically consulted to validate the content with respect to how well the items met the empirical indexes of interest .
We subjected the factor structure of the scale to confirmatory factor analysis (CFA). The model was assessed with respect to the following relative goodness-of-fit statistics: root mean square error approximation (values of < 0.08 indicated adequate fit and values of < 0.05 indicated close fit of the data to the model) , normed fit index (values of ≥0.90), non-normed fit index (values of ≥0.90), incremental fit index (values of ≥0.90), comparative fit index (values of ≥0.90), and root mean square residual (values of < 0.09) . We used LISREL 8.70 to assess the construct validity with CFA.
We determined the discriminant validity by comparing the mean scores for every subdomain of the CHF-PROM among the healthy people and patients with CHF. We compared the differences using a t-test, with the significance level set at P < 0.05 .
We evaluated the feasibility of the CHF-PROM by examining the response rate, completion rate, response time to completion, percentage of missing data, and score distribution. We considered response and return rates of < 85% to be inadequate and a completion time of 30 min to be acceptable. SPSS 16.0, Multilog 7.03, EpiData3.1, and LISREL 8.70 were used to conduct the data analysis. The entire study flow diagram is present in Fig. 1.
Generation of item pool
After consulting relevant literature and interviewing patients with CHF, we established four domains as described in the Methods section: physical domain, psychological domain, social domain, and therapeutic domain. These 4 domains were then divided into 12 subdomains and a pool of 67 items (see Additional file 1). The conceptual framework of the instrument is shown in Fig. 2.
Formation of preliminary scale
Establishment of the CHF-PROM was based on published literature and related questionnaires. Consultants were also needed to improve the validity of the questionnaire [3, 7, 8, 12,13,14,15]. According to the advice provided by patients and experts in this field, six items were removed (“PHD1. Do you feel that your limb is weak?”, “PHD15. Do you have constipation?,” “PSD13. Do you often check things over and over again?,” “PSD14. Do you often wash your hands or count over and over again?,” “PSD22. Do you feel that people do not judge your achievements properly?,” and “TRD6. Did you think the examinations are necessary?”), three items were added (“Do you feel that your illness is a burden to your family?,” “Do you know the side effects of the drugs?,” and “Are you worried about the side effects of the drugs?”), and one item was divided into two items (“PSD4. Do you feel less concentrated and forget things easily?”). As a result, we generated 65 items for the CHF-PROM.
The screening phase involved 105 patients and 50 healthy subjects. The patients with CHF had an average age of 69.16 ± 11.24 years. The normal subjects had an average age of 56.96 ± 14.96 years. The basic characteristics of the patients with CHF and healthy subjects are shown in Table 1. The demographic data were compared using the chi-square test for categorical variables.
First item-selection phase
Five statistical methods within the CTT and IRT were used to select the items. Items PHD3, PHD7, PSD12, and SOD9 were deleted according to the above-mentioned criteria. As a result, the initial scale contained 61 items, 10 subdomains, and 4 domains.
Second item-selection phase
As shown in Table 2, PHD9, PHD10, PHD14, PSD2, PSD18, PSD19, PSD20, PSD21, SOD1, and TRE1 were deleted according to the discrete trend ( s < 0.96). PHD4, PHD5, PHD8, PHD9, PHD10, PHD13, PHD14, and PHD15 were removed according to the factor analysis. PHD9, PHD10, TRE11, and TRE12 were deleted because the correlation coefficient was < 0.6. We also deleted PHD16 and SOD6 based on the CAID method. SOD6, SOD8, TRE1, TRE2, TRE3, TRE4, TRE11, and TRE12 were eliminated according to IRT. Figure 3 shows the item characteristic curve matrix of each item. Items proposed by at least three methods were retained. The final scale contained 57 items, 12 subdomains, and 4 domains (see Additional file 2). The final construction frame is shown in Table 3.
Validation of the scale
The scale was validated in large-scale sample. The sample size was determined based on Nunnally’s rule. The sample size was only slightly below the target sample size. Patients were enrolled from different departments of eight different hospitals in Shanxi Province, China. Some patients were not willing to participate in the questionnaire because of their physical condition at that time, fear of disclosing their privacy, and other factors. In these target hospitals, several departments of cardiology were participating in investigations using other psychological questionnaires and were therefore unwilling to take part in the survey. Bias many be introduced into the study results if inpatients with CHF participate in two questionnaires simultaneously. So, 470 questionnaires were sent out and 467 were collected (98.50%) totally. There were 460 valid questionnaires (patients with CHF, 360; healthy people, 100). The patients with CHF had an average age of 69.87 ± 10.60 years, and the healthy subjects had an average age of 57.06 ± 14.67 years. The participants’ baseline data are shown in Table 4. The demographic data were compared using the chi-square test for categorical variables.
Cronbach’s alpha coefficients for the four domains and overall scale are shown in Table 5. In general, this questionnaire showed great reliability.
The results of the CFA were as follows: physical domain measurement model: 16 items corresponding to 3 latent variables; PSD measurement model: 21 items corresponding to 4 latent variables; SOD measurement model: 8 items corresponding to 2 latent variables; and TRD measurement model: 12 items corresponding 3 latent variables. Table 6 shows the goodness-of-fit for the CFA. The results showed that the model correlated well with the reference standard. The parameter estimation results of the CFA are presented in Table 7.
In this survey, the scores of each subdomain in addition to the therapeutic domain and total score of the scale between the patients and healthy subjects showed significant differences (see Table 8). These differences indicated that the scale was able to distinguish people in different groups.
In the large-scale clinical investigation, the response rate of the CHF-PROM was 98.94%, and the effective rate of return was 98.92%. The average completion time of CHF-PROM was 15 min. The score distribution of each item was analyzed. No major floor or ceiling effects were found. Only 0.06% of the responses to the psychological domain were missing. These findings suggest that the CHF-PROM is feasible for use in clinical practice.
As a chronic disorder, CHF requires special management from patients and their families, including adjustment of daily habits, liquid management, and heart rate management. Based on detailed PROs, medical professionals can provide individual instructions to patients to improve their quality of life and reduce re-hospitalization and mortality rates . We established the present CHF-PROM because of the brevity of previous HF questionnaires, which were translated directly from aboard and focused little on social support and therapy status. We applied four domains (physical domain, psychological domain, social domain, and therapeutic domain) and performed large-scale survey for the healthy subjects and patients with CHF in 8 hospitals to generate this CHF-PROM, which can more fully reflect the health status of patients with CHF.
We developed the present CHF-PROM in compliance with the development principles and processes of international scales. The CHF-PROM was developed in three stages: generation of the item pool, a pilot survey to form the preliminary scale, and use of large-scale clinical trials to form the final scale. To ensure that each selected item was sensitive, representative, and independent, we adopted different statistical methods in the process of generating the scale. The average time spent performing PRO data collection was about 15 min. This is thought to have been an acceptable time for the inpatients. During this time period, the inpatients could complete the questionnaires and provide accurate responses. The time of data collection should be controlled when performing a questionnaire-based study. The timing of data collection might have influenced the responses.
The methods employed to develop related scales are still limited to CTT . Our study is innovative in that IRT was applied in addition to CTT. IRT has some advantages over CTT. Using IRT, estimation of parameters is independent the number of measured subjects. It is also possible to indicate the accuracy of the test capability [38,39,40]. Besides statistical methods, clinical professional knowledge was also required during the process of item selection. The item “PHD18: Do you often feel nauseous?” met the requirements of statistical methods for item selection, but it did not describe a typical symptom of CHF; therefore, we deleted this item. Ten items were removed based on joint consideration of the CTT results, IRT results, and clinical knowledge. The final scale contained 57 items, 12 subdomains, and 4 domains.
We also evaluated the reliability, validity, and feasibility of the scale for 360 patients and 100 healthy subjects. The results showed that this novel scale is a reliable instrument. The CHF-PROM was generated to overcome the deficiencies in the existing HF scales. However, this study had some limitations. First, some problems exist in the personal basic information section of the scale. Economic income and consumption levels vary among different provinces and cities, making adoption of a single evaluating system of QoL inappropriate for patients with CHF. Previous studies have reported that patients’ incomes, living conditions, life events, and education levels are the main factors influencing mental health, and among them, income most strongly affects living conditions and life events. Thus, income and educational level are included in the basic information of the scale . Second, some problems exist in selection of the items. We removed four items with poor sensitivity, independence, representativeness, and discrimination in the preliminary experiment. Our results suggest that every aspect should be considered in the future design of relevant scales. Finally, although our large-scale survey has indicated that CHF-PROM is a valid instrument, our samples were collected in only a limited area and are not completely representative of all patients with CHF. To further revise and improve the scale, more efforts are needed to extract larger numbers of patients from different provinces and regions and even different countries for cross-language scale adjustment to develop a CHF-PROM with wider applicability . And these adjusted versions also need a validation that must be done separately from this Chinese version.
In this study, we developed a CHF-PROM that showed better reliability, validity, and feasibility than previously established scales. The CHF-PROM provided the patients a greater chance to participate in treatment decisions, suggesting that PROs can be used in more clinical trials and diagnostic settings in the future. This will allow doctors to obtain more comprehensive medical information, and PROs will become an important indicator of the end-point in curative effects.
Cronbach’s α if item deleted
Confirmatory factor analysis
Patient-reported outcome measure for chronic heart failure
Corrected item-total correlation
Classical test theory
Health-related quality of life
Item response theory
Heart Failure Society of A, Lindenfeld J, Albert NM, et al. HFSA 2010 comprehensive heart failure practice guideline. J Card Fail. 2010;16:e1–194.
Ambrosy AP, Fonarow GC, Butler J, et al. The global health and economic burden of hospitalizations for heart failure: lessons learned from hospitalized heart failure registries. J Am Coll Cardiol. 2014;63:1123–33.
Blecker S, Paul M, Taksler G, et al. Heart failure-associated hospitalizations in the United States. J Am Coll Cardiol. 2013;61:1259–67.
Morrissey RP, Czer L, Shah PK. Chronic heart failure: current evidence, challenges to therapy, and future directions. Am J Cardiovasc Drugs. 2011;11:153–71.
Mariell Jessup, Biykem Bozkurt, Javed Butler, et al. 2013 ACCF/AHA guideline for the Management of Heart Failure. JACC 2013; 62:147–239.
Nieminen MS, Dickstein K, Fonseca C, et al. The patient perspective: quality of life in advanced heart failure with frequent hospitalisations. Int J Cardiol. 2015;191:256–64.
Schowalter MG, Gelbrich S, Störk JP, et al. Generic and disease-specific health-related QoL in patients with chronic systolic heart failure: impact of depression. Clin Res Cardiol. 2013;102:269–78.
Parissis JT, Nikolaou M, Farmakis D, et al. Self-assessment of health status is associated with inflammatory activation and predicts long-term outcomes in chronic heart failure. Eur J Heart Fail. 2009;11:163–9.
U.S. Department of Health and Human Services FDA Center for drug evaluation and research, U.S. Department of Health and Human Services FDA Center for biologics evaluation and research, U.S. Department of Health and Human Services FDA Center for devices and radiological health. Guidance for industry: patient-reported outcome measures: use in medical product development to support labeling claims: draft guidance. Health Qual Life Outcomes 2006; 4(1):79.
Patrick DL, Chiang YP. Measurement of health outcomes in treatment effectiveness evaluations: conceptual and methodological challenges. MedicalCare. 2000;38(Suppl 9):II14–25.
Donaldson MS. Taking stock of health-related quality of life measurement in oncology practice in the United States. J Natl Cancer Inst Monogr. 2004;(33):155–67.
Ware JE. Conceptualization and Measurement of health- related quality of life: comments on an evolving field. Arch Phys Med Rehabil. 2003;844(Suppl 2):S43–51.
Emery MP, Perrier LL, Acquadro C. Patient-reported outcome and quality of life instruments database (PROQOLID): frequently asked questions. Health Qual Life Outcomes. 2005;3:12.
Bradley C. Feedback on FDA’s February 2006 draft guidance on PRO measures from a developer of PRO measures. Health Qual Life Outcomes. 2006;4:78.
McHorney CA. Generic health measurement: past accomplishments and a measurement paradigm for the 21st century. Annals of Internal Medicine 1997; 127 Part 2:743–750.
White EB. Outcomes: essential information for clinical decision support: an interview with Ellen B. White. Interview by Melinda L. J Health Care Finance 1998; 24(3):71–81.
Dodd BG, De Ayala R, Koch WR. Computerized adaptive testing with polytomous items. Appl Psychol Meas. 1995;19(1):5–22.
Green CP, Porter CB, Bresnahan DR, et al. Development and evaluation of the Kansas City cardiomyopathy questionnaire: a new health status measure for heart failure. J Am Coll Cardiol. 2000;35(5):1245–55.
Rector TS, Cohn JN. Assessment of patient outcome with the Minnesota living with heart failure questionnaire: reliability and validity during a randomized, double-blind, placebo-controlled trial of pimobendan. Am Heart J. 1992;124(4):1017–25.
Bilbao A, Escobar A, García-Perez L, et al. The Minnesota living with heart failure questionnaire: comparison of different factor structures. Health Qual Life Outcomes. 2016;14(23):1–11.
Sijtsma K, Emons WHM, Bouwmeester S, et al. Nonparametric IRT analysis of quality-of-life scales and its application to the World Health Organization quality-of-life scale (WHOQOL-Bref). Qual Life Res. 2008;17:275–90.
U.S. Department of Health and Human Services FDA Center for Drug Evaluation and Research, U.S. Department of Health and Human Services FDA Center for Biologics Evaluation and Research,U.S. Department of Health and Human Services FDA Center for Devices and Radiological Health.Guidance for industry: patient-reported outcome measures: use in medical productdevelopment to support labeling claims: draft guidance.Health Qual Life Outcomes. 2006;11(4):79.
Nunnally JC. Psychometric theory. New York: McGraw-Hill; 1967.
Anthoine E, Moret L, Regnault A, et al. Sample size used to validate a scale a review of publications on newly-developed patient reported outcomes measures. Health Qual Life Outcomes. 2014;176:1–10.
Paulsen A, Overgaard S, Lauritsen JM. Quality of data entry using single entry, double entry and automated forms processing--an example based on a study of patient-reported outcomes. PLoS One. 2012;7:e35087.
Cappelleri JC, Jason LJ, Hays RD. Overview of classical test theory and item response theory for the quantitative assessment of items in developing patient-reported outcomes measures. Clin Ther. 2014;36:648–62.
Lai JS, Cook K, Stone A, et al. Classical test theory and item response theory/Rasch model to assess differences between patient-reported fatigue using seven-day and four-week recall periods. J Clin Epidemiol. 2009;62:991–7.
Waller J, Ostini R, Marlow LA, et al. Validation of a measure of knowledge about human papillomavirus (HPV) using item response theory and classical test theory. Prev Med. 2013;56:35–40.
Meads DM, Bentall RP. Rasch analysis and item reduction of the hypomanic personality scale. Personal Individ Differ. 2008;44:1772–83.
Zhang YB. Latent variable analysis. Beijing: Higher Education Press; 2009. p. 1–5.
Hambleton RK, Swaminathan H. Item response theory: principles and applications. Springer Science & Business Media; 2013.
Meads DM, McKenna SP, Doward LC, et al. Development and validation of the asthma life impact scale (ALIS). Respir Med. 2010;104:633–43.
Brady MJ, Cella DF, Mo F, et al. Reliability and validity of the functional assessment of Cancer therapy-breast quality-of-life instrument. J Clin Oncol. 1997;15:974–86.
Seock-Ho K, Cohen AS. A comparison of linking and concurrent calibrationunder the graded response model. Appl Psychol Meas. 2002;26(1):25–41.
Broekman BF, Niti M, Nyunt MS, et al. Validation of a brief seven-item response bias-free geriatric depression scale. Am J Geriatr Psychiatry. 2011;19:589–96.
Thompson LE, Bekelman DB, Allen LA, et al. Patient-reported outcomes in heart failure: existing measures and future uses. Curr Heart Fail Rep. 2015;12:236–46.
Neal DJ, Corbin WR, Fromme K. Measurement of alcohol-related consequences among high school and college students: application of item response models to the Rutgers alcohol problem index. Psychol Assess. 2006;18(4):402–14.
Hambieton RK, Jones RW. Comparison of classical test theory and item response theory and their applications to test development. Educational measurement:issues and. Practice. 1993;12(3):38–47.
Hagman BT, Kuerbis AN, Morgenstern J, et al. An item response theory (IRT) analysis of the short inventory of problems-alcohol and drugs (SIP-AD) among non-treatment seeking men-who-have-sex-with-men: evidence for a shortened 10-item SIP-AD. Addict Behav. 2009;34(11):948–54.
Patoo M, Allahyari AA, Moradi AR, et al. Persian version of functional assessment of Cancer therapy- breast (FACT-B) scale: confirmatory factor analysis and psychometric properties. Asian Pac J Cancer Prev. 2015;16(9):3799–803.
Fonarow GC. Clinical risk prediction tools in patients hospitalized with heart failure. Rev Cardiovasc Med. 2012;13(1):e14–23.
Kanatas A, Velikova G, Roe B, et al. Patient-reported outcomes in breast oncology: a review of validated outcome instruments. Tumori. 2012;98(6):678–88.
We are grateful for the cooperation of the community hospitals in Taiyuan City. We also thank Angela Morben, DVM, ELS, from Liwen Bianji, Edanz Editing China (www.liwenbianji.cn/ac), for editing the English text of a draft of this manuscript.
This study was funded by the National Nature Science Foundation of China(Grant No. 81273180) and the Key Research and Development Project of Shanxi Province(Grant No. 201603D321101).
Availability of data and materials
Please contact the corresponding author for the study data, which will be granted upon reasonable request.
Ethics approval and consent to participate
The study protocol received medical and ethical approval from Shanxi Medical University. All participants provided written informed consent and received compensation for their time and effort.
Consent for publication
All authors have approved the manuscript for publication.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.