Development and validation of a patient-reported outcome measure for stroke patients

Luo, Yanhong; Yang, Jie; Zhang, Yanbo

doi:10.1186/s12955-015-0246-0

Research article
Open access
Published: 08 May 2015

Development and validation of a patient-reported outcome measure for stroke patients

Yanhong Luo¹,
Jie Yang¹ &
Yanbo Zhang¹

Health and Quality of Life Outcomes volume 13, Article number: 53 (2015) Cite this article

3736 Accesses
21 Citations
1 Altmetric
Metrics details

Abstract

Background

Family support and patient satisfaction with treatment are crucial for aiding in the recovery from stroke. However, current validated stroke-specific questionnaires may not adequately capture the impact of these two variables on patients undergoing clinical trials of new drugs. Therefore, the aim of this study was to develop and evaluate a new stroke patient-reported outcome measure (Stroke-PROM) instrument for capturing more comprehensive effects of stroke on patients participating in clinical trials of new drugs.

Methods

A conceptual framework and a pool of items for the preliminary Stroke-PROM were generated by consulting the relevant literature and other questionnaires created in China and other countries, and interviewing 20 patients and 4 experts to ensure that all germane parameters were included. During the first item-selection phase, classical test theory and item response theory were applied to an initial scale completed by 133 patients with stroke. During the item-revaluation phase, classical test theory and item response theory were used again, this time with 475 patients with stroke and 104 healthy participants. During the scale assessment phase, confirmatory factor analysis was applied to the final scale of the Stroke-PROM using the same study population as in the second item-selection phase. Reliability, validity, responsiveness and feasibility of the final scale were tested.

Results

The final scale of Stroke-PROM contained 46 items describing four domains (physiology, psychology, society and treatment). These four domains were subdivided into 10 subdomains. Cronbach’s α coefficients for the four domains ranged from 0.861 to 0.908. Confirmatory factor analysis supported the validity of the final scale, and the model fit index satisfied the criterion. Differences in the Stroke-PROM mean scores were significant between patients with stroke and healthy participants in nine subdomains (P < 0.001), indicating that the scale showed good responsiveness.

Conclusions

The Stroke-PROM is a patient-reported outcome multidimensional questionnaire developed especially for clinical trials of new drugs and is focused on issues of family support and patient satisfaction with treatment. Extensive data analyses supported the validity, reliability and responsiveness of the Stroke-PROM.

Background

Stroke is the second leading cause of mortality worldwide [1], and stroke survivors are often severely disabled for the rest of their lives [2]. More than 85% of strokes occur in developing countries [3]. Epidemiological surveys have shown that there are 150–200 million new cases of stroke each year in China. The age-adjusted annual incidence rate of stroke is 116–219 per 100,000 people, and the annual mortality rate from stroke is 58–142 per 100,000 people [4].

Stroke has considerable adverse physical and psychological impacts on patients over time [5,6]. For the diagnosis and treatment of stroke and its sequelae, therefore, purely objective indicators do not accurately measure the multifaceted impact of stroke on patients. Assessment of the effects of treatment on any individual patient should include the patient’s own evaluation of therapy, or patient-reported outcome (PRO) [7]. A PRO is any report of the status of a patient’s health condition that comes directly from the patient, without interpretation of the patient’s response by a clinician or anyone else [8,9].

In recent years, multiple measures, including generic and disease-specific measures, have been used to assess outcomes of patients with stroke. Generic instruments are useful for comparing quality of life impact in populations with different diseases; however, disease-specific tools are generally more responsive and sensitive to disease-specific issues and are therefore more appropriate for clinical trials in which specific therapeutic interventions are being evaluated [10,11]. Although PRO tools developed specifically for stroke do exist (e.g., Newcastle Stroke-Specific Quality of Life Measure; Stroke and Aphasia Quality of Life Scale-39 item version; Stroke Impact Scale version 2.0), a review of these instruments yielded no measure that captures PRO associated with family support and patient satisfaction with treatment, two particularly significant issues for many stroke survivors [12-15]. Given the absence of stroke-specific measures in the subdomains of family support and treatment satisfaction, the development is necessary of a more comprehensive multidimensional scale that evaluates all facets of the health status in patients with stroke.

Therefore, the aim of this study was to develop an understandable, reliable and valid PRO measure for patients with stroke that captures valuable data from the patient’s viewpoint. This article reports on the development of the initial pool of items, selection of the final item set, and evaluation of a new stroke patient-reported outcome measure (Stroke-PROM).

Methods

Ethics statement

The study protocol and the Stroke-PROM were reviewed and approved by the Medical Ethics Committee of Shanxi Medical University. Participants signed informed consent forms prior to study participation, and all were compensated for their time.

Study population and design

Patients were enrolled from nine different hospitals, communities, and rural areas in Shanxi province in China. Clinical investigators at all study sites recruited participants using the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). Patients participating in this study were diagnosed with stroke by a physician and were not in the acute phase of stroke. The severity of poststroke sequelae in these patients varied from mild to severe. Individuals with tetraplegia, psychosis, or serious comorbidities (e.g., cancer) were excluded. Control participants were recruited from lists of patients who did not have cerebral vascular disease, cancer, or mental illness. Investigators helped patients with severe visual impairments fill in the questionnaires according to the patients’ verbal responses to items.

Ten patients with stroke were interviewed to identify potential items for use in the questionnaire. Five patients with stroke, three physician experts in stroke and one psychometric expert were interviewed for item revision and refinement to ensure that all items were appropriate and relevant. Five stroke patients were interviewed to evaluate their comprehension of each item. For the first item pool reduction, 135 patients with stroke were recruited from nine different hospitals, communities, and rural areas in Shanxi province; valid data from 133 participants were collected. For the item-revaluation phase and the validation phase of the Stroke-PROM, 485 patients with stroke and 110 controls from the same nine geographical regions were recruited, but only 475 and 104, respectively, were available to participate in the study. There was no overlap in the participants who contributed to the first and second item-reduction processes [16,17].

Development of the Stroke-PROM

The Stroke-PROM was developed in four phases: (1) conceptual framework construction and preliminary item generation; (2) formation of the initial scale by the first item-selection process; (3) formation of the final scale by an item-revaluation process based on the second item-selection process; and (4) validation of the Stroke-PROM. Phase 1 involved a qualitative analysis, whereas the other three phases used quantitative analyses. A flowchart of this four-phase developmental process is shown in Figure 1.

Identifying the conceptual framework and preliminary item content

A comprehensive review of existing stroke questionnaires was performed to identify an appropriate conceptual framework (see Figure 2). Four domains and 10 subdomains were generated. In-depth open-ended interviews of 10 stroke patients (5 men and 5 women; ages: ≤45, n = 2; 45–65, n = 5; ≥65, n = 3) were conducted to identify potential items for the Stroke-PROM using the selected conceptual framework. Patients were interviewed about their symptoms, their main psychological burden, the effects of stroke on them and their families, and their evaluations of the therapeutic effect and medical workers. As a result, a bank of 62 potential items was generated. Four chief physicians and five patients (3 men and 2 women; ages: ≤45, n = 1; 45–65, n = 2; ≥65, n = 2; education: high school degrees or above), all of whom were recruited from the First Hospital of Shanxi Medical University and Second Hospital of Shanxi Medical University, participated in revising these 62 preliminary items using a content validity index (CVI) (62 items and scale structure described in Additional file 1: Appendices 1–1 and 1–2).

The CVI is widely used for quantifying content validity for scales. Item-level CVI (I-CVI) is calculated by having experts rate the relevance of each item to its own subdomain (1 = not relevant, 2 = somewhat relevant, 3 = quite relevant, 4 = highly relevant). The I-CVI of each item is defined as the number of experts offering a rating of 3 or 4, divided by the number of experts.

As an adjustment for chance agreements, the multi-rater kappa statistic (K^*) was adopted and is described as follows [18,19]:

$$ {p}_c=\left[\frac{n!}{A!\left(n-A\right)!}\right]\times {0.5}^n $$

where P _c is the probability of chance agreement, n is the number of experts, and A is the number approving with good relevance. K ^* was calculated using the I-CVI and the probability of chance agreement as follows:

$$ {k}^{*}=\frac{I-CVI-{P}_c}{1-{P}_c} $$

Each item on the scale was then rated as “fair,” “good,” or “excellent,” based on the following rating criteria: fair, K^* = 0.40–0.59; good, K^* = 0.60–0.74; excellent, K^* > 0 .74. Any item that received a “fair” rating was deleted.

Five participants with stroke (2 men, 3 women; ages: ≤45, n = 1; 45–65, n = 2; ≥65, n = 2) were interviewed to evaluate their comprehension of each item. Items that were ambiguous, misunderstood, or rarely answered were reworded. The preliminary scale was developed after modifying the item pool based on the suggestions of the physician experts, psychometric expert and patients as well as on the outcomes of the comprehension tests for patients. In the end, the preliminary tool included 4 domains, 11 subdomains, and 60 items.

Formation of the initial scale

One hundred thirty-three patients were told that the aim of the questionnaire was to measure how much their stroke had affected them. For each issue presented in an item, patients responded using a five-point Likert scale to reflect how often they experienced the issue, where 0 = never, 1 = occasionally, 2 = about half of the time, 3 = often, and 4 = always. Scores of positively worded items were recoded as the original score plus 1, whereas scores of negatively worded items were recoded as 5 minus the original response. This recoding produced a score range for each item of 1 to 5, with a higher score reflecting a more positive PRO.

The item-reduction processes of the preliminary scale were based on both classical test theory (CTT) (e.g., discrete trend, factor analysis, correlation coefficient, Cronbach’s α if item deleted [CAID] values, and corrected item-total correlation [CITC]) and item response theory (IRT). CTT was used to reduce the number of items of the Stroke-PROM in the first four of the following steps, and IRT was used in the fifth step.

In step 1, the standard deviation in the score for every item was calculated. A low standard deviation indicates a low degree of differentiation and should be removed; thus, those items with a low standard deviation (<0.96) were deleted in this study.

In step 2, a principal component factor analysis with varimax rotation aided in item reduction. The value for the Kaiser–Meyer–Olkin measure of sampling adequacy was >0.5 [17]. Items with low factor loading (<0.4) or items with factor loading close to other factors was considered for removal.

In step 3, an item was considered for deletion when the Pearson correlation coefficient between the item and its own subdomain was <0.6, which indicated that the item did not represent the subdomain well.

In step 4, the internal consistency of items was evaluated using the CITC and CAID values. An item was considered to have highly contributed to the measured construct when the CITC value was more than 0.45. The CAID value also determines which item highly contributes to the reliability of the Stroke-PROM. An increase in the CAID value indicates that the items poorly contribute to Cronbach’s α value and should be deleted. Therefore, an item was deleted in the present study when the CITC value was <0.45 and the CAID value increased [20-22].

In step 5, IRT was applied to reduce the number of items in the Stroke-PROM. Each item’s parameters of discrimination (α) and difficulty (b) were estimated. Generally, items with discrimination values <0.4 should be deleted. The value of the four degrees of difficulty (b ₁, b ₂, b ₃, b ₄) ranged from −3 to 3. Items with degrees of difficulty (b ₁, b ₂, b ₃, b ₄) values outside the range of −3 to 3 should be considered for removal [23].

Both the statistical results and clinical relevance of items were also taken into account prior to an item’s deletion. The resulting initial scale resulted from the removal of items from the preliminary scale.

Formation and validation of the final scale

Thus, an initial scale was generated following the evaluation and selection of items from the preliminary scale. To ensure the reliability and validity of each item included in this initial Stroke-PROM, the items were re-evaluated based on a second item-selection of the initial scale. The CTT and IRT were applied once again to re-evaluate the items in the initial Stroke-PROM using the data gathered from 475 stroke patients, generating the final scale. The final Stroke-PROM tool was then evaluated for validity, reliability, and responsiveness using the data obtained from these 475 stroke patients as well as 104 control participants.

Content validity

Content validity was achieved by referring to relevant literature, consulting questionnaires from China and other countries, interviewing 10 patients to identify potential items, and consulting with 5 patients, 3 physician experts and 1 psychometric expert for item revision and refinement to ensure that all items were appropriate and relevant. Content validity was confirmed using the CVI.

Construct validity

Confirmatory factor analysis with the index of model fit was performed to investigate the factor structure of the scale [23]. The model indicates a good fit when the goodness-of-fit index (GFI), normed fit index (NFI), non-normed fit index (NNFI), incremental fit index (IFI) and comparative fit index (CFI) are all >0.9, and the root mean square residual (RMR) is <0.09. GFI, RMR, NFI and CFI range from 0 to 1.

Reliability

Cronbach’s α coefficients for the four domains and the total scale were calculated to measure the internal consistency of the Stroke-PROM. Generally a Cronbach’s α coefficient ≥ 0.7 indicates an acceptable level of internal consistency.

Discriminant validity

The modified Rankin Scale, a frequently used scale for measuring the degree of disability and dependence in the daily activities of people who have had a stroke, was used as the stroke outcome measure in the present study. This ordered scale ranges from 0 (no symptoms) to 5 (severe disability). Discriminant validity was assessed by comparing the mean scores for every subdomain of the Stroke-PROM among healthy participants with those among groups of stroke patients as defined by the Rankin scale, except for the subdomain of treatment. The comparison of means was performed using analysis of variance, with the significance level set at p < 0.05. The rejection of the null hypothesis would indicate that the scale has the ability to differentiate between healthy controls and stroke patients with varying degrees of disability and dependence as defined by the modified Rankin scale.

Feasibility

The feasibility of the Stroke-PROM tool was evaluated by examining the response rate, completion rate and response time to completion. Response and return rates above 95% were deemed adequate, and completion times of 8 to 13 minutes were considered acceptable.

Data analysis software

Data analyses were conducted using SPSS 13.0, Multilog 7.03 and LISREL 8.70 software.

Results

Participant characteristics

Tables 1 and 2 show the characteristics of the 133 patients with stroke who completed the preliminary scale and of the 475 stroke patients and 104 control participants who completed the initial scale.

Table 1 Demographic characteristics of 133 patients with stroke in the first item-selection phase

Full size table

Table 2 Demographic characteristics of 475 stroke patients and 104 controls in the second item-selection phase and the Stroke-PROM validation phase

Full size table

The demographic characteristics of the participants shown in Tables 1 and 2 indicated that the stroke sample population consisted of more men than women, more than 75% of all participants were over 45 years of age, and more than 80% were married. Additionally, approximately 70% of all participants had junior high school education or less. Table 2 shows that the proportion of males with stroke was a little higher than that of healthy males, and that among participants over 65 years old, the proportion of stroke patients was slightly higher than that of healthy participants. The average length of time since stroke diagnosis was approximately 6.3 months and 7.2 months for patients in the first item pool reduction phase, and the revaluation and validation phases, respectively.

Item generation

Four domains, 10 subdomains and a pool of 62 items were generated for the Stroke-PROM based on consulting relevant literature, examining other questionnaires, and interviewing 10 patients to ensure that all germane topics were included. The items and construction of the Stroke-PROM tool are described in Additional file 1: Appendices 1–1 and 1–2.

Four chief physicians and five patients (distinguished by different letters of the alphabet) who had attained a high school degree or above participated in the revision of the 62 items of the Stroke-PROM by rating the items according to the CVI (see Table 3).

Table 3 Content validity index based on grade of patients and experts of preliminary Stroke-PROM

Full size table

On the basis of the CVI results, advice from patients and experts, and the clinical relevance of items, seven items (item PHD3, PHD5, PHD6, PHD9, PHD16, PSD18 and SOD1 shown in Additional file 1: Appendix 1-1) were deleted, and the subdomain of cognition was added. Items PHD10 and PHD11 shown in Additional file 1: Appendix 1-1 were retained based on the advice of patients and experts and other stroke-specific scales. These two items were assimilated into the newly added cognitive subdomain. The following five items were also added: Have you felt any limb abnormalities [such as a burning sensation]?; Do your hands tremble when you reach for or pick up things?; Do you have trouble remembering the date?; When you see an object suddenly, do you struggle to bring its name to mind?; When others talk about your disease, do you prefer not to discuss it? (PHD2, PHD7, PHD10, PHD11 and PSD18 shown in Additional file 1: Appendix 2-1) [13-15]. The CVI values of the five added items were calculated, the K^* values of the five added items were all >0.74, and the five added items were rated “excellent.”

Five stroke patients of varying educational levels were interviewed to evaluate their comprehension of each item. Items that were ambiguous, misunderstood or rarely answered were reworded using comprehension tests for patients with stroke. The preliminary scale was developed after modifying the item pool based on the advice of the experts and the outcomes of the comprehension tests. The preliminary scale included 4 domains, 11 subdomains and 60 items. The items were also reordered (see Additional file 1: Appendices 2–1 and 2–2).

Item reduction

The two-step item selection process is described in Tables 4 and 5. This iterative process resulted in a final version that comprised 46 items within 10 subdomains. (The deletion of the compliance subdomain is explained in the next section.) Each subdomain was named according to its constituent items.

Table 4 Results of the first item-selection phase using CTT and IRT

Full size table

Table 5 Results of the second item-selection phase using CTT and IRT

Full size table

First item-selection phase based on CTT and IRT

Five statistical methods (within CTT and IRT) were used to select items. Any item recommended for deletion by two or more methods was deleted. All items were deleted or added based on their item selection results, their clinical importance, and other stroke-specific scales (see Table 4).

As seen in Table 4, 12 items were removed; however, PHD8 (Do you remember what happened two days ago?) was not deleted, because previously published results indicate that this item is crucial for the assessment of cognition [15]. PSD1 (Are you more prone to worry since your illness?) and PSD2 (Do you get angry easily?) did not discriminate well, so PSD2 was deleted in accord with the opinion of patients and experts. PSD15 (Have you felt depressed while in a cheerful atmosphere?) was deleted because it was not deemed closely relevant to stroke. As a result, 13 items (PHD18, PHD19, PSD2, PSD5, PSD8, PSD9, PSD15, PSD20, SOD1, SOD2, THD1, THD2, THD3; Additional file 1: Appendix 2-1 ) were deleted. All items in the compliance subdomain (THD1, THD2, THD3) were deleted; thus, this subdomain was also deleted [13-15].

Therefore, the initial scale contained 47 items, 10 subdomains, and 4 domains (see Appendices 3–1 and 3–2).

Revaluation phase based on CTT and IRT

To ensure the reliability and validity of each item included in the initial scale, we revaluated the items in this scale based on a second item selection. The evaluation results suggested that all items were perfect, except for item THD3. Thus, CTT and IRT analyses in the revaluation phase led to deletion of item THD3 (Are you satisfied with your medical expenses?) (see Table 5). As a result, the final scale contained 46 items, 10 subdomains, 4 domains (see Tables 6 and 7). This revision of the Stroke-PROM is described in Table 8.

Table 6 Bank of 46 items in the final Stroke-PROM

Full size table

Table 7 Scale structure of the bank of 46 items of the final Stroke-PROM

Full size table

Table 8 Process of revising the Stroke-PROM

Full size table

Evaluation of the scale

The validity, reliability, and responsiveness of the remaining 46 items were assessed and the results are presented in the sections below.

Content validity

The content validity was achieved as outlined in the Methods and was confirmed using the values obtained for the CVIs (see Table 3 and “Item generation” in the Results).

Construct validity

We conducted confirmatory factor analysis (CFA) on the 46 Stroke-PROM items. The index of fit (GFI, RMR, NFI, NNFI, CFI, IFI) met the standard requirements (see Table 9).

Table 9 Goodness of fit statistics of the Stroke-PROM

Full size table

Table 10 presents the 10 subdomains, their corresponding items and standardized factor loadings produced from the CFA. The standardized factor loadings for each of the 46 Stroke-PROM items were above 0.5, except for items PHD1, PHD2, and PHD3; however, these three items were recommended for retention by the results of CTT and IRT analyses. The results indicated that the 46 items showed salient loadings on their specific subdomains, and these 10 subdomains correlated well with the 10 that were conceptualized in the design phase and indicated good construct validity.

Table 10 Maximum likelihood estimation of CFA for the Stroke-PROM

Full size table

Reliability

Cronbach’s α coefficient ≥0.70 is considered acceptable for internal consistency. Cronbach’s α coefficient was 0.905 for the total score, and for the four domains, it ranged from 0.861 to 0.908. These results indicated high internal consistency (see Table 11).

Table 11 Cronbach’s α coefficient of four domains and total scale

Full size table

Discriminant validity

The discriminant validity of each subdomain was examined by comparing mean scores across healthy participants and the groups of stroke patients as defined by their modified Rankin scores. Table 12 indicates that the scales for 9 of the 10 subdomains were significantly different across healthy participants and stroke patients with different degrees of disability and dependence as defined by the modified Rankin scale. Because healthy participants were not treated and therefore could not answer the items in the treatment domain, no comparison of healthy participants was made for the SAT subdomain. However, the SAT subdomain scores for the stroke patient population was not significantly different across the Rankin levels. Overall, the Stroke-PROM was able to differentiate between healthy participants and stroke patients with varying degrees of disability and dependence as defined by the modified Rankin scale.

Table 12 Subdomain scores obtained using the Stroke-PROM instrument in healthy controls and stroke patients with varying degrees of disability and dependence as defined by the modified Rankin scale (mean ± SD)

Full size table

Feasibility

Both the response rate and the completion rate of the Stroke-PROM tool were more than 97%. The average completion time was 8.9 minutes.

Discussion

In this study, we developed and validated a Stroke-PROM for use in the evaluation of outcomes for patients with stroke. The US Food and Drug Administration (FDA) has highlighted the importance of the use of PRO in clinical trials and provided guidance regarding the development of PROMs [24]. The development strategy for the Stroke-PROM in this study complied with those guidelines. To the best of our knowledge, this is the first Stroke-PROM specifically developed and validated for use in clinical trials of new drugs with stroke to include physical, psychological, social and therapeutic domains [25].

The most commonly used stroke-specific measures, the National Institutes of Health Stroke Scale and the Canadian Neurological Scale are clinician-reported outcome measures that assess only the physical aspects of stroke [26-28]. Although the Stroke Impact Scale Version 2.0, the Stroke and Aphasia Quality of Life Scale-39, the Newcastle Stroke-Specific Quality of Life Measure, and the Stroke-Specific Quality of Life measures are all multidimensional PRO measures, no measures have been developed that assess the subdomains of family support and patient satisfaction with treatment [13-15,29]. In contrast to other stroke-specific instruments, our instrument includes these subdomains and therefore fills a gap in the research arena for a stroke PROM [13-15,29-31].

Stroke has considerable adverse physical and psychological impacts on patients over time. Stroke patients need help, understanding and care from their families [32]. Indeed, a growing body of research demonstrates the importance of family relationships for the recovery of functional capacity after stroke [33,34]. A stroke survivor’s family is often the most important source of long-term support during the patients’ recovery and treatment, and family support plays a significant role throughout the poststroke recovery period [35-38]. Family can supply the stroke survivor with physical and mental support, such as providing care in daily life and understanding [39]. Therefore, family support is a necessary addition to the Stroke-PROM.

Satisfaction with treatment is a main outcome measure in new drug clinical trials [24,40]. A Stroke-PROM tool can be used to measure treatment benefit or risk during clinical trials for medical products. Additionally, Stroke-PROM instruments provide optimal information from the patient’s perspective for use in drawing conclusions about the effectiveness of treatment [24]. Thus, the inclusion of a subdomain for treatment satisfaction provides an opportunity for new drug clinical trial participants to integrate into the overall evaluation the different aspects of their responses to treatment, including pain relief, function improvement, and side effects, as well as to provide feedback about the potential acceptability of a new drug and their overall trust in the drug treatment [24,41,42]. Therefore, the subdomain of satisfaction with treatment is also a prudent addition to the Stroke-PROM.

The Stroke-PROM presented here would complement existing stroke-specific measures and has particular value for extending our understanding of the impact of family support and patient satisfaction with treatment in clinical trials of new drugs for stroke. During clinical trials, the Stroke-PROM can be used to simultaneously measure the effect of a medical intervention on several concepts, that is, the measured parameter, such as a symptom or group of symptoms, the medical intervention effects on a particular function or group of functions, or a group of symptoms or functions shown to measure the severity of a health condition. The use of the Stroke-PROM as an outcome measure in clinical trials may facilitate evaluation of the effectiveness across several therapeutic modalities. From the researcher’s perspective, the scale may capture the patient’s experience and treatment benefit or risk, assist researchers in determining which patients with stroke benefit meaningfully from treatment, and facilitate between-trial comparisons [24]. From the pharmaceutical company’s perspective, such an instrument may increase the efficiency of discussions with the FDA during the medical product development process, and provide optimal information from a patient’s perspective for use in making conclusions about treatment effects at the time of medical product approval [24]. From a regulatory perspective, the Stroke-PROM tool may provide a standardized method for assessing treatment effectiveness on basic symptoms so that claims can be supported with PRO evidence in medical product clinical trials [43].

In contrast to the development of other stroke-specific instruments, our study used I-CVI, IRT and CFA as rigorous evidence for item selection, validity and reliability. First, content validity is an essential step in the development of any new scale. None of the previously developed instruments for stroke used statistical methods such as CVI to quantify content validity as was done for the Stroke-PROM tool in our study. The FDA places particular emphasis on demonstrating content validity using open-ended interviews with patients [24]. Identifying the items of the Stroke-PROM based on a review of the literature and other stroke questionnaires, face-to-face interviews with patients, discussions with stroke professionals and the CVI further strengthened the content validity of the preliminary scale in our study.

Second, in the item-selection phase, analyses based on both CTT and IRT were used to delete items. The IRT-based analysis was used more heavily than that based on CTT in the construction of scales for measuring subjective attributes. IRT-based analysis also afforded more accurate examinations of the features of each scale item than the analyses based on CTT. Existing stroke-specific instruments had focused exclusively on CTT statistics (e.g., exploratory factor analysis, Cronbach’s α coefficient) [13,29]. No other stroke-specific instruments existed that had been developed using IRT. CTT statistics are associated with certain disadvantages, whereas methods based on IRT offer several advantages to refine items and therefore to improve on CTT [44].

In our study, both CTT and IRT analyses were repeated during finalization of the item content in the second sample. The results showed that the final Stroke-PROM had a high degree of reliability and validity.

Third, the instrument’s presumed internal structure, supported by CFA, confirmed that the Stroke-PROM measure is multidimensional in nature. No other stroke-specific instruments have established construct validity with CFA.

In summary, our results showed that the scale was valid, reliable and feasible and had strong discriminative properties between healthy controls and stroke patients with varying degrees of disability and dependence as defined by the modified Rankin scale. Although the Stroke-PROM tool was developed primarily for use in clinical trials of new drugs to evaluate their clinical therapeutic effects, this study showed that the Stroke-PROM also had strong discriminative measurement properties and could be used to differentiate patients with stroke from healthy controls. We therefore believe that there is an important role for this Stroke-PROM instrument in clinical practice as well as in clinical trials.

Limitations and further development

The scale has several potential limitations that we will address in future studies.

First, in the evaluation of validity, our study did not explicitly address criterion validity. Most of the patients with stroke were elderly, and completion of two or more scales would have been a significant burden for them, according to our experts on stroke. Stroke patients often experience disturbances of consciousness and physical restlessness; thus, adding more tests could produce test fatigue in these patients, thereby reducing the validity and reliability of measurement. Therefore, instead of asking patients to complete more than one scale, we chose to delay a thorough examination of criterion validity for a future investigation.

Second, test–retest reliability was not measured as part of the validation process. This was due in part to the additional burden it would have placed on patients, but also because of the difficulties inherent in follow-up with patients in their home communities and rural areas. We therefore demonstrated reliability only with internal consistency; however, we conducted our reliability evaluation of items at two points in the process: during the phases of item selection and scale evaluation.

Third, the stroke patient sample differed slightly from the healthy participant sample in two ways: the stroke patient population had a higher proportion of males and of individuals over 65 years old. Future studies should seek to balance these groups.

Because of limited resources (both funding and personnel), the sample populations may not be representative of the entire population of patients with stroke. Our participants were from only the Shanxi province in northern China. Thus, future studies should evaluate reliability and validity of the Stroke-PROM instrument with a nationwide sample.

The Stroke-PROM was administered to native-Chinese-speaking individuals. Therefore, further work is required to test the strengths and weaknesses of this instrument across various national, cultural and language contexts.

Conclusions

Our results provided evidence for satisfactory reliability and validity of the Stroke-PROM. However, the instrument will require additional revisions and improvements through testing in different populations. The ongoing process of modifying the Stroke-PROM will also encompass further validation and reliability testing across various applications of the instrument. The Stroke-PROM is not meant to replace existing stroke-specific measures, but to provide further valuable information on patients with stroke. This innovative instrument may be helpful in both routine medical practice and clinical research.

Abbreviations

PRO:: Patient-reported outcomes
PROM:: Patient-reported outcomes measure
HRQOL:: Health-related quality of life
EQ-5D:: EuroQol 5 dimension
SF-36:: Short Form 36
SF-12:: The short form 12
SIP:: The sickness impact profile
NEWSQOL:: Newcastle stroke-specific quality of life measure
NIHSS:: National institutes of health stroke scale
SAQOL-39:: Stroke and aphasia quality of life scale - 39 item version
SIS:: Stroke impact scale & stroke toolbox
SS-QOL:: Stroke-specific quality of life measure
CNS:: Canadian neurological scale
mRS-SI:: Structured interview for the modified rankin scale
CTT:: Classical test theory
SD:: Standard deviation
CAID:: Cronbach’s α if item deleted
IRT:: item response theory
CITC:: Corrected item-total correlation
CVI:: Content validity index
I-CVI:: Item-level CVI
CFA:: Confirmatory factor analysis
KMO:: Kaiser-meyer-olkin
GFI:: Goodness of fit index
NFI:: Normed fit index
NNFI:: Non normed fit index
IFI:: Incremental fit index
CFI:: Comparative fit index
RMR:: Root mean square residual
PHD:: Physical domain
PSD:: Psychological domain
SOD:: Social domain
THD:: Therapeutic domain
SOS:: Somatic symptom
COG:: Cognition
VEC:: Verbal communication
SHS:: Self-help skills
ANX:: Anxiety
DEP:: Depressed
AVO:: Avoid
SOC:: Social contacts
FAS:: Family support
COM:: Compliance
SAT:: Satisfaction
FDA:: Food and drug administration

References

Kaste M. Every day is a world stroke day act now, be a stroke champion and a torchbearer. Stroke. 2010;41:2449–50.
Article PubMed Google Scholar
Markku K. World stroke day. Stroke. 2011;42:2715.
Article Google Scholar
Claiborne Johnston S, Shanthi M, Mathers CD. Global variation in stroke burden and mortality: estimates from monitoring, surveillance, and modeling. Lancet Neurol. 2009;8:345–54.
Article PubMed Google Scholar
Cerebrovascular disease group of the neurology branch of Chinese medical association for writing ischemic stroke secondary prevention guidelines. The secondary prevention guidelines of Chinese ischemic stroke and transient ischemic attack. Chin J Neurol. 2010;43(2):154–60.
Google Scholar
Jones P, Harding G, Wiklund I, Berry P, Leidy N. Improving the process and outcome of care in COPD:development of a standardised assessment tool. Prim Care Respir J. 2009;18(3):208–15.
Article PubMed Google Scholar
Rinu Susan R, Sarma PS, Pandian JD. Psychosocial problems, quality of life, and functional independence among Indian stroke survivors. Stroke. 2010;41:2932–7.
Article Google Scholar
Björn S, Mika N, Per-Olof E, Paul H, Eva Wikström J. Validation of the clinical COPD questionnaire (CCQ) in primary care. Health Qual Life Outcomes. 2009;7:26.
Article Google Scholar
U.S. Department of Health and Human Services, Food and Drug Administration, Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER), Center for Devices and Radiological Health (CDRH): Guidance for Industry, Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims. Clinical/Medical 2009. [http://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/default.htm]
Deshpande PR, Surulivel R, Lakshmi Sudeepthi B, Abdul Nazir CP. Patient-reported outcomes: a new era in clinical research. Perspect Clin Res. 2011;2(4):136–44.
Article Google Scholar
Reda AA, Daniel K, Janwillem WH K, Geertjan W, Constant P, van Schayck N. Reliability and validity of the clinical COPD questionniare and chronic respiratory questionnaire. Respir Med. 2010;104:1675–82.
Article PubMed Google Scholar
Windisch W, Budweiser S, Heinemann F, Pfeiferb M, Rzehak P. The severe respiratory insufficiency questionnaire was valid for COPD patients with severe chronic respiratory failure. J Clin Epidemiol. 2008;61:848–53.
Article PubMed Google Scholar
The Patient-Reported Outcome and Quality of life Instruments Database.[http://www.proqolid.org/]
Buck D, Jacoby A, Massey A, Steen N, Sharma A, Ford GA. Development and validation of NEWSQOL, the Newcastle stroke-specific quality of life measure. Cerebrovasc Dis. 2004;17:143–52.
Article PubMed Google Scholar
Hilari K, Byng S, Lamping DL, Smith SC. Stroke and aphasia quality of life scale-39 (SAQOL-39) evaluation of acceptability, reliability, and validity. Stroke. 2003;34:1944–50.
Article PubMed Google Scholar
Duncan PW, Dennis W, Sue Min L, Dallas J, Susan E, Louise Jacobs L. The stroke impact scale version 2.0: evaluation of reliability, validity, and sensitivity to change. Stroke. 1999;30:2131–40.
Article CAS PubMed Google Scholar
Xiaoxv Y, Xiaochen T, Yeqing T, Rui Y, Yunxia W, Shiyi C, et al. Development and validation of a tuberculosis medication adherence scale. Plos One. 2012;7(12):e50328.
Article Google Scholar
Richieri R, Boyer L, Reine G, Loundou A, Auquier P, Lançon C, et al. The schizophrenia caregiver quality of life questionnaire (S-CGQoL): development and validation of an instrument to measure quality of life of caregivers of individuals with schizophrenia. Schizophr Res. 2011;126:192–201.
Article CAS PubMed Google Scholar
Polit DF, Cheryl Tatano B, Owen SV. Is the CVI an acceptable indicator of content validity? appraisal and recommendations. Res Nurs Health. 2007;30:459–67.
Article PubMed Google Scholar
Wynd CA, Bruce S, Michelle Atkins S. Two quantitative approaches for estimating content validity. West J Nurs Res. 2003;25(5):508–18.
Article PubMed Google Scholar
Yusoff MSB. The Dundee ready educational environment measure: a confirmatory factor analysis in a sample of Malaysian medical students. Int J Humanities Social Sci. 2012;2(16):313–21.
Google Scholar
Steven DM. Reliability: on the reproducibility of assessment data. Metric Med Educ. 2004;38:1006–12.
Article Google Scholar
Muhamad Saiful Bahri Yusoff, Ahmad Fuad Abdul Rahim, Mohd Jamil Yaacob: The development and validity of the medical student stressor questionnaire(MSSQ). ASEAN Journal of Psychiatry 2010, 11 (1): Jan – June 2010: XX XX.
tot van Nispen Pannerden SC, Candel MJJM, Zwakhalen SMG, Hamers JPH, Curfs LMG, Berger MPF. An item response theory-based assessment of the pain assessment checklist for seniors with limited ability to communicate (PACSLAC). J Pain. 2009;10(8):844–53.
Article Google Scholar
U.S. Department of Health and Human Services, Food and Drug Administration Center for Drug Evaluation and Research, Center for Biologics Evaluation and Research, Center for Devices and Radiological Health. Guidance for industry patient-reported outcome measures: Use in medical product development to support labeling claims. Clin/Med. 2009;12:1–39.
Google Scholar
van der Molen T, Brigitte WM W, Siebrig S, ten Hacken NHT, Postma DS, Juniper EF. Development, validity and responsiveness of the Clinical COPD Questionnaire. Health Qual Life Outcomes. 2003;1:13.
Article PubMed Central PubMed Google Scholar
van Swieten JC, Koudstaal PJ, Visser MC, Schouten HJ, van Gijn J. Interobserver agreement for the assessment of handicap in stroke patients. Stroke. 1988;19:604–7.
Article PubMed Google Scholar
Lyden PD, Mei L, Levine SR, Brott TG, Broderick J. A modified national institutes of health stroke scale for use in stroke clinical trials. Stroke. 2001;32:1310–7.
Article CAS PubMed Google Scholar
Cote R, Hachinski VC, Shurvell BL, Norris JW, Wolfson C. The Canadian neurological scale: a preliminary study in acute stroke. Stroke. 1986;17:731–7.
Article CAS PubMed Google Scholar
Williams LS, Weinberger M, Harris LE, Clark DO, Biller J´. Development of a stroke-specific quality of life scale. Stroke. 1999;30:1362–9.
Article CAS PubMed Google Scholar
Wang Y, Zhao H, Liu Z, Liu B. Reliability, validity and response of patient reported outcome scale in stroke patients with spastic paralysis. Chinese General Practice. 2009;12:1168–70.
Google Scholar
Buck D, Jacoby A, Massey A, Ford G. Evaluation of measures used to assess quality of life after stroke. Stroke. 2000;31:2004–10.
Article CAS PubMed Google Scholar
Watanabe Y, Araki S, Kurihara M. Health-related quality of life of stroke patients’ families during the patients’ hospitalization: a pilot study in Japan. Int J Rehabil Res. 2003;26(1):43–5.
Article PubMed Google Scholar
Palmer S, Glass TA. Family function and stroke recovery: a review. Rehabil Psychol. 2003;48(4):255–65.
Article Google Scholar
Cameron JI, Gary N, Gignac MAM, Mark B, Grace W, Theresa G, et al. Randomized clinical trial of the timing it right stroke family support program: research protocol. Health Serv Res. 2014;14:18.
Article Google Scholar
Clark MS, Sally R, Adrian W. A randomized controlled trial of an education and counselling intervention for families after stroke. Clin Rehabil. 2003;17:703–12.
Article PubMed Google Scholar
Clark PC, Dunbar SB, Shields CG, Viswanathan B, Aycock DM, Wolf SL. Influence of stroke survivor characteristics and family conflict surrounding recovery on Caregivers’ mental and physical health. Nurs Res. 2004;53(6):406–13.
Article PubMed Google Scholar
Lawrence M, Kinn S. Needs, priorities, and desired rehabilitation outcomes of family members of young adults who have had a stroke: findings from a phenomenological study. Disabil Rehabil. 2013;35(7):586–95.
Article PubMed Google Scholar
Anne V-M, Marcel P, Jan Willem G, Berlekom SBV, Trudi Van Den B, Eline L. Rehabilitation of stroke patients needs a family-centred approach. Disabil Rehabil. 2006;28(24):1557–61.
Article Google Scholar
Han B, Haley WE. Family caregiving for patients with stroke review and analysis. Stroke. 1999;30:1478–85.
Article CAS PubMed Google Scholar
Francesco M, Harin Padma N, Andrew M, Brock GB, Gregory B, Sanjeev A, et al. Tadalafil in the treatment of erectile dysfunction following bilateral nerve sparing radical retropubic prostatectomy: a randomized, double-blind, placebo controlled trial. J Urol. 2004;172:1036–41.
Article Google Scholar
Dworkin RH, Turk DC, Wyrwich KW, Beaton D, Cleeland CS, Farrar JT, et al. Interpreting the clinical importance of treatment outcomes in chronic pain clinical trials: IMMPACT recommendations. J Pain. 2008;9(2):105–21.
Article PubMed Google Scholar
Burgio KL, Locher JL, Goode PS, Michael Hardin J, Joan McDowell B, Marianne D, et al. Behavioral vs drug treatment for urge urinary incontinence in older women a randomized controlled trial. JAMA. 1998;280(23):1995.
Article CAS PubMed Google Scholar
Linda A, Krithika R, Polyxane M, Carolyn B, Rod B, Robin C. Development and validation of the Impact of Dry Eye on Everyday Life (IDEEL) questionnaire, a patient-reported outcomes (PRO) measure for the assessment of the burden of dry eye on patients. Health Qual Life Outcomes. 2011;9:111.
Article Google Scholar
Hagman BT, Kuerbis AN, Morgenstern J, Bux DA, Parsons JT, Heidinger BE. An item response theory (IRT) analysis of the short inventory of problems-alcohol and drugs (SIP-AD) among non-treatment seeking men-who-have-sex-with-men: evidence for a shortened 10-item SIP-AD. Addict Behav. 2009;34:948–54.
Article PubMed Central PubMed Google Scholar

Download references

Acknowledgements

This research was supported by the grant from the National Natural Science Foundation of China (grant no: 81273180).

Author information

Authors and Affiliations

Department of Health Statistics, School of Public Health, Shanxi Medical University, 56 South XinJian Road, Taiyuan, Shanxi Province, 030001, People’s Republic of China
Yanhong Luo, Jie Yang & Yanbo Zhang

Authors

Yanhong Luo
View author publications
You can also search for this author in PubMed Google Scholar
Jie Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yanbo Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanbo Zhang.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

All authors participated in the design of the study; YHL participated in data analysis and drafted the manuscript; JY collected and analysed data; YBZ developed the original concept for this study, supervised the data analysis and revised the manuscript. All authors read and approved the final manuscript for this study.

Additional file

Additional file 1:

Appendix 1–1. Bank of 62 preliminary items of the Stroke-PROM Appendix 1–2. Scale structure of the bank of 62 preliminary items of the Stroke-PROM. Appendix 2–1. Bank of 60 items of the preliminary Stroke-PROM. Appendix 2–2. Scale structure of the bank of 60 items of the preliminary Stroke-PROM. Appendix 3–1. Bank of 47 items of the initial Stroke-PROM. Appendix 3–2. Scale structure of the bank of 47 items of the initial Stroke-PROM.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Luo, Y., Yang, J. & Zhang, Y. Development and validation of a patient-reported outcome measure for stroke patients. Health Qual Life Outcomes 13, 53 (2015). https://doi.org/10.1186/s12955-015-0246-0

Download citation

Received: 22 November 2014
Accepted: 17 April 2015
Published: 08 May 2015
DOI: https://doi.org/10.1186/s12955-015-0246-0

Development and validation of a patient-reported outcome measure for stroke patients

Abstract

Background

Methods

Results

Conclusions

Background

Methods

Ethics statement

Study population and design

Development of the Stroke-PROM

Identifying the conceptual framework and preliminary item content

Formation of the initial scale

Formation and validation of the final scale

Content validity

Construct validity

Reliability

Discriminant validity

Feasibility

Data analysis software

Results

Participant characteristics

Item generation

Item reduction

First item-selection phase based on CTT and IRT

Revaluation phase based on CTT and IRT

Evaluation of the scale

Content validity

Construct validity

Reliability

Discriminant validity

Feasibility

Discussion

Limitations and further development

Conclusions

Abbreviations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Additional file

Additional file 1:

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Health and Quality of Life Outcomes

Contact us