Development and validation of a patient-reported outcome measure for stroke patients

Background Family support and patient satisfaction with treatment are crucial for aiding in the recovery from stroke. However, current validated stroke-specific questionnaires may not adequately capture the impact of these two variables on patients undergoing clinical trials of new drugs. Therefore, the aim of this study was to develop and evaluate a new stroke patient-reported outcome measure (Stroke-PROM) instrument for capturing more comprehensive effects of stroke on patients participating in clinical trials of new drugs. Methods A conceptual framework and a pool of items for the preliminary Stroke-PROM were generated by consulting the relevant literature and other questionnaires created in China and other countries, and interviewing 20 patients and 4 experts to ensure that all germane parameters were included. During the first item-selection phase, classical test theory and item response theory were applied to an initial scale completed by 133 patients with stroke. During the item-revaluation phase, classical test theory and item response theory were used again, this time with 475 patients with stroke and 104 healthy participants. During the scale assessment phase, confirmatory factor analysis was applied to the final scale of the Stroke-PROM using the same study population as in the second item-selection phase. Reliability, validity, responsiveness and feasibility of the final scale were tested. Results The final scale of Stroke-PROM contained 46 items describing four domains (physiology, psychology, society and treatment). These four domains were subdivided into 10 subdomains. Cronbach’s α coefficients for the four domains ranged from 0.861 to 0.908. Confirmatory factor analysis supported the validity of the final scale, and the model fit index satisfied the criterion. Differences in the Stroke-PROM mean scores were significant between patients with stroke and healthy participants in nine subdomains (P < 0.001), indicating that the scale showed good responsiveness. Conclusions The Stroke-PROM is a patient-reported outcome multidimensional questionnaire developed especially for clinical trials of new drugs and is focused on issues of family support and patient satisfaction with treatment. Extensive data analyses supported the validity, reliability and responsiveness of the Stroke-PROM. Electronic supplementary material The online version of this article (doi:10.1186/s12955-015-0246-0) contains supplementary material, which is available to authorized users.


Background
Stroke is the second leading cause of mortality worldwide [1], and stroke survivors are often severely disabled for the rest of their lives [2]. More than 85% of strokes occur in developing countries [3]. Epidemiological surveys have shown that there are 150-200 million new cases of stroke each year in China. The age-adjusted annual incidence rate of stroke is 116-219 per 100,000 people, and the annual mortality rate from stroke is 58-142 per 100,000 people [4].
Stroke has considerable adverse physical and psychological impacts on patients over time [5,6]. For the diagnosis and treatment of stroke and its sequelae, therefore, purely objective indicators do not accurately measure the multifaceted impact of stroke on patients. Assessment of the effects of treatment on any individual patient should include the patient's own evaluation of therapy, or patient-reported outcome (PRO) [7]. A PRO is any report of the status of a patient's health condition that comes directly from the patient, without interpretation of the patient's response by a clinician or anyone else [8,9].
In recent years, multiple measures, including generic and disease-specific measures, have been used to assess outcomes of patients with stroke. Generic instruments are useful for comparing quality of life impact in populations with different diseases; however, disease-specific tools are generally more responsive and sensitive to disease-specific issues and are therefore more appropriate for clinical trials in which specific therapeutic interventions are being evaluated [10,11]. Although PRO tools developed specifically for stroke do exist (e.g., Newcastle Stroke-Specific Quality of Life Measure; Stroke and Aphasia Quality of Life Scale-39 item version; Stroke Impact Scale version 2.0), a review of these instruments yielded no measure that captures PRO associated with family support and patient satisfaction with treatment, two particularly significant issues for many stroke survivors [12][13][14][15]. Given the absence of strokespecific measures in the subdomains of family support and treatment satisfaction, the development is necessary of a more comprehensive multidimensional scale that evaluates all facets of the health status in patients with stroke.
Therefore, the aim of this study was to develop an understandable, reliable and valid PRO measure for patients with stroke that captures valuable data from the patient's viewpoint. This article reports on the development of the initial pool of items, selection of the final item set, and evaluation of a new stroke patient-reported outcome measure (Stroke-PROM).

Ethics statement
The study protocol and the Stroke-PROM were reviewed and approved by the Medical Ethics Committee of Shanxi Medical University. Participants signed informed consent forms prior to study participation, and all were compensated for their time.

Study population and design
Patients were enrolled from nine different hospitals, communities, and rural areas in Shanxi province in China. Clinical investigators at all study sites recruited participants using the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). Patients participating in this study were diagnosed with stroke by a physician and were not in the acute phase of stroke. The severity of poststroke sequelae in these patients varied from mild to severe. Individuals with tetraplegia, psychosis, or serious comorbidities (e.g., cancer) were excluded. Control participants were recruited from lists of patients who did not have cerebral vascular disease, cancer, or mental illness. Investigators helped patients with severe visual impairments fill in the questionnaires according to the patients' verbal responses to items.
Ten patients with stroke were interviewed to identify potential items for use in the questionnaire. Five patients with stroke, three physician experts in stroke and one psychometric expert were interviewed for item revision and refinement to ensure that all items were appropriate and relevant. Five stroke patients were interviewed to evaluate their comprehension of each item. For the first item pool reduction, 135 patients with stroke were recruited from nine different hospitals, communities, and rural areas in Shanxi province; valid data from 133 participants were collected. For the item-revaluation phase and the validation phase of the Stroke-PROM, 485 patients with stroke and 110 controls from the same nine geographical regions were recruited, but only 475 and 104, respectively, were available to participate in the study. There was no overlap in the participants who contributed to the first and second item-reduction processes [16,17].

Development of the Stroke-PROM
The Stroke-PROM was developed in four phases: (1) conceptual framework construction and preliminary item generation; (2) formation of the initial scale by the first item-selection process; (3) formation of the final scale by an item-revaluation process based on the second item-selection process; and (4) validation of the Stroke-PROM. Phase 1 involved a qualitative analysis, whereas the other three phases used quantitative analyses. A flowchart of this four-phase developmental process is shown in Figure 1.

Identifying the conceptual framework and preliminary item content
A comprehensive review of existing stroke questionnaires was performed to identify an appropriate conceptual framework (see Figure 2). Four domains and 10 subdomains were generated. In-depth open-ended interviews of 10 stroke patients (5 men and 5 women; ages: ≤45, n = 2; 45-65, n = 5; ≥65, n = 3) were conducted to identify potential items for the Stroke-PROM using the selected conceptual framework. Patients were interviewed about their symptoms, their main psychological burden, the effects of stroke on them and their families, and their evaluations of the therapeutic effect and medical workers. As a result, a bank of 62 potential items was generated. Four chief physicians and five patients (3 men and 2 women; ages: ≤45, n = 1; 45-65, n = 2; ≥65, n = 2; education: high school degrees or above), all of whom were recruited from the First Hospital of Shanxi Medical University and Second Hospital of Shanxi Medical University, participated in revising these 62 preliminary items using a content validity index (CVI) (62 items and scale structure described in Additional file 1: Appendices 1-1 and 1-2).
The CVI is widely used for quantifying content validity for scales. Item-level CVI (I-CVI) is calculated by having experts rate the relevance of each item to its own subdomain (1 = not relevant, 2 = somewhat relevant, 3 = quite relevant, 4 = highly relevant). The I-CVI of each item is defined as the number of experts offering a rating of 3 or 4, divided by the number of experts.
As an adjustment for chance agreements, the multirater kappa statistic (K * ) was adopted and is described as follows [18,19]: where P c is the probability of chance agreement, n is the number of experts, and A is the number approving with good relevance. K * was calculated using the I-CVI and the probability of chance agreement as follows: Each item on the scale was then rated as "fair," "good," or "excellent," based on the following rating criteria: fair, K * = 0.40-0.59; good, K * = 0.60-0.74; excellent, K * > 0 .74. Any item that received a "fair" rating was deleted.
Five participants with stroke (2 men, 3 women; ages: ≤45, n = 1; 45-65, n = 2; ≥65, n = 2) were interviewed to evaluate their comprehension of each item. Items that were ambiguous, misunderstood, or rarely answered were reworded. The preliminary scale was developed after modifying the item pool based on the suggestions of the physician experts, psychometric expert and patients as well as on the outcomes of the comprehension tests for patients. In the end, the preliminary tool included 4 domains, 11 subdomains, and 60 items.

Formation of the initial scale
One hundred thirty-three patients were told that the aim of the questionnaire was to measure how much their stroke had affected them. For each issue presented in an item, patients responded using a five-point Likert scale to reflect how often they experienced the issue, where 0 = never, 1 = occasionally, 2 = about half of the time, 3 = often, and 4 = always. Scores of positively worded items were recoded as the original score plus 1, whereas scores of negatively worded items were recoded as 5 minus the original response. This recoding produced a score range for each item of 1 to 5, with a higher score reflecting a more positive PRO.
The item-reduction processes of the preliminary scale were based on both classical test theory (CTT) (e.g., discrete trend, factor analysis, correlation coefficient, Cronbach's α if item deleted [CAID] values, and corrected item-total correlation [CITC]) and item response theory (IRT). CTT was used to reduce the number of items of the Stroke-PROM in the first four of the following steps, and IRT was used in the fifth step.
In step 1, the standard deviation in the score for every item was calculated. A low standard deviation indicates a low degree of differentiation and should be removed; thus, those items with a low standard deviation (<0.96) were deleted in this study.
In step 2, a principal component factor analysis with varimax rotation aided in item reduction. The value for the Kaiser-Meyer-Olkin measure of sampling adequacy was >0.5 [17]. Items with low factor loading (<0.4) or items with factor loading close to other factors was considered for removal.
In step 3, an item was considered for deletion when the Pearson correlation coefficient between the item and its own subdomain was <0.6, which indicated that the item did not represent the subdomain well.
In step 4, the internal consistency of items was evaluated using the CITC and CAID values. An item was considered to have highly contributed to the measured construct when the CITC value was more than 0.45. The CAID value also determines which item highly contributes to the reliability of the Stroke-PROM. An increase in the CAID value indicates that the items poorly contribute to Cronbach's α value and should be deleted. Therefore, an item was deleted in the present study when the CITC value was <0.45 and the CAID value increased [20][21][22].
In step 5, IRT was applied to reduce the number of items in the Stroke-PROM. Each item's parameters of discrimination (α) and difficulty (b) were estimated. Generally, items with discrimination values <0.4 should be deleted. The value of the four degrees of difficulty (b 1 , b 2 , b 3 , b 4 ) ranged from −3 to 3. Items with degrees of difficulty (b 1 , b 2 , b 3 , b 4 ) values outside the range of −3 to 3 should be considered for removal [23].
Both the statistical results and clinical relevance of items were also taken into account prior to an item's deletion. The resulting initial scale resulted from the removal of items from the preliminary scale.

Formation and validation of the final scale
Thus, an initial scale was generated following the evaluation and selection of items from the preliminary scale. To ensure the reliability and validity of each item included in this initial Stroke-PROM, the items were reevaluated based on a second item-selection of the initial scale. The CTT and IRT were applied once again to reevaluate the items in the initial Stroke-PROM using the data gathered from 475 stroke patients, generating the final scale. The final Stroke-PROM tool was then evaluated for validity, reliability, and responsiveness using the data obtained from these 475 stroke patients as well as 104 control participants.

Content validity
Content validity was achieved by referring to relevant literature, consulting questionnaires from China and other countries, interviewing 10 patients to identify potential items, and consulting with 5 patients, 3 physician experts and 1 psychometric expert for item revision and refinement to ensure that all items were appropriate and relevant. Content validity was confirmed using the CVI.

Construct validity
Confirmatory factor analysis with the index of model fit was performed to investigate the factor structure of the scale [23]. The model indicates a good fit when the goodness-of-fit index (GFI), normed fit index (NFI), non-normed fit index (NNFI), incremental fit index (IFI) and comparative fit index (CFI) are all >0.9, and the root mean square residual (RMR) is <0.09. GFI, RMR, NFI and CFI range from 0 to 1.

Reliability
Cronbach's α coefficients for the four domains and the total scale were calculated to measure the internal consistency of the Stroke-PROM. Generally a Cronbach's α coefficient ≥ 0.7 indicates an acceptable level of internal consistency.

Discriminant validity
The modified Rankin Scale, a frequently used scale for measuring the degree of disability and dependence in the daily activities of people who have had a stroke, was used as the stroke outcome measure in the present study. This ordered scale ranges from 0 (no symptoms) to 5 (severe disability). Discriminant validity was assessed by comparing the mean scores for every subdomain of the Stroke-PROM among healthy participants with those among groups of stroke patients as defined by the Rankin scale, except for the subdomain of treatment. The comparison of means was performed using analysis of variance, with the significance level set at p < 0.05. The rejection of the null hypothesis would indicate that the scale has the ability to differentiate between healthy controls and stroke patients with varying degrees of disability and dependence as defined by the modified Rankin scale.

Feasibility
The feasibility of the Stroke-PROM tool was evaluated by examining the response rate, completion rate and response time to completion. Response and return rates above 95% were deemed adequate, and completion times of 8 to 13 minutes were considered acceptable.

Data analysis software
Data analyses were conducted using SPSS 13.0, Multilog 7.03 and LISREL 8.70 software.

Participant characteristics
Tables 1 and 2 show the characteristics of the 133 patients with stroke who completed the preliminary scale and of the 475 stroke patients and 104 control participants who completed the initial scale.
The demographic characteristics of the participants shown in Tables 1 and 2 indicated that the stroke sample population consisted of more men than women, more than 75% of all participants were over 45 years of age, and more than 80% were married. Additionally, approximately 70% of all participants had junior high school education or less. Table 2 shows that the proportion of males with stroke was a little higher than that of healthy males, and that among participants over 65 years old, the proportion of stroke patients was slightly higher than that of healthy participants. The average length of time since stroke diagnosis was approximately 6.3 months

Item generation
Four domains, 10 subdomains and a pool of 62 items were generated for the Stroke-PROM based on consulting relevant literature, examining other questionnaires, and interviewing 10 patients to ensure that all germane topics were included. The items and construction of the Stroke-PROM tool are described in Additional file 1: Appendices 1-1 and 1-2. Four chief physicians and five patients (distinguished by different letters of the alphabet) who had attained a high school degree or above participated in the revision of the 62 items of the Stroke-PROM by rating the items according to the CVI (see Table 3).
On the basis of the CVI results, advice from patients and experts, and the clinical relevance of items, seven items (item PHD3, PHD5, PHD6, PHD9, PHD16, PSD18 and SOD1 shown in Additional file 1: Appendix 1-1) were deleted, and the subdomain of cognition was added. Items PHD10 and PHD11 shown in Additional file 1: Appendix 1-1 were retained based on the advice of patients and experts and other stroke-specific scales. These two items were assimilated into the newly added cognitive subdomain. The following five items were also added: Have you felt any limb abnormalities [such as a burning sensation]?; Do your hands tremble when you reach for or pick up things?; Do you have trouble remembering the date?; When you see an object suddenly, do you struggle to bring its name to mind?; When others talk about your disease, do you prefer not to discuss it? (PHD2, PHD7, PHD10, PHD11 and PSD18 shown in Additional file 1: Appendix 2-1) [13][14][15]. The CVI values of the five added items were calculated, the K * values of the five added items were all >0.74, and the five added items were rated "excellent." Five stroke patients of varying educational levels were interviewed to evaluate their comprehension of each item. Items that were ambiguous, misunderstood or rarely answered were reworded using comprehension tests for patients with stroke. The preliminary scale was developed after modifying the item pool based on the advice of the experts and the outcomes of the comprehension tests. The preliminary scale included 4 domains, 11 subdomains and 60 items. The items were also reordered (see Additional file 1: Appendices 2-1 and 2-2).

Item reduction
The two-step item selection process is described in Tables 4 and 5. This iterative process resulted in a

First item-selection phase based on CTT and IRT
Five statistical methods (within CTT and IRT) were used to select items. Any item recommended for deletion by two or more methods was deleted. All items were deleted or added based on their item selection results, their clinical importance, and other stroke-specific scales (see Table 4). As seen in Table 4, 12 items were removed; however, PHD8 (Do you remember what happened two days ago?) was not deleted, because previously published results indicate that this item is crucial for the assessment of cognition [15]. PSD1 (Are you more prone to worry since your illness?) and PSD2 (Do you get angry easily?) did not discriminate well, so PSD2 was deleted in accord with the opinion of patients and experts. PSD15 (Have you felt depressed while in a cheerful atmosphere?) was deleted because it was not deemed closely relevant to stroke. As a result, 13 items (PHD18, PHD19, PSD2, PSD5, PSD8, PSD9, PSD15, PSD20, SOD1, SOD2, THD1, THD2, THD3; Additional file 1: Appendix 2-1 ) were deleted. All items in the compliance subdomain (THD1, THD2, THD3) were deleted; thus, this subdomain was also deleted [13][14][15]. Therefore, the initial scale contained 47 items, 10 subdomains, and 4 domains (see Appendices 3-1 and 3-2).

Revaluation phase based on CTT and IRT
To ensure the reliability and validity of each item included in the initial scale, we revaluated the items in this scale based on a second item selection. The evaluation results suggested that all items were perfect, except for item THD3. Thus, CTT and IRT analyses in the revaluation phase led to deletion of item THD3 (Are you satisfied with your medical expenses?) (see Table 5). As a result, the final scale contained 46 items, 10 subdomains, 4 domains (see Tables 6 and 7). This revision of the Stroke-PROM is described in Table 8.

Evaluation of the scale
The validity, reliability, and responsiveness of the remaining 46 items were assessed and the results are presented in the sections below.

Content validity
The content validity was achieved as outlined in the Methods and was confirmed using the values obtained for the CVIs (see Table 3 and "Item generation" in the Results).

Construct validity
We conducted confirmatory factor analysis (CFA) on the 46 Stroke-PROM items. The index of fit (GFI, RMR,    NFI, NNFI, CFI, IFI) met the standard requirements (see Table 9). Table 10 presents the 10 subdomains, their corresponding items and standardized factor loadings produced from the CFA. The standardized factor loadings for each of the 46 Stroke-PROM items were above 0.5, except for items PHD1, PHD2, and PHD3; however, these three items were recommended for retention by the results of CTT and IRT analyses. The results indicated that the 46 items showed salient loadings on their specific subdomains, and these 10 subdomains correlated well with the 10 that were conceptualized in the design phase and indicated good construct validity.

Reliability
Cronbach's α coefficient ≥0.70 is considered acceptable for internal consistency. Cronbach's α coefficient was 0.905 for the total score, and for the four domains, it ranged from 0.861 to 0.908. These results indicated high internal consistency (see Table 11).

Discriminant validity
The discriminant validity of each subdomain was examined by comparing mean scores across healthy participants and the groups of stroke patients as defined by their modified Rankin scores. Table 12 indicates that the scales for 9 of the 10 subdomains were significantly different across healthy participants and stroke patients with different degrees of disability and dependence as defined by the modified Rankin scale. Because healthy participants were not treated and therefore could not answer the items in the treatment domain, no comparison of healthy participants was made for the SAT subdomain. However, the SAT subdomain scores for the stroke patient population was not significantly different across the Rankin levels. Overall, the Stroke-PROM was able to differentiate between healthy participants and stroke patients with varying degrees of disability and dependence as defined by the modified Rankin scale.

Feasibility
Both the response rate and the completion rate of the Stroke-PROM tool were more than 97%. The average completion time was 8.9 minutes.

Discussion
In this study, we developed and validated a Stroke-PROM for use in the evaluation of outcomes for patients with stroke. The US Food and Drug Administration (FDA) has highlighted the importance of the use of PRO in clinical trials and provided guidance regarding the development of PROMs [24]. The development strategy for the Stroke-PROM in this study complied with those guidelines. To the best of our knowledge, this is the first Stroke-PROM specifically developed and validated for use in clinical trials of new drugs with stroke to include physical, psychological, social and therapeutic domains [25].
The most commonly used stroke-specific measures, the National Institutes of Health Stroke Scale and the Canadian Neurological Scale are clinician-reported outcome measures that assess only the physical aspects of stroke [26][27][28]. Although the Stroke Impact Scale Version 2.0, the Stroke and Aphasia Quality of Life Scale-39, the Newcastle Stroke-Specific Quality of Life Measure, and the Stroke-Specific Quality of Life measures are all multidimensional PRO measures, no measures have been developed that assess the subdomains of family support and patient satisfaction with treatment [13][14][15]29]. In contrast to other stroke-specific instruments, our instrument includes these subdomains and therefore fills a gap in the research arena for a stroke PROM [13][14][15][29][30][31].
Stroke has considerable adverse physical and psychological impacts on patients over time. Stroke patients need help, understanding and care from their families [32]. Indeed, a growing body of research demonstrates the importance of family relationships for the recovery of functional capacity after stroke [33,34]. A stroke survivor's family is often the most important source of long-term support during the patients' recovery and treatment, and family support plays a significant role throughout the poststroke recovery period [35][36][37][38]. Family can supply the stroke survivor with physical and mental support, such as providing care in daily life and understanding [39]. Therefore, family support is a necessary addition to the Stroke-PROM.
Satisfaction with treatment is a main outcome measure in new drug clinical trials [24,40]. A Stroke-PROM tool can be used to measure treatment benefit or risk during clinical trials for medical products. Additionally, Stroke-PROM instruments provide optimal information from the patient's perspective for use in drawing conclusions about the effectiveness of treatment [24]. Thus, the inclusion of a subdomain for treatment satisfaction provides an opportunity for new drug clinical trial participants to integrate into the overall evaluation the different aspects of their responses to treatment, including pain relief, function improvement, and side effects, as well as to provide feedback about the potential acceptability of a new drug and their overall trust in the drug treatment [24,41,42]. Therefore, the subdomain of satisfaction with treatment is also a prudent addition to the Stroke-PROM.
The Stroke-PROM presented here would complement existing stroke-specific measures and has particular value for extending our understanding of the impact of family support and patient satisfaction with treatment in clinical trials of new drugs for stroke. During clinical trials, the Stroke-PROM can be used to simultaneously measure the effect of a medical intervention on several concepts, that is, the measured parameter, such as a symptom or group of symptoms, the medical intervention effects on a particular function or group of functions, or a group of symptoms or functions shown to measure the severity of a health condition. The use of the Stroke-PROM as an outcome measure in clinical trials may facilitate evaluation of the effectiveness across  several therapeutic modalities. From the researcher's perspective, the scale may capture the patient's experience and treatment benefit or risk, assist researchers in determining which patients with stroke benefit meaningfully from treatment, and facilitate between-trial comparisons [24]. From the pharmaceutical company's perspective, such an instrument may increase the efficiency of discussions with the FDA during the medical product development process, and provide optimal information from a patient's perspective for use in making conclusions about treatment effects at the time of medical product approval [24]. From a regulatory perspective, the Stroke-PROM tool may provide a standardized method for assessing treatment effectiveness on basic symptoms so that claims can be supported with PRO evidence in medical product clinical trials [43].
In contrast to the development of other stroke-specific instruments, our study used I-CVI, IRT and CFA as rigorous evidence for item selection, validity and reliability. First, content validity is an essential step in the development of any new scale. None of the previously developed instruments for stroke used statistical methods such as CVI to quantify content validity as was done for the Stroke-PROM tool in our study. The FDA places particular emphasis on demonstrating content validity using open-ended interviews with patients [24]. Identifying the items of the Stroke-PROM based on a review of the literature and other stroke questionnaires, face-to-face interviews with patients, discussions with stroke professionals and the CVI further strengthened the content validity of the preliminary scale in our study.
Second, in the item-selection phase, analyses based on both CTT and IRT were used to delete items. The IRTbased analysis was used more heavily than that based on CTT in the construction of scales for measuring subjective attributes. IRT-based analysis also afforded more accurate examinations of the features of each scale item than the analyses based on CTT. Existing stroke-specific instruments had focused exclusively on CTT statistics (e.g., exploratory factor analysis, Cronbach's α coefficient) [13,29]. No other stroke-specific instruments existed that had been developed using IRT. CTT statistics are associated with certain disadvantages, whereas methods based on IRT offer several advantages to refine items and therefore to improve on CTT [44].
In our study, both CTT and IRT analyses were repeated during finalization of the item content in the second sample. The results showed that the final Stroke-PROM had a high degree of reliability and validity.
Third, the instrument's presumed internal structure, supported by CFA, confirmed that the Stroke-PROM measure is multidimensional in nature. No other strokespecific instruments have established construct validity with CFA.
In summary, our results showed that the scale was valid, reliable and feasible and had strong discriminative properties between healthy controls and stroke patients with varying degrees of disability and dependence as defined by the modified Rankin scale. Although the Stroke-PROM tool was developed primarily for use in clinical trials of new drugs to evaluate their clinical therapeutic effects, this study showed that the Stroke-PROM also had strong discriminative measurement properties and could be used to differentiate patients with stroke from healthy controls. We therefore believe that there is an important role for this Stroke-PROM instrument in clinical practice as well as in clinical trials.

Limitations and further development
The scale has several potential limitations that we will address in future studies.
First, in the evaluation of validity, our study did not explicitly address criterion validity. Most of the patients with stroke were elderly, and completion of two or more scales would have been a significant burden for them, according to our experts on stroke. Stroke patients often experience disturbances of consciousness and physical restlessness; thus, adding more tests could produce test fatigue in these patients, thereby reducing the validity and reliability of measurement. Therefore, instead of asking patients to complete more than one scale, we  chose to delay a thorough examination of criterion validity for a future investigation. Second, test-retest reliability was not measured as part of the validation process. This was due in part to the additional burden it would have placed on patients, but also because of the difficulties inherent in follow-up with patients in their home communities and rural areas. We therefore demonstrated reliability only with internal consistency; however, we conducted our reliability evaluation of items at two points in the process: during the phases of item selection and scale evaluation.
Third, the stroke patient sample differed slightly from the healthy participant sample in two ways: the stroke patient population had a higher proportion of males and of individuals over 65 years old. Future studies should seek to balance these groups.
Because of limited resources (both funding and personnel), the sample populations may not be representative of the entire population of patients with stroke. Our participants were from only the Shanxi province in northern China. Thus, future studies should evaluate reliability and validity of the Stroke-PROM instrument with a nationwide sample.
The Stroke-PROM was administered to native-Chinesespeaking individuals. Therefore, further work is required to test the strengths and weaknesses of this instrument across various national, cultural and language contexts.

Conclusions
Our results provided evidence for satisfactory reliability and validity of the Stroke-PROM. However, the instrument will require additional revisions and improvements through testing in different populations. The ongoing process of modifying the Stroke-PROM will also encompass further validation and reliability testing across various applications of the instrument. The Stroke-PROM is not meant to replace existing strokespecific measures, but to provide further valuable information on patients with stroke. This innovative instrument may be helpful in both routine medical practice and clinical research.