The development and preliminary validation of a Preference-Based Stroke Index (PBSI)

Background Health-related quality of life (HRQL) is a key issue in disabling conditions like stroke. Unfortunately, HRQL is often difficult to quantify in a comprehensive measure that can be used in cost analyses. Preference-based HRQL measures meet this challenge. To date, there are no existing preference-based HRQL measure for stroke that could be used as an outcome in clinical and economic studies of stroke. The aim of this study was to develop the first stroke-specific health index, the Preference-based Stroke Index (PBSI). Methods The PBSI includes 10 items; walking, climbing stairs, physical activities/sports, recreational activities, work, driving, speech, memory, coping and self-esteem. Each item has a 3-point response scale. Items known to be impacted by a stroke were selected. Scaling properties and preference-weights obtained from individuals with stroke and their caregivers were used to develop a cumulative score. Results Compared to the EQ-5D, the PBSI showed no ceiling effect in a high-functioning stroke population. Moderately high correlations were found between the physical function (r = 0.78), vitality (r = 0.67), social functioning (r = 0.64) scales of the SF-36 and the PBSI. The lowest correlation was with the role emotional scale of the SF-36 (r = 0.32). Our results indicated that the PBSI can differentiate patients by severity of stroke (p < 0.05) and level of functional independence (p < 0.0001). Conclusions Content validity and preliminary evidence of construct validity has been demonstrated. Further work is needed to develop a multiattribute utility function to gather information on psychometric properties of the PBSI.


Background
There is increasing recognition that clinical benefits from the patient's point of view can best be quantified in terms of health-related quality of life (HRQL). This concept emerged in the mid 80's when the need was identified for a construct that would capture the impairments, functional states, perceptions and social opportunities that can be influenced by disease [1]. HRQL has been clearly identified as being influenced by an individual's capacity to perform and participate in various activities [2][3][4] and thus becomes highly meaningful in a disease such as stroke where the impact is often life-long and multidimensional. One approach to assess HRQL in various populations is to use health profiles. Health profiles, whether generic, like the SF-36 [5] or specific, like the Stroke Impact Scale (SIS) [6] have been used in many studies of stroke [7][8][9][10][11]. They are useful in identifying the extent by which health status is affected and, more precisely, in identifying the dimensions where the difficulties arise. However, the scoring systems of health profiles are often developed on the basis of sub-scales with no single summary score of overall health status. The absence of a summary score complicates the use of health profiles, like the SF-36, in studies where cost is an issue. Indeed, would an intervention be qualified as being cost-effective if it had a positive impact on physical health but a negative one on mental health? Unless one would know the relative importance attached to both dimensions, it would be impossible to conclude on an overall net improvement or deficit of HRQL. The complication of using health profiles becomes quite evident, the intervention is cost-effective on one hand but not on the other, should the intervention be offered or not?
Also available are health indexes that portray the HRQL of an individual on selected domains that are weighted to reflect the person's preferences. Recognizing the importance of integrating the person's value system [12] in the assessment of one's HRQL, health indexes go one step further than health profiles. This portrait of health is assigned a value ranging from 0 (death) to 1 (perfect health). This value is assumed to represent the preference an individual has for this health state and it can be obtained using different elicitation techniques, the most common being the standard gamble (SG), time trade-off (TTO) and visual analog scales (VAS). Preference scores obtained under risk and uncertainty are called "utilities" while those elicited without these conditions are called "values".
Generic health indices, like the Health Utilities Index (HUI) [13,14], the EuroQoL (EQ-5D index) [15,16] or the Quality of Well-Being (QWB) [17] scales, have been developed to provide a classification of health states weighted on the basis of individuals' preferences. Each health state generated by any of the scales is associated with a single comprehensive score. Studies in stroke have reported a more frequent use of the EQ-5D [9][10][11][18][19][20] compared to the HUI2 or HUI3 [21,22], perhaps due to the shortness and ease of completeness of the EQ-5D index compared to the latest versions of the HUI, either the HUI2 or HUI3. To date, no studies in stroke have reported the use of the QWB.
While both measures, the HUI (HUI2 or HUI3 versions) and the EQ-5D index demonstrate good psychometric properties [9,[20][21][22], they lack content validity for use with the stroke population. Indeed, the HUI is more 'impairment' oriented and neglects the activity component of health as defined by the World Health Organization [23], while the EQ-5D index does not include certain problems that are prevalent in stroke survivors, such as speech [24] and cognition [25][26][27]. Further, there is some evidence of a ceiling effect of the EQ-5D when used with the stroke population. [11].
While a few disease-specific health indices have been developed during the past few years [28,29], there has not been one for stroke. The need for a stroke health index has been recognized for several reasons. First, with its relatively stable incidence rate and declining mortality [30], stroke is expected to remain one of the most prevalent chronic diseases in the aged, generating high costs for our health care system. Second, new stroke treatments (e.g. drug therapy) are emerging and their impact will need to be measured. Third, with the aging of the population, stroke is only one among many health conditions our health system will need to deal with in future years. With ongoing financial constraints in the health sector, resource allocation will become highly competitive. By definition, generic health indices provide a common metric upon which treatments across or among diseases can be compared, favoring an equitable allocation of resources, but in practice, these comparisons remain challenging and somewhat, controversial.
Our objective was to develop a stroke-specific health index that would take into account the person's preferences for stroke relevant health states. This paper outlines the process used to develop and evaluate the first preference-based stroke index, the PBSI, for use as a comprehensive measure of HRQL post-stroke and as an outcome in cost-effectiveness studies.

Subjects and methods
The PBSI was developed by a series of steps. Different samples of subjects were used for each of these steps. Table 1 describes the population sources and socio-demographic characteristics of subjects for each step of the study.

Item generation
The first step was to identify items that were prevalent, yet specific, to the stroke population. The data for this step came from a longitudinal cohort study of the long-term outcome of stroke. [31]. At the time of this study, 493 persons with stroke had been interviewed approximately 6 months post-stroke and followed intermittently over time. In parallel, a population-based sample of 442 community dwelling individuals without stroke, frequency matched by age and city district, was also recruited and interviewed. Both groups (stroke and controls) were interviewed over the telephone on measures of disability and HRQL: SF-36 [4], EQ-5D, Barthel Index [32], IADL Subscale and Social Resource Scale of the OARS [33], Reintegration to Normal Living Index [34], and Modified Mini Mental Status Questionnaire [35].
Collectively, these scales contained 92 items and the ratings on these items were used to identify prevalent and stroke-specific items. Items were retained if they met the following criteria: 1) prevalence (i.e. defined as an identified difficulty) in at least 20% of stroke subjects, 2) a significant difference in prevalence between stroke and controls, and 3) a φ coefficient of 0.300 or more, indicating a significant association between the prevalence of the problem and having a stroke [36]. Items describing the same activity were removed to avoid redundancy. In addition, 13 items covering areas of mastery, cognition, dexterity, driving and communication were added in order to cover the full spectrum of activities, participative experiences and emotions known to be affected by stroke. This process provided our first pool of 43 items.

Item selection
These items were assembled into a questionnaire. Members of the longitudinal cohort study who were more than two years post stroke and living in the community were asked to rate their performance on each of these items using a standard five-point scale from 1; having no difficulty to 5; being unable to do it. Subsequently, they were asked to rate the importance of these items to their overall quality of life also on a five-point scale from 1; not important to 5; extremely important. They were also asked to report any additional activities, roles or emotional states they felt had been impacted upon by their stroke. An impact score, formed as the product of performance and importance, was calculated [37] and the 43 items were ranked according to this impact score. In total, 149 subjects received the performance questionnaire and from that group, 124 were also sent the one on importance; 91 and 70 persons responded to these questionnaires, respectively. From this survey, items with an impact scores > 6.0 and with a proportion of at least 40% of stroke subjects reporting some difficulty, were selected. To further reduce this set of items, correlational analyses were performed. Correlations above 0.75, identifying possible redundancy, were carefully considered and the item presenting the lowest item-to-total correlations was removed. Items generated by subjects were used to assess whether or not important or difficult activities, roles or emotions were missing from our first pool of 43 items.

Development of the three-point scale
In order to facilitate ease of completion, a three-point scale was the goal. Descriptive statements reflecting three different levels of observable functions of community living stroke survivors were generated for each of the remaining items. For example, the worst level of the walking item was described as being able to walk only a few steps or using a wheelchair. Because of the specificity of each descriptive statement for a given item, ordinality of the 3point scale was tested. A convenience sample of 29 undergraduate students rated each descriptive statement on a 10 cm long visual analog scale (VAS) [38]. Anchors varied in relation to the item. For example, the anchors for the walking statements were 0=unable to walk and 10= able to walk normally. Since there were 10 items with 3 descriptive statements each, students were asked to rate 30 randomly organized statements. Following comments and ratings, some statements were reworded. Figure 1 shows the mean VAS ratings.

Pilot testing the PBSI
We pilot tested the PBSI to determine if it demonstrated large inter-subject variation and compared this to that of a generic health index, the EQ-5D. Frequency distributions of subjects' ratings across response levels were exam- ined. An item that was distributed across levels was judged to be contributing valuable information to the measure and this performance was considered as a preliminary indication of its ability to capture different severity levels. Community dwelling long-term stroke survivors who had ended their participation in the two-year prospective study on stroke, and who had not participated in the first phase of this project were sent the PBSI, the EQ-5D 5-item questionnaire and its thermometer scale (EQ-VAS). In total, 170 subjects were surveyed but only 68 responded; subsequent follow-up revealed that 9 had moved, 8 were deceased and 85 refused or could not be reached. The overall participation rate was 41%, all were living in the greater Montreal area and 53% were men ( Table 1).

Elicitation of preference weights
Preferences were obtained to verify the ability of stroke survivors to go through a task of preference elicitation, Mean VAS rating scores of response options on English questionnaires (n = 29) Worst response option and to estimate whether stroke survivors differed from persons without stroke when providing the weights. Thirty subjects with stroke and 30 caregivers were estimated to be sufficient to detect a between-group difference of 0.10 in mean preference values with approximately 90% power and an alpha level of 0.05 assuming a standard deviation of 0.13 or less. An analysis based on ranks was also carried out. It was hypothesized that if subjects positioned the 9 corner states (CS) -a corner state is a multidimensional health state in which all items are described by their best level while one item is set at its worst level -on the thermometer in a similar order, the preference weight given to each corner state would be reinforced and to a certain degree, confirmed. For example, subject 1 could choose to position the corner states within a range of 30 to 70 while subject 2 could use a range between 45 and 80. But if both subjects placed the same corner state as their lowest value, then the preference for this corner state would be confirmed, even though it would have a large standard deviation due to differences in ratings (30 vs 45). Preferences were elicited on a convenience sample of 32 persons who had recently sustained a stroke (6 weeks to 6 months previously) and 28 caregivers who were participants in a randomized clinical trial of case management for stroke. The mean age of stroke subjects was 67.6 (sd = 11.3) and 75% were men. Caregivers were on average younger (59.4 (sd = 19.7)) and 22% were men (Table 1). Selection criteria for this preference elicitation task restricted the sample to those who could speak French or English, without apparent cognitive deficits or aphasia.
Face-to-face interviews were conducted at the home of the subject by one interviewer. On average, 10 to 15 minutes were required to do the task. To reduce contamination, the caregiver was asked to leave the room while the stroke subject was performing the task and vice-versa. Subjects were given a 50 cm long vertical thermometer with anchors ranging from 0, worst possible health state to 100, best possible health state. To test the subject's comprehension of the task, two unidimensional health states (HS) were given as practice. Each subject received 'I wear glasses' and 'I have severe pain all day' and was asked to place these health states on the thermometer in relation to the anchors. If the subject was unable to perform this task or gave an incoherent answer (it is assumed that wearing glasses is a more desirable health state and should, therefore, be positioned above having severe pain), further instructions were given. If comprehension difficulties persisted, the task was ended. If the subject succeeded, preferences were assessed for the set of health states. Subjects were asked to rate four HS and nine corner states (CS). The four HS described the following; being dead, being unconscious, all best levels of items in the PBSI, all worst levels of items. While there are 10 items on the PBSI, only 9 CS were described. Walking and stairs were combined to avoid an unrealistic statement like. The ratings of corner states are essential components of multi-attribute utility models and considered easier to understand and rate than the positive attribute itself.
For example, the corner state of the speech item is the following;

I can hardly be understood by anyone when I speak
But I can; Walk in the community as I desire Be satisfied with myself most of the times

The development of a preference-weighted cumulative index
The development of a preference-weighted cumulative scoring system became essential to compare scoring distributions and to test correlational evidence of validity. The interval properties of the response scales of the items in the PBSI were such that a simple index based on assigning values to levels and summing could be used for comparative purposes. The preference weights were incorporated into the index to create a temporary preference-weighted cumulative PBSI. To be aggregated into a single score, items within a measure must demonstrate they share a common structure with the construct of interest [39]. We tested the presence of a hypothesized common structure across the items through a factor analysis. An ideal situation would be to have all items under one single factor, or if this cannot be attained, item-to-total correlations above 0.4 are desirable [39] and to have items with similar means and standard deviations [40].
Data on the PBSI, available for 127 subjects who were participants in a randomized clinical trial of case management for stroke [Mayo et al, unpublished work], were used to conduct the factor analysis. Data were collected at baseline (within seven days post-discharge from hospital), at 6 weeks and at 6 months post-discharge. A variety of outcomes, including HRQL, physical and social functioning as well as mental or emotional status, were assessed via face to face interviews. This analysis used the 6 month post-discharge data obtained on the PBSI. Subjects were, on average, 71 ± 13.7 years of age and most were men (59%). This sample size was large enough to respect the 10:1 ratio (subjects per variable) considered a minimal requirement to obtain a "good" factorial analysis [40].

Preliminary validation of the measure
By six months post-stroke, motor and functional recovery plateaus in most individuals, resulting in a stable health status [41]. Complete data on HRQL and functional measures were available on ninety-one subjects. Subjects were primarily men (64.4%) and on average, aged 69.4 ± 15.5 years. Most had no limitations in their ADL (mean

Construct validity
Construct validity can be seen as the extent to which the measure is consistent with its theoretical framework. In this study, convergent and known-groups approaches were used to examine construct validity. For comparison purposes, a utility value was calculated for the EQ-5D index using United Kingdom (UK) weights [42] for health states lasting 10 years.

Convergent validity
Convergent validity was demonstrated through testing a priori hypotheses comparing the PBSI with an instrument measuring a similar construct, the SF-36. Correlations above 0.60 were identified as reflecting a strong association [33]. Higher coefficients were not necessarily desired as these would indicate strong similarity between the measures. Conversely, lower coefficients would indicate that measures were assessing different constructs. It was expected that the PBSI would correlate moderately (.4 <r < .6) with the physical functioning, role physical, social functioning, general health perceptions and vitality scales of the SF-36. Lower correlations (r < .4) were expected for the pain, mental health index and role emotional scales as these domains are not directly measured by the PBSI.

Known-groups validity
Results obtained from two distinct groups of individuals known to differ in the construct being assessed were used to assess the validity of the PBSI. Neurological status in the acute phase of stroke, as measured by the Canadian Neurological Scale [44], was used to define two groups. While no relationship had been established between severity of neurological status at stroke onset and HRQL at 6 months post-stroke, we know that individuals with a severe stroke are more likely to have long-term activity limitations [44] and consequently, to experience a lower HRQL. Subjects were also grouped according to their functional autonomy as measured by the Barthel Index. The Index is known to be a predictor of functional recovery and discharge destination [45], both outcomes being likely to affect HRQL We first hypothesized that at 6 months post-stroke, subjects with severe neurological deficits at onset of stroke (score < 9 on the CNS) will have lower scores on the PBSI than subjects presenting with very mild or no deficits at onset (CNS score of 11 and 11.5), and second, that stroke subjects presenting a marked dependence in functional activities (Barthel Index score of = 60) will have a significantly lower PBSI score than those who are fully independent in functional activ-ities. Student's T-tests were performed to compare mean scores of subjects.

Development of the instrument
Only 30 of the 92 items included in our initial pool of items were found to be significantly impacted by a stroke in terms of prevalence. When surveyed on the importance and performance of each of these 30 items and the 13 items added to cover the full spectrum of activities and emotions known to be affected by stroke, long-term stroke survivors rated as high impact (importance * difficulty) most items, omitting only eight of them (refer to table 2). Two referred to activities of daily living; feeding and performing personal hygiene and in both cases, importance scores were very high (4.27 ± 1.29 and 4.44 ± 1.22 respectively), but these items were discarded because of their low performance scores (1.42 ± 0.94 and 1.39 ± 0.91 respectively) indicating that they were not reported as difficult activities. Similar results were found for two speech-related items, (understanding a conversation with one person and following a conversation with three persons), where scores of importance were very close to 4.00 but few people rated these as difficult. This lead to the rejection of these two items. Two IADL activities were also dropped because of low performance and importance scores; preparing meals and doing own housework. Finally, participation in social activities as well as performance of moderate activities were discarded because of a low impact score.
Most items derived from the literature [24,46,47] generated high impact scores and a large majority of them were kept. The remaining 35 items were then analysed in terms of their frequency distributions on the performance questionnaire. Only 12 items were removed because they were not often reported to be difficult to perform by long-term stroke survivors. A correlation matrix was built using the 23 performance-rated items. Mobility-related items were scrutinized to avoid redundancy. For this reason only one stair climbing item and one walking item were kept. A work item merging both the "quantity" and the "quality" of work was developed.
A speech item was forced into the measure for content validity. Aphasia may severely limit an individual in the accomplishment of his activities and restrict participation. This limitation in speech was not a prevalent difficulty among the group of subjects surveyed, yet, was identified as very important in this study and in others [48,49]. The items performing vigorous activity and performing moderate activities (from the SF-36) were both rated as not important by respondents yet a large proportion of subjects generated items related to vigorous sports or hobbies that are physically demanding. An item related to performing sports and physically demanding activities was, therefore, used to encompass a mixed concept of vigorous and moderate activities. A total of ten items, with inter-items correlations ranging between 0.216 and 0.719, all significant at p < 0.01 (Table 3), were kept in the final version of the PBSI.

Pilot study
The PBSI demonstrated a good capacity to capture different health states. Figures 2 and 3 illustrate the distribution of responses across levels on each item of the PBSI and the EQ-5D respectively. Three items showed poor distribution of responses across levels -speech, memory and selfesteem: rarely did subjects report severe difficulties in these areas. This finding was not surprising considering that these subjects were long-time community-dwelling stroke survivors. However, contrary to the mobility item of the EQ-5D response option '3' (being bedridden), the three mobility items of the PBSI were likely to be scored on each possible level, assuming a more diverse population of stroke survivors in which various severity levels would be captured.
Among respondents, 17 rated their HRQL with a perfect EQ-5D score (11111). Of these, 7 subjects also scored 1 (or best level) on all of the 10 items of the PBSI. The mean EQ-VAS value for this group of subjects (perfect score on both EQ-5D and PBSI) was 85.6 (sd = 9.1). However, 10 subjects who scored perfectly on the EQ-5D reported having some limitation in at least one of the 10 items of the PBSI. These non-perfect PBSI ratings were associated with a mean EQ-VAS value of 72.4 (sd = 12.4). This difference is important and highlights the capacity of the PBSI to discriminate subjects with activity limitations from those with no activity limitations as well as the impact of these limitations on the individual's overall rating of his/ her HRQL.

Preference weights
In total, 67 persons were asked to complete the task; 7 could not manage the example and, therefore, were not asked to continue. Most subjects who failed the example task appeared unable to imagine someone else in the situation they were presented and asked to rate. They tended to refer to their situation only. Table 4 shows means and medians of each health state for both groups of subjects. For each subject, the health states were ranked according to their value on the VAS. Both stroke subjects and caregivers reported speech to be the domain that would most severely affect their HRQL if it became limited following a stroke (disutility = 0.34). On most domains, caregivers and subjects reported similar values (see Table 4). Five subjects (4 stroke subjects and one caregiver) rated the health state being dead as 100. They were prompted to rate death as if they were to die that day. Each of them expressed they were not afraid of dying and if it were to happen in the very near future, they would consider this event as positive. This high preference for death was not shared by the majority of subjects who rated death as 0.
The rating of the corner state coping was more highly variable than any other corner states. Coping is a relatively abstract construct and may, therefore, be more difficult to imagine. Both caregivers and subjects rated the 'all worst levels' which can be seen as a description of a severe stroke health state, below 0.20 (mean 0.15 ± .09). Driving was the only domain where differences in mean scores between stroke and caregivers reached statistical significance (p < 0.049). These differences cannot be explained by the proportion of drivers in each group (60% of stroke subjects were drivers compared to 83% of caregivers) but could be explained by the large proportion of women in the caregiver group (78%). Even though most of them were drivers, many performed this activity occasionally, leaving most of the driving to their spouses. The expected ranking of corner states was determined from mean preference weights obtained from the overall sample. Since preference weights did not statistically differ between stroke subjects and caregivers, data from both groups were merged to provide one large sample size of 60 subjects. Friedman's Chi-square was significant indicating that there is a general association between corner states mean scores and their ranks (p = 0.0001). This emphasizes that both groups of subjects rated the health states in a consistent manner.

Development of a preference-weighted cumulative index score
Loadings of items are reported in Table 5 as well as item means and standard deviations. All items except the one on physical activity/sport have mean values very close to one another and standard deviations within a similar range. With an unweighted variance of 35.6%, a one-fac-tor model probably does not provide the best fit with the data, yet, 9 out of 10 items have loading weights above the required value of 0.4 [39]. The homogeneity of the 10 items was reinforced by an internal consistency estimate of 0.84 (Cronbach's alpha). Only driving with a very low weight of 0.15, has a weak contribution to the overall variance of the factor. The fact that this single item appears to contribute minimally to the measure did not preclude its inclusion on the PBSI. Loading weights obtained from the factor analysis were not used as weight for the response options of the pBSI, rather, as each item on the PBSI is scaled by a 3-point response set that was shown to have reasonably equal intervals (Fig. 1). An unweighted scoring system would calculate a move from one response option to another on two different items as contributing similarly to the overall HRQL score. The interval property of response options was used to assign weights to each response options, so that a move from '1' to '2' on two Rec.Act Work Driving* Memory Speech Coping Self-Esteem level 1 (no problem) level 2 (moderate problem) level 3 (severe problem)

Proportion of subjects (%)
items would not yield a similar reduction in the overall HRQL. We hypothesized that the preference weights obtained for each item on the PBSI would follow the same interval pattern and be equally spaced. Therefore, a person with a '3' on the speech item (disutility of 0.33) would lose 6.7% of the overall HRQL compared to a lost of 4.4% with a '3' on recreational activities, assuming all other items being scored as perfect. A move from a '3' to a '2' on each of these items would then result in a gain of 3.35% and 2.2% for the speech and recreational activity items, respectively. The scoring formula was recalibrated so that a person with no limitations would obtain the highest possible score, that is, 1.0, while the person presenting the worst possible health state would obtain a PBSI score of 0.

Convergent validity
Pearson correlation coefficients are presented in Table 6. Correlations between the PBSI and most of the SF-36 subscales were moderately high and significant (p 0.005). The PBSI correlated moderately with the bodily pain (BP) (r = 0.48) and mental health (MH) (r = 0.44) subscales of the SF-36. The lowest correlation was with the role emotional (RE) subscale of the SF-36 (r = 0.33). This subscale has been shown to correlate poorly with other HRQL measures [11,49] and was recently identified as having a strong ceiling effect which would limit its value in stroke studies [50]. As anticipated, the EQ-5D index performed better than the PBSI on only two domains, BP (r = 0.69) and RE (r = 0.35), which are directly assessed by the EQ-5D and not the PBSI. A moderately high correlation was found between the PBSI and the EQ-5D index score (r = 0.76). When both measures were correlated to the EQ-VAS score, Distribution of responses (%) on items of the EQ-5D among a group of community dwelling stroke survivors (n = 68)

Known-groups validity
When subjects were divided according to the severity of their stroke, those presenting with a severe stroke at onset (CNS score <9) reported a much lower PBSI score (0.67) compared to those who had a very mild stroke (CNS score >11) who obtained a mean PBSI score of 0.81 (p < 0.05) ( Table 7).
Differences in PBSI scores for subjects who presented major difficulties performing their ADL (PBSI score of 0.47) compared to those reporting moderate difficulties (PBSI score of 0.57) and to those with no difficulty (PBSI score of 0.82) were statistically significant (p < 0.0001) ( Table 7). This difference would also be considered as clinically meaningful [51]. However, because a very small number of subjects had a Barthel Index score less than 60, statistical significance could not be reached when this group was compared to the intermediate functioning group.

Discussion
The PBSI is a 10-item stroke specific health index developed for economic purposes, more specifically as an outcome for use in cost-effectiveness studies [see Additional file 1]. The PBSI encompasses the most important and commonly impacted domains of HRQL in relation to stroke. It generates 59,049 multidimensional health states, each defined by a preference-weighted cumulative score which captures the losses and gains in the various health components affected by stroke. The PBSI is short and easy to administer. It is available in Canadian French and English.
As the first stroke-specific health index, content validity was a priority. Major efforts were made to ensure the selection of the most appropriate sample of items to describe HRQL post stroke. Content validity is recognized as a cru-cial component of instrument development. The methods used to develop the content of the PBSI combined different procedures that have been used previously in the development of HRQL instruments, including the estimation of impact scores [37] and the generation of items felt to be impacted upon by stroke survivors [52]. This methodology optimized the content validity of the PBSI. These domains have been recognized by other developers to be meaningful post-stroke [6,53].
The wide spectrum of PBSI scores obtained in the population studied indicates that a large number of different health states can be captured by the PBSI and confirms that the measure does not have a ceiling effect, nor does it have a floor effect. This evidence was reinforced by comparing the PBSI scores with those of the EQ-5D-index. Our results demonstrated a ceiling effect in the EQ-5Dindex, which had not been previously identified in stroke studies [10,11] but had been reported in other populations [54,57]. The absence of a ceiling effect in a high functioning group of individuals is another indication of the validity of selected domains and response options. Contrary to the EQ-5D-index, where very few, if not any, stroke survivors will choose the response option 3 on the mobility item, the PBSI offers respondents the possibility on each item, of choosing among three option levels that are realistic or likely to occur following a stroke. The fact that even community living stroke survivors chose the most severe response option (option 3) on each of the 10 items is quite promising for future performance of the PBSI in a more heterogeneous group of stroke subjects.
Convergent validity was demonstrated through correlation of the PBSI with a generic health status measure, the SF-36. Only the Role Physical scale of the SF-36 did not exactly reach the desired correlation (r = 0.48). The largest correlation was with the physical functioning scale of the SF-36. This was not a surprise as the items walking, stairs management, and the physically demanding activities/sports on the PBSI were generated from the SF-36 questionnaire. While they were slightly modified to meet the 3-point response scale of the PBSI, the domains were similar. As expected, the PBSI was poorly associated with the role emotional and mental health scales of the SF-36. Our findings are similar to those obtained in studies comparing the SF-36 to the EQ-5D [11] in a stroke population and the SF-36 to the QWB scale [49], when used among a general population and patients with renal problems. It was surprising to see that bodily pain was indirectly captured by the PBSI (r = 0.48). The item on pain had been dropped in the developmental process because of poor association with stroke. But, because pain is frequently assessed in HRQL instruments, and contradictory conclusions are reported in the literature as to whether or not, pain impacts HRQL post-stroke [21,22,56] we wanted to confirm its exclusion from the PBSI. Our findings support the exclusion of pain. Finally, the PBSI was able to discriminate between groups of individuals on the basis of their functional independence level and according to the severity of their stroke at onset.
This preliminary validation provided evidence of construct validity in a group of stroke subjects, at six months post-stroke. Further information needs to be gathered in regard of its ability to be responsive to change over time and in regard to its validity among more severely disabled stroke survivors. Nonetheless these results are promising and will lead to future development and assessment of the PBSI. There were a number of potential limitations involved in the development of this instrument. First, a convenience sample of community living stroke survivors was surveyed to generate an initial item pool. It is possible that some problematic areas were missed as individuals who were surveyed were relatively high functioning. However, as they were compared to a group of community living individuals who were, on average, younger and did not have a stroke, it is more likely that more items than less were kept in the first developmental step of the instrument, which was actually an advantage. While we do not think that the selection of items will affect the generalizability of the PBSI across the range of stroke severity, further research is required to test the ability of the PBSI to capture HRQL among the wide range of possible health states post-stroke. It is interesting that items related to mood or depression did not meet our selection criteria. While the absence of such items could be seen as a limitation, it is important to remember that items covering "emotions" often have poor inter-rater reliability coefficients. In stroke studies, when a subject's HRQL is assessed by a proxy because of aphasia or cognitive deficits, maintaining high reliability coefficients is crucial. Nonetheless, we recognize the need in stroke studies, to capture mood and emotions of stroke survivors through the use of complementary generic or stroke-specific measures capturing these domains.
Further research is needed to evaluate the ability of the PBSI to determine HRQL of stroke survivors in relation to their recovery. This will require access to longitudinal data, which will become available in the next year. The PBSI also needs to be validated against another generic health index, such as the HUI3, and against a stroke specific profile such as the Stroke Impact Scale. These comparisons will provide valuable information about the validity of the PBSI and add to its value as an outcome measure in stroke studies. Another important step to be undertaken in the near future is the assessment of testretest reliability. To date, stability of results of the PBSI has not been tested directly. Rather, reliability estimates were inferred to be adequate based on data from the parentinstrument (the SF-36, Barthel Index, MMSE, etc) and on the expected stability of the attributes being measured by the PBSI (assuming no change in HRQL). Formal assessment will be undertaken to verify these assumptions. Some have argued that HRQL by definition is not a stable construct, therefore test-retest reliability estimates are not appropriate [62]. However, variation in HRQL estimates is highly dependent on the domains and attributes chosen to define the construct. For example, disease-specific measures defining HRQL mainly in terms of symptoms may vary over short period of time periods, leading to poor stability. This does not apply to the PBSI, in which HRQL is described by a comprehensive set of impairments and activities known to slowly evolve after the first month post-stroke and become much more likely to be stable unless a major change in health occurs.
Finally, the fact that a multi-attribute preference-based scoring system has not yet been developed can be seen as an immediate limitation in the use of the PBSI. Consequently, the next step in the development of the PBSI will be the creation of a mathematical model to quantify each unique health state as a single value. The model will be developed using the multiattribute utility method [61]. While the current scoring system of the PBSI is adequate and respects the necessary conditions for items on an instrument to be summed, it does not take into account possible interactions between items. With its present scoring system, the PBSI gives a value of 0 to the worst stroke scenario (major stroke) and 1 to the best stroke scenario. With a multiattribute model, these scenarios would be given the values obtained from our survey of stroke survivors and caregivers, that is, 0.19 and 0.93 on a 0 to 1 scale where 0 represents death and 1, perfect health.
While a consensus is slowly emerging about the need to obtain societal weights for health states to provide rational and objective means of comparing health programs across diseases, this applies to generic health indexes, not to specific measures. Individuals with stroke and their caregivers represent the ideal sample to elicit preference values for stroke specific health states. Access to community dwelling stroke subjects and caregivers is realistic, and not only did our data demonstrate that these two groups expressed similar values, they also showed that comprehension of the rating scale technique was feasible among all subjects. As mentioned previously, the index summary score of the PBSI should only be used with the stroke population. Therefore, obtaining only societal weights would not be relevant or purposeful.

Conclusion
Preference-based measures are expected to become more prominent in the future. The concept, desirability of a health state, is highly meaningful as it may help decisionmakers to better target interventions and programs taking into account gains and losses in the most important domains for that population. This further highlights the need for disease-specific instruments, such as the PBSI. The content validity of the PBSI and its ability to capture health states across the continuum of stroke severity is likely to enhance its responsiveness and make it more appealing than generic instruments like the HUI or EQ-5D, in stroke studies where cost-effectiveness is an issue.

Author's contribution
LP designed and conducted this study as part of her PhD. NM, SWD and AC provided feedback and guidance on this doctoral work. All authors read and approved the final manuscript.