Patient Uncertainty Questionnaire-Rheumatology (PUQ-R): development and validation of a new patient-reported outcome instrument for systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) in a mixed methods study

Background An in-depth qualitative exploration of uncertainty in systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) led to the development of a five-domain conceptual framework of patient uncertainty in these two conditions. The purpose of this study was to develop and evaluate a new patient-reported outcome (PRO) instrument for patient uncertainty in SLE and RA on the basis of this empirically developed conceptual framework. Methods Cognitive debriefing interviews were conducted to pre-test the initial items generated on the basis of the preliminary qualitative exploration of patient uncertainty in SLE and RA. Two separate field tests were conducted in five hospital sites to evaluate the measurement properties of the new instrument; the first to identify and form scales, and the second to assess measurement properties of the final version in an independent sample. Psychometric evaluation was conducted in line with the Rasch Measurement Theory (RMT), examining the extent to which sample to scale targeting was satisfactory, measurement scales were constructed effectively and the sample was measured successfully. Traditional psychometric techniques were also used to provide complementary analyses best understood by clinicians. Results Pre-testing supported the relevance, acceptability and comprehensibility of the initial items. Findings indicated that the Patient Uncertainty Questionnaire for Rheumatology PUQ-R instrument fulfilled the expectations of RMT to a large extent (including person separation index 0.73 – 0.91). The PUQ-R comprises 49 items across five scales; symptoms and flares (14 items), medication (11 items), trust in doctor (8 items), self-management (6 items) and impact (10 items) which further displayed excellent measurement properties as assessed against the traditional psychometric criteria (including Cronbach’s alpha 0.82 – 0.93). Conclusion The PUQ-R has been developed and evaluated specifically for patients with SLE and RA. By quantifying uncertainty, the PUQ-R has the potential to support evidence-based management programmes and research. Electronic supplementary material The online version of this article (doi:10.1186/s12955-016-0432-8) contains supplementary material, which is available to authorized users.


Background
The importance of considering the chronic diseases and their treatment beyond clinical morbidity is increasingly being recognised in many disciplines including rheumatology [1,2]. In patients with systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA), the patients' perspective including physical symptoms such as pain and fatigue as well as health-related quality of life (HRQoL) are not always associated with clinical markers of disease [1,[3][4][5]. Similarly, it is increasingly recognised that patient perceptions and appraisal of one's condition impact on psychosocial and physical functioning [6] and can further influence patient treatment adherence [7]. One such perception is patient uncertainty which is considered to be particularly relevant in unpredictable conditions like SLE and RA [8][9][10]. Patient uncertainty has been portrayed as a cognitive stressor with significant implications for patient well-being and management [11,12].
Cognitive theories view uncertainty as a cognitive state associated with a perceived lack of knowledge and a subjective evaluation or appraisal process which is an inherent part of life [13][14][15]. It is therefore unsurprising that a disruptive life events like a chronic illness have also been associated with an inevitable sense of uncertainty [8,16,17]. The patient uncertainty literature is dominated by the Uncertainty in Illness Theory (UIT) and corresponding instruments [18][19][20]. These were initially developed in the 1980s to address uncertainty in prediagnostic, diagnostic, treatment, and acute illness and was re-conceptualised (RUIT) to address enduring uncertainty in chronic illness [21]. The UIT and RUIT define uncertainty as a cognitive state in which a patient is unable to assign meaning to illness-related events and focus primarily on the sources and appraisal of uncertainty.
Despite providing a very useful generic context for patient uncertainty, qualitative investigations indicate the multidimensional nature of the concept neglected by the UIT and highlight the importance of illness-specific exploration of uncertainty. Specifically qualitative findings in RA, HIV and cancer display how different illness characteristics, for example the illness course, contagiousness, differential treatment advice, and mortality risk, impose different dimensions of uncertainty between different illness groups that can prevail in all aspects of life [22][23][24][25][26].
Our previous in-depth exploration of patient uncertainty in SLE and RA using both patient and rheumatology health-care professionals (HCPs) interviews confirmed this [9]. Patients expressed uncertainty across a variety of domains both directly and indirectly associated with SLE and RA. These were inductively categorised in a five domain framework; including (i) symptoms and prognosis related to uncertainties of symptom and health status interpretation and disease progression; (ii) medical management related to uncertainty of current and future treatment effectiveness as well as uncertainties around doctors' knowledge and ability to treat a patient; (iii) self-management related to uncertainties around how best to manage and control symptoms and health; (iv) impact related uncertainties related to the potential consequences of disease on all aspects of a person's life and finally (v) social functioning related to uncertainties around disclosing and handling diagnosis within social circle.
Even though this exploration was conducted in parallel across the two conditions analysis showed that qualitatively the uncertainty domains relevant to SLE and RA patients were overarching hence a common framework was put forward. In line with the heighted clinical complexity of SLE patients reported quantitatively more uncertainties per patient on average; however; younger RA patients reported comparable qualitatively and quantitative uncertainties with SLE patients i.e. uncertainties in the same domains and sub-domains [9].
The manifestation of patient uncertainty in SLE and RA appeared complex, as it comprised different states and not just the inability to assign meaning to illnessrelated events [19] including a lack of knowledge or understanding, difficulty in interpretation or judgement, unpredictability and the expectation of potential consequences or risks related to the different domains. Patient quotations related to uncertainty were often expressed with an apparent sense of worry and anxiety an issue that was also indicated by HCPs, who further suggested the association of patient uncertainty with treatment adherence and general well-being [9].
This work demonstrated the importance of illnessspecific assessment of patient uncertainty as it expanded previous theories [19,27] by the addition of domains such as impact, comprising issues of family planning and functionality and social functioning, comprising issues of disclosing diagnosis, support and reactions from social circles [9]. Additionally the rheumatology conceptualisation introduced uncertainties related to domains that have been described before such as illness progression and treatment but had not made reference to issues relevant to SLE and RA such as multi-organ involvement unpredictability of flares, medication toxicity and ineffectiveness.
In addition, these findings indicated the insufficiency of existing instruments to adequately capture uncertainty in SLE and RA. Despite their popularity, the UIT instruments were originally developed in the 1980s using data from hospitalised patients targeting acute uncertainty [20]; therefore, content validity in rheumatology is questionable. Furthermore in light of more recent guidelines for patient reported outcome (PRO) development, it is fundamental to support any PRO with empirically derived conceptual framework to ensure that its items are appropriate and comprehensive relative to the concept of interest in the specific context of use to safeguard its content validity [28][29][30].
In this paper, we take the next steps in the process of developing and evaluating a new PRO instrument for patient uncertainty in SLE and RA. The rising profile of the patient perspective has consequently increased interest in PRO instruments which quantify them [31]. Developing and evaluating PROs which are fit for purpose and provide clinically meaningful and interpretable data is crucial, particularly when numbers generated by them are used to make important decisions about patient care [31,32]. To address this, more comprehensive and advanced psychometric techniques are increasingly being used and have therefore been chosen in this study.

Methods
International guidelines and criteria for PRO instruments were used for the development and evaluation process of the PUQ-R [28, 29,[33][34][35]. The process comprised three stages with independent SLE and RA samples. As the goal was to develop a PRO instrument that could be used across the board of severity in SLE and RA, patients from all disease stages were included in this process. National Research Ethics Committee approval was obtained for this study as well as local Research and Development approval at each of the participating sites.

Stage 1: Item generation & pre-testing
Item generation involved the development of an exhaustive pool of potential item strings for each domain within the patient uncertainty conceptual framework [9]. Item strings were developed on the basis of patient quotes that were coded as uncertain in the preliminary phase of this study [9]. Following principles of item construction [28, 36,37], we aimed to have an adequate range of items to cover the breadth of content within each of the five conceptual domains. Items were constructed in lay language using as many of the patients' own words as possible whilst aiming for brevity and minimal semantic overlap. Item generation was performed in parallel but independently for SLE and RA.
Participants involved in the qualitative interviewing stage of this study [9] were re-invited to participate in the cognitive debriefing interviews. Participants were instructed to complete the initial items whilst thinking aloud to note any queries or problem questions and discuss these with the interviewer [38]. Interviews were digitally recorded and timed. Interview records were reviewed for any issues related with wording ambiguities, relevance and acceptability, in relation to each item, response scale and set of instructions.

Stage 2: Field test 1
A field test was set up in five hospitals in England: University College Hospital, Kings College Hospital, Royal Blackburn Hospital, Robert Jones and Agnes Hunt Orthopaedic Hospital and Leicester Royal Infirmary. Participants were eligible for participation if they were at least 18 years old, met standard criteria for SLE or RA diagnosis and were fluent in English. Participants with a significant co-morbid diagnosis were excluded. Participants were via two routes; through the post and during outpatient appointments. Personalised letters, standardised instructions and a reminder letter were used to achieve the highest possible response rate [39]. Study materials consisted of a demographics questionnaire and the first draft of the PUQ-R. Examination of these results led to scale modifications and the second draft of the PUQ-R instrument.

Stage 3: Field test 2
A second field test was set up in four of the participating hospitals (excluding Kings College Hospital). Participant eligibility and recruitment were identical to the first field test. A demographics questionnaire and the second draft of the PUQ-R were administered. This consisted of the five revised scales, including symptoms and flares, medication, trust in doctor, self-management and impact. Rasch analysis was used to evaluate the measurement properties of the PUQ-R scales and to make any necessary additional revisions. Traditional psychometric techniques were then used to assess the measurement properties of the final version of the PUQ-R and complement the psychometric evaluation.

Stage 2 & 3 statistical analyses
Different psychometric techniques are available for developing and evaluating the scientific rigour of PRO instruments [31]. The modern psychometric paradigm of Rasch Measurement Theory (RMT) [40] offers a mathematical testable model which allows for rigorous testing of measurement properties and therefore leads to the development of instruments which are scientifically sound. A detailed outline of the RMT advantages over traditional psychometrics is presented elsewhere [31,41].

Rasch measurement theory analysis
Psychometric evaluation of the PUQ-R scales was performed in line with Rasch Measurement Theory (RMT) using the RUMM2030 software [42]. RMT analysis examines the extent to which observed raw scores match the scores expected by the Rasch model, which indicates the degree to which the summing of scale items results in rigorous measurement (2). The evaluation of a rating scale using Rasch analysis aims to evaluate three broad aspects [32]: 1. How adequate is the sample to scale targeting?
Scale to sample targeting refers to the comparison between the range of trait (i.e. uncertainty) measured by the scale and the range of the trait measured in the study sample. Targeting was evaluated through examination of the relative distribution of sample and item thresholds as plotted against the same metric scale of logits (the unit of measurement in RMT analysis); where item thresholds reflect the difficulty of each of the multiple response options of each item and the item threshold mean is always set at zero logits [32,43,44]. Precision of the person location mean to the item threshold mean indicates adequate targeting [45].

2.
To what extent has a measurement scale been constructed successfully? Information from four different tests was gathered in order to address this question [41].

Do the response categories work as intended?
Response category thresholds were examined for disordering as the RMT expects them to be ordered in a sequential manner (i.e., "0 = very uncertain", "1 = somewhat uncertain", "2 = somewhat certain","3 = very certain") when plotted on the measurement continuum to reflect the decreasing level of uncertainty the responses denote [32,41]. 2.2 Do the PUQ-R scale items define a single variable? RMT expects items within a scale to be cohesive in defining a single measurement continuum [41,46]. Three "fit" indicators were examined to assess this. Item fit residuals assess whether the item-person interaction is in line with the RMT. Fit residuals reflect the difference between the observed scores and the ones expected by the Rasch model (i.e. observed-expected=residual) and are expected to be distributed between -2.5 to +2.5 [32]. Chi-square statistics assess whether the item-trait interaction is in line with the RMT. Chi square is a summary statistic computed by dividing the sample into six groups (class intervals) based on their trait (i.e. level of uncertainty). For items to fit the RMT, it is expected that the chi-square probabilities would not be significant (>0.01) [32,47,48]. Item characteristic curves (ICC) are graphical indicators of fit which are used to complement the interpretation of the fit residuals and chi square probabilities [32,43].

Do responses to one item bias responses to others?
RMT expects that response to an item should not directly influence response to another as this will bias measurement estimates (inflate or deflate reliability). Response dependency is assessed via residual (observed scoreexpected score= residual) correlations. As the RMT model expects local independence for items, it is also expected that item residuals should be unrelated in order to reflect random error. Residual correlations were used to examine response bias [43,44] in line with the r>0.30 rule of thumb, but residual correlations below <0.4 were considered as acceptable [49]. 2.4 Is the performance of the scales stable across relevant groups?
The RMT expects the measurement continuum to perform consistently across different sample groups. Item stability was assessed through differential item functioning (DIF) [32,41,50]. DIF explores the relationship between item responses and group membership by examining the observed response differences between class intervals within groups [51]. DIF was assessed between the SLE and RA groups using ANOVA. 3. How has the sample been measured?
Two indicators were used to examine measurement of the specific sample.

Is the sample separated by the PUQ-R scales?
A scale is expected to detect differences in the levels of trait within a sample and also detect changes in trait levels over time. Within the RMT paradigm the person separation index (PSI) is calculated to assess this [32,41]. The PSI is computed as the ration of variation of person estimates relative to the estimated error for each person [52]. In other words, the PSI displays how much of the variation in person-location estimates can be associated with random error, where a 0 score indicated all error and a 1 score no error at all [32].

To what extent are raw scores linear?
The extent to which ordinal raw scores approach linear (interval) measurement and their subsequent transformations on an interval scale were assessed. This is important as one point on a scale is not necessarily the same across the breadth of the scale [41,53]. Considering the stringent mathematical criteria of the RMT minor deviations of raw scores from interval/linear measurement is expected.

Traditional test theory analysis
To complement the psychometric evaluation the final draft of the PUQ-R scales were further tested to determine whether they fulfilled the widely accepted and used traditional psychometric criteria which are grounded in widely accepted guidelines [28, 33,35]. Four traditional psychometric properties (Table 1) were assessed using the IBM SPSS Statistics 19 software package. Finally some preliminary construct validity analysis were performed by evaluating differences between the SLE and RA scores across the five PUQ-R scales and convergence of these with other measures of treatment adherence [54], mood [55] and quality of life [56].

Stage 1: Item development & pre-testing
A total of 82 items were generated for the new instrument called the Patient Uncertainty Questionnaire-Rheumatology (PUQ-R). Items were grouped into five hypothesized scales reflecting the five conceptual domains the items were derived from [9]. Specifically PUQ-R comprised 26 items related to the symptoms and prognosis, 27 items to the medical management, 5 items to the self-management, 18 items to the impact, and 6 items related to the social functioning conceptual framework domain [9]. Even though the volume of uncertainty quotations in the SLE sample was greater, item generation resulted in qualitatively the same breadth of items in both conditions. To this effect, two versions of the PUQ-R were developed, consisting of exactly the same items but a distinctive reference of either lupus or arthritis within the item string. In an attempt to keep the response scale proximal to the latent variable under assessment [28], all items were scored on a 4-point Likert scale reflecting four different degrees of uncertainty.
A total of 20 patients, 10 SLE and 10 RA, were recruited for the cognitive debriefing interviews, the details of which have been described elsewhere [9]. The initial PUQ-R items were well received by participants. No items were omitted, and the completion time ranged from 8 to 30 minutes, including time spent discussing and commenting on items (mean = 18.75, SD = 6.84). A "not applicable" response option was added to address issues of relevance and problem with response scale. The wording of 5 items and two set of instructions was simplified to avoid any ambiguities and one item was split into two to address to separate uncertainty in the workplace and social circle. These changes did not impact on the initial content and structure of the PUQ-R.

Stage 2: Field test 1
At an average response rate of 60.9 % a total sample of 383 participants was recruited ( Table 2). Analyses and interpretation of the RMT psychometric tests resulted in modification and the second draft PUQ-R containing 51 items in total. RMT analysis retained the symptoms and flares, self-management and impact scales whilst splitting the medical management into two scales; medication and trust in doctor. Finally the social functioning CITC corrected item total correlation, IIC item-item correlation, ITC item total correlation a Psychometric properties are adapted from and explained in more detail in Cano et al 2010 [66] items were reduced and merged with the impact scale as they did not perform sufficiently as an independent scale. Two items, which displayed significant DIF between the two conditions were retained in the scales but split by DIF and analysed as separately i.e. they were presented in a different order in the SLE and RA version of the symptoms and flares and medication scale to reflect the different level of difficulty each item had for each condition. The performance of the revised improved when re-evaluated within the same sample.

Stage 3: Field test 2
At an average response rate of 63.4 % a total sample of 279 participants was recruited ( Table 2). The second draft of the PUQ-R scales performed consistently well in the first as in the second field test. Further revisions were only made to the symptoms and flares scale which was reduced by two items (Additional file 1). PUQ-R scale psychometric evaluation is presented in line with the methods discussed above, in more length for the RMT analysis and in summary for the traditional psychometrics.
RMT Analysis: How adequate is the sample to scale targeting?
PUQ-R scales presented good targeting as the range of uncertainty measured by the scales matched the range of uncertainty in the sample to a satisfactory degree, except for the self-management scale which displayed targeting which was adequate but could stand to be improved. Figure 1 displays the sampleto-scale distributions for the symptoms and flares scale displaying very good targeting. In comparison, the self-management scale targeting graph (Fig. 2) indicates many person measurements located on the right hand side of the continuum, signifying respondents with the highest scores i.e. less uncertainty, who are not covered by the scale items. This can also be deducted by the self-management person mean score (1.276) which is the highest of all PUQ-R and the one furthest away from the item mean score (which is also set at zero logits). Person location mean scores for the remaining scales were 0.067, 0.675, 0.845 and -0.246 for the symptoms and flares, medication, trust in doctor and impact scales respectively.

RMT analysis: to what extent has a measurement scales been constructed successfully?
The PUQ-R scales were constructed successfully as findings displayed minor deviations from the RMT expectations. All item response categories were ordered in sequence apart from three out of forty-nine items; item 34 of the self-management scale that was consistently disordered in the first field test and items 15RA and 49 of the medication and impact scales evaluated for the first time in this field test. The response category "somewhat uncertain" was problematic for items 34 and 15RA and the "somewhat certain" for  Fig. 3. Item goodness of fit was excellent for three of the PUQ-R scales as only one item of the trust in doctor and three items of the impact scale displayed statistical misfit with fit residual outside the recommended criterion and significant chi square probabilities (Table 3). However, when misfit was assessed graphically via the ICCs (graphs not presented can be obtained from authors), misfit was marginal for items 41 and 45 of the impact scale. More evident misfit was displayed by item 33 of the trust in doctor and item 49 of the impact scale which both underestimated the trait presented scores higher than expected at lower end of the continuum (i.e. less uncertainty for the less able persons) and lower scores than expected at the higher end of the continuum (i.e. more uncertainty for more able persons).
Some response bias was revealed in the final version of the medication scale items evaluated for the first time in the second field test (Table 3). Another two item pairs displayed significant response bias; the symptoms and flares items 13 and 14 and the trust in doctor items 26 and 27 and produced high residual correlation coefficients. The performance of the scale items was stable across SLE and RA as only one item (item 45) displayed significant statistical DIF between the two conditions. Fig. 2 PUQ-R Self-management Scale Targeting. The upper histogram (pink blocks) represent the sample distribution for the scale total score whereas the lower histogram (blue blocks) represent the scale item threshold distribution plotted on the same linear measurement continuum. Targeting is suboptimal. The item thresholds distribution does not match the sample distribution well, as no items are located beyond the +3 logit location. This is also displayed by the person mean location (1.276) which is higher than the item threshold mean location which is always set at zero Targeting is satisfactory as the spread of sample and item threshold distributions are well matched. This is also displayed by the person mean location (0.067) which is very close to the item threshold mean location which is always set at zero Assessing this graphically revealed that observed scores for the SLE sample for item 45 related to functionality, were higher than expected, and lower than expected for the RA sample.
RMT analysis: How has the sample been measured?
All PUQ-R scales produced high PSI (073 -0.91), thus confirming their ability to separate the sample ( Table 3). The linearity of measurement was evaluated graphically by plotting the raw scores on a graph against interval measurement (graphs not presented can be obtained from authors). Graphs for all PUQ-R scales displayed an expected sub-optimal S-shaped relationship raw scores and interval measurement and scores were used to calculate a transformed 0-100 interval scoring for each of the five scales.

Traditional psychometrics
The PUQ-R scales satisfied the traditional psychometric analysis criteria (Table 1). PUQ-R scale acceptability (quality & targeting) was excellent with very low percentages of scale-level missing data and no floor and ceiling effects or any statistical skewness (Table 4). Scaling assumptions were further met as the range of corrected item total correlations (CITCs) and mean item-to-item correlation (IIC) for all scales laid above the 0.30 criterion. PUQ-R scales mean scores were also very close to the actual mid-point. Findings also greatly supported the PUQ-R scales reliability with Cronbach' s alpha coefficient well above the 0.70 criterion for all scales, which further satisfied the item-level validity criteria. Preliminary examination of the PUQ-R scales construct validity showed significant relationships between different PUQ-R scales and measures of treatment compliance, depression, anxiety, physical and mental quality of life (Table 5). Means comparison between the SLE and RA sample revealed a significant difference only in the symptoms and flares scales with higher scores for the SLE patients (t = -4.40, df = 277, p = 0.00) and non-significant differences across all other scales. This was in line with heightened clinical complexity of SLE and previous qualitative findings [9].

Discussion
The PUQ-R is a PRO instrument developed using comprehensive qualitative methodology, incorporating the input of patients with SLE and RA and rheumatology HCPs, rigorous psychometric techniques in line with best practice guidelines [28, 29,33,35] and rheumatology outcome-recommendations [57,58]. It quantifies patient uncertainty in SLE and RA across five different domains; symptoms and flares, medication, trust in Threshold maps for all PUQ-R scales. The x-axis represents the measurement continuum of the trait (uncertainty), with decreasing levels from left to right. The y-axis shows each of the items response categories "Very Uncertain" labeled as 0; "Somewhat Uncertain" labeled as 1; "Somewhat Certain" labeled as 2 and "Very Certain" labeled as 3. Thresholds for items are missing and replaced with ** if they are disordered, i.e. response categories do not appear in a consecutive increasing order in relation to the construct (x-axis) doctor, self-management and impact (Additional file 1). These were suggested as important aspects of the SLE and RA illness experience by patients themselves in a preliminary study [9] which uncovered aspects of patient uncertainty not covered by older generic theories and instruments [18][19][20]27]. The empirical content development of the PUQ-R [9] supports its relevance for patients with SLE and RA and the subsequent pre-testing of items ensures that the instrument is acceptable and appropriate for patients. The extensive quantitative RMT psychometric analysis supported the suitability of use of the PUQ-R scales [41] which also displayed excellent measurement properties when assessed against the traditional psychometric criteria [33]. Preliminary construct validity examinations indicated negative association of different uncertainty aspects with other important patient outcomes.
Although these findings support the PUQ-R' s measurement properties, developing an instrument using rigorous methodology is an on-going process [28,29]. An RMT psychometric evaluation provides a vehicle for evidence-based scale improvement by signifying areas of sub-optimal performance. In this respect, the RMT psychometric evaluation of the PUQ-R satisfies all criteria for its initial use, but further highlights areas needing improvement including the sample-to-scale targeting for the self-management scale and item dependency for the medication scale that would benefit from further empirical testing.
Finally, the raw ordinal total scores of the PUQ-R scales did not reflect interval measurement. However, this was an expected finding as raw scores are ordinal and unsurprisingly have unequal intervals. The advantage of RMT analysis is the ability to obtain implied interval measurements [59] which can be used to calculate a transformed 0-100 interval scoring for subsequent use. This issue is not always addressed in PRO instruments; however, it is highly important, particularly when interpreting scores from a total ordinal scale which have unequal intervals [41]. This analysis therefore benefits from the provision of interval-level transformed scoring.
Patient uncertainty has been linked with unfavourable outcomes in SLE and RA [9,26,60,61] and in chronic illness in general [11,12]. The PUQ-R is the first instrument developed to quantify patient uncertainty specific to SLE and RA and also the first instrument to the authors' knowledge to quantify uncertainty as a multidimensional concept across different domains. The PUQ-R could therefore be used in studies exploring the impact of patient perceptions on outcomes of disease such as HRQoL, physical symptoms like pain and fatigue as well as treatment adherence [1,[4][5][6][7]9].  Several self-management interventions in chronic illness and rheumatology have drawn from the biopsychosocial model and other social cognition theories to improve moderating variables of chronic illness, such as patient perceptions, self-efficacy and coping [1,62]. Preliminary construct validity analysis indicates that higher uncertainty across different but not all domains are associated with lower treatment adherence, higher levers of depression, anxiety and poorer HRQoL.
If these relationships are established patient uncertainty could be targeted as a moderating variable in self-management interventions to evaluate whether it is amenable and whether it can subsequently influence other patient outcome. For example, whether decreasing levels of uncertainty in relation to the trust patients have in their doctors would improve treatment adherence in the SLE sample, or whether decreasing levels of medication and impact uncertainty would improve depression levels in RA and HRQoL in both conditions. Such could be potential uses of the PUQ-R instrument in patient research and management.
Lastly it is important to acknowledge potential limitations of this work and areas for future work. The sample size for both field tests was sufficient considering the general "rule of thumb" recommending 5 to 10 participants per scale item [63]; however, there was room for improvement as far as the response rate is concerned. Response rates exceeded the reported 60 % average response rate in medical and nursing surveys [64,65]; nevertheless, a post-hoc investigation revealed that changes in study design could have improved this.
Screening for all three stages of this study did not limit the sample to a specific disease stage as the intention was develop a PRO instrument applicable across all ranges of disease. Future work should aim to evaluate whether levels of disease severity influence the levels of patient uncertainty expressed by patients, as well as to establish psychometric performance of the PUQ-R across all stage of SLE and RA disease using a clinical measure of disease. Finally, a more extensive exploration of construct validity, minimally clinically important difference and responsiveness of the PUQ-R should follow suing longitudinal data and clinical measures of disease which were not available during this study.

Conclusions
The PUQ-R was developed and evaluated in line with best practice guidelines [28, 29, 33-35] rheumatology outcome-recommendations [57,58] using comprehensive methodology and a large amount of patient input. Therefore, a new instrument like the PUQ-R enhances the field of health measurement in rheumatology, by offering the opportunity to quantify in a valid and meaningful way, aspects of the patient perspective within SLE and RA. This study contributes a scientifically rigorous instrument to SLE and RA health measurement and further offers a useful template for the rigorous step-wise development and validation of PRO instruments.