Psychometric properties of morning joint stiffness duration and severity measures in patients with moderately to severely active rheumatoid arthritis

Background To assess the measurement properties of two single-item patient-reported outcome (PRO) measures that assessed the length of time (in minutes) and severity of morning joint stiffness (MJS) experienced each day. Methods Data from two Phase 3, randomized placebo-controlled (and active-controlled [RA-BEAM]), clinical studies assessing the safety and efficacy of baricitinib in adults with moderately to severely active rheumatoid arthritis (RA) were used to evaluate the psychometric properties of the Duration of MJS and Severity of MJS PROs. Results Test-retest reliability of Duration of MJS and Severity of MJS was supported through large intraclass correlation coefficients among stable patients (coefficient range for both studies: 0.88 to 0.93). In support of construct validity, moderate correlations were evidenced between Duration of MJS and other related patient- and clinician-reported assessments of RA symptoms and patient functioning, whereas moderate-to-strong correlations were evidenced between these same patient- and clinician-reported assessments and Severity of MJS. Statistically significant differences between the median and mean values of Duration of MJS and Severity of MJS for differing categories of RA disease severity supported known-groups validity. Finally, large and statistically significant differences in change scores from Day 1 to Week 12 for patients defined as responders versus non-responders using the American College of Rheumatology 20 criteria supported the responsiveness of both PROs. Conclusion Duration of MJS and Severity of MJS PROs demonstrated reliability, validity, and responsiveness in adults with moderately to severely active RA, supporting the measurement of these key symptoms in clinical trials. Electronic supplementary material The online version of this article (doi: 10.1186/s12955-017-0813-7) contains supplementary material, which is available to authorized users.


Background
Rheumatoid arthritis (RA) is a systemic, inflammatory, autoimmune disease. The expression and outcomes of the disease vary, ranging from mild, limited disease to severe disease that is associated with progressive joint destruction, significantly compromised health-related quality of life (HRQOL), and reduced survival. Some clinical symptoms of RA, such as morning joint stiffness (MJS), may follow a circadian pattern and the rhythm of pro-inflammatory cytokines, such as interleukin-6 (IL-6). The increase in nocturnal anti-inflammatory cortisol seen in patients with RA is generally insufficient to suppress the ongoing joint inflammation, often resulting in joint stiffness in the morning [1][2][3].
Prior research illustrates the importance of early morning stiffness or difficulty moving joints for patients with RA. In a study of 916 patients with recent onset of RA (disease duration ≤24 months), Westhoff et al. [4] reported that for many patients, morning stiffness impacts their HRQOL, functional capability, and ability to continue to work. These findings are supported by recent survey research conducted by Mattila et al. [5] in patients with RA (duration >6 months) who experience impairment of morning function at least 3 times per week. Of the 534 working respondents, 47% indicated morning stiffness affected their work performance, and 33% indicated it led to a late arrival at work. Of the 224 retired participants, 159 (71%) stopped working earlier than their expected retirement age, with 64% of these participants giving RA-related morning stiffness as a reason for their retirement [5]. Indeed, MJS is an important and clinically meaningful element of disease activity [6].
Patient-reported outcome (PRO) measures of symptoms like MJS are important tools to aid clinicians in treating patients with RA, facilitate doctor-patient communication to improve the quality of patient care, and contribute to better patient outcomes [7]. Despite the importance of MJS to patients with RA [4,6], the assessment of MJS is not currently a measured component in several recommended endpoints in clinical trials, such as the American College of Rheumatology (ACR), Disease Activity Score modified to include the 28 diarthrodial joint count (DAS28), Clinical Disease Activity Index (CDAI), or Simplified Disease Activity Index (SDAI).
To address this need, two daily electronic PRO diary items were created to assess both the duration and severity of MJS from the patient's perspective. The content validity of these two items were supported through a targeted literature review, interviews with health-care providers, and qualitative concept elicitation and cognitive debriefing interviews with patients with RA [8]. These interviews confirmed the relevance of duration and severity of MJS as symptoms of RA, as well as the appropriateness of terminology used to assess these symptoms.
This study reports the assessment of the psychometric properties (i.e., reliability, validity, and responsiveness) of two PRO items administered daily, the Duration of MJS and Severity of MJS, in patients with moderately to severely active RA who participated in two Phase 3 clinical trials for baricitinib RA-BEAM and RA-BUILD.

Methods
Patient population RA-BEAM RA-BEAM (N = 1305) was a randomized, double-blind, double-dummy, placebo-and active-controlled, parallelarm, 52-week study designed to assess improvements in disease activity, structural preservation, and PROs including physical function, safety, and tolerability with oral baricitinib 4-mg once daily in patients with RA who had inadequate responses to methotrexate. Full details regarding the primary efficacy and safety outcomes of this study have been reported previously [9]. Briefly, patients were aged ≥18 years with active RA (≥6/68 tender and ≥6/66 swollen joints; serum high-sensitivity C-reactive protein [hsCRP] ≥6 mg/L). Comparisons were made to placebo and to adalimumab, a tumor necrosis factor (TNF)-α inhibitor and a standard-of-care biologic disease-modifying antirheumatic drug (DMARD) in this setting.

RA-BUILD
RA-BUILD (N = 684) was a randomized, double-blind, placebo-controlled, parallel-group 24-week study designed to assess improvements in disease activity, structural preservation, and PROs including physical function, safety, and, tolerability with oral baricitinib 2mg and 4-mg once daily in patients with RA who were refractory to or intolerant of conventional synthetic DMARDs (csDMARDs). Full details regarding the primary efficacy and safety outcomes of this study have been reported previously [10]. Briefly, patients were aged ≥18 years with active rheumatoid arthritis (≥6/68 tender and ≥6/66 swollen joints; hsCRP ≥3.6 mg/L [upper limit of normal 3.0 mg/L]) and an insufficient response (despite prior therapy) or intolerance to ≥1 csDMARDs. Comparisons were made to placebo.

Patient-reported outcomes (PROs)
Duration of morning joint stiffness, severity of morning joint stiffness, severity of worst tiredness, and severity of worst joint pain Duration of MJS is a single-item PRO designed to capture information on self-reported length of time, in minutes, that a patient's MJS lasted each day. Specifically, patients were asked: "Please indicate how long your morning joint stiffness lasted today," and responded with the number of hours and minutes. Durations recorded as >12 h (720 min) were censored at 720 min.
Severity of MJS, Severity of Worst Tiredness, and Severity of Worst Joint Pain are all single-item PROs designed to capture the severity of MJS, worst tiredness, and worst joint pain experienced that day, respectively. Patient's were asked to complete each at the end of their day. Specifically, for Severity of MJS, patients were asked: "Please rate the overall level of morning joint stiffness you had from the time you woke up today." All three of these PROs are anchored at 0 and 10, where 0 represents "no joint stiffness," "no tiredness," or "no joint pain," and 10 represents "joint stiffness as bad as you can imagine," "tiredness as bad as you can imagine," or "joint pain as bad as you can imagine," respectively.
For RA-BEAM and RA-BUILD, all four PROs were assessed using an electronic diary on a daily basis through Week 12. The Day 1 assessment was the first assessment at the end of the patient's day following the randomization visit (Week 0, Visit 2). The Week 1 assessment refers to the weekly average values from Days 2 to 8. Assessments at Weeks 2, 4, 8, and 12 refer to weekly average values of the 7 days prior to Weeks 2, 4, 8, and 12 visits, respectively. Recognizing that late-shift workers, individuals who work outside of the hours of 9 am until 5 pm, could not complete the electronic diary (at home) at the end of Day 1, the Day 2 assessment (if available) was used to impute missing Day 1 values so that more patients could be included in the psychometric analyses utilizing the Day 1 data.
Medical Outcomes Study 36-item Short Form Health Survey version 2 Acute (SF-36) The SF-36 is a generic, 36-item PRO that measures general health status. The SF-36 includes eight domains of health status evaluated over the previous week: physical function, role limitations-physical, bodily pain, general health perceptions, vitality, social function, role limitations-emotional, and mental health. Two component scores, the Physical Component Score (PCS) and the Mental Component Score (MCS), are derived based on the 8 domain scores [11]. Domain and component scores are derived using established formulas [11], with higher scores indicating better health status or functioning. Acceptable psychometric properties of this instrument have been demonstrated elsewhere [12,13].
Health Assessment Questionnaire-Disability Index (HAQ-DI) The HAQ-DI assesses patients' physical function or disability over the past week. The HAQ-DI contains 24 questions that query the degree of difficulty a person has in accomplishing tasks in 8 functional areas (dressing, arising, eating, walking, hygiene, reaching, gripping, and activities). Responses in each functional area are scored from 0, indicating "no difficulty," to 3, indicating "inability to perform a task" in that area. The HAQ-DI total score, ranging from 0 to 3 (higher values indicate worse functioning), is obtained by summing the highest score within each functional area and dividing by the number of functional areas answered [14]. The reliability and validity of this instrument have been documented previously [15].
Quick Inventory of Depressive Symptomatology Self-Rated-16 (QIDS-SR 16 ) The QIDS-SR 16 is a 16-item PRO intended to assess the existence and severity of symptoms of depression as listed in the American Psychiatric Association's Diagnostic and Statistical Manual of Mental Disorders, 4th Edition (DSM-IV) [16].
Patients were asked to consider each statement as it relates to the way they have felt for the past 7 days. There is a unique 4-point ordinal scale for each item, with scores ranging from 0 to 3 reflecting increasing depressive symptoms as the item score increases. The instrument measures 9 core symptom domains that are used to define a depressive episode: sad mood; concentration; self criticism; suicidal ideation; interest; energy/fatigue; sleep disturbance; decrease or increase in appetite or weight; and psychomotor agitation or retardation. The QIDS-SR 16 total score is derived as the sum of the scores across the 9 scale domains. The psychometric properties, including reliability, validity, and sensitivity to change of this instrument have been demonstrated elsewhere [17].
Patient's assessment of pain Patient's current pain was assessed at each study visit with the use of a 0-to 100mm visual analogue scale (VAS), with higher scores indicating more severe pain.
Patient's Global Assessment of Disease Activity (PtGA) The PtGA, assessed as their current disease activity, was assessed at each study visit and is recorded on a 0-to 100-mm VAS, with higher scores indicating more active RA.

Clinician-reported assessments
Physician's Global Assessment of Disease Activity (PhGA) The PhGA was assessed at each study visit and is recorded on a 0-to 100-mm VAS, with higher scores indicating or more active RA.

Clinical signs and symptoms measures
American College of Rheumatology 20 (ACR20) An ACR20 response (i.e., a binary variable indicating achieving or not achieving a response) was measured at each study visit and is defined as at least a 20% improvement from baseline in both tender joint count (TJC) (0 to 68) and swollen joint count (SJC) (0 to 66), and in at least 3 of the following 5 assessments: patient's assessment of pain, PtGA, PhGA, HAQ-DI, and high-sensitivity Creactive protein.
Clinical Disease Activity Index (CDAI) The CDAI is a tool for measurement of disease activity in RA that integrates measures of physical examination, patient selfassessment, and evaluator assessment [18]. The CDAI was assessed at each study visit and is calculated by adding together scores from the following assessments: number of swollen joints (0 to 28), number of tender joints (0 to 28), PtGA on a VAS (0 to 10 cm), PhGA on a VAS (0 to 10 cm). Total scores are calculated using established formulas [18]. Thresholds have been established for the CDAI (remission: 0.0 to ≤2.8; low disease activity: >2.8 to ≤10; moderate disease activity: >10 to ≤22; high disease activity: >22 to ≤76) [19].
Disease Activity Score (28 joints) (DAS28) The DAS28 is a composite score that is based on a 28-joint count (both TJC 0 to 28 and SJC 0 to 28), hsCRP or erythrocyte sedimentation rate (ESR), and PtGA and was measured at each study visit. Total scores are calculated using established formulas [20]. Patients can be categorized into 4 groups (remission: <2.6; low disease activity: ≥2.6 to ≤3.2; moderate disease activity: >3.2 to ≤5.1; high disease activity: >5.1).

Statistical analyses
The distribution of scores for the Duration of MJS and Severity of MJS PROs was assessed using descriptive statistics at Day 1, including mean (SD), median, range, and ceiling/floor effects.

Reliability (test-retest)
Test-retest reliability was assessed in stable patients during the interval between Week 1 and 2 and again between Week 4 and 8. Stable patients were defined as those with ≤5 point difference [21] on the 0 to 100 PtGA between each assessment period. Intraclass correlation coefficients (ICCs) were calculated between the initial (Week 1 or 4) and retest (Week 2 or 8) scores to evaluate test-retest reliability. An ICC of ≥0.70 was considered good agreement [22].

Convergent and discriminant validity (construct validity)
Construct validity was assessed by Pearson correlations at Day 1 and Week 12 between the scores of Duration of MJS and Severity of MJS, and the scores of other clinical/PRO endpoints: Severity of Worst Tiredness, Severity of Worst Joint Pain, SF-36 domain and component scores, HAQ-DI, QIDS-SR 16 , patient's assessment of pain, PtGA, TJC28, SJC28, PhGA, and hsCRP. Cohen's conventions were used to interpret the absolute value of the correlation results, where a correlation >0.5 is large, 0.3 to 0.5 is moderate, 0.1 to <0.3 is small, and <0.1 is insubstantial [23].
It was hypothesized that moderate or large correlations supporting convergent validity would be demonstrated between Duration of MJS and Severity of MJS and the PRO instruments measuring concepts related to RA symptoms (SF-36 PCS, SF-36 Bodily Pain), their impact on functioning (SF-36 Social Functioning, SF-36 Vitality, SF-36 Physical Functioning, HAQ-DI), and clinician-reported/laboratory assessments of disease activity (TJC28, SJC28, PhGA, and hsCRP). Discriminant validity was assessed by Pearson correlations at Day 1 and at Week 12 between Duration of MJS, Severity of MJS, and PROs measuring distally related concepts (SF-36 MCS, SF-36 Role Emotional, QIDS-SR 16 ) where small correlations were hypothesized.

Known-groups validity
Known-groups validity was evaluated using the Kruskal-Wallis test to distinguish median Duration of MJS and an analysis of variance (ANOVA) model to distinguish mean Severity of MJS between subgroups defined by the DAS28-ESR thresholds (<2.6; ≥2.6 and ≤3.2; >3.2 and ≤5.1; >5.1) measured at Day 1 and Week 4, and CDAI (0.0 to ≤2.8; >2.8 to ≤10; >10 to ≤22; and >22 to ≤76) measured at Day 1 and Week 4. The Scheffé adjustment was used for multiple comparisons. Subgroups were combined in instances of small sample sizes (i.e., <5% of the total sample size for the subgroup).

Responsiveness
Due to anticipated skewness of Duration of MJS PRO, responsiveness was evaluated using a nonparametric randomization-based analysis of covariance (ANCOVA) methodology [24] to assess significant differences in median change in Duration of MJS from Day 1 to Week 12 between ACR20 responders and nonresponders at Week 12, controlling for Duration of MJS at Day 1. Responsiveness was evaluated using an ANCOVA methodology to assess significant differences in mean change in Severity of MJS. Responsiveness was also assessed using disease activity as measured by DAS28-hsCRP at Week 12, using the following subgroups: DAS28-hsCRP <2.6, DAS28-hsCRP ≥2.6 and DAS28-hsCRP ≤3.2, and DAS28-hsCRP >3.2. An overall statistically significant difference (p < 0.05) with statistically significant subgroup comparisons was hypothesized.
SAS® statistical software Version 9,4 (SAS Institute Inc., Cary, NC, USA) was used to conduct all analyses.

Results
Baseline demographics for the total modified intent-totreat population, patients with Day 1 diary scores, and patients with Week 12 diary scores are provided in Table 1. Baseline and Week 12 scores for Duration of MJS PRO, Severity of MJS PRO, and other assessments are presented in Table 2, while score distributions for Duration of MJS PRO and Severity of MJS PRO are found in Additional file 1: Table S1. As can be seen in Tables 1 and 2, there was a large amount of missing data at the Day 1 assessment period. This missing data was due to multiple reasons as shown in Additional file 1: Table S2, such as the diary device alarm not sounding until the following day or the diary device  These values provide evidence for substantial agreement between assessment periods among stable patients.

Convergent and discriminant validity
Results supporting convergent validity of Duration of MJS and Severity of MJS are found in Table 3. For For Severity of MJS, moderate-to-large associations were found between Severity of MJS and other assessments of RA symptoms, assessments of function, and clinician-reported assessments at Day 1 and Week 12. At Day 1 and Week 12, respectively, Severity of MJS was most strongly associated with Severity of Worst Joint

Known-groups validity
Due to small sample sizes in the lower DAS28-ESR subgroups at Day 1 (i.e., <5% of the sample in each score category) in RA-BEAM and RA-BUILD, patients were categorized into only 2 subgroups: ≤5.1 and >5.1 (Table 4) (Table 4). However, at Week 4, the Duration of MJS was significantly longer and the Severity of MJS was significantly greater in patients with higher CDAI scores than in patients with low CDAI scores in both studies (all p = 0.001) ( Table 5). All values increased linearly with higher CDAI scores, where all post-hoc comparisons using Scheffé adjustment were statistically significant (p = 0.001), except for the comparison of median Duration of MJS values between CDAI categories of 0.0 to ≤10.0 versus >10.0 to ≤22.0 in RA-BUILD. These findings indicate that the Duration of MJS and Severity of MJS PROs are able to distinguish between known groups based on disease severity.

Responsiveness
As hypothesized, the responsiveness of the Duration of MJS and Severity of MJS PROs was supported by large and statistically significant differences in median and mean change in Duration of MJS and Severity of MJS, respectively, from Day 1 to Week 12 between patients defined as responders or nonresponders on the basis of the ACR20 at Week 12 (Table 6). For example, for Duration of MJS in RA-BEAM, median change from Day 1 to Week 12 was −35.1 for responders in comparison to −7.0 for nonresponders (p = 0.001). Similarly, for Severity of MJS in RA-BEAM, mean change was −3.2 compared to −1.1 for responders and nonresponders, respectively (p = 0.001).
Comparable findings were seen when using DAS28-hsCRP as an anchor, as pairwise comparisons assessing significant differences in median and mean change between DAS28-hsCRP groups of <2.6 versus ≥3.2 (p = 0.001 for all comparisons), and ≥2.6 and <3.2 versus ≥3.2     were statistically significant (p < 0.01 for all comparisons) ( Table 6). However, the change scores comparisons between <2.6 versus ≥2.6 and <3.

Discussion
The psychometric properties, including reliability, validity, and responsiveness, of Duration of MJS Week 12 when defining responders using the ACR20. Sensitivity to change was also supported when using the DAS28-hsCRP as an anchor, except for comparisons between the lowest disease activity groups. However, although these symptoms have been demonstrated to be important to patients with RA, an assessment of these symptoms was excluded from the the core set of disease activity measures. Specifically, rheumatologists have frequently used morning stiffness as an indicator that patient medication changes are needed [25]. However, in 1993, the core set of disease activity measures for use in RA clinical trials was revised. Studies with therapies available at the time (e.g., csDMARDs, auranofin, nonsteroidal anti-inflammatory drugs [NSAIDs]) suggested that MJS was not sensitive to change with treatment compared to control, and this lack of ability to discriminate treatment from control contributed to this symptom's exclusion from the core set of measures [26].  Subsequent to its removal from the ACR core set in 1993, morning stiffness has been found to be an important determining clinical status indicator driving changes to DMARD therapy [27]. Additional work by Boers et al. [28] further demonstrated that both duration and severity of morning stiffness are related to other measures of disease activity. The findings from the present study provide support that both the Duration of MJS and Severity of MJS PROs are not only associated with other measures of disease activity in RA but rather are measures that assess distinct symptom experiences by patients. Furthermore, these measures are sensitive to detect change over time in adults with moderately to severely active RA. However, future research is needed in order to determine the relationship of these instruments to patient outcomes over a long term period, and the usability of such instruments in a clinical setting. Given the advent of electronic PRO diaries, these instruments could be used in a clinical setting and collected daily, thus enhancing the dialogue between patients and care providers. Further, using applications on mobile phones, these instruments can be easily completed by patients to track and collect their daily symptoms.
Despite the strong findings related to the reliability, validity, and responsiveness of the Duration of MJS and Severity of MJS PROs, a key limitation to this investigation is the missing data at the Day 1 assessment. As noted above, the missing data were due to a variety of reasons (e.g., device alarm not sounding until the following day). However, sensitivity analyses were conducted after imputing missing Day 1 Duration of MJS and Severity of MJS PRO scores, and results for the evaluations of reliability, validity, and responsiveness of these two PROs remained the same. In addition, the evaluation of responsiveness was limited as the studies' baseline assessment period consisted of only data at Day 1 rather than a full 7 days of data as was available at Week 12. Finally, all analyses were pooled across countries due to small sample sizes (<5 participants) within certain participating countries. Thus, future research is needed to assess the psychometric properties of both instruments within each country.

Conclusion
The findings from the current study demonstrate that the Duration of MJS and Severity of MJS items are unique PROs fit for purpose of measuring two key symptoms of MJS in adult patients with moderately to severely active RA in clinical trials.