For the past 15 years, PRO measures of physical function, HRQoL and fatigue have played an increasingly prominent role in evaluating the safety and efficacy of RA treatment. It is well recognized that these measures provide important complimentary information in understanding the efficacy of treatment beyond traditional clinical endpoints of ACR, DAS and SDAI responses. While most evaluations of the efficacy of recently approved RA treatments present clinical outcomes, additionally citing improvements in physical function, HRQoL and fatigue [34, 35], few have tried to link the two types of outcomes together to further our understanding of the patient benefits associated with a response by standard clinical endpoints for RA. The objective of this study was to link changes in several PROs to improvements in ACR and SDAI responses. In addition, analyses were designed to determine whether further incremental benefits in PROs including physical function, HRQoL and fatigue accrue with larger improvements in these clinical endpoints beyond what would be expected as MIDs.
Analyses conducted here went beyond standard correlation analyses to investigate the magnitude of mean changes in physical function, HRQoL and fatigue scores associated with various ranges of change in standard efficacy endpoints in RA RCTs. As expected, mean improvements in HAQ-DI, SF-36 and FACIT scores differed significantly across groups of patients categorized according to their magnitude of change in each clinical endpoint investigated. Very good agreement was observed between what was defined as “clinically meaningful” in each of the clinical endpoints and what is defined as the MID for scales of the SF-36, HAQ-DI, and FACIT instruments. For example, with few exceptions mean changes from baseline in scores across all SF-36 domains and summary scores and FACIT-Fatigue scales among patients in the minimal ACR-N response category (20-49%) met or exceeded MID for these PRO instruments [36, 37]. Likewise, mean changes from baseline in HAQ-DI, SF-36 and FACIT met MID among patients with “minor” improvements in SDAI, who were also included in the category representing the smallest meaningful change (-11 to -20) in VAS pain and PtGA. These results mutually validate the cut points established as clinically meaningful for clinical and MID for PROs and highlight that even the smallest benefit observed with treatment in each clinical endpoint is associated with clinically meaningful improvements in physical function, HRQoL and fatigue.
Another key finding from these analyses was there was considerable incremental improvement across all physical function, HRQoL and fatigue scores associated with greater levels of improvement in each clinical endpoint beyond what would be considered of minimal clinical significance. For example, analyses involving ACR-N showed that, as patients met higher thresholds of improvement on ACR-N there were incremental improvements in HRQoL and fatigue. At the first level of meaningful responses by ACR-N (ACR-N category of 20 to 49%) mean changes from baseline in all SF-36 domains and FACIT met established definitions for MID. With few exceptions, the magnitude of mean changes in SF-36 and FACIT doubled and in some instances tripled at the next level of ACR-N responses (ACR-N category 50 to 69) and were of moderate to large effect sizes . Changes of this magnitude could potentially be considered as really important differences (RID) . Lastly, at the highest ACR-N response category (ACR-N ≥ 70) mean changes from baseline in each SF-36 and FACIT scores increased further compared to lower thresholds of ACR-N responses and were all in the range of large effect sizes .
A similar pattern of results was observed with analyses based on SDAI, where HAQ-DI could also be included as it is not a component of the SDAI. Mean changes in physical function, HRQoL, and fatigue scores increased incrementally from no meaningful change to “minimal” and from “minimal” to “major” improvements by SDAI. In the “minimal” improvement SDAI category mean changes in HAQ-DI, SF-36 and FACIT scores exceeded MID, with a few exceptions, and the magnitude of changes were in the range of small to moderate effect sizes. Going from “minimal” to “moderate” improvement categories of SDAI, changes from baseline in HAQ-DI, SF-36 and FACIT more than doubled in magnitude and in the “major” improvement category of SDAI were in the range of large effect sizes also considered as ≥ RID.
As expected, significant improvements in physical function, HRQoL, and fatigue were observed with reductions in VAS pain scores. With exception of SF-36 MH domain, mean improvements from baseline exceeded MID even in subjects reporting the smallest category of pain reduction (VAS change of -11 to -20 points). Mean changes from baseline in HAQ-DI, SF-36 and FACIT were generally in the small to moderate effect size range at this category of pain reduction. The magnitude of improvements increased incrementally at the next highest category of pain reduction (VAS change of -21 to -40 points) where they were generally in the moderate effect size range. Finally largest improvements in physical function, HRQOL and fatigue scores were observed at the highest category of pain reduction (VAS change of < -40 points), and were, with few exceptions, in the large effect size range.
Mean changes in physical function, HRQoL and fatigue scores differed significantly across the groups of patients that differed in their level of change defined by PtGA and MDGA. In general, patients with the greatest improvements in PtGA and MDGA also reported the greatest improvements in HRQoL, physical function, and fatigue. However, differences in mean changes from baseline were not always ordered consistently across categories of improvement defined by PtGA. For example, in many instances mean improvements from baseline were either the same or reversed in order of magnitude between the two intermediate categories of improvement (-11 to-20 and -21 to-40) indicating there were few discernible benefits in physical function, HRQoL and fatigue between the two. Comparing the results observed between PtGA and MDGA it appeared that larger improvements in MDGA were required before meaningful changes in physical function, HRQOL and fatigue were reported. With few exceptions, mean changes in physical function, HRQoL and fatigue scores were smaller than the MID threshold on each scale at the first category of improvement (-11 to -20 points) on the physician global assessment, while most changes in physical function, HRQoL and fatigue scores met the MID at the first category of improvement on the patient global assessment. While it has been established that a 10 point improvement in MDGA is clinically meaningful the data from this study suggest that there were no discernible benefits in physical function, HRQoL and fatigue until a change of at least -21 points.
The results of this study may be of importance to those investigators wishing to understand the importance of change in PROs in treatment studies of RA. Specifically, the value that the results of this study lends to investigators using PROs in treatment studies relates to the magnitude of change in PROs that one might expect to observe as treatment results in a greater response on clinical outcomes. With newer treatments developed for RA it has become more commonplace in RA treatment studies to go beyond the minimal threshold of improvement (ACR20 response criteria) to include evaluations of treatment efficacy in terms of ACR50 and ACR70 response thresholds. The results of this study provide potentially useful thresholds of improvement in PROs that go beyond the threshold of minimal importance established for these tools.
A limitation of this study concerns the statistical tests used to assess the statistical significance of differences in mean PRO score changes across groups of patients differing in the magnitude of clinical outcomes. Specifically, in several instances samples sizes were relatively small resulting in large differences in score variances observed across groups, which is a violation of an assumption underlying ANOVA. To address this limitation the analyses of known groups differences were conducted in two alternative ways to assess the robustness of the results determined with ANOVA. First, a non-parametric test, the Kruskal-Wallis test, was conducted. This test makes no assumptions of the equality of variances observed across comparison groups. The results of these analyses were all statistically significant confirming the results of the ANOVA tests. Second, groups with small sample sizes were collapsed with an adjacent category and ANOVA tests were conducted. Results of these analyses were all statistically significant, too. Since the main objective of this study was to determine the magnitude of mean PRO score changes associated with incremental improvement in clinical outcome measures, the original analyses were presented despite the violation of the assumption underlying ANOVA.