Skip to main content

Table 4 Area under the receiver operating characteristic curve (AUC) for depression measures detecting moderate improvement

From: Responsiveness of PROMIS and Patient Health Questionnaire (PHQ) Depression Scales in three clinical trials

Depression measure* Average accuracy across trials Accuracy for detecting moderate improvement
Retrospective Global Rating of Change (GRC)
Accuracy for detecting moderate improvement
Prospective Global Rating of Change (GRC)
Retro-spective GRC Pro-spective GRC CAMEO SPACE SSM CAMEO SPACE SSM
AUC (95% CI) AUC (95% CI) AUC (95% CI) AUC (95% CI) AUC (95% CI) AUC (95% CI)
PROMIS 4-item .603 .751 .625 (.522–.728) .640 (.553–.727) .545 (.473–.617) .773 (.650–.895) .819 (.700–.934) .663 (.488–.838)
PROMIS 6-item .619 .745 .653 (.553–.753) .645 (.556–.734) .560 (.487–.634) .767 (.642–.892) .811 (.697–.926) .657 (.491–.823)
PROMIS 8-item .610 .751 .632 (.530–.734) .642 (.555–.728) .557 (.483–.631) .751 (.619–.883) .816 (.702–.929) .687 (.539–.844)
PROMIS Short-form .625 .757 .680 (.583–.777) .638 (.551–.725) .558 (.484–.631) .760 (.638–.881) .836 (.729–.942) .676 (.514–.838)
PHQ-9 .636 .682 .625 (.526–.724) .669 (.592–.747) .614 (.542–.686) .705 (.568–.841) .660 (.532–.787) .681 (.515–.846)
PHQ-2 .588 .631 .587 (.482–.692) .616 (.537–.695) .562 (.492–.632) .609 (.455–.764) .679 (.553–.806) .605 (.424–.785)
SF-36 Mental Health .580 (.473–.687) .810 (.675–.944)
  1. *AUC is probability of correctly discriminating between patients who have improved and those who have not. Any improvement ≥ “a little better”; moderate improvement ≥ “moderately better”
  2. 6 month follow-up for CAMEO; 3 months for SPACE and SSM. The proportion of patients reporting moderate improvement by retrospective GRC was 25%, 24%, and 42% in CAMEO, SPACE, and SSM, respectively. The proportion reporting moderate improvement by prospective GRC was 13%, 11%, and 7% in CAMEO, SPACE, and SSM, respectively
  3. There were no significant differences at P < .01 (using Bonferroni’s correction for multiple comparisons) between any of the retrospective AUC’s. The prospective AUCs were significantly lower for the PHQ-9 (P = .008) and PHQ-2 (P = .004) compared to the PROMIS Short-form (with P = .01 to .02 range compared to the other PROMIS scales) in the SPACE trial and for the PHQ-2 (P = .007) compared to the SF-36 in the CAMEO trial