Reliability
|
Test-retest
|
Stability of scores over time when no change has occurred in the concept of interest
|
Does the PRO instrument reliably measure the concepts it was designed to measure?
|
|
Internal consistency
|
Whether the items in a domain are intercorrelated, as evidenced by an internal consistency statistic (e.g., coefficient alpha)
|
Were appropriate reliability tests conducted?
|
|
Inter-interviewer reproducibility (for interviewer-administered PROs only)
|
Agreement between responses when the PRO is administered by two or more different interviewers
|
What was the quality of the evidence of reliability?
|
Validity
|
Content-related
|
Whether items and response options are relevant and are comprehensive measures of the domain or concept
|
Do items in the verbatim copy of the PRO instrument appear to measure the concepts they are intended to measure in a useful way?
|
| | |
Have patients similar to those participating in the clinical trial confirmed the completeness and relevance of all items?
|
|
Ability to measure the concept (also known as construct-related validity; can include tests for discriminant, convergent, and known-groups validity)
|
Whether relationships among items, domains, and concepts conform to what is predicted by the conceptual framework for the PRO instrument itself and its validation hypotheses.
|
Do observed relationships between the items and domains confirm the hypotheses in the conceptual framework? Do results compare favorably with results from a similar but independent measure?
|
| | |
Do results distinguish one group from another based on a prespecified variable that is relevant to the concept of interest?
|
|
Ability to predict future outcomes (also known as predictive validity)
|
Whether future events or status can be predicted by changes in the PRO scores
|
Do PRO scores predict subsequent events or outcomes accurately?
|
Ability to detect change
|
Includes calculations of effect size and standard error of measurement among others
|
Whether PRO scores are stable when there is no change in the patient, and the scores change in the predicted direction when there has been a notable change in the patient as evidenced by some effect size statistic. Ability to detect change is always specific to a time interval.
|
Has ability to detect change been demonstrated in a comparative trial setting, comparing mean group scores or proportion of patients who experienced a response to the treatment?
|
| | |
Has ability to detect change been assessed for the time interval appropriate to study?
|
Interpretability
|
Smallest difference that is considered clinically important; this can be a specified difference (the minimum important difference (MID)) or, in some cases, any detectable difference. The MID is used as a benchmark to interpret mean score differences between treatment arms in a clinical trial
|
Difference in mean score between treatment groups that provides convincing evidence of a treatment benefit. Can be based on experience with the measure using a distribution-based approach, a clinical or nonclinical anchor, an empirical rule, or a combination of approaches. The definition of an MID using a clinical anchor is sometimes called an MCID.
|
The FDA is specifically requesting comment on appropriate review of derivation and application of an MID in the clinical trial setting.
|
|
Responder definition – used to identify responders in clinical trials for analyzing differences in the proportion of responders between treatment arms
|
Change in score that would be clear evidence that an individual patient experienced a treatment benefit. Can be based on experience with the measure using a distribution-based approach, a clinical or nonclinical anchor, an empirical rule, or a combination of approaches.
|
The FDA is specifically requesting comment on appropriate review of derivation and application of responder definitions when used in clinical trials.
|