Patient-reported outcome measures in patients with peripheral arterial disease: a systematic review of psychometric properties

Poku, Edith; Duncan, Rosie; Keetharuth, Anju; Essat, Munira; Phillips, Patrick; Woods, Helen Buckley; Palfreyman, Simon; Jones, Georgina; Kaltenthaler, Eva; Michaels, Jonathan

doi:10.1186/s12955-016-0563-y

Table 2 Appraisal criteria for assessing the psychometric properties of patient reported outcome measures

From: Patient-reported outcome measures in patients with peripheral arterial disease: a systematic review of psychometric properties

Domain	Criteria
Test re-test reliability	Reliability is the ability of a measure to reproduce the same value on two separate administrations when there has been no change in health. The intra-class correlation/ weighted kappa score should be ≥ 0.70 for group comparisons and ≥ 0.90 if scores are going to be used for decisions about an individual based on their score [2]. The mean difference (paired t test or Wilcoxon signed-rank test) between time point 1 (T₁) and time point 2 (T₂) and the 95% CI should also be reported.
Internal consistency	Internal consistency is an assessment of whether the items are measuring the same thing. A Cronbach’s alpha score of ≥ 0.70 is considered good and it should not exceed ≥0.92 for group comparisons as this is taken to indicate that items in the scale could be redundant. Item total correlations should be ≥0.20 [14].
Content validity	Content validity measures the extent to which the items reflect the domains of interest in a way that is clear. To achieve good content validity, there must be evidence that the instrument has been developed by consulting patients, experts as well as undertaking a literature review. Patients should be involved in the development stage and item generation. The opinion of patient representatives should be sought on the constructed scale [2, 14, 16].
Construct validity	Construct validity assesses how well an instrument measures what it was intended to measure. A correlation coefficient of ≥0.60 is considered as strong evidence of construct validity. Authors should make specific directional hypotheses and estimate the strength of correlation before testing [2, 14, 15].
Criterion validity	Criterion validity assesses the degree of empirical association of the PROM with external criteria or other measures. A good argument should be made as to why an instrument is a gold standard and correlation with the gold standard should be ≥ 0.70 [15].
Responsiveness	Responsiveness assesses the ability of the PROM to detect changes when changes are expected. Available methods to measure responsiveness include t-tests, effect size, standardised response means or responsiveness statistics, Guyatts’ responsiveness index. Standardised effects sizes and SRMs of less than 0.2 are considered small, 0.5 moderate, and 0.8 [17]. There should be statistically significant changes in score of an expected magnitude [8].
Floor-ceiling effects	A floor or celling effect is considered if 15% of respondents are achieving the lowest or the highest score on the instrument, respectively [15].
Acceptability	Acceptability is reflected by the completeness of the data supplied. 80% or more of the data should be complete [16].

Back to article page

ISSN: 1477-7525

Contact us

Submission enquiries: journalsubmissions@springernature.com

Health and Quality of Life Outcomes

Contact us