Measurement properties of the Dutch–Flemish patient-reported outcomes measurement information system (PROMIS) physical function item bank and instruments: a systematic review

Abma, Inger L.; Butje, Bas J. D.; ten Klooster, Peter M.; van der Wees, Philip J.

doi:10.1186/s12955-020-01647-y

Table 1 Criteria for good measurement properties^a

From: Measurement properties of the Dutch–Flemish patient-reported outcomes measurement information system (PROMIS) physical function item bank and instruments: a systematic review

Measurement property	Rating	Criteria
Main category: validity
Content validity	+	≥ 85% of the items are relevant for the construct of interest, the target population, and the context of use AND no key concepts are missing (comprehensiveness) AND > 85% of items is comprehensible for the population of interest^b
	?	Not all information for ‘+’ reported
	−	Criteria for ‘+ ’ not met
Structural validity^c	+	CTT CFA: CFI or TLI or comparable measure > 0.95 OR RMSEA < 0.06 OR SRMR < 0.08
	+	IRT/Rasch No violation of unidimensionality: CFI or TLI or comparable measure > 0.95 OR RMSEA < 0.06 OR SRMR < 0.08 OR (for item banks only) Bifactor model: Standardized loadings on common factor (H) are > 0.30 and larger than loadings on group factors OR high coefficient omega (> 0.80) and a high ECV (> 0.60) AND (for item banks: OR) No or limited violation of local independence: Residual correlations among the items after controlling for the dominant factor < 0.20 in ≥ 95% of item pairs OR in < 95% of item pairs but evidence shown that impact is negligible OR Q3′s < 0.37 AND No violation of monotonicity: Adequate looking graphs OR item scalability (Hi) > 0.30 AND (not for item banks) Adequate model fit IRT: χ² p-value > 0.001 Rasch: infit and outfit mean squares ≥ 0.5 and ≤ 1.5 OR Z-standardized values > − 2 and < 2
	?	Not all information for ‘+’ reported OR residual correlations among the items after controlling for the dominant factor < 0.20 in < 95% of item pairs but no evidence shown on the impact
	−	Criteria for ‘+’ not met
Hypothesis testing for construct validity	+	Result is in accordance with hypothesis^d
	?	No hypothesis defined (by the review team)
	−	Result is not in accordance with hypothesis^d
Cross-cultural validity/measurement invariance	+	No important differences found between group factors (such as age, gender, language) in multiple group factor analysis OR DIF in ≤ 5% of item pairs for group factors (e.g., McFadden’s R² < 0.02) OR DIF in > 5% of item pairs but evidence shown that impact is negligible
	?	No multiple group factor analysis OR DIF analysis performed, OR DIF in > 5% of item pairs and no evidence shown on impact
	−	Important differences between group factors OR DIF was found in > 5% of item pairs with no mention of impact or evidence showing that impact is not negligible
Main category: Reliability
Internal consistency/measurement precision	+	CTT At least low evidence^e for sufficient structural validity^f AND Cronbach’s alpha(s) ≥ 0.70 for each unidimensional scale or subscale IRT At least low evidence^e for sufficient structural validity^f AND reliability coefficient ≥ 0.90 over a range of at least two standard deviations around the average of the study population (or ≥ 68% of the study population)
	?	Criteria for “At least low evidence^e for sufficient structural validity^f” not met
	−	Criteria for “At least low evidence^e for sufficient structural validity^f^” AND other criteria for + not met
Reliability	+	ICC or weighted Kappa ≥ 0.70
	?	ICC or weighted Kappa not reported
	−	ICC or weighted Kappa < 0.70
Measurement error	+	SDC or LoA < MIC^e
	?	MIC not defined
	−	SDC or LoA > MIC^e
Main category: Responsiveness
Responsiveness	+	Result is in accordance with hypothesis^d OR AUC ≥ 0.70
	?	No hypothesis defined (by the review team)
	−	Result is not in accordance with hypothesis^d OR AUC < 0.70

AUC, area under the curve; CFA, confirmatory factor analysis; CFI, comparative fit index; CTT, classical test theory; DIF, differential item functioning; ECV, explained common variance; ICC, intraclass correlation coefficient; IRT, item response theory; LoA, limits of agreement; MIC, minimal important change; RMSEA, root mean square error of approximation; SEM, standard error of measurement; SDC, smallest detectable change; SRMR, standardized root mean residuals; TLI, Tucker–Lewis index
“ + ” = sufficient, “?” = indeterminate, “ − ” = insufficient
^aAdjusted from the COSMIN criteria [30, 31] as described in the “Methods” section
^bFrom the COSMIN guidelines on evaluating content validity [30]
^cStructural validity is not relevant for CATs
^dThe results of all studies taken together should show that 75% of the results are in accordance with the hypotheses [31]
^eAs defined by grading the evidence according to the GRADE approach
^fThis evidence may come from a different study

Back to article page

ISSN: 1477-7525

Contact us

Submission enquiries: journalsubmissions@springernature.com

Health and Quality of Life Outcomes

Contact us