Skip to main content

Table 1 Criteria for good measurement propertiesa

From: Measurement properties of the Dutch–Flemish patient-reported outcomes measurement information system (PROMIS) physical function item bank and instruments: a systematic review

Measurement property

Rating

Criteria

Main category: validity

Content validity

 + 

 ≥ 85% of the items are relevant for the construct of interest, the target population, and the context of use AND no key concepts are missing (comprehensiveness) AND > 85% of items is comprehensible for the population of interestb

?

Not all information for ‘+’ reported

 − 

Criteria for ‘+ ’ not met

Structural validityc

 + 

CTT

CFA: CFI or TLI or comparable measure > 0.95 OR RMSEA < 0.06 OR SRMR < 0.08

IRT/Rasch

No violation of unidimensionality: CFI or TLI or comparable measure > 0.95 OR RMSEA < 0.06 OR SRMR < 0.08

OR (for item banks only)

Bifactor model: Standardized loadings on common factor (H) are > 0.30 and larger than loadings on group factors OR high coefficient omega (> 0.80) and a high ECV (> 0.60)

AND (for item banks: OR)

No or limited violation of local independence: Residual correlations among the items after controlling for the dominant factor < 0.20 in ≥ 95% of item pairs OR in < 95% of item pairs but evidence shown that impact is negligible OR Q3′s < 0.37

AND

No violation of monotonicity: Adequate looking graphs OR item scalability (Hi) > 0.30

AND (not for item banks)

Adequate model fit

IRT: χ2p-value > 0.001

Rasch: infit and outfit mean squares ≥ 0.5 and ≤ 1.5 OR Z-standardized values >  − 2 and < 2

?

Not all information for ‘+’ reported OR residual correlations among the items after controlling for the dominant factor < 0.20 in < 95% of item pairs but no evidence shown on the impact

 − 

Criteria for ‘+’ not met

Hypothesis testing for construct validity

 + 

Result is in accordance with hypothesisd

?

No hypothesis defined (by the review team)

Result is not in accordance with hypothesisd

Cross-cultural validity/measurement invariance

 + 

No important differences found between group factors (such as age, gender, language) in multiple group factor analysis OR DIF in ≤ 5% of item pairs for group factors (e.g., McFadden’s R2 < 0.02) OR DIF in > 5% of item pairs but evidence shown that impact is negligible

?

No multiple group factor analysis OR DIF analysis performed, OR DIF in > 5% of item pairs and no evidence shown on impact

Important differences between group factors OR DIF was found in > 5% of item pairs with no mention of impact or evidence showing that impact is not negligible

Main category: Reliability

Internal consistency/measurement precision

 + 

CTT

At least low evidencee for sufficient structural validityf AND Cronbach’s alpha(s) ≥ 0.70 for each unidimensional scale or subscale

IRT

At least low evidencee for sufficient structural validityf AND reliability coefficient ≥ 0.90 over a range of at least two standard deviations around the average of the study population (or ≥ 68% of the study population)

?

Criteria for “At least low evidencee for sufficient structural validityf” not met

 − 

Criteria for “At least low evidencee for sufficient structural validityf AND other criteria for + not met

Reliability

 + 

ICC or weighted Kappa ≥ 0.70

?

ICC or weighted Kappa not reported

 − 

ICC or weighted Kappa < 0.70

Measurement error

 + 

SDC or LoA < MICe

?

MIC not defined

 − 

SDC or LoA > MICe

Main category: Responsiveness

Responsiveness

 + 

Result is in accordance with hypothesisd OR AUC ≥ 0.70

?

No hypothesis defined (by the review team)

 − 

Result is not in accordance with hypothesisd OR AUC < 0.70

  1. AUC, area under the curve; CFA, confirmatory factor analysis; CFI, comparative fit index; CTT, classical test theory; DIF, differential item functioning; ECV, explained common variance; ICC, intraclass correlation coefficient; IRT, item response theory; LoA, limits of agreement; MIC, minimal important change; RMSEA, root mean square error of approximation; SEM, standard error of measurement; SDC, smallest detectable change; SRMR, standardized root mean residuals; TLI, Tucker–Lewis index
  2. “ + ” = sufficient, “?” = indeterminate, “ − ” = insufficient
  3. aAdjusted from the COSMIN criteria [30, 31] as described in the “Methods” section
  4. bFrom the COSMIN guidelines on evaluating content validity [30]
  5. cStructural validity is not relevant for CATs
  6. dThe results of all studies taken together should show that 75% of the results are in accordance with the hypotheses [31]
  7. eAs defined by grading the evidence according to the GRADE approach
  8. fThis evidence may come from a different study