Skip to main content

Table 1 Summary of the research questions, the psychometric properties studied, the statistical analyses applied, and the results for the PROMIS Global Mental Health subscale (4 items) and the PROMIS Global Physical Health subscale (4 items) in the total Dutch general population sample (N = 4370)

From: Psychometric properties of the patient-reported outcomes measurement information system scale v1.2: global health (PROMIS-GH) in a Dutch general population

Research questions from a users’ perspective

Research questions from a psychometric perspective

Psychometric property

Analyses per subscale

Statistic

Criteria

Reference

Software package with reference

Results

GMH

GPH

1. Is it legitimate to calculate IRT-based score for this measure?

Do the items assess only one construct?

Unidimensionalitya

CFA

CFI

 ≥ 0.95

[44]

Mplus software (version 6.0) [45]

0.98

0.99

    

TLI

 ≥ 0.95

[44]

 

0.95

0.97

    

RMSEA

 ≤ 0.06

[44]

 

0.22

0.12

    

SRMR

 ≤ 0.08

[44]

 

0.04

0.03

   

Exploratory Bifactor Analysis

ECV

 > 0.70

[46]

R package psych (version 1.7.8) [47]

0.80

0.71

    

ωH

 > 0.80

[48]

 

0.75

0.65

 

Do the items relate to the construct being measured only?

Local independence

Residual correlation matrixb

r

 ≤ 0.20

[25]

Mplus software (version 6.0) [49]

all r < 0.20

all r < 0.20

 

Do the probabilities of higher responses to the items increase with increasing levels of the construct?

Monotonicity

Mokken scale analysis

Hi

 ≥ 0.30

[50]

R-package Mokken (version 2.8.4) [51]

See Table 3

See Table 3

    

H

 > 0.50

[50]

 

0.60

0.54

    

ICCsc

Graphic display

[52, 53]

 

See Fig. 1a

See Fig. 1b

 

Can the relationship between the items and the construct be described using an IRT-model?

IRT-model fit

Logistic GRM model fit

S-X2 and p of the items

p ≥ 0.001*

[44, 45]

R-package mirt (version 3.3.2) [54]

See Table 3

See Table 3

2. Is this measure able to discriminate between different levels of the construct/trait?

Do the items have the ability to discriminate between different levels of the construct/trait?

Range of item discrimination

IRT-modelling

αd

 > 1.0

[52]

R-package mirt (version 3.3.2) [54]

See Table 3

See Table 3

3. Does this measure cover the relevant range of the construct/trait?

Do the items cover the relevant range of the construct/trait?

Range of item difficulties

IRT-modelling

βe

N/A

[52]

R-package mirt (version 3.3.2) [54]

See Table 3

See Table 3

4. Is this measure reliable?

What is the overall precision of this measure in this sample?

Internal consistency

Internal consistency

Cronbach’s alpha

 > 0.70

[55]

SPSS software. Version 21 for Windows

0.83

0.78

 

What is the contribution of the individual items to this overall precision?

Internal consistency

Internal consistency

Cronbach’s alpha if item deleted

Reduction of totalalpha

[56]

 

See Table 3

See Table3

   

Corrected item-to-total correlation

rs

 ≥ 0.40

[57]

 

See Table 3

See Table 3

 

What is the precision of this measure at different levels of the construct/trait?

Precision

TIC and IIC

Graphic display

 

[52, 53]

 

See Fig. 2a

See Fig. 2b

5/6. Does this measure function in the same way in different (sub)groups?

Can this measure be used to compare (sub)groups in terms of demographic variables?

Measurement invariance

DIFg

Change in Mcfadden R2

 > 0.02

[44, 58, 59]

R-package lordif (version 0.3–3) [58]

See Table 3

See Table 3

 

Can this measure be used to compare the scores of English-speaking persons (who responded to its English original version) and Dutch-speaking persons (who responded to its Dutch-Flemish translation)?

Cross-cultural validity

DIFg

Change in Mcfadden R2

 > 0.02

[44, 58, 59]

R-package lordif (version 0.3–3) [58]

See Table 3

See Table 3

  1. α, Item Discrimination Parameters estimated under the Graded Response Model; β, Item Difficulty Parameters estimated under the Graded Response Model; CFA, Confirmatory Factor Analysis; CFI, Comparative Fit Index; DIF, Differential Item Functioning; ECV, Explained Common Variance; GMH, General Mental Health; GPH, General Physical Health; GRM, Graded Response Model; H, scalability coefficient for the scale; Hi, scalability coefficient for the item; ICC, Item Characteristics Curve; IIC, Item Information Curve; IRT, Item Response Theory; N/A, not appropriate;p, p-value; r, residual correlation; rs, Spearman correlation coefficient; RMSEA, Root Means Square Error of Approximation; S-X2, item fit statistics under the Graded Response Model; SRMR, Standardized Root Mean Square Residual; TIC, Test Information Curve; TLI, Tucker Lewis Index; ωH, Omega-Hierarchical
  2. Statistics values beyond the recommended cut-off presented in bold
  3. The research questions have been formulated from an user perspective (the clinicians or researchers who intend to apply the measure) and from a psychometric perspective (the researchers that investigate the psychometric properties of a measure)
  4. The numbers next to the questions refer to the numbers of the measurement property reported in the methods
  5. aA confirmatory two-factor analysis on the entire Global Health measure was initially run in order to confirm the two-factor structure. Once the two-factor structure was confirmed, analyses were performed on each subscale separately to confirm their unidimensionality, i.e., a unidimensional CFA [60] (fitted using a mean- and variance-adjusted Weighted Least Squares estimator) and an Exploratory Bifactor Analysis [48] (performed using a Schmid-Leiman procedure [61])
  6. bResulting from the single factor CFA
  7. cICC graphs in Fig. 1, plotted for each item, visually illustrate the probability to select an item response across the level of ability
  8. d Item slopes indicate the ability of an item to discriminate between people with adjoining values on the latent trait
  9. eItem thresholds refer to item difficulty, and locate the items along the latent trait
  10. f TICs and IICs plot the information across the latent trait at the total score-level or at item-level, respectively [52, 53]. In a unidimensional scale, the standard error (SE) is the reciprocal of the information (1/information) [62]; for each level of the latent trait and for each item, item information can be converted to a measure of reliability which can be interpreted as a Cronbach’s alpha using the following formula: 1-(SE) [52]; Information values of 10, 5 and 3.45 are therefore equal to internal reliability values of 0.90, 0.80, and 0.70 respectively [62]
  11. g A DIF [53]analysis was performed using a ordinal logistic regression framework. In the ordinal logistic regression framework, three regression models are compared to detect DIF, namely model 1 (item responses are predicted by the latent trait only), model 2 (item responses are predicted by the latent trait and group membership) and model 3 (item responses are predicted by the latent trait, group membership and the interaction between these two terms). Uniform and non-uniform DIF are present if model 2 has better fit than model 1 and if model 3 has better fit than model 2, respectively. The impact of DIF on item score and the total score was assessed by the visual display of ICCs per group and test characteristic curves per group, respectively
  12. *Given the large sample size (N = 4370), we drew 10 mutually exclusive random sample of 473 subject each in order to minimize the chance to yield statistically significant results also for small fit differences