Research questions from a users’ perspective | Research questions from a psychometric perspective | Psychometric property | Analyses per subscale | Statistic | Criteria | Reference | Software package with reference | Results | |
---|---|---|---|---|---|---|---|---|---|
GMH | GPH | ||||||||
1. Is it legitimate to calculate IRT-based score for this measure? | Do the items assess only one construct? | Unidimensionalitya | CFA | CFI |  ≥ 0.95 | [44] | Mplus software (version 6.0) [45] | 0.98 | 0.99 |
 |  |  |  | TLI |  ≥ 0.95 | [44] |  | 0.95 | 0.97 |
 |  |  |  | RMSEA |  ≤ 0.06 | [44] |  | 0.22 | 0.12 |
 |  |  |  | SRMR |  ≤ 0.08 | [44] |  | 0.04 | 0.03 |
 |  |  | Exploratory Bifactor Analysis | ECV |  > 0.70 | [46] | R package psych (version 1.7.8) [47] | 0.80 | 0.71 |
 |  |  |  | ωH |  > 0.80 | [48] |  | 0.75 | 0.65 |
 | Do the items relate to the construct being measured only? | Local independence | Residual correlation matrixb | r |  ≤ 0.20 | [25] | Mplus software (version 6.0) [49] | all r < 0.20 | all r < 0.20 |
 | Do the probabilities of higher responses to the items increase with increasing levels of the construct? | Monotonicity | Mokken scale analysis | Hi |  ≥ 0.30 | [50] | R-package Mokken (version 2.8.4) [51] | See Table 3 | See Table 3 |
 |  |  |  | H |  > 0.50 | [50] |  | 0.60 | 0.54 |
 |  |  |  | ICCsc | Graphic display |  | See Fig. 1a | See Fig. 1b | |
 | Can the relationship between the items and the construct be described using an IRT-model? | IRT-model fit | Logistic GRM model fit | S-X2 and p of the items | p ≥ 0.001* | R-package mirt (version 3.3.2) [54] | See Table 3 | See Table 3 | |
2. Is this measure able to discriminate between different levels of the construct/trait? | Do the items have the ability to discriminate between different levels of the construct/trait? | Range of item discrimination | IRT-modelling | αd |  > 1.0 | [52] | R-package mirt (version 3.3.2) [54] | See Table 3 | See Table 3 |
3. Does this measure cover the relevant range of the construct/trait? | Do the items cover the relevant range of the construct/trait? | Range of item difficulties | IRT-modelling | βe | N/A | [52] | R-package mirt (version 3.3.2) [54] | See Table 3 | See Table 3 |
4. Is this measure reliable? | What is the overall precision of this measure in this sample? | Internal consistency | Internal consistency | Cronbach’s alpha |  > 0.70 | [55] | SPSS software. Version 21 for Windows | 0.83 | 0.78 |
 | What is the contribution of the individual items to this overall precision? | Internal consistency | Internal consistency | Cronbach’s alpha if item deleted | Reduction of totalalpha | [56] |  | See Table 3 | See Table3 |
 |  |  | Corrected item-to-total correlation | rs |  ≥ 0.40 | [57] |  | See Table 3 | See Table 3 |
 | What is the precision of this measure at different levels of the construct/trait? | Precision | TIC and IIC | Graphic display |  |  | See Fig. 2a | See Fig. 2b | |
5/6. Does this measure function in the same way in different (sub)groups? | Can this measure be used to compare (sub)groups in terms of demographic variables? | Measurement invariance | DIFg | Change in Mcfadden R2 |  > 0.02 | R-package lordif (version 0.3–3) [58] | See Table 3 | See Table 3 | |
 | Can this measure be used to compare the scores of English-speaking persons (who responded to its English original version) and Dutch-speaking persons (who responded to its Dutch-Flemish translation)? | Cross-cultural validity | DIFg | Change in Mcfadden R2 |  > 0.02 | R-package lordif (version 0.3–3) [58] | See Table 3 | See Table 3 |