Psychometric properties of the patient-reported outcomes measurement information system scale v1.2: global health (PROMIS-GH) in a Dutch general population

Pellicciari, Leonardo; Chiarotto, Alessandro; Giusti, Emanuele; Crins, Martine H. P.; Roorda, Leo D.; Terwee, Caroline B.

doi:10.1186/s12955-021-01855-0

Table 1 Summary of the research questions, the psychometric properties studied, the statistical analyses applied, and the results for the PROMIS Global Mental Health subscale (4 items) and the PROMIS Global Physical Health subscale (4 items) in the total Dutch general population sample (N = 4370)

From: Psychometric properties of the patient-reported outcomes measurement information system scale v1.2: global health (PROMIS-GH) in a Dutch general population

Research questions from a users’ perspective	Research questions from a psychometric perspective	Psychometric property	Analyses per subscale	Statistic	Criteria	Reference	Software package with reference	Results
Research questions from a users’ perspective	Research questions from a psychometric perspective	Psychometric property	Analyses per subscale	Statistic	Criteria	Reference	Software package with reference	GMH	GPH
1. Is it legitimate to calculate IRT-based score for this measure?	Do the items assess only one construct?	Unidimensionality^a	CFA	CFI	≥ 0.95	[44]	Mplus software (version 6.0) [45]	0.98	0.99
				TLI	≥ 0.95	[44]		0.95	0.97
				RMSEA	≤ 0.06	[44]		0.22	0.12
				SRMR	≤ 0.08	[44]		0.04	0.03
			Exploratory Bifactor Analysis	ECV	> 0.70	[46]	R package psych (version 1.7.8) [47]	0.80	0.71
				ωH	> 0.80	[48]		0.75	0.65
	Do the items relate to the construct being measured only?	Local independence	Residual correlation matrix^b	r	≤ 0.20	[25]	Mplus software (version 6.0) [49]	all r < 0.20	all r < 0.20
	Do the probabilities of higher responses to the items increase with increasing levels of the construct?	Monotonicity	Mokken scale analysis	H_i	≥ 0.30	[50]	R-package Mokken (version 2.8.4) [51]	See Table 3	See Table 3
				H	> 0.50	[50]		0.60	0.54
				ICCs^c	Graphic display	[52, 53]		See Fig. 1a	See Fig. 1b
	Can the relationship between the items and the construct be described using an IRT-model?	IRT-model fit	Logistic GRM model fit	S-X² and p of the items	p ≥ 0.001*	[44, 45]	R-package mirt (version 3.3.2) [54]	See Table 3	See Table 3
2. Is this measure able to discriminate between different levels of the construct/trait?	Do the items have the ability to discriminate between different levels of the construct/trait?	Range of item discrimination	IRT-modelling	α^d	> 1.0	[52]	R-package mirt (version 3.3.2) [54]	See Table 3	See Table 3
3. Does this measure cover the relevant range of the construct/trait?	Do the items cover the relevant range of the construct/trait?	Range of item difficulties	IRT-modelling	β^e	N/A	[52]	R-package mirt (version 3.3.2) [54]	See Table 3	See Table 3
4. Is this measure reliable?	What is the overall precision of this measure in this sample?	Internal consistency	Internal consistency	Cronbach’s alpha	> 0.70	[55]	SPSS software. Version 21 for Windows	0.83	0.78
	What is the contribution of the individual items to this overall precision?	Internal consistency	Internal consistency	Cronbach’s alpha if item deleted	Reduction of totalalpha	[56]		See Table 3	See Table3
			Corrected item-to-total correlation	r_s	≥ 0.40	[57]		See Table 3	See Table 3
	What is the precision of this measure at different levels of the construct/trait?	Precision	TIC and IIC	Graphic display		[52, 53]		See Fig. 2a	See Fig. 2b
5/6. Does this measure function in the same way in different (sub)groups?	Can this measure be used to compare (sub)groups in terms of demographic variables?	Measurement invariance	DIF^g	Change in Mcfadden R2	> 0.02	[44, 58, 59]	R-package lordif (version 0.3–3) [58]	See Table 3	See Table 3
	Can this measure be used to compare the scores of English-speaking persons (who responded to its English original version) and Dutch-speaking persons (who responded to its Dutch-Flemish translation)?	Cross-cultural validity	DIF^g	Change in Mcfadden R2	> 0.02	[44, 58, 59]	R-package lordif (version 0.3–3) [58]	See Table 3	See Table 3

α, Item Discrimination Parameters estimated under the Graded Response Model; β, Item Difficulty Parameters estimated under the Graded Response Model; CFA, Confirmatory Factor Analysis; CFI, Comparative Fit Index; DIF, Differential Item Functioning; ECV, Explained Common Variance; GMH, General Mental Health; GPH, General Physical Health; GRM, Graded Response Model; H, scalability coefficient for the scale; H_i, scalability coefficient for the item; ICC, Item Characteristics Curve; IIC, Item Information Curve; IRT, Item Response Theory; N/A, not appropriate;p, p-value; r, residual correlation; r_s, Spearman correlation coefficient; RMSEA, Root Means Square Error of Approximation; S-X², item fit statistics under the Graded Response Model; SRMR, Standardized Root Mean Square Residual; TIC, Test Information Curve; TLI, Tucker Lewis Index; ωH, Omega-Hierarchical
Statistics values beyond the recommended cut-off presented in bold
The research questions have been formulated from an user perspective (the clinicians or researchers who intend to apply the measure) and from a psychometric perspective (the researchers that investigate the psychometric properties of a measure)
The numbers next to the questions refer to the numbers of the measurement property reported in the methods
^aA confirmatory two-factor analysis on the entire Global Health measure was initially run in order to confirm the two-factor structure. Once the two-factor structure was confirmed, analyses were performed on each subscale separately to confirm their unidimensionality, i.e., a unidimensional CFA [60] (fitted using a mean- and variance-adjusted Weighted Least Squares estimator) and an Exploratory Bifactor Analysis [48] (performed using a Schmid-Leiman procedure [61])
^bResulting from the single factor CFA
^cICC graphs in Fig. 1, plotted for each item, visually illustrate the probability to select an item response across the level of ability
^d Item slopes indicate the ability of an item to discriminate between people with adjoining values on the latent trait
^eItem thresholds refer to item difficulty, and locate the items along the latent trait
^f TICs and IICs plot the information across the latent trait at the total score-level or at item-level, respectively [52, 53]. In a unidimensional scale, the standard error (SE) is the reciprocal of the information (1/information) [62]; for each level of the latent trait and for each item, item information can be converted to a measure of reliability which can be interpreted as a Cronbach’s alpha using the following formula: 1-(SE) [52]; Information values of 10, 5 and 3.45 are therefore equal to internal reliability values of 0.90, 0.80, and 0.70 respectively [62]
^g A DIF [53]analysis was performed using a ordinal logistic regression framework. In the ordinal logistic regression framework, three regression models are compared to detect DIF, namely model 1 (item responses are predicted by the latent trait only), model 2 (item responses are predicted by the latent trait and group membership) and model 3 (item responses are predicted by the latent trait, group membership and the interaction between these two terms). Uniform and non-uniform DIF are present if model 2 has better fit than model 1 and if model 3 has better fit than model 2, respectively. The impact of DIF on item score and the total score was assessed by the visual display of ICCs per group and test characteristic curves per group, respectively
^*Given the large sample size (N = 4370), we drew 10 mutually exclusive random sample of 473 subject each in order to minimize the chance to yield statistically significant results also for small fit differences

Back to article page

ISSN: 1477-7525

Contact us

Submission enquiries: journalsubmissions@springernature.com

Health and Quality of Life Outcomes

Contact us