From: Measuring self-reported ability to perform activities of daily living: a Rasch analysis
Steps in the analysis | Procedures | Indicators/criteria |
---|---|---|
1. Selecting a Rasch Measurement model | Evaluation of the log likelihood ratio | A non-significant (p > 0.05) log likelihood ratio indicates that data fits an interval scale model i.e. the Rasch Rating Scale Model |
2. The psychometric properties of the ADL-I rating scale | Frequency distribution across response categories should be either uniform or peak in central or extreme categories to illustrate optimal use of the categories | |
Average category measures should advance monotonically up the rating scale, indicating that persons, who experience higher quality of performance, have higher item ratings | ||
Scale category outfit mean square (MnSq) values should be ≤ 2.0 | ||
Threshold calibrations should advance monotonically, with no threshold disordering | ||
Thresholds should increase by at least 1.4 logits to show distinction between categories, but by no more than 5 logits to avoid large gaps in the variable [29, 30] | ||
3. Principal Component Analysis (PCA) | Identification of possible secondary dimensions within the data | The proportion of variance explained by the measure must be > 50% The largest secondary dimension should have an eigenvalue < 2.0 (i.e. less than two items) to support unidimensionality [33] |
Examination of potential secondary dimensions: division of ADL-I items into three clusters based on item loadings, estimation of a measure for each person on each cluster and performance of Pearson correlations between measures | A disattenuated correlation (correlation based on measures adjusted for their standard error) > 0.7 between clusters would support unidimensionality [33] | |
4. Item goodness-of-fit | Examining infit and outfit statistics. Items displaying underfit misfit were removed one at the time, in the order of highest MnSq values, considering high infit MnSq values first | MnSq values between 0.7 and 1.3 logits, combined with z values ≥ 2.0, indicated item fit [34] |
Removal of underfitting items was planned to stop when all items met the criteria for acceptable goodness-of-fit | Assuming the PCA does not support the presence of a secondary dimension in the data, an instrument is generally considered to be unidimensional, when no more than 5% of the items fail to fit the Rasch model (p < 0.05) [32] | |
5. Person goodness-of-fit | Evidence of person-response validity was evaluated by examining the person goodness-of-fit statistics | The criterion for acceptable person goodness-of-fit was infit MnSq values < 1.3 logits associated with a z value of < 2.0 [35, 36] It was accepted that, by chance, up to 5% of the sample would fail to demonstrate acceptable goodness-of-fit without a serious threat to validity [36, 37] |
6. After removal of misfitting items | Persons with maximum scores on this shorter version were removed, and analyses of rating scale properties, PCA and person goodness-of-fit repeated | Determine if scale properties and unidimensionality had improved |
7. Differential Item Functioning (DIF) | Determine if item difficulty estimates vary across gender and diagnostic groups | An item was considered to display DIF, when the difference in item difficulty estimates between groups was > 0.50 logits [38] and statistically significant (p < 0.01) [33, 39, 40] |
8. Differential Test Functioning (DTF) | Scatterplots of the variance of person ability measures across versions were produced | A criterion was set that no more than 5% of the participants should differ significantly (z-values exceeding ± 1.96) between the two measures [41] |
9. Reliability and precision | Determine if the mean item difficulty measure was appropriately targeted to the mean person ability measure | The mean person ability measure would be close to zero for a well-targeted instrument [23] |
Examining the item-person map | Dispersion of item difficulty and person ability measures were evaluated for a reasonable match | |
Precision was evaluated by overall separation and reliability indices | Separation indices should be at least 2.0 to obtain a desired reliability coefficient of 0.80 for replicability of person ability and item difficulty ordering [42] The closer the reliability index was to 1.0 (range 0.0 to 1.0) the better [43] |