Skip to main content

Table 1 Overview of the Rasch analysis

From: Measuring self-reported ability to perform activities of daily living: a Rasch analysis

Steps in the analysis

Procedures

Indicators/criteria

1. Selecting a Rasch Measurement model

Evaluation of the log likelihood ratio

A non-significant (p > 0.05) log likelihood ratio indicates that data fits an interval scale model i.e. the Rasch Rating Scale Model

2. The psychometric properties of the ADL-I rating scale

Following Linacre’s guidelines [28,29,30]

Frequency distribution across response categories should be either uniform or peak in central or extreme categories to illustrate optimal use of the categories

  

Average category measures should advance monotonically up the rating scale, indicating that persons, who experience higher quality of performance, have higher item ratings

  

Scale category outfit mean square (MnSq) values should be ≤ 2.0

  

Threshold calibrations should advance monotonically, with no threshold disordering

  

Thresholds should increase by at least 1.4 logits to show distinction between categories, but by no more than 5 logits to avoid large gaps in the variable [29, 30]

3. Principal Component Analysis (PCA)

Identification of possible secondary dimensions within the data

The proportion of variance explained by the measure must be > 50%

The largest secondary dimension should have an eigenvalue < 2.0 (i.e. less than two items) to support unidimensionality [33]

 

Examination of potential secondary dimensions: division of ADL-I items into three clusters based on item loadings, estimation of a measure for each person on each cluster and performance of Pearson correlations between measures

A disattenuated correlation (correlation based on measures adjusted for their standard error) > 0.7 between clusters would support unidimensionality [33]

4. Item goodness-of-fit

Examining infit and outfit statistics. Items displaying underfit misfit were removed one at the time, in the order of highest MnSq values, considering high infit MnSq values first

MnSq values between 0.7 and 1.3 logits, combined with z values ≥ 2.0, indicated item fit [34]

 

Removal of underfitting items was planned to stop when all items met the criteria for acceptable goodness-of-fit

Assuming the PCA does not support the presence of a secondary dimension in the data, an instrument is generally considered to be unidimensional, when no more than 5% of the items fail to fit the Rasch model (p < 0.05) [32]

5. Person goodness-of-fit

Evidence of person-response validity was evaluated by examining the person goodness-of-fit statistics

The criterion for acceptable person goodness-of-fit was infit MnSq values < 1.3 logits associated with a z value of < 2.0 [35, 36]

It was accepted that, by chance, up to 5% of the sample would fail to demonstrate acceptable goodness-of-fit without a serious threat to validity [36, 37]

6. After removal of misfitting items

Persons with maximum scores on this shorter version were removed, and analyses of rating scale properties, PCA and person goodness-of-fit repeated

Determine if scale properties and unidimensionality had improved

7. Differential Item Functioning (DIF)

Determine if item difficulty estimates vary across gender and diagnostic groups

An item was considered to display DIF, when the difference in item difficulty estimates between groups was > 0.50 logits [38] and statistically significant (p < 0.01) [33, 39, 40]

8. Differential Test Functioning (DTF)

Scatterplots of the variance of person ability measures across versions were produced

A criterion was set that no more than 5% of the participants should differ significantly (z-values exceeding ± 1.96) between the two measures [41]

9. Reliability and precision

Determine if the mean item difficulty measure was appropriately targeted to the mean person ability measure

The mean person ability measure would be close to zero for a well-targeted instrument [23]

 

Examining the item-person map

Dispersion of item difficulty and person ability measures were evaluated for a reasonable match

 

Precision was evaluated by overall separation and reliability indices

Separation indices should be at least 2.0 to obtain a desired reliability coefficient of 0.80 for replicability of person ability and item difficulty ordering [42]

The closer the reliability index was to 1.0 (range 0.0 to 1.0) the better [43]