Skip to main content

Table 3 Classical Test Theory Analyses: Properties, Definitions and Acceptability Criteria

From: Development and validation of a new instrument to measure perceived risks associated with the use of tobacco and nicotine-containing products

Property Definitions and Acceptability Criteria
Data quality Data quality refers to the extent to which the scale items are accepted by the participants and, consequently, yield usable responses. Missing data are indicative of a lack of acceptability and/or a lack of applicability of the items from the perspective of the participant. Item-level missing data should be < 10% [58]
Scaling assumptions Scaling assumptions refer to the extent to which it is legitimate to sum a set of item scores, without weighting or standardisation, to produce a single total score [59, 60]. Summing scale item scores is considered legitimate, when the items:
• are approximately parallel (i.e., they measure at the same point on the scale). This criterion is satisfied when items have similar mean scores [61];
• contribute similarly to the variation of the total score (i.e., they have similar variances), otherwise they should be standardized. This criterion is satisfied when items have similar standard deviations [62];
• measure a common underlying construct, as otherwise combining them to produce a single score is not appropriate [63]. This criterion is satisfied when items have adequate corrected item-total correlation (ITC ≥ 0.30) [64];
• contain a similar proportion of information concerning the construct being measured. Otherwise items should be given different weights [61]. This criterion is satisfied when items have similar ITCs [64].
Scale-to-sample targeting refers to the extent to which the range of the construct measured by the scale matches the range of that variable in the study sample. Adequate targeting provides greater confidence in making judgments about the performance of the scale when interpreting results. Poor targeting implies that measurement precision is limited. People with extreme scores represent a sub-sample in which changes within and differences between individuals will be underestimated. Scale scores should span the entire range; floor (proportion of the sample at the minimum score for the scale) and ceiling (proportion of the sample at the maximum score) effects should be low (< 15%) [65]; and skewness, i.e., the third central moment of the distribution capturing its asymmetry, should be between ±1 [66]. There are no published criteria for item-level targeting.
Reliability Reliability refers to the extent to which scale scores reflect random error. High reliability indicates that scores are associated with little random error, i.e., are consistent. Internal consistency reliability estimates the random error associated with total scores from the intercorrelations among the items [67]. The recommended level for adequate scale internal consistency is Cronbach’s alpha coefficient ≥ 0.80 [67], and item-total correlations > 0.30 [58].