The findings of the present study suggest that the SF-36 exhibits good validity and reliability when used in long-term survivors of childhood cancer. Apart from ceiling effects observed in some scales, most properties were satisfactory with regards to conventional psychometric criteria.
Eighty-eight percent of the 10,189 subjects completed all items, which is identical to the 88% observed in the UK general population [14, 15]. The percentage of missing values per item ranged between 0.5 to 2.9 percent, indicating that no particular item on the SF-36 had a substantially higher completion rate than other items. Missing values per item were roughly similar to those found in a UK normative general population sample (range: 0.43–1.91) [14, 15].
Ceiling effects were found to be highest in the role limitation-physical (79.0%) and role limitation-emotional (77.3%) scales. The ceiling effect observed in these two scales can be explained, at least partly, by the dichotomous format (yes vs. no) of the items (4a-d, 5a-c) that measure these concepts. Ceiling effects in the role limitation-physical and role limitation-emotional scale scores have been found in other populations [10], so that this effect cannot not solely be attributed to the specific responses in childhood cancer survivors. The limitations of these dichotomous items have been recognised by their developers and, in the newer version of the SF-36 (version 2), a 5-point Likert scale has been used, in stead of the dichotomous scale. According to the developers, this should reduce the ceiling and floor effects generally observed within these items and, thereby, should improve precision.
The ceiling effects observed in the PF, RP, BP, SF, and RE scale scores may be a consequence of the relative youth of the childhood cancer survivor cohort. A young population generally has better health status than the general population, consequently, it will tend to score higher on most scales, resulting in more common ceiling effects. When comparing these ceiling effects between cancer survivors and a sample from the general population [10, 14] with a similar age distribution, comparable percentages of ceiling effects were found (results not shown), indicating that the ceiling effects we observed are not specifically related to the population of childhood cancer survivors. The observed ceiling effects in these scales may however be a result of the lack of sensitivity of these scales in general. However, in large studies (n > 100) these ceiling effects should not cause large difficulties when testing whether two groups differ statistically from each other with regard to their mean score on a SF-36 scale. It has been shown that, probably as a result of the Central Limit Theorem, the use of parametric methods, such as a t-test, when comparing means are fairly robust against violations of non-normality [16]. The question whether the use of a mean is an appropriate measure in the presence of non-normality however remains. It is therefore advisable, in addition to reporting mean scores or mean differences, to report scale scores and scale score differences at the median and outer centiles such as for example the 25th and 75th [14, 17]. This will give the potential reader insight in the nature of the actual spread of the scale scores.
Item-internal consistency was acceptable, as all item-scale correlations exhibited a value that exceeded 0.40. According to our findings, there is substantial evidence of discriminant validity of the SF-36 scales in adult survivors of childhood cancer. This implies that the SF-36 scales can discriminate between the different concepts that should be measured.
Cronbach's alpha exceeded 0.70 for all scales, indicating high internal consistency. The reliability, therefore, is high. However, the high alpha coefficients observed may be partly due to the ceiling effects observed in some scales.
The PCS and MCS scales explained over 70% of the total variance, which is somewhat higher than the 66% of variance explained by these scales in the UK normative data [14, 15]. Obtained correlations between the eight scales and the PCS and MCS scales in our data largely were similar to the correlations between these eight scales and the PCS and MCS scales found in the UK normative data [14, 15]. Although survivors showed a slightly higher correlation between the role-emotional scale and the PCS than the general population, this might be attributed to the fact that the prevalence of physical problems associated with role emotional problems is higher among childhood cancer survivors than in the general population. For example, neurological problems are more common among childhood cancer survivors, and may affect the survivor both physically and mentally. The moderate correlation of the bodily pain scale with both the PCS and MCS scales among young survivors (16–19 years) suggests that those survivors who experience bodily pain are likely to be more mentally than physically affected by bodily pain than survivors of older age. Nonetheless, the overall findings support the scaling assumptions of the SF-36 scales in our data, indicating that a two-factor model can be used in evaluating health status among long-term survivors of childhood cancer.
Because of the population-based design of our study, the validity and reliability of the SF-36 was assessed in a representative sample of all childhood cancer survivors in the UK. Respondents were similar to non-respondents with respect to age, sex, diagnosis of childhood cancer, and cancer treatment (Reulen et al. Health status of adult survivors of childhood cancer: a large scale population-based study from the British Childhood Cancer Survivor Study (submitted)).
A limitation of this study was that only cross-sectional data was available and we were therefore not able to assess the responsiveness of the SF-36 over time. However, as survivors within this study will be followed-up over time, and probably will complete another SF-36 questionnaire, we probably will have the opportunity to assess the responsiveness in future analyses. Hence, the findings in this paper are only applicable to the use of the SF-36 in studies comparing childhood cancer survivors to other comparison groups.
In conclusion, the findings presented in this paper provide support for the validity and reliability of the SF-36 when used in long-term survivors of childhood cancer. These findings should encourage other researchers and health care practitioners to use the SF-36 when assessing health status in this population, bearing in mind, however, its susceptibility to ceiling effects. The ceiling effects observed were however not specific to the group of childhood cancer survivors and may therefore indicate the lack of sensitivity of some SF-36 scales in general.