The aim of this study was to evaluate the psychometric properties of KIDSCREEN-27 with a Rasch analysis in a national cohort of childhood cancer survivors. Overall, the results were satisfactory, with acceptable item goodness-of-fit in 23 of 27 items, acceptable unidimensionality for four of the five dimensions, and acceptable person goodness-of-fit in four of the five dimensions. No uniform DIF was detected between the childhood cancer survivors and the comparison group. With regard to the growing number of survivors in society it is of importance to find a instrument to be able to use as a screening tool at follow up visits and this Rasch analysis could be the first step towards choosing an appropriate instrument. However, given the relatively small sample size (N = 63) the results presented in this paper must be applied with some caution even if it has been suggested that the Rasch model can be used to perform exploratory work with small samples. Based on the results from the Rasch analysis of KIDSCREEN-27 we recommend the instrument to be used among populations of childhood cancer survivors with similar age ranges. Thirty items administered to 30 individuals should have the ability to deliver statistically stable measures, given reasonable targeting and fit .
The response categories and threshold disordering that were found were based on a small number of responses/scores, and therefore the number of observations in each rating scale category did not always meet the criterion suggested by Linacre . Taking action (e.g., by collapsing response categories) based upon very few unexpected responses in a small sample may also be inappropriate. If the response categories had been collapsed within this study, it would probably have contributed to an even lower number of misfits, and thus to an improvement of internal scale validity and person response validity. As a small sample may limit the inferences of the fit statistics the findings presented here may actually be underestimating the psychometric performance of KIDSCREEN-27 in a sample of childhood cancer survivors.
Item goodness of fit revealed that 23 of 27 items fitted the model. Three of the items: “Have you been able to run well?” (1.60); “Have your parent(s) treated you fairly?” (1.62); “Have you been able to rely on your friends?” (1.51) displayed underfit to the model, i.e. too much variation in the data, compared to expectations from the Rasch model . These items were all outside the critical range for rating scales (0.6-1.4) but when comparing them to the range for clinical observations (0.5-1.7)  all items fitted within the range. It should also be noted that a high proportion (16%) of respondents did not answer one of the items (“Have you been able to run well”). Most of these participants had of different reasons not run the previous week. As the content of these items is relevant for cancer survivors [30, 31] we chose not to omit them from the scale, an approach that previously has been used in scale evaluation . It has been stated in the literature that the guidelines regarding fit statistics are supposed to help in detecting problems with items; not just with the decision on which items should be excluded from a test . However, as our criterion was set that no item would display unacceptable goodness-of-fit, the findings in relation to scale validity were mixed. Considering that the sample is fairly small, and previous studies have shown reasonable item fit for both KIDSCREEN-27  and KIDSCREEN-52 [13, 14], we need to verify whether these findings are stable with larger samples of cancer survivors, or if they are due to individual variations in this limited dataset. As none of the items did display DIF, when compared to the comparison group, the interpretation of fit statistics is not seen as a major threat to validity, but more a concern to monitor in further studies since the findings do indicate that some individuals score these items differently than expected based on the overall pattern found in the sample. The item “Have you felt fit and well?” showed overfit (less variation) which can indicate redundancy or similar ratings across all participants. As low MnSq values may not be a major threat to validity, this item may be of less concern when KIDSCREEN-27 is validated within this sample.
Regarding unidimensionality, the results revealed that the underlying constructs were measured to an acceptable extent, except for the Autonomy & Parent Relation dimension, which showed indications of multidimensionality. Therefore, this dimension is recommended to be further tested among childhood cancer survivors. The possible weakness may have been because this dimension represents a merge of three separate dimensions in the 52-version: Autonomy, Parent Relations & Home Life, and Financial Resources . In contrast, Robitail et al.  showed that all five dimensions in the 27-version, for the whole sample (n = 22827), were unidimensional, with regard to infit statistics. They also performed a confirmatory factor analysis that showed acceptable fit to the model. However, an exploratory factor analysis showed that a few items loaded similarly to more than one dimension. Additional analysis, such as PCA of residuals, to measure unidimensionality, was not performed in that study . In the present study the variance explained by the secondary dimension (1st contrast) also showed higher values than the recommended 5% in all dimensions, which can be explained by the fact that there are relatively few items within each dimension in KIDSCREEN-27. The concept of HRQoL has many different aspects  and they should measure distinct parts of the concept but still be considered to be interrelated with each other. Qualitative interviews were conducted with the same sample , previous to the collection of the questionnaire based data, which revealed results supporting content validity of the KIDSCREEN-27 among childhood cancer survivors.
Person goodness of fit revealed that one dimension (Psychological Well-being) displayed a value above 5% (Table 2). As the number of participants that did not demonstrate acceptable goodness of fit was small, there was no possibility of carrying out more in-depth analyses on subgroup level in this study. On an individual level, no clear pattern was found among the participants that did not demonstrate acceptable goodness of fit; three females and two males, age ranged from 13 to 22 with different diagnoses represented. Future studies with larger sample size would allow for more in-depth explorations, and also for monitoring associations between item and participant misfit. A limited number of responses due to a small sample will also impact on the precision of the item calibration measures. Larger samples will therefore allow for more precise analyses providing evidence of scale validity (e.g., collapsing response categories and exploring residual correlations).
According to the person item map the most challenging dimension was Physical Well-being. The most challenging item was “Was physically active?” and the least challenging item was “Able to talk to parent(s) when wanted to?” It is not surprising that Physical Well-being was the most challenging dimension, since this aspect of HRQoL is the one where impairments and difficulties are expected for the survivors, related to complications because of diagnosis and treatment.
According to the results of the DIF analyses, the items do not appear to work differently for survivors of childhood cancer compared to young people of the same age without a cancer experience. To our knowledge, one previous study has provided results of DIF for KIDSCREEN-27, across different European countries , but no study has provided DIF between the sexes, age groups or health status. Regarding KIDSCREEN-52, previous results have shown that none of the items displayed any sizeable DIF by age groups (8–11 vs. 12–18 years), sex or health status . However, in a study comparing children with or without cerebral palsy (CP) some items showed statistically significant DIF; however, this was more frequently seen in the proxy version of the instrument . Based on our findings further validation studies are suggested to explore unique diagnostic profiles in HRQoL, even though this study did not indicate such profiles in relation to survivors after childhood cancer.
An important strength of the present study is that a unique and representative (for five years of survival) national cohort of childhood cancer survivors in Sweden is being followed from 2004 and onwards, with several data collection occasions. However, there are some limitations to the present study that should be mentioned. Firstly, the small sample of survivors of childhood cancer limits the possibility of drawing firm conclusions regarding the robustness of the instrument. Because of the relatively small groups, more sophisticated analyses regarding DIF , e.g. for different specific diagnoses, could not be performed. Secondly, as time since diagnosis was relatively short, conclusions regarding the instrument’s performance cannot be drawn for the entire follow-up period after diagnosis. Continued evaluation of the instrument’s psychometric performance in a long-term perspective is recommended, especially as health problems are known to increase over time . Larger cohort studies in a European context would be of value in order to achieve a higher power and also to monitor item and person response validity in more detail. Some participants exceeded the recommended age limits for the instrument of 18 years but no uncertainties were expressed among those older than 18 years when responding to the items.