Skip to main content

Feasibility, reliability and validity of the health-related quality of life instrument Child Health Utility 9D (CHU9D) among school-aged children and adolescents in Sweden



This study was conducted in a general population of schoolchildren in Sweden, with the aim to assess the psychometric properties of a generic preference-based health related quality of life (HRQoL) instrument, the Swedish Child Health Utility 9D (CHU9D), among schoolchildren aged 7–15 years, and in subgroups aged 7–9, 10–12 and 13–15 years.


In total, 486 school aged children, aged 7–15 years, completed a questionnaire including the CHU9D, the Pediatric quality of life inventory 4.0 (PedsQL), KIDSCREEN-10, questions on general health, long-term illness, and sociodemographic characteristics. Psychometric testing was undertaken of feasibility, internal consistency reliability, test–retest reliability, construct validity, factorial validity, concurrent validity, convergent validity and divergent validity.


The CHU9D evidenced very few missing values, minimal ceiling, and no floor effects. The instrument achieved satisfactory internal consistency (Cronbach’s Alfa > 0.7) and strong test–retest reliability (r > 0.6). Confirmatory factor analyses supported the proposed one-factor structure of the CHU9D. For child algorithm, RMSEA = 0.05, CFI = 0.95, TLI = 0.94, and SRMR = 0.04. For adult algorithm RMSEA = 0.04, CFI = 0.96, TLI = 0.95, and SRMR = 0.04. The CHU9D utility value correlated moderately or strongly with KIDSCREEN-10 and PedsQL total scores (r > 0.5–0.7). The CHU9D discriminated as anticipated on health and on three of five sociodemographic characteristics (sex, age, and custody arrangement, but not socioeconomic status and ethnic origin).


This study provides evidence that the Swedish CHU9D is a feasible, reliable and valid measure of preference-based HRQoL in children. The study furthermore suggests that the CHU9D is appropriate for use among children 7–15 years of age in the general population, as well as among subgroups aged 7– 9, 10–12 and 13–15 years.


Economic evaluations, including cost-utility analysis (CUA), have a central role in healthcare decision-making [1]. CUA is typically expressed as the incremental cost of interventions per quality adjusted life years (QALYs), a common outcome unit that enables comparisons across clinical areas. The QALYs can be calculated by multiplying the duration of time spent in a health state by the health related quality of life (HRQoL) utility weight associated with that health state, using the area under the curve (AUC) method [1]. HRQoL can be captured by multi-attribute utility (MAU) instruments addressing key generic health domains.

Existing Pediatric MAU instruments are mostly adapted from adult instruments or developed from the perspective and preferences of adults [2,3,4]. Such adult-based HRQoL measures may be cognitively challenging for younger populations [5]. They may also capture health aspects less pertinent to pediatric populations, while failing to tap into others of particular importance to children’s HRQoL, and adult preferences for health states may differ from child preferences [6]. To overcome some of these problems, a MAU instrument, the Child Health Utility 9D (CHU9D), was developed specifically with and for children [7,8,9], and with scoring algorithms obtained from a parent [10] as well as a child population [11].

The Original English version of the CHU9D has demonstrated sound psychometric properties in 7 to 11-year-olds in the UK [8] and in 11 to 17-year-olds in Australia [12,13,14]. Linguistic and cognitive skills, however, vary over the course of childhood and the ability to understand and respond to a questionnaire may differ between children in different ages. Therefore, acceptable psychometric properties should be assured, also in narrower age strata. Furthermore, to enable valid and reliable CUA in pediatric populations more widely, the CHU9D are being translated into other languages [15]. However, HRQoL is context dependent [16]. Therefore, psychometric properties of translated instruments should be assured in their specific cultural contexts [17].

The CHU9D has been translated into Swedish. The current study is the first to investigate the psychometric properties of the Swedish CHU9D.

This study aimed to investigate the feasibility, reliability, and construct validity of the CHU9D among school-aged children attending grades 1–9 in Sweden (ages 7–15 years), and in subgroups attending grades 1–3, 4–6 and 7–9 (ages 7–9, 10–12 and 13–15 years).


Study population

The study utilized a convenience sample of children attending grades 1–9 (ages 7–15 years) in four elementary schools in a city in Northern Sweden with around 89,000 inhabitants. Around 9000 of these are 7–15 years old. A minimum of 50 children were included from each grade (1–9), resulting in a general population sample of at least 150 children from each of the three elementary school stages: grades 1–3; 4–6; and 7–9 (ages 7–9, 10–12 and 13–15 years). The one exclusion criteria used was (children) not being fluent in Swedish.

Data collection procedure

Headmasters and class-teachers were informed about the study. Following their consent, children were informed about the study in class and parents received written information. Informed consent to participate was sought from the child. In children below 15 years, also parental consent was requested. After obtained consent, children attending school the day of the survey, filled in a questionnaire in school, assisted by a specially trained research assistant (school nurse). Approximately 59% of the approached students participated in the study. Parents of children in grades 1–3 filled in a short questionnaire at home. In the younger age groups (grades 1–3), the research assistant read the questions and response alternatives aloud, one by one. In each grade, approximately half of the participating schoolchildren filled in the CHU9D again, 7–15 days after the first assessment. Whole classes were randomly selected for the second assessment. Teachers facilitated the process, and were present, but not actively involved in the data collection.


The children answered a questionnaire including three measure of HRQoL, the CHU9D, KIDSCREEN-10 [18, 19] and the Pediatric quality of life inventory 4.0 generic core scale (PedsQL) [20, 21], along with measures of health and socio-demographic background. Given the lower cognitive ability of the younger children, some questions were omitted from the child questionnaire in grade 1, and in grades 1–3, some background questions were parent-reported.


The preference-based CHU9D measures HRQoL by nine-dimensions: worried, sad, pain, tiredness, annoyed, schoolwork/homework, sleep, daily routines, and ability to join in activities. Recall time is today or last night, and each dimension has five severity levels [7, 22]. Reponses were converted to utilities (on a scale from 0 implying dead to 1 implying perfect health) using both available scoring algorithms: the child-generated Australian algorithm [11] and the adult-generated UK algorithm [10], onwards named child- and adult algorithms.


KIDSCREEN-10 addresses 8–18-year-old children and captures overall non-preference based HRQoL based upon 10 underlying dimensions: feeling fit and well, full of energy, sad, lonely, and having enough time for one-self, being able to do what you want at spare-time, being treated fairly by parent(s), having fun with friends, doing well in school, and being able to pay attention (in school) [18]. Recall time is one week and items are answered on a 5-point scale from 1 (not at all or never) to 4 (extremely or always). Item scores were recoded into higher values equalling better HRQoL. Then, HRQoL sum scores were calculated and given Rasch-person-parameters (PP), which were further transformed into values with a mean of 50 and a standard deviation of approximately 10 [19]. KIDSCREEN-10 has demonstrated acceptable reliability, construct and criterion validity [18]. This instrument was only filled in by children in grades 2–9.


The non-preference based PedsQL has 23 age-specific items that capture 4–18-year-old children’s overall HRQoL, as well as the underlying psychosocial (15 items) and physical (8 items) dimensions [21]. The psychosocial dimension has three sub-dimensions: emotional, social and preschool/school functioning and wellbeing (5 items each). Recall time is one month and items are scored on a 5-point scale between 0 (never a problem) and 4 (almost always a problem). Scores were reversed and linearly transformed to a 0–100 scale with higher scores representing higher HRQoL. Mean values were computed for each dimension. The instrument has demonstrated acceptable psychometric properties in numerous countries, including Sweden [23].


Self-reported general health was assessed by the question “How do you feel in general”. The question has five response alternatives, which were merged into three categories: not good/fair (named not good onwards), good, and very good/excellent (named very good onwards). This question was only assessed in grades 2–9.

Long-term illness/disability was studied by asking about the presence of seven specified pediatric chronic or long-term health problems (eczema, asthma, allergies, depression, epilepsy, ADHD, diabetes) and one alternative named “others”. In grades 1–3, this information was obtained from the parents and in grades 4–9 from the child. Children with at least one chronic or long-term health problem were classified as having a long-term illness/disability.

Socio-demographic variables

Sex, grade, children’s and parent’s country of birth, and custody arrangement were measured by specific questions. In grades 1–3, parents provided information about country of birth.

Socio-economic status was measured by the Family Affluence Scale (FAS) [24]. FAS assesses self-reported own bedroom, dishwasher at home, number of: family cars, holidays abroad the past 12 months, bath rooms at home and computers at home. A sum score was generated ranging from to 0–13 points, with higher scores indicating higher affluence. The FAS index has shown acceptable cross-cultural reliability and criterion validity in a study of eight European countries, including Denmark and Norway [24].

Major life events

To assure equivalence of CHU9D scores at the two assessments used for the test–retest analyses (see statistical analysis), children were asked “Has anything out of the ordinary happened that makes you feel better or worse today”. The response alternatives were no vs yes, please describe. The answers were independently reviewed by two of the authors, (KL and MV) to determine if any child needed to be excluded out of the test–retest analysis due to having a major life event either at the first or second time of participation. If there were any uncertainties, a third author (SP) was consulted.

Statistical analysis

Differences in CHU9D utility scores were estimated using multivariate tests of means for the one-sample test when applying the child and the adult algorithms, respectively.

Feasibility was examined by estimating floor and ceiling effects and the proportion of missing values. The Cronbach’s coefficient α was used to evaluate internal consistency reliability, with values ≥ 0.7 considered acceptable for group and ≥ 0.9 for individual comparisons [25]. Furthermore, utility scores from the first and second data collection were compared (test–retest reliability) using Intraclass-, Canonical- and Spearman’s correlations for scale comparisons, along with Spearman correlations and Weighted Kappa statistics for dimension comparisons. Correlations of 0.00–0.19 were considered very weak, 0.20–0.39 weak, 0.4–0.59 moderate, 0.60–0.79 strong and 0.80–1.00 very strong [26]. Kappa values below 0.00 were considered to signal poor agreement, 0–0.20 slight, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 substantial, and 0.81–1.00 almost perfect agreement [27].

Construct validity was assessed through confirmatory factor analysis testing the hypothesized one-factor structure of the CHU9D (factorial validity). The following model fit indices and cut-offs were used to confirm model adequacy: comparative fit index (CFI) and Tucker-Lewis index (TLI) values ≥ 0.90 (acceptable fit) or ≥ 0.95 (excellent fit) [28], root mean square error of approximation (RMSEA) values ≤ 0.08 (acceptable fit) or ≤ 0.05 (good fit) [29], and standardized root mean square residual (SRMR) values ≤ 0.08 (acceptable fit) [30]. Construct validity was also explored by comparing the total CHU9D utility scores to the total KIDSCREEN-10 and the PedsQL scores, using Spearman’s correlations (concurrent validity). Spearman’s correlations were furthermore used to test whether the conceptually alike dimensions of the CHU9D and the other HRQoL instruments (see Table 1) were correlated (convergent validity) and difference between correlations of conceptually alike and dislike dimensions were explored (divergent validity).

Table 1 Conceptually alike dimensions between the CHU9D and the instruments KID-SCREEN-10 and PedSQL

Finally, construct validity was assessed by the known-groups method, i.e. by comparing CHU9D utility scores depending on sex, school stages (grades 1–3, 4–6¸7–9), parental country of birth (Sweden: none-one-both parents) custody arrangement (living with both parents-or not), family affluence (FAS: < 25 percentile, 25–75 percentile, > 75 percentile), general health status (not good-good-very good), having a long-term illness or disability (yes–no). Differences between groups were estimated using Mann–Whitney U-test and Kruskal–Wallis test. Lower utility scores were anticipated in girls (although no sex-differences expected in early pre-adolescence) and older age groups, and in those with a foreign background, not living with both parents, having lower family affluence, not holding a good general health, or having a long-term illness or disability [12, 31,32,33].

The analyses were performed for grades 1–9 in total, and separately for each of the three school stages studied, using Stata version 16.1 (Stata Corp LP, College Station, Texas, USA). Relationships mentioned in the result section are significant at the 95% level (p < 0.05).


The suggested minimum sample size levels for feasibility and reliability tests were n = 50, and in factor analysis 4–10 persons per variable items or at least n = 100 [34]. There is also a rule of thumb stating that validations of questionnaires should include at least 5–10 persons per item, here equivalent to a sample of n = 45–90 [35]. Thus, for the current analyses, the sample size of minimum 50 children at each grade allows for at least school-stage level analysis (n at least 150).


Descriptive characteristics

Data was collected from 486 children. Of these, 473 (97%) answered all CHU9D questions and thus, were included in the analysis. These participants were evenly distributed between the three studied school stages, but included slightly less girls than boys (Table 2). The great majority (92%) were born in Sweden, had parents who were both born in Sweden (77%), and lived with both parents (81%). Affluence ranged between 2 and 13 on the Family Affluence Scale, with only two children scoring below 6 points (not seen in the table). Two thirds of the children reported having very good general health and about half reported having a long-term illness/disability, mainly allergy, asthma, and eczema (not seen in the table). HRQoL, as expressed by mean CHU9D utility scores, were 0.74 (SD ± 0.21) when using the child algorithm and 0.85 (SD ± 0.11) when using the adult algorithm (p < 0.001). Measured by KIDSCREEN-10 and PedsQL, the corresponding numbers were 41.17 (SD ± 6.10) and 82.12 (SD ± 12.94), respectively.

Table 2 Descriptive characteristics of the participants


No child reported the worst possible health state for all nine dimensions (no floor effect), while 8.5% reported the best possible health state, i.e. no problems, for all dimensions (ceiling effect) (Table 3). Stratified analyses revealed varying ceiling effects in the three school stages, i.e. approximately 12% in grades 1–3; 8% in grades 4–6 and 5% in grades 7–9.

Table 3 Floor and ceiling effects for the CHU9D among school children in grades 1–9 and stratified by school stages

Studying each of the nine dimensions separately, floor effects were generally below 2%, and with a few exceptions, this was true across all three school stages. The “tiredness” dimension, however, revealed an overall floor effect of 9%, varying from 5% in grades 4–6 to 16% in grades 7–9. Ceiling effects, on the other hand, overall ranged from 21% (tiredness) to 77% (sadness) and the general pattern of lower ceiling effects by higher school stage was seen for each of the nine dimensions.

Test–retest reliability and internal consistency

In total, 255 children filled in the CHU9D twice. Of these, 13 children were excluded from the test–retest analyses, 11 because of a major event happening between rating occasions and 2 children had missing values in several dimensions at the second assessment. Thus, 242 children were included in the test–retest analyses, 73 from grades 1–3, 81 from grades 4–6 and 88 from grades 7–9.

At scale level, Canonical, Spearman’s and Interclass correlations all showed strong (> 0.7) or very strong (> 0.8) correlations between the two occasions (Table 4). Similar results were seen for each of the three school stages studied. In addition, all analysis of internal consistency revealed Cronbach alpha values above 0.7. Thus, overall, and at each of the studied school stages, the CHU9D scale met the reliability criteria for group level comparisons.

Table 4 Test–retest and internal consistency reliability of the CHU9D at scale level, among school children in grades 1–9, and stratified by school stages (n = 242)

When comparing the individual dimensions-scores from the test–retest occasions, one by one, correlations were moderate or strong (0.41–0.71), while kappa-agreement were moderate or close to moderate for 6 dimensions (0.39–0.54) and fair for 2 dimensions (“pain” 0.32 and “annoyed” 0.35) (Table 5). These results were to some extent similar for the separate school stages, but at each of the three school stages, there were cases of weak or non-significant correlations and fair or non-significant kappa-agreements. The specific dimensions holding these weaker test–retest results varied between school stages.

Table 5 Test–retest reliability of the CHU9D dimensions, among school children in grades 1–9, and stratified by school stages (n = 242)

Construct validity

Factorial validity

Confirmatory factor analyses showed that all dimensions loaded significantly on the latent factor and the loadings were all above 0.4, except for abilities to join in activities: 0.32 (child algorithm). Furthermore, as seen in Table 6, the CFI, TLI, SRMR and RMSEA values met the criteria for acceptable or excellent fit for the studied single factor model, i.e. supporting that the nine dimensions of the CHU9D measures a single latent construct. Acceptable model fit was demonstrated when using both child- and adult algorithms, and it was seen in the whole sample (grades 1–9: CFI and TLI ≥ 0.94; RMSEA and SRMR ≤ 0.05), but also in two of the three school stages separately (grades 4–6 and 7–9: CFI and TLI ≥ 0.92; RMSEA and SRMR ≤ 0.06). In grades 1–3, there was weaker support for the model. Modification indices suggested high correlation in the dimensions “worried” and “sad”, which is theoretically plausible. Model fit improved for grades 1–3 after allowing the correlation between these two dimensions in the adjusted model (CFI and TLI 0.83–0.89; RMSEA and SRMR ≤ 0.07).

Table 6 Model fit in confirmatory factor analyses testing the CHU9D one-factor structure among school children in grades 1–9 and stratified by school stages (child and adult algorithms) (n = 473)

Concurrent validity

Strong correlations were found between the total score of CHU9D (child/adult algorithm) and the total scores of KIDSCREEN (0.61/0.62) and PedsQL, (0.62/0.61). Stratified analyses showed that, in the two older school stages, the CHU9D total scores correlated strongly with the KIDSCREN-10 and PedsQL scores (r > 0.6 and 0.7, respectively). In grades 1–3, these correlations were moderate with r just above 0.5 (both KIDSCREEN and PedsQL).

Convergent and divergent validity

The individual nine CHU9D dimensions generally demonstrated moderate (r > 0.4), or close to moderate, relationships to the conceptually alike KIDSCREEN-10 and PedsQL dimensions (Table 7). Also, there were a pattern of stronger correlation between alike dimensions as compared to dislike dimensions. One exception was that, unexpectedly, the CHU9D dimension capturing ability to do daily routines had a slightly closer relationship to the PedsQL dimensions emotional and school functioning (r 0.38 and 0.36, respectively) than to the physical functioning dimension (r 0.31). Another exception was that the dimension on abilities to join in activities showed the strongest relationship with the KIDSCREEN-10 dimension “fit and well” (r 0.33). and not the expected dimension regarding ability to do desirable spare time activities (“able to do things” r 0.26). In each of the three separate school stages, the results overall follow the same pattern as describes above, although correlations seem to be slightly weaker among the youngest children.

Table 7 Correlations between CHU9D dimension scores and dimension scores in KIDSCREEN-10 and PedsQL, among children (n = 473)

Known-groups validity

Table 8 shows that CHU9D utility scores mostly differed as expected when comparing children with varying characteristics. As compared to their counterparts, higher scores were found among boys, younger children, those living with both parents, those reporting better general health and those without a long-term illness or disability (child and adult algorithms, both). However, there were no utility score differences depending on parental country of birth or family affluence. To further investigate the lack of statistical differences in utility scores by family affluence, we conducted several other analyses, in which, we stratified FAS-scores into 2, 4 and 5 categories with different set and relative cut-offs, tested with and without imputation for the 37 cases with missing FAS scores. All analyses confirmed the initial results (data not shown).

Table 8 Comparison of CHU9D utility scores between children with different sociodemographic and health characteristics (n = 473)

The sample size only allowed grade-stratified analyses by sex and long-term illness/disability. In grades 4–6 and 7–9, these analyses showed similar pattern as those reported above, but for long-term illnesses, only when applying the child algorithm. In grades 1–3, no such differences were seen.


This study investigated the feasibility, reliability and construct validity of the Swedish CHU9D among school children attending grades 1–9 in Sweden (ages 7–15), and separately for children in grades 1–3, 4–6 and 7–9 (ages 7–9, 10–12 and 13–15 years).

Very few missing values, minimal ceiling, and no floor effects support the general feasibility of the Swedish CHU9D. However, ceiling effects were relatively high for several of the underlying CHU9D dimensions. This is not surprising given that many children are expected to be at good health in general populations. Similar results have been shown when using the English, Danish and Chinese CHU9D in general populations of school-aged children [13, 14, 31, 33, 36]. Notable though, across these studies, as well as in the current study, floor effects in the CHU9D dimensions are mainly below 5% and ceiling effects below 85%. This indicates that the CHU9D is capable of detecting improvement in general population studies of school-aged children, in Sweden and elsewhere.

The reliability of the Swedish CHU9D was supported by strong test–retest correlations and agreements, along with established internal consistency. Again, this is in line with the result of studies using the English, Danish and Chinese CHU9D in their cultural contexts [33, 36, 37]. Notably though, for most of the individual CHU9D dimension scores, we only found moderate or close to moderate test–retest correlations and agreements. Similar findings were reported in a UK study of 6–7-year-olds using a shorter test–retest timeframe (morning-to-afternoon) [31] and a Chinese two-week, test–retest study of 8–17-year-olds [36]. Thus, across language versions and cultural contexts, the CHU9D dimensions show signs of some inconsistency over time. This may be due to the shifting nature of the concepts studied, in combination with the short reference time (today). Furber et al. [37], reports that one third of the children do not consider the day of the study a typical day in terms of the assessed CU9D concepts, indicating that these concepts fluctuate day by day. Further research should investigate how this potential shortcoming influences the instruments sensitivity to change over time (responsiveness).

The current study confirms the proposed one-factor structure of the CHU9D. In the absence of a gold standard for HRQoL measurement, it is not possible to prove conclusively that this factor measures HRQoL. However, we found a strong correlation between the total scale score of the CHU9D and two HRQoL instruments with demonstrated reliability and validity (KIDSCREEN-10 and PedsQL). Furthermore, the strongest correlations were seen between the CHU9D dimensions and the conceptually overlapping KIDSCREEN-10 and PedsQL dimensions, while correlations were weaker for non-overlapping concepts. Other studies have shown similar results when comparing the original English CHU9D to KIDSCREEN-10 and PedsQL [14, 38] or the Danish and Chinese CHU9D to the PedsQL [33, 36]. Taken together, these results indicate that the CHU9D may be used as a measure of HRQoL.

Although we and other researchers [14, 33, 38] find correlations to be strongest between the CHU9D and alike KIDSCREEN-10 and PedsQL dimension, these correlations are merely moderate. This is not surprising, given that the questions and response alternatives are somewhat differently phrased in the three instruments. Also, the recall period is “today” in the CHU9D, while KIDSCREEN-10 and PedsQL have recall-times of one week and one month, respectively.

Consistent with studies from other countries, [12, 31, 33, 36] we confirmed that the Swedish CHU9D is able to detect anticipated HRQoL differences depending on health outcomes and sociodemographic characteristic such as sex, age and custody arrangement. We did however not replicate previously shown differences by socioeconomic status (SES) [13, 14] and ethnic origin [32]. This may be attributed to the fact that we required children to be fluent Swedish speakers, leading to the exclusion of families who are less rooted in the Swedish society and thereby potentially to less diversity by ethnicity and affluence. Thus, our results overall support the known-group validity of the Swedish CHU9D, but additional studies are required to confirm the ability to discriminate between children with different SES and ethnic origin.

The CHU9D was initially developed for ages 7–11 years [8]. Our study supports that the instrument is acceptable for use among children up to 15 years of age. Furthermore, in each of the three school stages studied, we found acceptable floor and ceiling effects, strongly correlated test–retest CH9D utility scores, an internal consistency allowing for group comparisons, and moderate to strong correlations between the CHU9D scale and two established HRQoL scales. These findings suggest that the CHU9D is feasible, reliable and valid for use, not only in wide age-ranges, but also in narrower strata comprising only children aged 7– 9 years, 10–12 year, or 13–15 years.

Our results confirm earlier findings showing that CHU9D utility scores are higher when applying the adult as compared to the child algorithm [9, 37, 39]. This may be attributed to the disparities in valuation methods used to assess utility weights in child (best–worst) and adult populations (standard gamble) [39]. It may also be explained by adults and children giving different values to CHU9D generated health states, i.e. that adults, as compared to children, generally place less weight on mental health impairment (sadness, worries, being annoyed) and impairment in daily functioning (schoolwork, daily routines, activities) but comparably higher weight on health states dominated by physical impairment (pain, tiredness, sleep problems) [6]. Thus, the CHU9D adult algorithm may not accurately reflect children’s preferences.

Notably, we found that the algorithm used influenced the size of between-groups differences. Comparing for instance those with a “not good” and a “very good” general health, we found utility scores difference to be twice as high when applying the child algorithm as compared with the adult algorithm (mean-score difference: 0.33 vs. 0.17). Such differences suggest that the choice of algorithm may have the capacity to influence interpretations of future economic evaluations of pediatric health interventions, highlighting the importance of this choice. We acknowledge some limitations. Although including only fluently Swedish speaking children diminishes biases due to language barriers, which is a strength, it may also have biased the discriminative analysis regarding SES and ethnic origin. Also, given that the study was based on a convenience sample, it cannot provide normative information about HRQoL levels. Another limitation is that this general population study with self- and parent-reported health, could not evaluate the applicability of the CHU9D in clinical population. Likewise, the study design did not allow evaluations of responsiveness to change in health status over time. In addition, the long-term illness/disability measure was based on self-report by children or parents, and were not confirmed via medical records.


This study provides support that the Swedish CHU9D is a feasible, reliable and valid measure of HRQoL that holds psychometric qualities comparable to those of the original English CHU9D. The study furthermore, suggests that the CHU9D is appropriate for use among 7–15-year-old children in the general population, as well as among subgroups aged 7– 9, 10–12 and 13–15 years. To provide further support for the CHU9D as a useful health outcome measure in health economic evaluations, future studies should investigate the performance of the CHU9D in clinical samples. Longitudinal studies are also needed to test the instruments sensitivity to change.

Availability of data and materials

Not applicable.


  1. Drummond MF, Sculpher MJ, Torrance GW, O’Brien BJ, Stoddart GL. Methods for the economic evaluation of health care programmes. Oxford: Oxford University Press; 2005.

    Google Scholar 

  2. Chen G, Ratcliffe J. A review of the development and application of generic multi-attribute utility instruments for paediatric populations. Pharmacoeconomics. 2015;33(10):1013–28.

    Article  Google Scholar 

  3. Kwon J, Kim SW, Ungar WJ, Tsiplova K, Madan J, Petrou S. A systematic review and meta-analysis of childhood health utilities. Med Decis Mak. 2018;38(3):277–305.

    Article  Google Scholar 

  4. Thorrington D, Eames K. Measuring health utilities in children and adolescents: a systematic review of the literature. PLoS ONE. 2015;10(8):e0135672.

    Article  Google Scholar 

  5. Eiser C, Morse R. Quality-of-life measures in chronic diseases of childhood. Health Technol Assess. 2001;5(4):1–157.

    Article  CAS  Google Scholar 

  6. Ratcliffe J, Huynh E, Stevens K, Brazier J, Sawyer M, Flynn T. Nothing about us without us? A comparison of adolescent and adult health-state values for the child health utility-9D using profile case best-worst scaling. Health Econ. 2016;25(4):486–96.

    Article  Google Scholar 

  7. Stevens K. Developing a descriptive system for a new preference-based measure of health-related quality of life for children. Qual Life Res. 2009;18(8):1105–13.

    Article  Google Scholar 

  8. Stevens K. Assessing the performance of a new generic measure of health-related quality of life for children and refining it for use in health state valuation. Appl Health Econ Health Policy. 2011;9(3):157–69.

    Article  Google Scholar 

  9. Stevens KJ. Working with children to develop dimensions for a preference-based, generic, pediatric, health-related quality-of-life measure. Qual Health Res. 2010;20(3):340–51.

    Article  Google Scholar 

  10. Stevens K. Valuation of the child health utility 9D index. Pharmacoeconomics. 2012;30(8):729–47.

    Article  Google Scholar 

  11. Ratcliffe J, Huynh E, Chen G, Stevens K, Swait J, Brazier J, Sawyer M, Roberts R, Flynn T. Valuing the child health utility 9D: using profile case best worst scaling methods to develop a new adolescent specific scoring algorithm. Soc Sci Med. 2016;157:48–59.

    Article  Google Scholar 

  12. Chen G, Flynn T, Stevens K, Brazier J, Huynh E, Sawyer M, Roberts R, Ratcliffe J. Assessing the health-related quality of life of australian adolescents: an empirical comparison of the child health utility 9D and EQ-5D-Y instruments. Value Health. 2015;18(4):432–8.

    Article  Google Scholar 

  13. Ratcliffe J, Stevens K, Flynn T, Brazier J, Sawyer M. An assessment of the construct validity of the CHU9D in the Australian adolescent general population. Qual Life Res. 2012;21(4):717–25.

    Article  Google Scholar 

  14. Stevens K, Ratcliffe J. Measuring and valuing health benefits for economic evaluation in adolescence: an assessment of the practicality and validity of the child health utility 9D in the Australian adolescent population. Value Health. 2012;15(8):1092–9.

    Article  Google Scholar 

  15. University of Sheffield. School of Health and Related Research. Measuring and Valuing Health. Accessed 01 Sept 2020.

  16. WHO QoLAG. What quality of life?/The WHOQOL Group. World Health Forum. 1996;17(4):354–6.

  17. Lohr KN. Assessing health status and quality-of-life instruments: attributes and review criteria. Qual Life Res. 2002;11(3):193.

    Article  Google Scholar 

  18. Ravens-Sieberer U, Erhart M, Rajmil L, Herdman M, Auquier P, Bruil J, Power M, Duer W, Abel T, Czemy L. Reliability, construct and criterion validity of the KIDSCREEN-10 score: a short measure for children and adolescents’ well-being and health-related quality of life. Qual Life Res. 2010;19(10):1487–500.

    Article  Google Scholar 

  19. Ravens-Sieberer U, Gosch A, Erhart M, Rueden U, Nickel J, Kurth B-M, Duer W, Fuerth K, Czemy L, Auquier P. The KIDSCREEN Questionnaires—quality of life questionnaires for children and adolescents—handbook. Lengerich: Papst Science Publisher; 2006.

  20. Varni JW, Burwinkle TM, Seid M. The PedsQL TM 4.0 as a school population health measure: feasibility, reliability, and validity. Qual Life Res. 2006;15(2):203–15.

    Article  Google Scholar 

  21. Varni JW, Seid M, Kurtin PS. PedsQL™ 4.0: Reliability and validity of the pediatric quality of life inventory™ version 4.0 generic core scales in healthy and patient populations. Med Care. 2001;800–12.

  22. Stevens K. The child health utility 9D (CHU9D)—a new paediatric preference based measure of health related quality of life. In: PRO Newsletter. vol. 43; 2010.

  23. Petersen S, Hägglöf B, Stenlund H, Bergström E. Psychometric properties of the Swedish PedsQL, Pediatric Quality of Life Inventory 4.0 generic core scales. Acta Paediatr. 2009;98(9):1504–12.

    Article  Google Scholar 

  24. Torsheim T, Cavallo F, Levin KA, Schnohr C, Mazur J, Niclasen B, Currie C, Group FDS. Psychometric validation of the revised family affluence scale: a latent variable approach. Child Indic Res. 2016;9(3):771–84.

    Article  Google Scholar 

  25. Aaronson N, Alonso J, Burnam A, Lohr KN, Patrick DL, Perrin E, Stein RE. Assessing health status and quality-of-life instruments: attributes and review criteria. Qual Life Res. 2002;11(3):193–205.

    Article  Google Scholar 

  26. Evans JD. Straightforward statistics for the behavioral sciences. Pacific Grove: Thomson Brooks/Cole Publishing Co; 1996.

    Google Scholar 

  27. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;159–74.

  28. Marsh HW, Hau K-T, Wen Z. In search of golden rules: Comment on hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu and Bentler’s (1999) findings. Struct Equ Model. 2004;11(3):320–41.

    Article  Google Scholar 

  29. Brown MW, Cudeck R. Alternative ways of assessing model fit. Test Struct Equ Models. 1993;154:136-62.  

    Google Scholar 

  30. Hu LT, Bentler PM. Fit indices in covariance structure modeling: sensitivity to underparameterized model misspecification. Psychol Methods. 1998;3(4):424.

  31. Canaway AG, Frew EJ. Measuring preference-based quality of life in children aged 6–7 years: a comparison of the performance of the CHU-9D and EQ-5D-Y–the WAVES pilot study. Qual Life Res. 2013;22(1):173–83.

    Article  Google Scholar 

  32. Frew EJ, Pallan M, Lancashire E, Hemming K, Adab P. Is utility-based quality of life associated with overweight in children? Evidence from the UK WAVES randomised controlled study. BMC Pediatr. 2015;15(1):211.

    Article  Google Scholar 

  33. Petersen KD, Ratcliffe J, Chen G, Serles D, Frøsig CS, Olesen AV. The construct validity of the Child Health Utility 9D-DK instrument. Health Qual Life Outcomes. 2019;17(1):1–12.

    Article  Google Scholar 

  34. Terwee CB, Bot SD, de Boer MR, van der Windt DA, Knol DL, Dekker J, Bouter LM, de Vet HC. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60(1):34–42.

    Article  Google Scholar 

  35. Kline P. The handbook of psychological testing. 1993.

  36. Yang P, Chen G, Wang P, Zhang K, Deng F, Yang H, Zhuang G. Psychometric evaluation of the Chinese version of the Child Health Utility 9D (CHU9D-CHN): a school-based study in China. Qual Life Res. 2018;27(7):1921–31.

    Article  Google Scholar 

  37. Furber G, Segal L. The validity of the Child Health Utility instrument (CHU9D) as a routine outcome measure for use in child and adolescent mental health services. Health Qual Life Outcomes. 2015;13(1):22.

    Article  Google Scholar 

  38. Petersen KD, Chen G, Mpundu-Kaambwa C, Stevens K, Brazier J, Ratcliffe J. Measuring health-related quality of life in adolescent populations: an empirical comparison of the CHU9D and the PedsQL TM 4.0 short form 15. Patient-Patient-Center Outcomes Res. 2018;11(1):29–37.

    Article  Google Scholar 

  39. Ratcliffe J, Stevens K, Flynn T, Brazier J, Sawyer MG. Whose values in health? An empirical comparison of the application of adolescent and adult values for the CHU-9D and AQOL-6D in the Australian adolescent general population. Value Health. 2012;15(5):730–6.

    Article  Google Scholar 

Download references


The authors would like to thank all children who participated in the study. We would also like to acknowledge Britt-Marie Andersson and Ulrika Järvholm for their contribution in the data collection.


Open access funding provided by Umeå University.

Author information

Authors and Affiliations



Concept and design: SP, KL, MV, IF, AI, KS. Acquisition of data: KL, SP. Analysis and interpretation of data: MV, SP, KL, IF, AI, KS. Drafting of the manuscript: KL, SP, MV. Critical revision of the paper for important intellectual content: KL, MV, IF, AI, KS, SP. Statistical analysis: MV. Administrative, technical, or logistic support: KL, SP. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kristina Lindvall.

Ethics declarations

Ethics approval and consent to participate

This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the by the Regional Ethics Research Committee (Dnr 2017/281–31). Informed consent to participate was sought from the child. In children below 15 years, also parental consent was requested.

Consent for publication

All authors have consented for publication.

Competing interests

Katherine Stevens is the developer of the CHU9D. All authors however, declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lindvall, K., Vaezghasemi, M., Feldman, I. et al. Feasibility, reliability and validity of the health-related quality of life instrument Child Health Utility 9D (CHU9D) among school-aged children and adolescents in Sweden. Health Qual Life Outcomes 19, 193 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: