Health economic evaluations are routinely using the QALY as a summary measure of health outcome. The value of the QALY is that it is a generic measure that enables comparisons between a diverse range of health services, programs and interventions. In a health system where budgets are limited but demand for health services is high, policy makers need to make decisions about what services, programs and interventions to fund. Consideration of cost per QALY is a key part of this decision making process [44].
In this context psychotherapy-based services such as specialist CAMHS are competing for resources against pharmaceutical companies who have developed medications for many of the conditions seen in CAMHS (e.g. methylphenidate for ADHD, antidepressants for depression and anxiety) and are seeking to have those medications included in pharmaceutical benefits schemes. Whilst pharmaceutical companies are well versed in the utilisation of PBHRQOL instruments and calculation of cost per QALY; such information is provided in guidelines for submissions to the Pharmaceutical Benefits Advisory Committee [45], there are relatively few cost-utility studies of psychotherapy interventions and the use of PBHRQOL instruments in CAMHS is rare. For example, a search on “utility” in the PEDE database [46] returned 173 cost-utility studies, of which only 12 were for child and adolescent mental health disorders and 9 of these were for pharmaceutical treatments. This state of affairs disadvantages psychotherapy-based CAMHS who lack QALY data to support the effectiveness of their interventions.
In this study we explored the potential value of the Child Health Utility (CHU9D) as a routine outcome measure for use in CAMHS. The CHU9D is a preference-based instrument that generates utility weights which can be used to calculate QALYs for use in health economic evaluations. Of particular interest was whether the CHU9D was quick and easy to use, whether it could act as a suitable proxy for mental health symptoms, and whether it generated utility weights similar to those measured in other child and adolescent mental health population studies.
From a clinical perspective, the CHU9D was quick and easy to administer, and caregivers had little trouble answering the questions, suggesting it could be implemented with minimal fuss for caregivers. Three of the instrument’s 9 items relate directly to emotional symptoms: worried, sad, and annoyed. Additionally, the 3 items that were found to be significant predictors of the SDQ total score — schoolwork, sleep, and daily routine — measure impacts in areas commonly disrupted in children with a wide variety of mental health disorders. In a recent study of clinician’s behaviour and attitudes towards routine outcome measurement, administrative load and instrument relevance were highlighted as barriers to implementation [47]. The brevity and broad clinical relevance of the CHU9D are therefore important when considering the likelihood of clinicians endorsing and taking up the instrument in clinical practice.
Two administration issues were identified and need to be explored further. The first is that the instrument asks about the child’s functioning ‘today’. Almost 1/3 of caregivers in our study reported that ‘today’ was not a typical day for the child which may have led some to underestimate their child’s dysfunction. The other finding was that a small number of parents were unable to rate the child’s functioning that day due to limited exposure to their child. In a population where it is common for separated parents to share access to their child, this issue may occur more frequently than in non mental health populations. We suggest one modification to the instrument which might help address this problem, subject to further testing. This is to adjust the wording for caregivers to rate a ‘typical day’ if they report not having the information required to answer for the actual day in question.
Psychometrically, the instrument performed adequately, although we were only able to test a couple of aspects of its performance. The obtained Cronbach alpha of .781 is challenging to interpret. On one hand, it compares favourably to the alpha of .66 found by Foster Page et al. [48], suggesting the items are converging better in a mental health population than in a dental population. It was also higher than our cutoff of 0.7 but not in high 0.9’s suggesting the items are tapping into a central construct (i.e. quality of life) without indicating some items are redundant. However, it is difficult to define an ideal value for alpha for an instrument that is designed to measure the multi-dimensional nature of quality of life in children. We expect that further validation exercises in clinical populations with samples large enough for factor analysis will help illuminate the factor structure of the instrument for different clinical populations.
In terms of validity, despite having a different focus and reference time period to the SDQ (today vs last 6-months), there was evidence of moderate convergence between the instruments. The correlation between the SDQ and the CHU9D was in the moderate to strong range, item-level correlations were in the expected direction, and children in the abnormal range on the SDQ showed significantly lower utility weights than children in the borderline/normal ranges. These findings are important as a predictable relationship between quality of life and child and adolescent mental health supports the use of PBHRQOL instruments in this population.
From an economic perspective, we noted two things in relation to the utility weights generated by the CHU9D tariffs. First, we noted utility weights from this study were significantly higher (i.e. indicating better quality of life) than those collected in other child and adolescent mental health populations. Whilst the comparison was highly fraught as the comparative studies used different instruments and populations, we believe this issue warrants further investigation. If competing instruments generate significantly different utility weights in the same population, the interpretation of economic evaluations may be influenced by choice of utility instrument in addition to the performance of assessed interventions, a finding noted elsewhere [49,50]. Our current hypothesis is that the CHU9D is overestimating quality of life compared to other instruments in mental health populations, consistent with that found in other CHU9D studies [24]. The implications for this in terms of choice of instrument to use in measuring the impact of child and adolescent mental health interventions needs further exploring.
Second, we noted the failure of both CHU9D tariffs to capture the full range of utility weights from 0–1. Both tariffs have a floor utility weight of .3, similar to that seen in the SF-6D, a widely used adult utility instrument [51]. This smaller range can lead to an over-prediction of utility in poor health states [52] and an underestimation of utility change in intervention studies [53]. Thus the CHU9D might over-predict utility in severe Child and Adolescent Mental Health Services presentations, for example, severe mental illness (schizophrenia, depression) with suicidal ideation and suicide attempts. Interventions evaluated using the instrument may also show smaller utility gains (higher cost-utility estimates), than might be seen in adult populations where the EQ-5D is used, which has a significantly wider score range. Both situations potentially disadvantage economic evaluations of interventions in child and adolescent mental health, compared to adult interventions. Within the limited utility range, the Australian Adolescent Tariff generated a wider spread of scores and consistently lower mean values. Thus we suggest there is still ongoing work with these tariffs, to reduce the floor effect and explore differences in ratings between adolescents and adults. We note for example that a recent modification of the classification system of the SF-6D has provided preliminary evidence of being able to reduce the floor effect [54], and such an approach may be relevant to the CHU9D.
Limitations
Due to the nature of the study we were unable to test a number of useful metrics. For example, we were not able to calculate test-retest reliability or the sensitivity of the instrument to change in mental health symptoms over time, having only one measurement point. These are particularly important metrics, as the value of the CHU9D for economic evaluation will depend on its capacity to reliably detect change due to intervention, rather than large natural fluctuations in an individual’s responses. We were also unable to explore concordance between caregiver and child self-completed versions of CHU9D as the telephone survey methodology was not well suited to collection of responses from children and adolescents. As a result, we recommend further psychometric validation of the CHU9D with a focus on repeated measures and multiple raters (e.g. child, caregiver, therapist).
In terms of the study sample, although it was drawn randomly from a list of active CAMHS clients, difficulties in contacting people from the list possibly led to the sample being higher functioning than a true random sample. In essence, families who were not contactable were assumed to have greater dysfunction, although we did not have data to test this hypothesis. There is also the question of whether the study sample is representative of CAMHS clients elsewhere. Comparisons not reported in this paper however show SDQ scores in our sample compare very similarly to SDQ scores collected routinely from other Australian CAMHS, hence we have reasonable confidence that the study sample is at least fairly representative of clients of CAMHS in Australia.
It should also be noted that there is a broader debate relating to the use of preference-based quality of life instruments in children and adolescents [55]. Concerns include that the valuation procedures used in defining the tariffs may not translate well from adult measures to children and adolescents, the capacity of children to understand and complete instruments, the accuracy of proxy raters, the need to consider family interactions in children’s measures, and the wide variation in utility weights noted for different childhood disorders from existing instruments. We also note (and this applies to adult instruments as well) that tariffs generated in one population or age group may not be comparable to those in other populations, further hampering efforts to use preference-based HRQOL instruments to facilitate cross comparisons. Whilst it is beyond the scope of this paper to address these issues in any depth, it should be noted that there are valid arguments that preference-based instruments (as they currently stand) might not be the best fit for child and adolescent populations, and therefore alternative metrics of outcomes with economic relevance (e.g. school attendance) should be similarly explored.
Future work
The telephone survey method used in this study proved to be a viable and efficient way of communicating with caregivers about the mental health and quality of life of their children receiving mental health services. The process, which was separate from their clinical care, caused minimal disruption to clients and therapists. As such, this method might be suitable for exploring test-retest, sensitivity to change, and comparisons between PBHRQOL instruments. Telephone or web-based survey methods may also be suitable for tracking adolescents receiving services. For example, Ratcliffe and colleagues were able to obtain consent and collect data from adolescents using a web-based survey [24]. Eliciting answers to the CHU9D from very young children has required direct contact [21], thus studies involving children as young as 5 might need to be situated within clinics.
Future studies could thus employ telephone and web-based survey methods, but with a larger range of utility measures, as well as a follow-up assessment and sub-samples with repeated measures. Studies looking to obtain scores from children may need to supplement telephone and web-based methods with assessments conducted within the clinical services themselves. Templates for such studies include the Multi Instrument Comparison Project [56]. The outcomes of comparing the performance of different utility instruments in children and adolescents would provide much clearer guidance on whether preference-based instruments are a suitable addition to mental health services, and if so, which ones are superior.