The comparison of patients' rating on their QOL with proxy-ratings obtained from their significant others is of importance to the decision whether or not these proxy-ratings are a useful measure, if patients' ability to report on his QOL diminishes due to physical or cognitive deterioration.
Our study found that for a considerable number of subscales of the EORTC QLQ-C30 and QLQ-BN20 proxy-ratings by significant others can be regarded as useful. This was especially true for Physical Functioning, Sleeping Disturbances, Appetite Loss, Constipation, Financial Impact and Taste Alterations. Worse rater agreement was found for Social Functioning, Emotional Functioning, Cognitive Functioning, Fatigue, Pain, Dyspnoea and Seizures. For these scales correlations as well as percentage of agreement (+/-5 points) were low. However, with the exception of Social Functioning and Dyspnoea means of patients' ratings and proxy-ratings were rather similar (less than 5 points difference).
The additional module QLQ-BN20 showed fairly good rater agreement for most scales. Worst agreement was found for Seizures and Bladder Control.
With reference to Osoba et al. [17] and King [18] we considered mean differences above 5 points as relevant rater disagreement. Taking this into account discrepancies between proxy- and self-ratings were rather insiginficant for most scales. No uniform pattern was found with respect to systematic under/over-rating by proxies.
Another important issue is the extent of rater-agreement across the scale range, especially with regard to generalisability of our results to patients in a poor condition. Analysis of Bland and Altman plots indicate that agreement is worst for the central section of a scale. This finding is probably a result of the fact that possible differences between raters are necessarily minimised by the limited range scale.
Overall, proxy-ratings performed somewhat better for more overt aspects of QOL such as physical symptoms, whereas ratings on social and psychological aspects showed less congruency.
A limitation of our study is the small sample size which did not allow to detect small mean differences between patient and proxy ratings. For the same reason, it was not possible to perform subgroup analyses on certain patient groups. In addition, patients in a very bad physical condition, would have been of importance to our study, as proxy-ratings are most useful in that patient group. However, due to ethical considerations it was not possible to include such, since burden caused by filling in both questionnaires was considered not acceptable for these patients. Another limitation of our study is the high rate of significant others refusing participation in the study.
The results for accuracy (percentage of mean differences equal or below 5 points) may have been affected by the number of items in a scale, more precisely the number of possible scores on a scale. Two contrary effects can be expected from this. On the one hand a low number of possible scores increases agreement due to chance, on the other hand if the distance between two possible scores is higher than 10 points (e.g. for scales containing one or two items) only exact agreement is taken into account by this accuracy parameter.
The study most similar to ours [6] found more pronounced mean differences for Physical Functioning, Role Functioning, Cognitive Functioning, Social Functioning and Fatigue (all between 5 and 10 points). With the exception of Physical Functioning, these scales showed also only a moderate proportion of exact agreement. A slight difference to our study was the use of a previous version of the QLQ-C30 in the study by Sneeuw et al. [6] that employed a dichotomous response format for the scales Physical Functioning and Role Functioning.
Proxies' relationship with the patient, age, gender and culture showed no significant association with rater agreement. But agreement was worse in patients with mental confusion, cognitive impairments and motor deficits. We think that the finding that rater agreement is low in patients with severe cognitive impairments should not be considered per se as an indication for inaccurate proxy rating. It might also reflect patients' inability to report on their condition. On the other hand, it may as well be difficult for proxies to understand the individual consequences of cognitive decline. Additional clinical variables as more objective criteria may be helpful in evaluating rater disagreement in this patient group.
In a recent study by Brown et al. [21] on rater agreement in patients with newly diagnosed high-grade gliomas proxy-ratings by a caregiver chosen by the patient himself also showed good congruence. As QOL-instrument this study employed the FACT-Br [22]. Correlation between patient-ratings and caregiver-ratings was 0.63 at baseline and 0.64 at 2 and 4 months follow-up, percentage of agreement (+/- 10 points on a scale ranging from 0 to 100) was 63-68% at the three assessment time points.
With regard to type of proxy-rating, proxy-raters can not only differ in their relation to the patient (significant other, treating physician, caregiver etc.) but also in the perspective they take towards the patient. Gundy and Aaronson [23] investigated whether or not there are differences in proxy-ratings if the proxy rates the patient taking the patient's perspective or if he makes his own assessment of the patient. No differences with regard to bias were found between both types of ratings, although it should be mentioned that the study might have been not sufficiently powered to detect possible differences between these types of ratings.
Taking our own findings and those from similar studies into account, the assessment of QOL in brain cancer patients through ratings from their significant others seems to be a feasible strategy to gain information about important aspects of a patient's QOL, if the patient is not able to provide information himself. However, in general rater agreement is lower for psychosocial issues compared to physical symptoms.
In a research context proxy ratings may allow to reduce bias from patients droping out of studies because of deteriorating health and in a clinical context proxy-ratings could contribute to medical decision making. Future research, should further evaluate the impact of patient and proxy characteristics on rater agreement and include further criteria for accuracy of proxy ratings.