Sample
We recruited a cross-sectional sample of cancer patients from centers in four European countries: the Netherlands Cancer Institute (the Netherlands), Kufstein County Hospital (Austria), the Mount Vernon Cancer Centre and Basingstoke & North Hampshire Hospital (the United Kingdom) and the Jagiellonian University Medical College (Poland). To obtain a heterogeneous sample we included any cancer patient (on- or off-treatment) aged above 18 years. Patients were invited to participate in the study via mail or during a clinic visit, and were asked to complete the EORTC QLQ-C30 and an additional questionnaire with anchor items.
Assessment instruments
EORTC QLQ-C30
The EORTC QLQ-C30 [10] is an internationally validated and widely used cancer-specific HRQOL instrument. It contains five functioning scales (physical, social, role, cognitive, and emotional functioning), eight symptom scales (fatigue, nausea/vomiting, pain, dyspnea, sleep disturbances, appetite loss, constipation, and diarrhea), financial impact, and overall quality of life. All scale scores are linearly converted to range from 0 to 100. For the functioning scales and global QOL higher scores indicate better functioning; for the symptom scales higher scores indicate higher symptom burden.
Anchor items
An expert panel including four PRO researchers, three psycho-oncologists, three oncologists and a biostatistician generated a set of anchor items intended to assess the clinical importance of functional health problems and symptoms included in the QLQ-C30. This set of anchor items was reviewed independently by the EORTC Quality of Life Group as part of a grant review process and additionally underwent external anonymous peer review within this process. The anchor items were also presented and discussed in plenary at an EORTC Quality of Life Group meeting. The wording of the anchor items and the response categories were as follows:
-
Has your PF/EF/FA/PA been a burden to you?
-
Has your PF/EF/FA/PA limited your daily activities?
-
Have you needed any help or care for your PF/EF/FA/PA?
A functional limitation or symptom was considered to be potentially clinically important if any of the three anchor items was answered positively. More specifically, a patient was considered to have a problem of at least “minimal clinical importance”, if s/he reported at least “a little” for any of the three anchor items. Patients who rated their problem/symptom as “quite a bit” or “very much” for any anchor item were classified as having a problem of “clinical importance”. In line with this, we labeled the two possible thresholds to be investigated further TMCI (Threshold for Minimal Clinical Importance) and TCI (Threshold for Clinical Importance).
Statistical analysis
Sample characteristics are presented as means, standard deviations, and absolute and relative frequencies.
In a first step we calculated prevalence rates (i.e. relative frequency of positive cases) for symptoms and functioning problems in our sample based on the above definitions of TMCI and TCI. These prevalence rates were evaluated to decide which anchor definition (TMCI or TCI) to employ for further analysis and development of the final thresholds. We considered an anchor definition to be too sensitive if it resulted in the majority of patients being classified as positive cases on multiple domains. This would not be sustainable in clinical practice, as it would require additional help and/or intervention for too many patients.
To assess the discriminatory power of the QLQ-C30 scales, we calculated effect sizes (Cohen’s d) for the mean QLQ-C30 scale score differences between patients classified as having a clinically important problem and those not so classified. To determine the diagnostic accuracy of the QLQ-C30 scales with regard to the external anchors we conducted Receiver Operating Characteristic (ROC) analyses and calculated the Area Under the Curve (AUC). In line with Hosmer and Lemeshow (1989), we classified diagnostic accuracy as follows: <0.70 poor; 0.70–0.80 acceptable; >0.80 excellent [19].
Selection of threshold scores for the QLQ-C30 scales was based primarily on Youden’s J [20], i.e. the sum of sensitivity and specificity minus one. In case of different thresholds showing comparable Youden’s J, we chose the one providing higher sensitivity, as we considered sensitivity to be more important than specificity in the context of initial screening for problems at the individual patient level. We also calculated correlations (Spearman’s Rho) between the three anchor items.
We used binary logistic regression analysis to investigate invariance of diagnostic accuracy and stability of thresholds across different patient groups. The regression analysis included the dichotomous external anchor as the dependent variable, and the QLQ-C30 scale and the grouping variables (sex, age, stage, country, treatment status) as independent variables. In such a model, the main effect of the grouping variable indicates a difference in diagnostic accuracy of the anchor-based threshold (sensitivity/specificity) between the patient groups. A significant interaction term (grouping variable * scale) indicates that the optimal threshold providing the lowest misclassification rate differs across patient groups. To account for multiple testing in these sensitivity analyses we considered p-values below 0.01 to be statistically significant.