 Research
 Open Access
 Published:
Interpreting small treatment differences from quality of life data in cancer trials: an alternative measure of treatment benefit and effect size for the EORTCQLQC30
Health and Quality of Life Outcomes volume 13, Article number: 180 (2015)
Abstract
Background
The EORTCQLQC30 is a widely used health related quality of life (HRQoL) questionnaire in lung cancer patients. Small HRQoL treatment effects are often reported as mean differences (MDs) between treatments, which are rarely justified or understood by patients and clinicians. An alternative approach using odds ratios (OR) for reporting effects is proposed. This may offer advantages including facilitating alignment between patient and clinician understanding of HRQoL effects.
Methods
Data from six CRUK sponsored randomized controlled lung cancer trials (2 small cell and 4 in nonsmall cell, in 2909 patients) were used to HRQoL effects. Results from BetaBinomial (BB) standard mixed effects were compared. Preferences for ORs vs MDs were determined and Time to Deterioration (TD) was also compared.
Results
HRQoL effects using ORs offered coherent interpretations: MDs >0 resulted in ORs >1 and vice versa; effect sizes were classified as ‘Trivial’ if the OR was between 1 ± 0.05 (i.e. 0.95 to 1.05); ‘Small’: for 1 ± 0.1; ‘Medium’: 1 ± 0.2 and ‘Large’: OR <0.8 or >1.20. Small HRQoL effects on the MD scale may translate to important treatment differences on the OR scale: for example, a worsening in symptoms (MD) by 2.6 points (p = 0.1314) would be a 17 % deterioration (p < 0.0001) with an OR. Hence important differences may be missed with MD; conversely, small ORs are unlikely to yield large MDs because methods based on OR model skewed data well. Initial evidence also suggests oncologists prefer ORs over MDs since interpretation is similar to hazard ratios.
Conclusion
Reporting HRQoL benefits as MDs can be misleading. Estimates of HRQoL treatment effects in terms of ORs are preferred over MDs. Future analysis of QLQC30 and other HRQoL measures should consider reporting HRQoL treatment effects as ORs.
Background
Health related quality of life (HRQoL) is an important endpoint in cancer trials for several reasons. First, where effect sizes are small, HRQoL can ‘add value’ to expensive cancer treatments. Secondly, considerable time is spent completing instruments for the purpose of estimating the impact of treatments on HRQoL. Therefore, such efforts should result in HRQoL effects that are meaningful and interpretable, especially where HRQoL is a primary or coprimary endpoint [1]. Thirdly, some anticancer treatments exhibit serious sideeffects, despite improvements in overall survival (OS); HRQoL is also reported to be a predictor of survival in lung cancer patients [2], the leading cause of death among cancers [3]. It would be important to understand for example, how survival differs between patients with ‘poor’ baseline HRQoL, compared to those with ‘Good’ HRQoL. Finally, HRQoL outcomes are often required for costeffectiveness analyses and drug reimbursement [4, 5]. Therefore, understanding and interpreting HRQoL data is crucial in evaluating cancer treatments.
The EORTCQLQC30 (QLQC30) is a widely used cancer specific instrument [6]. The instrument has 30 questions from which 15 domains (sub scales) are determined, consisting of 5 ‘function’ scales, 8 ‘symptom’ scales, a global quality of life (QL) scale and a finance scale (FI). For QL and function domains, high scores indicate better HRQoL. For symptom domains (and FI), low scores indicate better HRQoL.
Treatment effects from the QLQC30 are often reported as mean differences (MDs) [7], despite scores having heavily skewed distributions with ceiling effects (many patients with scores of 0 or 100) and censored data due to progressive disease, death or failure to complete questionnaires. The interpretation of HRQoL MDs can be more complicated than survival endpoints. Consequently, alternative measures of treatment effect have been proposed.
Maringwa suggests a minimally important ‘difference over time’ as a measure of effect [8]. The area under the curve (AUC) can be difficult to interpret, although useful for reducing multiple observations to a single value [9]. However, if HRQoL is measured at a few time points (e.g. baseline and month 12), the AUC will have limited value. Moreover, the interpretation of the effect can become tricky (e.g. for HRQoL scores of 100 at each of 0, 1 and 2 months, the AUC score is but the original HRQoL scale is 0 to 100).
Categorizing scores: e.g. improvements in symptoms from ‘moderate’ or ‘severe’ (67–100 points at baseline) to ‘non’ or ‘little’ (0 to 33 points) was proposed by Langendjik [10]. Reck and Norman [11, 12] suggested ‘noted’ changes in HRQoL occur when a ‘shift’ of greater than half of the baseline standard deviation is observed). Time to HRQoL deterioration (TD) has been suggested (Anota) [13]. However different definitions of ‘deterioration’ lead to different conclusions and median TD may not be estimable (e.g. few events) and further complicated by nonproportional hazards (PH). Interpretation of effects with TD using HRs is however similar to ORs. Reporting a ‘Trend’ is also a way of describing HRQoL over time (Schaake) [14], although difficult to interpret (e.g. how much ‘more trend’ is there for experimental vs. control?).
The above measures of HRQoL effects can be difficult to interpret for patients and clinicians. The mean is often the statistic of choice to define treatment effect sizes for HRQoL endpoints in most of these measures.
One commonly reported clinically relevant effect size proposed by Osoba and King [6, 15, 16] is ≥10 points MD (on any domain), a value used as a benchmark by researchers to determine whether HRQoL benefits exist [7]. Some researchers interpret a 10 point improvement as a difference between treatments, while others as a 10 point change (improvement) from baseline (Hirsh) [6, 17], which is not always possible. For example, if a patient scores 8 points (or 92 points) at baseline, a reduction (or increase) of 10 points is not possible. Moreover, ‘important’ treatment differences need not be the same for symptom as functional scales. A worsening of 5 points in a symptom scale may be more important than a 10 point improvement in a functional scale.
For HRQoL endpoints, the magnitude of effect sizes are often considered to be clinically relevant if a difference of 10 points is observed, regardless of whether HRQoL is a primary or secondary outcome. Such requirements are not expected of other secondary clinical endpoints in cancer trials (e.g. time to progression (TTP)). One reason may be that secondary endpoints are not powered or there is a clinical rationale that the secondary outcome cannot be expected to yield effects similar to primary endpoints. In a similar vein, effect sizes should not be expected to be uniform across HRQoL domains for demonstrating treatment benefit because some smaller effect sizes (e.g. < 10 points) may be important. In this research we attempt to show that some small effect sizes on a MD scale might be dismissed as clinically irrelevant but remain important on a relative scale.
Little attention has been given to smaller HRQoL effects (MDs) which are often glossed over unless a ‘statistically significant’ pvalue is reported alongside. Small MDs tend to be perceived as offering limited HRQoL benefit but can mask important improvements, particularly when data are analysed using an alternative scale (e.g. OR scale). This presents a challenge for setting thresholds for defining clinically relevant HRQoL effect sizes. Moreover, ORs can facilitate an interpretation of effects similar to hazard ratios (HR), familiar to many oncologists (OR are interpreted in a similar way to HRs).
Therefore, in this article after presenting baseline characteristics, we offer effect size categories based on the OR and describe example situations of the relationship between ORs and MDs. We discuss aspects of statistical significance of small effects in the context of ORs and MDs and compare preferences between ORs vs MDs from several clinicians; Finally, we compare ORs and MDs with time a to deterioration (TD) approach (TD ≥5 points) following Anota [13].
Methods
Data
HRQoL data from six randomized controlled trials (RCT) conducted by the CRUK & UCL CTC were analayzed [9, 18–22]. These were selected because they comprised of all patient level QLQC30 data available in the CTC database from RCTs in lung cancer which had been published.

(i)
‘TOPICAL’: A phase III trial in NSCLC patients unfit for chemotherapy comparing erlotinib with placebo [18]; N = 670 patients.

(ii)
‘SOCCAR’: A phase II trial comparing concurrent vs. sequential chemotherapy in NSCLC patients [19]; N = 130.

(iii)
‘Study 10’: A phase II trial comparing Gemcitabine/Carboplatin versus Cisplatin/Etoposide in patients with small cell lung cancer (SCLC) [20]; N = 241.

(iv)
‘Study 11’: A phase III trial comparing Gemcitabine/Carboplatin versus Mitomycin/Ifosfamide /Cisplatin in patients with stage IIIB or IV NSCLC [9]; N = 422

(v)
‘Study 12’: A phase III trial comparing Thalidomide combined with chemotherapy versus chemotherapy alone in SCLC patients [21]; N = 724

(vi)
Study 14: A phase III trial comparing Thalidomide/Gemcitabine/Carboplatin versus Gemcitabine/Carboplatin alone in NSCLC patients [22]; N = 722
Assessments
Data were collected during clinic visits and questionnaires returned by patients during follow up; QLQC30 was assessed at several time points including baseline, pre and post chemotherapy and at monthly intervals for at least 24 months or until disease progression.
Statistical analysis
Patient level HRQoL scores for each of the 15 domain scores were analysed using a a repeated measures [21, 22] analysis for reporting MDs and a more novel Beta Binomial (BB) model in a mixed model framework [23] for reporting ORs. For the BB model, responses were transformed to a (0,1) scale using the transformation [23] Ya/ba, where a and b are the minimum and maximum possible scores and Y the observed response. For example, a score of 80 is transformed as 80 0/(100a) = 80/100 = 0.8. Dichotomization is not required for a BB model to generate ORs.
The BB model has been used in a variety of applications [23–25]. Its advantages over standard (linear) models in terms of statistical properties are widely reported [25, 26]. The BB is also flexible because it models scores at the extreme ends of the scale (e.g. many patients scoring 0 or 100), a common feature of QLQC30 scores, using zero–one inflated model [25, 26]. MDs were classified similar to those described by Cocks [7]; ‘Trivial’ (0–3 points), ‘Small’ (3–10 points), ‘Modest’/ ‘Medium’ (10–15 points) and ‘Large’ (>15 points). Similarly, ORs were classified as 1 ± 0.05 (‘Trivial’), 1 ± 0.1 (‘Small’), 1 ± 0.2 (‘Medium’) and <0.8 or >1.2 (‘Large’). Time to Deterioration (TD) was determined using the first time where scores reduced/increased by ≥ 5 points. Patients without deterioration were censored. A KaplanMeier and Cox proportional hazards (PH) analysis was carried out.
A pilot survey was carried out to determine preliminary evidence of whether clinicians and/or patients preferred ORs or MDs for expressing treatment effects. Three items, physical function (PF), Pain (PA) and cognitive function (CF) from the 15 domains were randomly selected and presented to each of five clinicians and their patients (where possible). Patients/clinicians were asked to state preferences for ORs or MDs (Additional file 1). Lower/High scores express preferences for ORs; scores close to 5 express indifference.
Results
Demographics and baseline characteristics
The median age was 64 years (range 27–86 years) with oldest patients in the TOPICAL trial (median age 77); 61 % were male; 67 % were ECOG (0–1), 24 % ECOG 2 and 9 % ECOG 3 (Table 1); less than half were stage IIIaIIIb (47 %) [9, 18–22]. Most QLQC30 responses were >90 % complete at baseline (Additional file 2: Table S1) with the exception of study 10 (about 60 % complete). More than 50 % of data were available for at least 5 time points.
Distribution of QLQC30
Most (>85 %) QLQC30 responses were very skewed (Fig. 1 & Additional file 2: Figure S1). For TOPICAL, 14/15 (93 %) of scores had alpha or beta values (special values associated with a BB distribution relating to the mean and variance) <1; KolmogorovSmirnov tests rejected normality (pvalue <0.001). Therefore, using the mean as a measure of HRQoL benefit and consequently MDs is not considered a suitable reporting metric for HRQoL scores. Statistical analysis should be conducted according to the underlying (true) distribution of the data. The distribution of QLQC30 scores from the six trials were not normally distributed in most (≥85 %) of cases.
Relationship between MDs and ORs
Few 4/90 (4 %) HRQoL treatment effects (MDs) were ‘Large’ (>15 points) or ‘Medium’ (10–15 points); 27/90 (30 %) were ‘Small’ (3–10 points) and 59/90 (66 %) ‘Trivial’ (0–3 points) MDs; For ORs, 22/90 (24 %) were ‘Large’ (effects > 20 %) or ‘Medium’ (effects between 10 % to 20 %) with the rest being ‘Small’ or ‘Trivial (10 % and 5 % respectively). ORs were therefore more than seven times more likely to detect larger differences which can yield up to 20 % improvements in HRQoL ([0.24/0.76]/[0.04/0.96]) compared with MDs (Tables 2 and 3).
Additional file 2: Figure S2 shows the relationship between MDs and ORs and shows general agreement in terms of the direction of effects (i.e. observations in the upper right quadrant are ORs >1 and MDs >0; estimates in the lower left are ORs < 1 and MDs <0).
Four examples are provided to understand the relationship between ORs and MDs.
Example 1: when MDs are small but ORs are large
In the TOPICAL Trial the MD for constipation (CO) symptoms were 2.6 points (p = 0.1314) while this was an OR of 1.17 (p < 0.0001) – the choice of interpretation is ‘a worsening in CO by a mean difference of 2.6 points with erlotinib compared to placebo’ vs ‘patients are 17 % more likely of having worsening CO symptoms with erlotinib compared to placebo’. The MD scale gives the impression that CO symptoms worsens by a ‘Trivial’ amount of 2.6 points (Table 2). This tends to occur when responses are skewed (Fig. 1 and Additional file 2: Figures S1, S2 and S3). In the presence of heavily skewed data, the OR is a suitable choice for presenting HRQoL effects from the QLQC30.
Example 2: when MDs are ‘Large’ but ORs are ‘Medium’ or ‘Small’
In the TOPICAL trial, patients had worse diarrhoea (DI) with erlotinib: MD of 15.1 (‘Large’ effect) points (p <0.001) with a corresponding OR of 1.12 (p = 0.0505). The DI scores were considerably skewed (Fig. 1) which might explain why the larger MD corresponded with only 12 % (‘Medium’ effect) higher odds of diarrhoea with erlotinib compared to placebo (OR = 1.12). The OR appears to have modified the ‘Large’ effect size (borderline significance) to a smaller (nonsignificant) effect size.
Example 3: when MDs are ‘Medium’ but ORs are ‘Large’
In study 10, RF improved by a MD of about 13 points (Table 2) with the experimental treatment – a ‘Medium’ effect. Using an OR, this was an improvement in role function by almost 30 % (OR =1.29 ‘). On examination of Additional file 2: Figure S1, responses fell into only three distinct categories at 0, 50 and 100 and scores were not Normally distributed making use of the MD questionable. The OR approach has relegated a ‘Medium’ effect to a ‘Large’ effect.
Example 4: when MDs and ‘ORs agree on the direction of effects
In the TOPICAL trial, two of the MDs (MD of 3.2 and 3.6 in TOPICAL; pvalues of 0.0017 and 0.0007 for PF and CF respectively) had corresponding ORs of 1.10 and 1.14 (pvalue = 0.0168 and 0.0107). Both MDs and ORs are in agreement that PF and CF are improving with the experimental treatment. Hence, on average, patients had 10 % and 14 % higher odds of improved PF and CF on erlotinib compared with placebo respectively (Table 2).
The above are a limited number of examples reflecting the challenges associated with defining thresholds of HRQoL differences with the MD. Another issue that can complicate interpretation is when small effects become difficult to interpret and justification is made through statistical significance. Statistical significance of small HRQoL effects are often reported, but the clinical relevance not always discussed. Table 3 shows that 28/90 (31 %) of ‘small’ or ‘Trivial’ effects based on MD were statistically significant compared with 7/90 (8 %) for ORs.
Example 5: Potentially unreliable statistically significant conclusions using MD
In study 12, for Diarrhoea, the MD was −2.3 (p = 0.0017). The corresponding OR was 1.05 (p = 0.2909). The clinical relevance of the small improvement in DI symptoms with experimental treatment might be difficult to judge. On the ORs scale, DI is actually shown to be worse: a 5 % likelihood of worsening diarrhoea (a common side effect with this chemotherapy) on the experimental treatment. Examination of Additional file 2: Figure S2 shows heavily skewed DI scores – with about 15 % of patients showing worsening DI symptoms. The choice of a mean statistic here is likely to lead to an unreliable or unexpected statistical conclusion. Further examples of differing statistical conclusions between ORs and MDs are shown in Additional file 2: Tables S2, S3.
Effect size classification for ORs and MDs
Estimates for OR effect size categories similar to those described earlier [7] were determined using a cumulative frequency plots from MDs and ORs (Fig. 2 and Additional file 2: Tables S2, S3, S4). Effect sizes in terms of ORs were broadly classified as: ‘Trivial’: ORs within ±5 % of 1 (i.e. ORs between 0.95 and 1.05); ‘Small’ effects (ORs 1.05 1.10 or 0.90 – 0.95); ‘Medium effects (ORs 1.10 – 1.20 or 0.800.90) and ‘Large’ effects ORs either >1.20 or <0.80. Additional file 2: Table S4 shows that 12/59 (20 %) of ‘Trivial’ effects based on MDs might be clinically important because on an OR scale these were ‘Medium’ or ‘Large’. Consequently some clinically important effects may be missed using MDs.
Figure 2 shows median HRQoL effect sizes are 2.5 points (half of effect sizes are ≤2.5), roughly equivalent to 7 % changes in HRQoL on the OR scale; similarly for the lower and upper quartiles, 25 % of effect sizes ≤1 point or 4 % changes on the OR scale; and 75 % of effect sizes are ≤3.6 points (ORs of about 1.10).
Secondly, for effect sizes of 1, 3, 5 10 and >15 points, the equivalent ORs are about 1.02, 1.07, 1.13, 1.25 and 1.37 respectively. The threshold for a large effect size of >15 points is challenging: patients expected to improve/worsen by almost 40 %. This may be a difficult target for some cancer drugs to achieve when compared with each other.
Summary of preference scores from survey
Five lung cancer clinicians completed a pilot (Additional file 1) survey (London UCH, Liverpool, Leeds, Chester and Imperial College London). At this time no patient responses were available. Hence a total of 15 scores from 5 clinicians who expressed preferences for either ORs or MDs for each of PF, Pain and CF were analysed. Stronger preferences were expressed for ORs over MDs: mean scores of 2.4, 3.1 and 2.8 for PF, Pain and CF respectively. Hence, initial evidence suggests clinician preference was greater for ORs than MDs. The results would need to be confirmed in a larger sample.
Comparison with time to deterioration
The time it takes for a patient to deteriorate from baseline by ≥5 was not possible for about 13 % HRQoL domain scores due to too few events (i.e. patients did not show of ≥5 points). Moreover, a TD of ≥5 points was not always possible because scores were clustered in values such as 16.7, 33.3 and 66.6 (e.g. as in CF scores for TOPICAL Fig. 1). No patient experienced (or could experience) a TD of exactly 5, 10 or 15 points (the possible values of the QLQC30 for CF were only 0, 16.7, 33.3, 50.0, 66.7, 83.3 and 100). The median TD (Additional file 2: Table S5) was not calculable for some symptom and function scores: for CF, a HR of 1.05 (p = 0.241) was reported: patients had a 5 % increased risk of deteriorating (≥5 point reduction) CF with erlotinib compared to placebo. The OR of 1.14 and MD of 3.2 in contrast show improvements in CF. The definition of deterioration is therefore critical for a valid estimate to be possible. When the TD for CF was changed to ≥16 points (‘Large’ effect), the medians become calculable as 77 vs 87 months for erlotinib vs placebo (HR = 0.92; p = 0.56): the risk of deterioration in CF was slightly worse (by 8 %) with erlotinib compared to placebo. The Kaplan Meier curves cross and the PH assumption was violated, a complication the OR analysis avoids.
Conclusion
An alternative metric to the commonly reported MD was presented in the form of ORs. Skewness of QLQC30 scores might render statistical and clinical interpretation of MDs questionable. Alternative effect size categories for ORs were proposed. We have also shown a relationship between ORs and MDs for QLQC30 measures; ORs can on the one hand reveal important HRQoL effects which might otherwise be missed with MDs, particularly those perceived to be ‘Trivial’ or ‘Small’. Conversely, effect sizes based on MDs thought to be ‘Medium’ or ‘Large’ may appear less exaggerated with ORs; Treatment effects from TD type analyses did not always result in estimates of effect sizes and interpretations were complicated by non PH assumptions. Finally we showed results from a pilot survey which suggest oncologists may prefer ORs over MDs for interpreting QLQC30 effects.
The use of the ORs has been used previously in HRQoL data. Feddern et al. (2015) [27] reports them for assessment of pain; Chie et al. (2015) [28] uses a propensity score (logistic regression) approach to report odds of HRQoL deterioration; Kurita et al. (2015) [29] use ORs with the QLQC30 in renally impaired patients. In these analyses scores were dichotomized in order to generate the ORs. In our analysis, no such dichotomization (and consequent loss of information) was required due to flexibility of the BetaBinomial regression approach.
Patient and clinician understanding of MDs have not been previously shown to be concordant [7] and this may in part be due to how HRQoL benefits are expressed to patients. Clinicians and patients may find it easier to agree on relative quantities than absolute differences. The pilot survey results may support relative quantities. The choice between interpretations such as: “your diarrhoea will be worse with the new treatment by 15 points, on average” instead of: “the likelihood of diarrhoea with the new treatment is significantly higher by about 11 % compared to placebo”, is a matter of preference, but the latter may be appealing for some. Aligning understanding of smaller effect sizes is increasingly important with the emergence of novel treatments for lung cancer being compared with each other (and not just placebo).
There are several advantages and disadvantages of both MDs and ORs. First, ORs evaluate relative (instead of absolute) treatment effects. For objective endpoints, absolute differences (e.g. 4 vs 3 months survival) may provide easier interpretations of treatment benefits (although the effects are median and not mean differences in cancer trials). However, HRQoL are selfreported endpoints for which even the most experienced clinician has difficulty interpreting. For such endpoints, a relative scale may be more useful. If treatment effects from primary endpoints are judged by relative quantities (e.g. hazard ratios), there are no reasons why treatment effects from HRQoL endpoints should not also be assessed this way. Both survival time and HRQoL share some similar distributional properties (e.g. skewed or censored). There is some concern that effects near the boundaries (floor/ceiling) will be overvalued with ORs compared to effects around the middle. However, such concerns can be addressed through the use of zero–one inflated models (Khan, 2014) [25] which model the over/under dispersion.
Secondly, the OR model assumes a fixed odds ratio over time (i.e. the effect is constant over time), which may not hold in a longitudinal QoL setting. Reliable interpretation of MDs also depends on an absence of treatment by time interactions (i.e. ORs and MDs are not dependent on specific time points). Thirdly, statistical models for MDs will provide predicted patient level HRQoL responses. For example, a patient taking experimental treatment with a certain demographic profile might yield a predicted PF score (e.g. 5 points). Similarly, a model for estimating ORs can be used to predicted a probability of a achieving a specific PF score for a given patient (group of patients) on the experimental treatment (response curves are advocated by the FDA for patient reported outcomes) [30].
The suggested effect size of >10 units on the QLQC30 was proposed almost two decades ago when fewer treatment comparators were available [15]. Few (about 2 %) MDs were >10 points and this research confirms earlier conclusions that small changes in HRQoL can be important (Cella, 2002) [7, 31]. Importantly, the implications of skewed distributions were not factored in when the magnitude of effect sizes were defined in earlier research.
There are several strengths and limitations of this analysis. First, a large sample size is used from clinical trials in similar groups of patients. Secondly, established criteria for classifying effect sizes were used for MDs [7]. Third, the BB model is a robust approach to analysing skewed data with ceiling effects, without arbitrary dichotomisation of responses. Finally, interpreting ORs is similar to that of HRs which many oncologists are familiar with.
Although the BB approach offers an alternative approach to analyse and interpret HRQoL effects, it is more complex. The complexity is outweighed by the benefits of reliable and potentially easier to interpret estimates of effect. A further limitation is that analysis has been restricted to lung cancer patients, but can be applied to other tumour types and disease areas. The classifications suggested for ORs in this analysis are arbitrary (even if based on the observed data) and different results can occur with alternative categories. Definition of effect sizes may require some threshold to be set which may necessarily be subjective. However, a starting point in our view is that the most appropriate metric is used to present HRQoL effects in cancer patients, an area for further research. The initial survey results too should also be confirmed in a larger sample size.
Treatment effects for HRQoL from the QLQC30 should be reported using relative quantities such as ORs which appear to be clinically intuitive, easier to interpret and where analysis involves modelling the skewed distribution of responses.
Highlights
The highlights of this paper are:

Mean differences in HRQoL are difficult to interpret for clinicians and patients alike, especially when the difference is small.

An alternative measure to reporting and interpreting HRQoL treatment differences using a relative quantity such as an odds ratio can greatly facilitate patient –clinician understanding of a ‘relevant’ HRQoL improvement.

We offer a way in which mean differences in HRQoL can be interpreted as approximate odds ratios. Effect sizes are categorized as ‘Trivial, ‘Small’ ‘Medium’ and ‘Large’ for odds ratios in a similar way to mean differences

Although the BB approach offers an alternative approach to analyse and interpret HRQoL effects, it is more complex. The complexity is outweighed by the benefits of reliable and potentially easier to interpret estimates of effect.

Our approach will allow patients and clinicians to align their understanding of treatment benefits using HRQoL outcomes.
References
 1.
Stephens RJ, Hopwood P, Girling DJ, Machin D. Randomized trials with quality of life endpoints: are doctors' ratings of patients' physical symptoms interchangeable with patients' selfratings? Qual Life Res. 1997;6(3):225–36.
 2.
Lemonnier I, Guillemin F, Arveux P, ClémentDuchêne C, Velten M, WoronoffLemsi MC, et al. Quality of life after the initial treatments of nonsmall cell lung cancer: a persistent predictor for patients' survival. Health Qual Life Outcomes. 2014;12:73. doi:10.1186/147775251273.
 3.
Cancer Research UK. (2014a). Lung cancer Key Facts. Available: http://www.cancerresearchuk.org/cancerinfo/cancerstats/keyfacts/lungcancer/cancerstatskeyfactsonlungcancerandsmoking. Last accessed 10th Oct 2015.
 4.
Brazier, J.E., Rowen, D. NICE DSU Technical Support Document 11: Alternatives to EQ5D for generating health state utility values; 2011. Available from http://www.nicedsu.org.uk.
 5.
Khan. I; Design & Analysis of Clinical Trials For Costeffectiveness & Reimbursement; 315 Pages Chapman & Hall; 2015 (in press)
 6.
Scott NW, Fayers PM, Aaronson NK, Bottomley A, de Graeff A, Groenvold M, et al. EORTC QLQC30 Reference Values. Brussels: EORTC Quality of Life Group Publications; 2008.
 7.
Cocks K, King MT, Velikova G, Martyn StJames M, Fayers PM, Brown JM. Evidencebased guidelines for determination of sample size and interpretation of the European Organisation for the Research and Treatment of Cancer Quality of Life Questionnaire Core 30. J Clin Oncol. 2011;29(1):89–96. doi:10.1200/JCO.2010.28.0107. Epub 2010 Nov 22.
 8.
Maringwa J, Quinten C, King M, Ringash J, Osoba D, Coens C, et al. Minimal clinically meaningful differences for the EORTC QLQC30 and EORTC QLQBN20 scales in brain cancer patients. Ann Oncol. 2011;22(9):2107–12. doi:10.1093/annonc/mdq726. Epub 2011 Feb 15.
 9.
Rudd RM, Gower NH, Spiro SG, Eisen TG, Harper PG, Littler JA, et al. Gemcitabine plus carboplatin versus mitomycin, ifosfamide, and cisplatin in patients with stage IIIB or IV nonsmallcell lung cancer: a phase III randomized study of the London Lung Cancer Group. J Clin Oncol. 2005;23(1):142–53. Study 11.
 10.
Langendijk JA, ten Velde GP, Aaronson NK, de Jong JM, Muller MJ, Wouters EF. Quality of life after palliative radiotherapy in nonsmall cell lung cancer: a prospective study. Int J Radiat Oncol Biol Phys. 2000;47(1):149–55.
 11.
Reck M, von Pawel J, Macha HN, Kaukel E, Deppermann KM, Bonnet R, et al. Efficient palliation in patients with smallcell lung cancer by a combination of paclitaxel, etoposide and carboplatin: quality of life and 6years'followup results from a randomised phase III trial. Lung Cancer. 2006;53(1):67–75. Epub 2006 May 19.
 12.
Norman GR, Sloan JA, Wyrwich KW. Interpretation of changes in healthrelated quality of life: the remarkable universality of half a standard deviation. Med Care. 2003;41(5):582–92.
 13.
Anota A, Hamidou Z, PagetBailly S, Chibaudel B, BascoulMollevi C, Auquier P, et. al., Time to healthrelated quality of life score deterioration as a modality of longitudinal analysis for healthrelated quality of life studies in oncology: do we need RECIST for quality of life to achieve standardization? Qual Life Res. 2013 Nov 26 doi:10.1007/s1113601305836.
 14.
Schaake W, de Groot M, Krijnen WP, Langendijk JA, van den Bergh AC. Quality of life among prostate cancer patients: a prospective longitudinal populationbased study. Radiother Oncol. 2013;108(2):299–305. doi:10.1016/j.radonc.2013.06.039. Epub 2013 Aug 8.
 15.
Osoba D, Rodrigues G, Myles J, Zee B, Pater J. Interpreting the significance of changes in healthrelated qualityoflife scores. J Clin Oncol. 1998;16(1):139–44.
 16.
King MT. The interpretation of scores from the EORTC quality of life questionnaire QLQC30. Qual Life Res. 1996;5(6):555–67.
 17.
Hirsh V. Are the data on quality of life and patient reported outcomes from clinical trials of metastatic nonsmallcell lung cancer important? World J Clin Oncol. 2013;4(4):82–4. doi:10.5306/wjco.v4.i4.82.
 18.
Lee SM, Khan I, Upadhyay S, Lewanski C, Falk S, Skailes G, et al. Firstline erlotinib in patients with advanced nonsmallcell lung cancer unsuitable for chemotherapy (TOPICAL): a doubleblind, placebocontrolled, phase 3 trial. Lancet Oncol. 2012;13(11):1161–70. doi:10.1016/S14702045(12)704126. Epub 2012 Oct 16.
 19.
Maguire J, Khan I, McMenemin R, O’Rourke N, McNee S, Kelly V, et al. SOCCAR: A randomised phase II trial comparing sequential versus concurrent chemotherapy and radical hypofractionated radiotherapy in patients with inoperable stage III NonSmall Cell Lung Cancer and good performance status. Eur J Oncol. 2014;50(17):2939–49.
 20.
Lee SM, James LE, Qian W, Spiro S, Eisen T, Gower NH, et al. Comparison of gemcitabine and carboplatin versus cisplatin and etoposide for patients with poorprognosis small cell lung cancer. Thorax. 2009;64(1):75–80. doi:10.1136/thx.2007.093872. Epub 2008 Sep 11.(Study 10).
 21.
Lee SM, Woll PJ, Rudd R, Ferry D, O'Brien M, Middleton G, et al. Antiangiogenic therapy using thalidomide combined with chemotherapy in small cell lung cancer: a randomized, doubleblind, placebocontrolled trial. J Natl Cancer Inst. 2009;101(15):1049–57. doi:10.1093/jnci/djp200. Epub 2009 Jul 16 (study 12).
 22.
Lee SM, Rudd R, Woll PJ, Ottensmeier C, Gilligan D, Price A, et al. Randomized doubleblind placebocontrolled trial of thalidomide in combination with gemcitabine and Carboplatin in advanced nonsmallcell lung cancer. J Clin Oncol. 2009;27(31):5248–54. doi:10.1200/JCO.2009.21.9733. Epub 2009 Sep 21.
 23.
Ferrari SLP, CribariNeto F. Beta regression for modelling rates and proportions. J Appl Stat. 2004;31(7):799–815.
 24.
Swearingen CJ, Castro, M.S.M. and Bursac, z. Inflated beta regression: Zero, one and everything in between. SAS Global Forum. 2012. http://support.sas.com/resources/papers/proceedings12/3252012.pdf. Access 10th Oct 2015.
 25.
Khan I, Morris S. A nonlinear betabinomial regression model for mapping EORTC QLQ C30 to the EQ5D3L in lung cancer patients: a comparison with existing approaches. Health Qual Life Outcomes. 2014;12:163.
 26.
Ospina R, Ferrari SLP. A general class of zeroorone inflated beta regression models. Comput Stat Data Anal. 2012;56:1609–23.
 27.
Feddern ML, Jensen TS, Laurberg S.Chronic pain in the pelvic area or lower extremities after rectal cancer treatment and its impact on quality of life: a populationbased crosssectional study Pain. 2015 Sep;156(9):1765–71. doi:10.1097/j.pain.0000000000000237.
 28.
Chie WC, Yu F, Li M, Baccaglini L, Blazeby JM, Hsiao CF, et. al., Qual Life Res. 2015 Oct;24(10):2499–506. doi:10.1007/s1113601509858. Epub 2015 May 6.
 29.
Kurita GP, Lundström S, Sjøgren P, Ekholm O, Christrup L, Davies A, et al. Renal function and symptoms/adverse effects in opioidtreated patients with Cancer. Acta Anaesthesiol Scand. 2015;59(8):1049–59. doi:10.1111/aas.12521. Epub 2015 May 5.
 30.
McLeod LD, Coon CD, Martin SA, Fehnel SE, Hays RD. Interpreting patientreported outcome results: US FDA guidance and emerging methods. Expert Rev Pharmacoecon Outcomes Res. 2011;11(2):163–9.
 31.
Cella D, Hahn EA, Dineen K. Meaningful change in cancerspecific quality of life scores: differences between improvement and worsening. Qual Life Res. 2002;11(3):207–21.
Acknowledgments
I would like to acknowledge the chief investigators of the trials for kindly agreeing to using the data (Prof. Siow MingLee for TOPICAL, Study 10, Study 12, Study 14; Prof. Robin Rudd for Study 11and Dr. Joe Maguire for the SOCCAR study). In addition CRUK and the London Lung Cancer Group (LLCG) are also acknowledged for funding the trials which led to this analysis being possible.
The data from this research came from Cancer Research UK Funded trials and the London Lung Cancer Group with Grant numbers: C11922/A4558 (TOPICAL), C1438/A4147 (SOCCAR), 1074994 (Study 10), 1074994 (Study 11) C1438/A2932 (Study 12) and an educational grant from Eli LIlly (Study 14).
Data sharing
No additional data available beyond what is reported in this manuscript.
Author information
Additional information
Competing interests
The authors have no conflict of interest to declare.
Authors’ contributions
Iftekhar Khan conceived the idea, performed the statistical analysis, wrote and interpreted the results and the text of the manuscript. Martin Forster and Zahid Bashir reviewed the text of the manuscript. All authors read and approved the final manuscript.
Additional files
Additional file 1:
Short survey questionnaire. (DOC 45 kb)
Additional file 2:
Supplementary tables and figures. Figure S1: Distribution of QLQC30 responses. a)SOCCAR, b)Study 10, c)Study 11, d)Study 12, e)Study 14. (xaxis is QLQC30 score on a scale of 0 to 1 and  y axis is relative frequency). Figure S2: Plot of Odds Ratios vs. MDs for all 15 domains (all trials). a) Overall, b) Functional Domain, c) Symptom Domain. Vertical reference lines are ±10 points for MDs and zero (no effect line). Horizontal reference lines are 1 (no effect) and 0.8 and 1.20 for ORs. Figure S3(a): Comparison of erlotinib vs placebo responses for EF in TOPICAL trial. (b): Comparison of erlotinib vs placebo responses for CO in TOPICAL trial. Example showing MDs not statistically different but ORs statistically significant: Higher proportion of placebo responses for lower scores and higher proportion of erlotinib responses in some categories of emotional function (EF) >0.6 (or 60); distribution is skewed. (DOC 955 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Khan, I., Bashir, Z. & Forster, M. Interpreting small treatment differences from quality of life data in cancer trials: an alternative measure of treatment benefit and effect size for the EORTCQLQC30. Health Qual Life Outcomes 13, 180 (2015) doi:10.1186/s1295501503746
Received
Accepted
Published
DOI
Keywords
 EORTCQLQC30
 Lung cancer
 Quality of life
 Beta binomial
 Treatment effect size
 MD: Mean Differences
 ORs: Odds Ratios