Health Related Quality of Life (HRQoL) outcome measures are being increasingly used in research trials, but less so in routine clinical practice. The interpretation of HRQoL scores raises many issues. [1–7] The scales and instruments used may be unfamiliar to many clinicians and patients, who may be uncertain of the meaning of the scale values and summary scores. [8]

Repeated experience and familiarity with a wide variety of physiological measures such as blood pressure or forced expiratory volume, has allowed clinicians to make meaningful interpretation of the results. [9, 10] In contrast, the meaning of a change in score of x points on a HRQoL instrument is less intuitively apparent, not only because the scale has unfamiliar units, but also because health professionals seldom use HRQoL measures in routine clinical practice.

In clinical trials, where HRQoL instruments are being increasingly used as primary outcome measures, it is simple to determine the statistical significance of a change in HRQoL, but placing the magnitude of these changes in a context that is meaningful for health professionals, patients and other stakeholders (Pharmaceutical and Medical Device Developers, Insurance Payers, Regulators, Governments) has not been so easy. Ascertaining the magnitude of change that corresponds to a minimal important difference would help address this problem. [11] So when determining an important change standard the perspective can influence the assessment approach and the way in which an important difference is determined. [5] The minimal important difference (**MID**), from the patient perspective, can be defined as "*the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient's management*". [9]

Thus, individual change standards are needed to provide meaningful interpretation of HRQoL intervention and treatment effects and to classify patients based on this standard as improved, stable or declined. To date two broad strategies have been used to interpret differences or changes in HRQoL following treatment: [12] distribution based approaches – the effect size (ES); and anchor-based measures – the minimum clinically important difference (MCID).

Distribution based approaches rely on relating the difference between treatment and control groups to some measure of variability. The most popular approach uses Cohen's [13] standardised effect size, the mean change divided by the standard deviation to serve as an "effect size index", that is suitable for sample size estimation. Cohen suggested that standardised effect sizes of 0.2 to 0.5 should be regarded as "small", 0.5 to 0.8 as "moderate" and those above 0.8 as "large". Cohen's effect size may be influenced by the degree of homogeneity or heterogeneity in the sample. Distribution-based methods rely on expressing an effect in terms of the underlying distribution of the results. Investigators may express effects in terms of between-person standard deviation units, within-person standard deviation units, and the standard error of measurement. [2]

Four statistics commonly used to index responsiveness are: [

14]

- 1.

- 2.

- 3.
the standardised response mean; [17]

- 4.
the responsiveness statistic. [18]

The formula for these statistics are as follows, where D = raw score change on measure; SE = standard error of the difference; SD = standard deviation at time 1; SD* = standard deviation of D; SD# = standard deviation of D among stable subjects (those who true status is constant over time):

Paired *t*-statistics = D/SE

Effect size (ES) statistic = D/SD

Standardised response mean (SRM) = D/SD*

Responsiveness statistic = D/SD#

The paired *t*-statistic is best suited to pre-post assessments of interventions of known efficacy. The effect size statistic relates change over time to the standard deviation of baseline scores. The standardised response mean compares change to the standard deviation of change. The responsiveness statistic looks at HRQoL change relative to variability for clinically stable respondents. The effect size statistic ignores variation in change entirely, the *t*-statistic ignores information about variation in scores for clinically stable respondents, and the responsiveness statistic ignores information about variation in scores for clinically unstable responders.

Anchor-based methods examine the relationship between an HRQoL measure and an independent measure (or anchor) to elucidate the meaning of a particular degree of change. Thus anchor-based approaches require an independent standard or anchor that is itself interpretable and at least moderately correlated with the instrument being explored. [2] One anchor-based approach uses an estimate of the MID, the difference on the HRQoL scale corresponding to self-reported small but important change on a global scale.[9]

Norman *et al* mention several problems with the global assessment of change including, that the reliability and validity of the global scale has not been established and that the judgement of change is psychologically difficult. [19] Another limitation of the global rating is that is does not represent a criterion or gold standard for assessment of change and yet we use the global rating as an anchor to define small, medium and large changes. [9, 11]

No single approach to interpretability is perfect. As Guyatt *et al* suggest the use of multiple strategies is likely to enhance the interpretability of any particular instrument. [2] Therefore we used both distribution and anchor-based approaches to try and establish the interpretability of the SF-6D, a new single summary preference-based measure of health derived from the SF-36.

The SF-36 is one of the most widely used HRQoL outcome measures in the world today. It contains 36 questions measuring health across eight dimensions – physical functioning, role limitation because of physical health, social functioning, vitality, bodily pain, mental health, role limitation because of emotional problems and general health. Responses to each question within a dimension are combined to generate a score from 0 to 100, where 100 indicates "good health". [20] Thus, the SF-36 generates a profile of HRQoL outcomes (on up to eight dimensions), which makes statistical analysis and interpretation difficult. [8]

The developers of the SF-36 have suggested that using the general health dimension a five-point difference (on the 0–100 scale) is the smallest score change achievable by an individual and considered as 'clinically and socially relevant'. [21] Angst *et al* found the MCID ranged from 3.3 to 5.3 points on the physical function dimension and 7.2 to 7.8 points on the bodily pain dimension in patients with osteoarthritis of the hip or knee. [22] Hays and Morales also provide information on what a clinically important difference is for the SF-36 scales. They conclude that the MCID for the SF-36 is "typically in the range of 3–5 points", although they also recommend caution in interpreting 3–5 points on the SF-36 dimensions as the MCID. [23]

The method of scoring the SF-36 is not based on preferences. The simple scoring algorithm for the eight dimensions assumes equal intervals between the response choices, and that all items are of equal importance, which may not be appropriate. The SF-6D is a new single summary preference based or utility measure of health derived from the SF36. [24, 25] Empirical work is required to determine what is the smallest change in SF-6D scores that can be regarded as important. We used anchor-based methods to determine the MID for the SF-6D for various datasets.