### The model

The SF-36 assesses health across eight dimensions using 36 items. The SF-36 produces a score on a 0–100 scale for each of the eight dimensions, which are specific health domains such as physical functioning, social functioning and vitality. These scores are not comparable across dimensions and are not based on individual preferences, therefore they cannot be used to generate QALYs. The SF-36 can be used to generate a preference-based index via the SF-6D [5].

The EQ-5D is the most widely used generic preference-based measure of health-related quality of life which produces utility scores anchored at 0 for dead and 1 for perfect health. The utility scores represent preferences for particular health states. The descriptive system has 5 dimensions (mobility, self-care, usual activity, pain/discomfort and anxiety/depression) and 3 levels (no problems, some problems, extreme problems) which create 243 unique health states. This study uses the UK TTO value set in its main analysis [6]. The EQ-5D valued using the UK TTO value set is preferred by NICE [1]. The SF-6D has been found to differ from the EQ-5D [7] and so to achieve comparability between studies using different measures this paper explores an alternative strategy of mapping.

### Model specifications

Regression analysis is used to examine the relationship between the EQ-5D utility score and the SF-36 using the 8 dimension scores; physical functioning, role-physical, bodily pain, general health, vitality, social functioning, role-emotional and mental health, squared dimension scores and interaction terms derived using the product of two dimension scores. The dependent variable, the EQ-5D utility score, is measured on a -1 to 1 scale. The 8 dimension scores of the SF-36 are rescaled onto a 0–1 scale to enable easier interpretation of the results and the squared terms and interaction terms are generated using the rescaled scores.

Three models are estimated: (1) all dimensions; (2) all dimensions and squared terms; (3) all dimensions, squared terms and interactions. The general model is defined as

where *i* = 1,2,..., *n* represents individual respondents and *j* = 1,2,..., *m* represents the 8 different dimensions. The dependent variable, *y*, represents the EQ-5D utility score, **x** represents the vector of SF-36 dimensions, **r** represents the vector of squared terms, **z** represents the vector of interaction terms and *ε*
_{
ij
}represents the error term. This is an additive model which imposes no restrictions on the relationship between dimensions. The squared terms are designed to pick up non-linearities in the relationship between dimension scores and the EQ-5D index. There is no reason for it to be linear and there is evidence in physical functioning, for example, that the same differences in scores at the lower end of the scale indicate larger differences in functioning than at the upper end [8]. Interaction terms are important since there is evidence from other measures that dimensions are not additive [9]. Statistical measures of explanatory power, predictive ability, and model specification are reported.

The sample used here is a patient dataset (described below) where respondents are included each time they are treated, and hence some respondents have multiple observations. Random effects models are used to take account of this data structure. The estimated models are used to generate predicted EQ-5D scores. Predictive ability is assessed using line graphs of the observed and predicted EQ-5D utility scores ordered by observed tariff value of EQ-5D state, mean error, mean absolute error and mean squared error.

EQ-5D utility scores are known to exhibit a ceiling effect, where a large proportion of subjects rate themselves in full health with a utility score of 1, and hence the data can be interpreted as being bounded or censored at 1. Ignoring the bounded nature of the EQ-5D will result in biased and inconsistent estimates, and hence the random effects tobit model is an appropriate alternative [

10]. The tobit model with an upper censoring limit of 1 is defined as

where
is the observed EQ-5D utility score and *y*
_{
i
}is the bounded measure of the EQ-5D score.

However, the tobit model also produces biased estimates in the presence of heteroscedasticity or non-normality [10, 11]. The censored least absolute deviations (CLAD) model is also used here since it produces consistent estimates in the presence of heteroscedasticity and non-normality [10, 12]. STATA version 9 was used for all regression analysis and CLAD was performed using programs written for [13], SPSS version 12 was used for statistical analysis.

### Reliability and robustness

In order to examine whether the estimated relationships are reliable and robust across inpatient and outpatient setting and medical conditions, we estimate model (3) as outlined above for subsets of the sample data^{i}. The model is estimated for inpatients and outpatients and for the medical conditions of neoplasms, diseases of the circulatory system and diseases of the digestive system as measured according to ICD classifications C, I and K respectively.

### Comparison to existing mapping functions

Our models are compared to existing approaches [3, 4, 10] to determine whether their mapping approaches are more or less reliable for a patient dataset. The existing models from the literature are estimated using the published results and algorithms rather than re-estimating the models using our dataset. We take this approach because mapping is used in economic evaluations to estimate the EQ-5D using the SF-36 (or SF-12) when this is the only health status measure that has been included in the trial. Therefore in practical applications the published results and algorithms are used and it is not feasible to re-estimate the model.

Franks et al. [3] regress the EQ-5D utility score on PCS-12 and MCS-12, squared terms and cross-products using OLS. PCS and MCS are the physical and mental component summary scores estimated using factor analysis and shown to contain most of the information contained in the 8 dimensions of the SF-36 [14]. In accordance with this approach PCS-12 and MCS-12 are centred on the means used in the paper [3] and the published coefficients are used to produce predicted EQ-5D utility scores.^{ii} Another study [15] uses similar variables and estimation techniques to [3] in order to predict EQ-5D scores from the SF-12 and hence the model is not analysed here separately.

Gray et al. [4] use a response mapping approach that uses a multinomial logit model to estimate the probability that a respondent will choose a particular level for each dimension of the EQ-5D using responses to the 12 items included in the SF-12 (general health, climbing stairs, moderate activities, accomplish less due to physical health, work limitations, accomplish less due to emotional problems, work carefully, pain interference, calm, energy, down-hearted and low, interference with social activities). Subsequently predicted EQ-5D level responses for each dimension are generated using Monte Carlo simulation methods and the corresponding EQ-5D utility score for that health state is calculated. We use the available algorithm to predict EQ-5D utility scores [4].^{iii}

Sullivan and Ghushchyan [10] regress the US EQ-5D utility score on PCS-12 and MCS-12, the product of PCS-12 and MCS-12 and sociodemographic variables using OLS, tobit and CLAD. It is not appropriate to use the exact model [10] as they use the US-based EQ-5D values [16] rather than the UK-based values [6] and further only report models including sociodemographic variables unavailable in our dataset. Instead we have used the tobit and CLAD estimation techniques suggested in [10] as outlined above and re-estimated the model using our dataset.

### The data

The Health Outcomes Data Repository, HODaR, is a dataset collated by Cardiff Research Consortium. The data is collected from a prospective survey of inpatients and outpatients at Cardiff and Vale NHS Hospitals Trust, which is a large University hospital in South Wales, UK. The survey is linked to existing routine hospital health data to provide a dataset with sociodemographic, health related quality of life and ICD classification data^{iv}. The survey includes all subjects aged 18 years or older and excludes individuals who are known to have died. The survey also excludes people with a primary diagnosis on admission of a psychological illness or learning disability. As well as information on inpatients, the survey includes outpatient clinics on a rotational basis where all patients within the selected clinic are surveyed. The response rate in HODaR prior to October 2003 was around 36% and subsequently strategies were implemented to improve response rates to around 50% [17].

The inpatient sample has 31,236 eligible observations across 27,620 individuals from August 2002 to November 2004, and of these there are 25,783 complete responses across 23,179 individuals for SF-36 and EQ-5D questions and hence this is the sample used here. The outpatient sample has 9,081 eligible observations across 8,610 individuals collected from June 2002 to November 2004, and of these there are 7,465 complete responses across 7,122 individuals. The dataset covers a wider range of conditions and severity than the general population datasets used in existing mapping approaches, and hence may be more similar to datasets used in economic evaluation.