Correlated physical and mental health summary scores for the SF-36 and SF-12 Health Survey, V.1

Background The SF-36 and SF-12 summary scores were derived using an uncorrelated (orthogonal) factor solution. We estimate SF-36 and SF-12 summary scores using a correlated (oblique) physical and mental health factor model. Methods We administered the SF-36 to 7,093 patients who received medical care from an independent association of 48 physician groups in the western United States. Correlated physical health (PCSc) and mental health (MCSc) scores were constructed by multiplying each SF-36 scale z-score by its respective scoring coefficient from the obliquely rotated two factor solution. PCSc-12 and MCSc-12 scores were estimated using an approach similar to the one used to derive the original SF-12 summary scores. Results The estimated correlation between SF-36 PCSc and MCSc scores was 0.62. There were far fewer negative factor scoring coefficients for the oblique factor solution compared to the factor scoring coefficients produced by the standard orthogonal factor solution. Similar results were found for PCSc-12, and MCSc-12 summary scores. Conclusion Correlated physical and mental health summary scores for the SF-36 and SF-12 derived from an obliquely rotated factor solution should be used along with the uncorrelated summary scores. The new scoring algorithm can reduce inconsistent results between the SF-36 scale scores and physical and mental health summary scores reported in some prior studies. (Subscripts C = correlated and UC = uncorrelated)

36 that has two summary measures: the Physical (PCS- 12) and Mental (MCS-12) Component Summary scores [2]. Higher scores represent better health.
The standard scoring algorithm for the SF-36 and SF-12 version 1 summary measures is based on a factor analytic technique that forces the scores to be orthogonal [2,3]. Figure 1 depicts the conceptual framework on which the orthogonal component summary scores are based. The model assumes that physical and mental health constructs are uncorrelated (Φ = 0). Recent studies have shown inconsistent results between the 8 SF-36 scale scores and the PCS and MCS [4][5][6][7]. For example, a study of 482 patients initiating antidepressant treatment found improvements from baseline to 3 months of 0.28-0.49 SD units on the physical health scales (physical functioning, role limitations due to physical health problems, pain, general health), but the PCSuc was essentially unchanged (from 51 to 50). These patients had large improvements on the emotional well-being scale (1.67 SD) [8]. Taft et.al. concluded that the discrepancies between results for the SF-36 scale scores and component scores are a result of the negatively weighted scales used in the PCS and MCS scoring algorithm [5,6]. The scoring algorithm for PCS includes positive weights for the physical functioning, role-physical, bodily pain, general health and vitality scales and negative weights for the social functioning, role-emotional and emotional well-being scales [3]. The scoring algorithm for MCS includes positive weights for the vitality, social functioning, role-emotional, and emotional well-being scales and negative weights for the physical functioning, role-physical, bodily pain and general health scales [3]. As such, higher mental health scale scores drive the PCS down and higher physical functioning scores drive the MCS down (and vice versa).
The objective of this study is to estimate the SF-36 summary scores (PCS c and MCS c ) from a correlated (oblique) physical and mental health factor solution. In addition, we derive weights that can be used to create SF-12 component summary scores from the correlated factor model (PCS c -12 and MCS c -12). We hypothesize that the correlated factor model will produce better correspondence between the scale and summary scores. The results are compared to those obtained from the standard uncorrelated approach [3]. (Summary scores with a subscript "c" are based on oblique [correlated] factor analysis whereas summary scores with the subscript "uc" are created via orthogonal [uncorrelated] factor analysis.)

Sample
The sample consists of a random selection of patients receiving medical care from the Unified Medical Group Association (UMGA), an independent association of physicians in the western United States [9,10]. Patients were at least 18 years of age or older and had a minimum of Conceptual model for the SF-36 health survey Figure 1 Conceptual model for the SF-36 health survey. Orthogonal (uncorrelated) model assumes the correlation between physical and mental health constructs is fixed at 0 (Φ = 0). Oblique (correlated) model allows correlation between the physical and mental health constructs. δ denotes error terms (uniqueness terms) associated with each scale. Directional associations exist between the physical and mental health and the 8 scales (as indicated by the arrows); however, the associations vary from large (e.g. physical functioning on physical health) to close to zero (e.g., emotional well-being on physical health). one provider visit during the year prior to the data collection period from October 1994 to June 1995. Study participants were mailed $2 cash along with a 12-page questionnaire assessing HRQOL, patient evaluations of health care, utilization and demographic characteristics. Those who had not yet responded were sent a questionnaire two weeks later and were given a reminder telephone call. There were 7,093 respondents, a 59% response rate after adjusting for undeliverable surveys, ineligible respondents, and deceased. Our analysis was conducted on patients who had complete data for the SF-36 (n = 6,931).

Deriving Weights for Correlated SF-36 PCS c and MCS c
The method used here is identical to that used by Ware et al. [3] except the factors were allowed to be correlated. Factor analysis of the 8 SF-36 scale scores with a two-factor oblique rotation was used to estimate the physical and mental health factor scoring coefficients (weights). PCS c was then constructed by multiplying each SF-36 scale zscore by its respective physical factor scoring coefficient and summing the eight products. Similarly, MCS c was created by multiplying each SF-36 scale z-score by its respective mental factor score coefficient and summing the products. The component scores were then transformed so that each had a mean of 50 and a standard deviation of 10 (T-score) in the sample.

Sensitivity Analysis
In order to illustrate the potential differences in scores produced by the weights derived from the uncorrelated versus correlated factor analysis, we determined summary scores if the scales that load heavily on physical health (physical health, role physical, bodily pain, general health) have z-scores of 1 and the scales that load heavily on mental health (vitality, social functioning, role-emotional and emotional well-being) have z-scores equal to 0.3. Then we determined the summary scores if the zscores for scales loading heavily on physical health are equal to 0.3 and z-scores for scales loading heavily on mental health are equal to 1.

Deriving Correlated SF-12 PCS c and MCS c
To derive weights for the SF-12 summary measures, the SF-36 PCS c and MCS c were regressed in separate models on the SF-12 items. Dummy variables were created for each of the response choices of the 12 items, allowing the relationship of each level of each SF-12 item to vary rather than assuming a linear relationship. Following Ware et al. [2], the most favorable response choice for a question was the holdout category. As such, the parameters (weights) estimated are decrements associated with different SF-12 response choices. The predicted values in the models were the PCS c -12 and MCS c -12 scores, respectively.

Results
Thirty-five percent of the sample was male. The majority was Caucasian (80%). The average age was 50 (SD = 18) The majority of the sample had either gone to vocational school, had some college, or completed college (55%) and had a household income greater than $20,000 (77%). Other sample characteristics and average scale and summary scores are given in Table 1. There were no differences between the demographic characteristics (gender, race, age, education, income) of the total respondent sample (n = 7,093) versus the analytic sample (n = 6,931). We also compared adult members in the sampling frame who visited the physician within the last 365 days (n = 1,203,001) and those who returned the questionnaire (n = 7,093). Those who returned a questionnaire tended to be slightly more likely to be older, female, to have hypertension, and to have visited the physician group more recently [10].

Factor Analysis Results
The oblique two-factor solution indicated that role-physical (0.76), physical functioning (0.71), bodily pain (0.66) and general health (0.53) loaded heavily on the physical factor whereas emotional well-being (0.84), role-emotional (0.59), vitality (0.58) and social functioning (0.39) loaded most heavily on the mental factor. The estimated correlation between the two factors was 0.62 ( Table 2).
The factor scoring coefficients produced by the oblique factor solution produced fewer negative numbers than the factor scoring coefficients produced by the orthogonal factor solution used by Ware et al. [3]. For the physical health factor, only emotional well-being had a negative coefficient (-0.03); for the mental health factor, only physical functioning had a negative coefficient (-0.02). The magnitudes of the negative factor scoring coefficients are smaller than those derived in the orthogonal model (Table 3).

Sensitivity Analysis Results
As shown in Table 4, when the SF-36 physical health scale scores are 1 SD and the mental health scales are 0.3 SD above the mean, the PCS uc score is 62.2 (1.2 SD above the mean) and the MCS uc score is 49.6 (equal to the mean). As such, the MCS uc does not reflect the fact that the mental health scales are better than the mean. The alternative scoring algorithm results in a PCS c score that is 1 SD above the mean (60.0) and a MCS c score that is 0.5 SD above the mean (54.6). Similar results were found when the physical health scale scores were 0.3 SD above the mean and the mental health scale scores were 1 SD above the mean, resulting in a PCSuc score of 50.1 (at the mean) and a MCS uc score of 62.8 (1.2 SD above the mean). However, the alternate scoring algorithm produced a PCS c score of 55.1 and a MCS c score of 60.3 (0.5 SD and 1 SD above the mean, respectively). Table 5 lists the SF-12 items, the variable names, the parameters estimated previously from the regression models where the orthogonal PCS uc and MCS uc were regressed on the SF-12 items and the parameters estimated here from regressing the obliquely derived PCS c and MCS c scores on the SF-12 items [2].   *Factor scoring coefficients are weights produced by the variables (the 8 scales) used to construct the factors (physical and mental). Factor scoring coefficients are based on factor loadings and vary by how much a particular variable contributes to the factor. Variables that correlate highly with a factor have large factor loadings and a larger factor score coefficient. Factor scoring coefficients are multiplied by the z-score for the scales and summed in order to obtain the estimated factor scores. n = 6,931 It is informative to compare the parameters estimated for the PCS c -12 and MCS c -12 to those estimated for the PCS uc -12 and MCS uc -12. Since the most favorable response choice for each item is the reference group, the yintercept is the PCS-12 or MCS-12 score for a person who is in the best possible health (respondent selects the most positive response choice for all questions). Hence, the parameters estimated are decrements associated with each response choice for the items. For an individual item, response choices that represent a more favorable health state should have smaller decrements compared to a response choice for a less favorable health states such that we would expect negative coefficients in descending order of magnitude for the response choices of each item. The latter is not the case for four items in the PCS uc -12 model and five items in the MCS uc -12 model. In fact, the parameters estimated are positive, implying an increase in score, if the respondent chooses a non-favorable response choice over the most favorable response choices. These items are denoted with an asterisk ("*" or " + ") in Table 5.

Regression Analysis Results
In the PCS c -12 model, all parameters estimated were negative and in descending order of magnitude except for the response choices for two items (SF2 and EWB3). Similarly, in the MCS c -12 model, three items have higher estimates for less favorable response choices (PF02, PF04, and SF2). The magnitude of the weighting discrepancies are smaller than those obtained in the orthogonal model [2].
Correlations amongst the SF-36 and SF-12 summary measures are similar when the summary measure is derived using the correlated rather the uncorrelated algorithm. The correlation between PCSc and PCSc-12 was 0.98 whereas the correlation between the PCSuc and PCSuc-12 was 0.96. Similarly, the correlation between the MCSc and the MCSc-12 was slightly higher (0.97) than the correlation between the MCSuc and MCSuc-12 (0.96) ( Table 6).

Discussion
The SF-36 is one of the most commonly used HRQOL measures. Summary scores can be used to minimize prob-lems with multiple comparisons. Ware et al. argue that the orthogonal method of developing summary scores is mathematically simpler and makes the interpretation of each scale less complicated compared to the oblique method [11,12]. However, several studies have shown that product-moment correlations between the physical and mental health factors range from 0.32 -0.66, suggesting a moderate to strong correlation between the two components. [13] Summary scores that are forced to be uncorrelated may yield contradictory results compared to the scale scores. Our data demonstrate that this can be problematic if one assesses the significance of summary scores first and then assesses the scale scores only if the summary scores are significant. Alternatively, if the summary scores are presented alone, without the scale scores, the study may fail to detect an effect of an intervention or an important association with physical health, mental health or both. In fact, specific guidance regarding the SF-12 emphasizes the use of the summary scores because of the limitations of the 8 scale scores. [14,15] The present study suggests limitations of the summary scores need to be taken into account, as well.
This paper provides an alternative scoring algorithm for the SF-36 (version 1) and the SF-12 (version 1) physical and mental health summary scores. Our approach to constructing these scores is the same as the approach taken by Ware et al. [2,3] except we allow the physical and mental health constructs to be correlated. By allowing the constructs to be correlated, our results reduce the negative weights that were causing scale and summary score inconsistencies in the scoring algorithm for the uncorrelated SF-36 summary measures. Similarly, our approach reduced the positive weights in scoring algorithm for the uncorrelated SF-12 summary measures that result in weighting discrepancies. Thus, we conclude that by removing the constraint of "uncorrelated factors," it is likely the discrepancies between the scale and composite scores will be reduced.
While this manuscript focused on the method of composite score construction developed by Ware et al. [2,3], it is important to note that an alternative algorithm for the construction of correlated mental health and physical health summary measures exists [16,17]. The RAND-36 method is based on item response theory (IRT) scoring for scale scoring and uses only the 4 scales that are primarily indicative of physical health (physical functioning, role limitations due to physical health problems, pain, general health perceptions) and mental health (emotional wellbeing, role limitations due to emotional problems, social functioning, vitality), respectively, in creating the summary scores. Future research should also examine whether the RAND-36 method resolve inconsistent results between the SF-36 scale scores and the summary scores.
We recognize that there are several limitations inherent to this study. First, our sample includes only those receiving care from UMGA health plans, which may limit generalizability. When comparing the UMGA sample characteristics to those of the general population studied by Ware et al [2,3], there were some differences with respect to age, gender and race between the two samples [1,18]. Second, the majority of the study sites included in this study was from the West Coast which would also limit generalizability. Third, non-responders accounted for 41% of the patients contacted. As such, we do not know if the characteristics of the non-responders are the same as the responders. Hence, while this study derived weights based on one sample, we recommend that a similar approach be applied in other samples including the original sample from the general population that was used to generate the uncorrelated summary scores [18,19]. Lastly, even with the correlated factor solution, there are still some negative factor scoring coefficients.

Conclusion
Summary scores that are forced to be uncorrelated may yield inconsistent results compared to the scale scores from which they are derived. This manuscript provides an alternative approach of deriving summary scores that allows the scores to be correlated. In this sample, the alternate scoring algorithm produced weights for scale scores and items that make it more likely that consistent results will be obtained for summary scores and scale scores. When presenting results from the SF-36 and SF-12 version 1, we recommend presenting the summary scores for the PCS c and MCS c derived from an obliquely rotated factor solution along with the scale scores and uncorrelated summary scores. Future research should be dedicated to deriving a scoring algorithm from an optimal correlated physical and mental health factor solution that is based on the general population, but the scoring algorithm presented in this manuscript can be employed until that is available. Lastly, we recommend that a similar approach be applied to derive summary measures for version 2 of the SF-36 and SF-12.