Correlated physical and mental health composite scores for the RAND-36 and RAND-12 health surveys: can we keep them simple?

Background The RAND-36 and RAND-12 (equivalent to versions 1 of the SF-36 Health Survey and SF-12 Health Survey, respectively) are widely used measures of health-related quality of life. However, there are diverging views regarding how to create the physical health and mental health composite scores of these questionnaires. We present a simple approach using an unweighted linear combination of subscale scores for constructing composite scores for physical and mental health that assumes these scores should be free to correlate. The aim of this study was to investigate the criterion validity and convergent validity of these scores. Methods We investigated oblique and unweighted RAND-36/12 composite scores from a random sample of the general Norwegian population (N = 2107). Criterion validity was tested by examining the correlation between unweighted composite scores and weighted scores derived from oblique principal component analysis. Convergent validity was examined by analysing the associations between the different composite scores, age, gender, body mass index, physical activity, rheumatic disease, and depression. Results The correlations between the composite scores derived by the two methods were substantial (r = 0.97 to 0.99) for both the RAND-36 and RAND-12. The effect sizes of the associations between the oblique versus the unweighted composite scores and other variables had comparable magnitudes. Conclusion The unweighted RAND-36 and RAND-12 composite scores demonstrated satisfactory criterion validity and convergent validity. This suggests that if the physical and mental composite scores are free to be correlated, the calculation of these composite scores can be kept simple.


Background
The RAND-36, and its brief version, the RAND-12 (equivalent to version 1 of the SF-36 Health Survey and SF-12 Health Survey, respectively), are freely available and widely used measures of generic health-related quality of life (HRQoL) [1][2][3][4]. HRQoL refers to "how health impacts on an individual's ability to function and his or her perceived well-being in physical, mental and social domains of life" [3]. The RAND-36/12 provides data on eight subscale scores and two composite scores of physical and mental health. The use of composite scores has become quite popular as they can simplify the interpretation of the findings [2]. However, despite the widespread use of the RAND-36/12 composite scores, the choice of method for constructing them has been a controversial issue for decades [5][6][7][8][9][10].
Originally, Ware et al. [2,11] provided algorithms for constructing composite scores based on orthogonal principal component analysis (PCA) to create a physical composite summary (PCS) and a mental composite summary (MCS) for version 1 of the SF-36/12 Health Surveys. They aimed to create pure PCS and MCS scores with little overlapping variance. To achieve this, all the scales/items must be included in the two composite scores but have different weights. However, this orthogonal approach has been criticized for producing inconsistencies between the composite scores and the observed data [5][6][7]9].
Thus, a range of alternative scoring algorithms has been developed that do not restrict the correlations between the PCS and MCS [12]. One of the best documented alternatives to the orthogonal PCS and MCS was published by Farivar et al. in 2007, using oblique PCA to create the RAND-36/12 composite scores, which allowed correlations between them [7]. Overall, approaches such as this seem to be less prone to produce inconsistencies with the observed data [5][6][7].
On the other hand, a correlated PCS and MCS might not be without limitations. For example, a PCS and MCS from oblique PCA tend to be very strongly correlated, inducing multicollinearity [8]. Another issue is that the weights from oblique PCA fluctuate according to sample characteristics, making standardization across samples problematic [12,13]. Furthermore, several authors have advocated the use of weights from confirmatory factor analysis (CFA) to create a PCS and MCS that are permitted to correlate [14,15]. However, using CFA to construct composite scores can be problematic from a theoretical point of view, as a composite score, by nature, is a multidimensional construct [13,16].
Hence, there are many alternatives to an orthogonal PCS and MCS for the RAND-36/12, making it unclear for researchers to decide which one to use [6,12]. It has been argued that we often tend to make HRQoL scores unnecessarily complicated, for example, by using weighted scores [17]. Thus, simple unweighted composite scores have been proposed for the RAND-36/12. Such simple composite scores show promising criterion validity [5,18]. This is not surprising given the strong correlations among the indicators of the RAND-36/12 composite scores. Weighting is probably of little value under such conditions [19], a logic that also applied when the original developers decided not to use weights for the SF-36 subscores [20]. However, data on the convergent validity for the unweighted RAND-36/12 is lacking in studies that have used unweighted composite scores for them [5,18]. This is a limitation, as convergent validity is a crucial part of evaluating psychometric properties [21].
We present a simple approach to construct unweighted composite scores for the RAND-36/12, which implicitly assumes that these scores should be allowed to correlate. The composite scores were created by a linear combination of (1) The four subscales that have shown to be primarily indicative of physical health (physical functioning, physical role functioning, bodily pain and, general health), and (2) The four subscales that have shown to be primarily indicative of mental health (vitality, social functioning, emotional role functioning, and mental health). The aim of this study was to investigate the criterion and convergent validity of these scores by comparing them to established oblique composite scores. We hypostatized that unweighted and oblique composite scores would be highly correlated with each other and demonstrate equal convergent validity.

Design and study participants
We reused data from a representative survey of the general population of Norwegian adults aged 18-79 years. The methods have been described in detail previously [22]. In brief, the sample consisted of 2107 persons (36% response rate) who completed the Norwegian version of the RAND-36 (equivalent to the SF-36 version 1) as a postal questionnaire in 2015. All the items in the RAND-12 were taken directly from the RAND-36.

Demographic and other variables
We included self-reported data on age (10-year intervals), gender (women, men), marital/cohabitation status (no, yes), education (elementary school, high school, university < 4 years, and university ≥ 4 years), strenuous physical activity habits (never, less than 1 h per week, 1-2 h per week, and ≥ 3 h per week), self-reported height and weight (body mass index), and self-reported history of being diagnosed with a rheumatic disease or depression (no, yes) [22].

RAND-36 measures and scoring
Oblique RAND-36 PCS and MCS composite scores were created using the method, including the scoring coefficients described by Farivar et al. [7]. First, all the items were standardized into 0-100 scores. Second, eight subscales were created based on the mean scores of items belonging to the same scale: physical functioning (10 items), physical role functioning (4 items), bodily pain (2 items), general health (5 items), vitality (4 items), social functioning (2 items), emotional role functioning (3 items), and mental health (5 items). The subscale scores ranged from 0 to 100, with higher scores indicating better HRQoL. Third, eight z-scores were made using the mean and standard deviations of SF-36 subscales from a 1998 US norm population, described in the manual of Ware et al. [2]. Forth, the z-scores were weighted using published scoring coefficients from oblique factor analysis. The only difference from the method of Ware et al. [2] and the present one is that we weighted the composite scores based on published scoring coefficients derived from the oblique PCA analysis in the study of Farivar et al. [7]. Finally, T-scores were made with a mean of 50 (SD = 10) representing the average scores in the US norm population [2].
The unweighted RAND-36 PCS and MCS composite scores were based on the original subscales, ranging from 0 to 100. Previous studies have shown that four subscale scores predominantly reflect physical health, while four others predominantly reflect mental health [6]. Thus, the unweighted RAND-36 PCS was created by adding the subscale scores for physical functioning, physical role functioning, bodily pain, and general health and dividing the sum by 4. The unweighted RAND-36 MCS was created by adding the subscale scores for vitality, social functioning, emotional role functioning, and mental health and dividing the sum by 4. This is quite similar to the RAND-HSI scoring but without the weights [6]. The unweighted RAND-36 PCS and MCS ranged from 0 to 100, with higher scores indicating better HRQoL.

RAND-12 measures and scoring
The oblique RAND-12 PCS and MCS composite scores were also created by the method of Farivar et al. [7]. The scoring was based on regressing the oblique RAND-36 PCS and MCS T-scores in separate models for the RAND-12 items. From these results, weighted dummy variables were used to create RAND-12 PCS and MCS T-scores, with higher scores indicating better HRQoL.
The unweighted RAND-12 PCS and MCS composite scores were created by standardizing the 12-items to 0-100 scores, in the same way as done for the RAND-36. Eight subscales were created, based on mean scores of items belonging to the same scale: physical functioning (2 items), physical role functioning (2 items), bodily pain (1 item), general health (1 item), vitality (1 item), social functioning (1 item), emotional role functioning (2 items), and mental health (2 items). Subscale scores ranged from 0 to 100, with higher scores indicating better HRQoL. The unweighted RAND-12 PCS score was created by adding the subscale scores for physical functioning, physical role functioning, bodily pain, and general health and dividing the sum by 4. The unweighted RAND-12 MCS score was created by adding the subscale scores for vitality, social functioning, emotional role functioning, and mental health and dividing the sum by 4. The unweighted RAND-12 PCS and MCS scores ranged from 0 to 100, with higher scores indicating better HRQoL.

Statistics
Characteristics of the study population are presented as means and standard deviations or raw numbers and percentages. Descriptive statistics of the RAND-36/12 composite scores were based on z-scores (means = 0 and standard deviations = 1) because the different versions of the composite scores are not on the same metric. We present features of score distributions by medians, modes, floor, and ceiling effects (percentages), item-total correlations corrected for overlap, along with values for skewness and kurtosis. Floor or ceiling effects might be a problem if ≥ 15% of respondents obtain the worst or best possible score [23]. The corrected item-total correlations were calculated for the unweighted RAND-36/12 composite scores: PCS (physical functioning, physical role functioning, bodily pain, and general health) and MCS (vitality, social functioning, emotional role functioning, and mental health), with values ≥ 0.4 indicating scores that consist of highly correlated variables [20]. Values of skewness ≥ 2 and kurtosis ≥ 7 suggest distributions of scores that begin to depart substantially from normality [24]. Associations between subscale scores and the PCS and MCS composite scores were examined using Pearson correlations or Spearman rank correlations. Criterion validity was examined by Pearson correlations between unweighted composite scores and scores derived from the oblique factor scoring coefficients [7]. Based on previous research and theory, the correlations between the unweighted and oblique RAND-36/12 composite scores measuring the same construct should be ≥ 0.95 [5,18,21]. Convergent validity was examined using Spearman rank coefficients between the composite scores and variables known to be related to HRQoL: age (years, continuous); sex (women = 0, men = 1); body mass index (units, continuous); physical activity (strenuous physical activity: never = 0, less than 1 h per week = 1, 1-2 h per week = 2, ≥ 3 h per week = 3), rheumatic disease (no = 0, yes = 1), and depression (no = 0, yes = 1) [2,11,[25][26][27][28]. A correlation ≥ 0.2 regarding convergent validity suggest an effect size that might be of practical importance [29]. SPSS version 27 was used to perform the statistical analyses (IBM Corporation).

Results
The characteristics of the study participants are presented in Table 1. Table 2 shows that the means and standard deviations of unweighted RAND-36 versus RAND-12 scores (0-100) were not directly comparable, with the RAND-12 composite scores being somewhat lower. Descriptive statistics of the RAND-36/12 composite z-scores and features of score distributions showed approximately similar results for unweighted and oblique scores (Table 3). An exception was that the unweighted RAND-12 compositive scores had slightly higher ceiling values, although none of them were above ≥ 15%. The corrected item-total correlations for the unweighted RAND-36/12 composite scores ranged from 0.52 to 0.74, indicating scores that consist of highly correlated variables. The correlations between the RAND-36/12 composite scores and the respective subscale scores showed a pattern with oblique PCS scores having stronger correlations with the subscales representing mental health and vice versa (Tables 4-5). The correlations between the composite scores derived from the two methods were very strong (r = 0.97 to 0.99) for both the RAND-36 and RAND-12 ( Table 6). The correlations between the PCS and MCS derived by the two methods were weaker for the unweighted method than for the oblique method for both the RAND-36 (r = 0.61 vs. r = 0.78) and RAND-12 (r = 0.58 vs. r = 0.79). The effect sizes of the associations between the oblique versus unweighted composite scores and other variables had comparable magnitudes, indicating similar convergent validity (Table 7).

Discussion
We found strong correlations between the composite scores derived by the two methods for both the RAND-36 and RAND-12 and that the effect sizes of the associations between the oblique versus the unweighted composite scores and other variables had comparable magnitudes, also indicating similar convergent validity. The features of score distributions of the corresponding composite scores showed approximately similar results, except for unweighted RAND-12 composite scores having slightly higher ceiling effects.
To the best of our knowledge, this is the first study to report both the criterion validity and convergent validity of unweighted RAND-36/12 composite scores. However, two prior studies have reported the criterion validity of the RAND-36 or RAND-12 composite scores using two other methods for constructing unweighted scores. Grassi et al. [30] used data from the European Community Respiratory Health Survey and compared SF-36 composite scores derived from oblique PCA with those from an unweighted scoring system. The unweighted PCS was calculated as the sum of 18 items, while the MCS included 19 items. The correlation between the oblique and unweighted PCS was 0.97, and 0.96 between oblique and unweighted MCS. The correlation between the unweighted PCS and MCS was 0.61.
Hagell et al. [5] applied data from people with Parkinson's disease and stroke to compare SF-12 composite scores derived from the RAND-12 HSI algorithm  The scoring methods in these two studies differed slightly from ours by using the sum of items to create raw scores, while we used unweighted linear combinations of subscale scores based on items that were standardized, ranging from 0 to 100. We think that a two-step method that initially scores the subscales and then uses them to create composite scores is more intuitive, considering   that the subscales have a different number of items. However, the practical difference between our approach and the two other unweighted approaches for scoring composite scores seems to be minor. These findings are not surprising, given the strong correlations between the items that contribute to the RAND-36/12 composite scores. We found that the correlations between the unweighted RAND-36/12 PCS and MCS were weaker than those created from oblique PCA. A reason for this is that oblique PCA produces weights for creating PCS and MCS that increase the correlation between these scores [7]. In the unweighted approach, no restraints are imposed, and the PCS and MCS are completely free to correlate. This could be a strength favouring unweighted RAND-36/12 composite scores, as correlations approaching 0.80 may induce multicollinearity if the PCS and MCS are used as independent variables in the same model [31].
Regarding convergent validity, the associations between the oblique versus the unweighted RAND-36/12 composite scores and other variables had comparable magnitudes. An exception was that age was more strongly correlated with the unweighted PCS scores than the oblique ones. This could reflect that the oblique PCS scores were based on all subscales being either negatively, neutral, or positively correlated with age. There also seems to be a subtle tendency for the oblique PCS and MCS to have more similar effect sizes than the  unweighted PCS and MCS. This probably reflects the stronger correlations between the oblique PCS and MCS. The strengths of this study include a sufficiently large sample from a general population, and that convergent validity was examined. A limitation of the study is that weight, height, physical activity, rheumatic disease, and depression were assessed by self-reports. However, the included measures have been shown to have acceptable validity [32][33][34]. Second, our cross-sectional design did not allow us to study longitudinal changes in unweighted versus weighted composite scores. Thus, differences in the responsiveness of change should be explored in future studies.
The main implication of this study is that we can keep the calculation of the RAND-36/12 composite scores simple. This has several advantages, such as the standardization of scoring across studies and populations. In this paper, we calculated composite scores ranging from 0 to 100, but the data can easily be converted to T-scores. An advantage of the previous scoring approaches is good comparability between PCS-12 and PCS-36 and between MCS-12 and MCS-36 [2,7]. Such comparability is not seen with our current proposed scoring method. However, similar comparability could be derived by regressing the unweighted RAND-36 PCS and MCS T-scores in separate models for the two unweighted RAND-12 composite scores. This could be explored in a future study using a cross-validated design. It should be emphasized that our findings do not imply that weighted composite scores of HRQoL are never useful or that prior studies using different oblique composite scores for the RAND-36/12 have led to erroneous results. However, we propose that weighting is likely to be redundant if analyses of composite scores show corrected item-total correlations values ≥ 0.4. This knowledge should also be useful to consider when developing composite scores for new HRQoL instruments.

Conclusions
In conclusion, the unweighted RAND-36/12 composite scores demonstrated satisfactory validity. Consequently, the calculation of these composite scores can be kept simple when we want them to be free to correlate. Future studies should examine the external validity of our findings and the sensitivity of the change in unweighted versus weighted composite scores in different populations.