Comparing the SF-36 and SF-12 in Psychometric Properties as Measuring Quality of Life among Adolescent in China: a Large Sample Cross-sectional Study

Objective: By comparing psychometric properties of the SF-36 and the SF-12, supplied evidence for the election of instruments of the quality of life (QOL) and decision-making processes to promote the Quality of Life of adolescent. Methods: Stratified cluster random sampling was adopted. The Short-Form 36 (SF-36) was used to assess QOL. Pearson Correlation Coefficient was used to show correlation. Cronbach’s Alpha and Construct Reliability (CR) were used to evaluate reliability of SF-36 and the Short-Form 12 (SF-12), Criterion Validity and Average Variance Extracted (AVE, Convergence Validity) for validity. Confirmatory factor analysis was used to calculate load factor for each item, then obtained CR and AVE. The Semejima grade response model (Logistic two-parameter module) in the item response theory was used to estimate the Item Discrimination, Item Difficulty and Item Average Information of each item. Results: 19,428 samples were included in the study. The mean age was 14.78 years (SD=1.77). High correlations between corresponding domains and components of both scales were found. Reliability of sf-36 each domain was better than that corresponding domain of sf-12. Domains of PF, RP, BP, and GH in SF-36 had good construct reliability (CR,>0.6). The Criterion Validities of SF-36 were little higher in some corresponding dimensions except PCS. Convergence validities of SF-12 were higher than SF-36 in PF, RP, BP and PCS. The items of BP, SF, RP and VT in SF-12 had acceptable discriminations of items and higher than in SF-36. The items Average Amounts of Information of BP, VT, SF, RE and MH in SF-36 and SF-12 were poor. Conclusion: Two components (PCS and MCS) measurements of SF-12 appeared to perform at least as well as the SF-36 in cross-sectional settings in adolescence. Some domains, for instance SF and BP, were suitable for adolescents or not need study further.


Introduction
Youth involved identity building; such experiences could shape their attributes and attitudes, leading to risky behaviors in their lives. [1] Due to individual experiences experiments and transformations, the determinants of health and disease for adolescence traversed the social and psychological fields [2]. A deeper understanding of how adolescents view their lives allowed a greater understanding of their health. Health-related quality of life of school adolescents in some international studies was discussed. 'Health-related quality of life' (HRQOL) was a comprehensive model of subjective health, which had covered physical, social, psychological and functional aspects of individual well-being as a multidimensional and subjective construct [3,4]. For the purpose of guiding the organization of resources and decision-making processes to promote the quality of life of adolescent, Understanding the quality of adolescent's life was essential [5,6].
the Short-Form 36(SF-36)was developed and validated as the most appropriate instrument to generic short form health survey for measuring Quality of Life (QOL), which was widely applied to assess important QOL domains in the Medical Outcomes Study [7]. The SF-36 consists of eight QOL domains (PF, physical functioning; RP, role physical; BP, bodily pain; GH, general health; VT, vitality; SF, social functioning; RE, role emotional; MH, mental health) that comprise two summary measures-the physical component summary (PCS, calculated from PF RP, BP, and GH) and the mental component summary (MCS ,calculated from VT, SF, RE, and MH) [8]. One of the major advantages of using the SF-36 is that it allows for QOL scores to be compared to scores in different groups [9], However, because the SF-36 was not originally designed to measure important QOL domains specific to adolescent, some studies presented the SF-36, especially the mental component summary, to be relatively insensitive to variations in different populations over time [10][11][12].
A substantially shorter questionnaire, the SF-12 that was developed by Ware and colleagues utilized a reduced number of items from 36 to 12 for reducing the considerable burden placed on respondents and investigators generically by SF-36 [13,14]. Most of respondents completed the SF-12 in less than a third of the usual time needed to complete the SF-36 [8]. Ware showed the two instruments highly correlated, and about 90% of the variation in both of the physical and mental component summary measures in the SF-36 was explained by the same summary measures of the SF-12 [15]. Subsequent studies that compared the two scales had suggested varying results on account of the disease or health condition of interest. [16][17][18] The SF-12 and SF-36 were available in many languages, and were applied to all kinds of groups, including in adolescence [19]. Although studies had demonstrated that both scales were valid instruments for adolescent, they were rarely used to evaluate QOL of adolescent in china, In other words, few studies had focused on the quality of life of healthy adolescents in china [2,20].
In adolescence, studies surveying the perception of QOL in chronic patients that conducted in hospital or outpatient settings were predominant [21,22]. Otherwise, a recent interest in the study of healthy groups had accreted and been performed in other contexts, such as in school [23,24], because it was beneficial to recognizing and monitoring of adolescents vulnerable to a poor health-related quality of life [25,26]. In some studies, though the SF-12 and SF-36 were used to investigate to the perception of QOL in adolescent, It was unclear which of the two scales was more suitable to adolescent [27].
Thus, our study aimed to evaluate the QOL of adolescent students at school in china by using the SF-36 and SF-12, through comparing psychometric properties of the SF-36 and the SF-12, supplying evidence for the election of instruments of the quality of life and decision-making processes to promote the quality of life of adolescent.

Study design and Sample
Stratified cluster random sampling was adopted [28], firstly, dividing regions by geographical location, and Guangdong, Shanghai, Shenyang, Wuhan, Xi'an and Yunnan represented the south, east, north, central, northwest and southwest regions respectively. These areas were chosen in order to ensure proper representation by including participants from geographically diverse areas. Secondly, middle schools were randomly selected and followed by grade (First grade of junior school to third grade of high school), and all of students enrolled and effectively attending in the selected classes were eligible, except for those with any physical or mental condition that cannot complete questionnaires. Based on the response to individual items comprising that subscale and using a z-score transformation, Scores of each subscale are calculated. Using standard methods, aggregated to estimate physical and mental summary scores [29].
SF-12 component summary scores (eight subscales, PCS-12 and MCS-12) were calculated using SF-12 items embedded in the SF-36 [30]. It had been presented to be equivalent to calculating SF-12 derived from the SF-12 as a standalone questionnaire [16]. All summary scores range from 0-100 where higher scores indicated better QOL.

Statistical analysis
For descriptive analyses, we aimed to show overall demographics, and QOL. We calculated average and standard deviations in QOL scores by SF-36 and SF-12. For testing the relevance of them, Pearson Correlation Coefficient was used to show correlation between subscales of SF-36 and SF-12.
Cronbach's Alpha and Construct Reliability (CR) were used to evaluate reliability of SF-36 and SF-12, and validity indicators were represented by criterion validity and average variance extracted (AVE).
Confirmatory factor analysis was used to calculate load factor for each item, then obtained CR and AVE according to load factors. Criterion validity was expressed by the correlation between the response of each subscale and self-reported health status.
According to the evaluation results of the samples, and taking into account the characteristics of the ordered and multi-category forms of the scale items, the Semejima grade response model (Logistic two-parameter module) in the item response theory was used to estimate the item discrimination, item difficulty and item average information of each item [31]. Multilog 7.03 and Amos 20.0 were used to process data.

Sample characteristics
Of the 20,226 questionnaires received, 798 had no responses on some the SF-36 items. Finally, 19,428 samples were included in the study. The mean age of the sample of respondents was 14.78 years (standard deviation; SD = 1.77), 49.4% (9595) were boys. Among the sf-36 and sf-12 scales, the physical functioning (PF) mean score was the highest, and the role emotional (RE) mean scores was the lowest. The biggest mean difference in scores between the two scales was in the social functioning domain (SF). Corresponding domains of two scales, the role emotional dimension was the most relevant (r = 0.923). Details in Table 1.  Table 2). The load factor results for confirmatory factor analysis that were used to calculate CR and AVE were shown in Fig. 1.   Table 3. It could be seen that the discriminations of items were between 0.45-2.73, with a large gap. The difficulties of items were ascending from the lowest level to the highest level unidirectionally, which meet the difficulty assumptions estimated by the model. Average amount of information of each item was between 0.07-1.02.
In SF-36, the domains of PF, RP, GH and RE had acceptable discriminations of items (> 1), the remaining dimensions were less differentiated, especially BP and SF, probably because for teenagers, there was strong homogeneity between individuals in terms of physical pain and social function. On the other hand, in SF-12, BP, SF, RP and VT had higher discriminations of items than in SF-36.
With reference to relevant literatures, the amount of information measured on the scale > 25 indicated that the quality of the evaluation items was good; the amount of information  Studies showed the two scales discriminated between adolescents with physical and mental health problems and performed well in associating with other clinical criteria [19,32,33]. A study of 31,357 adolescents in Hong Kong showed the two components and a single general health component of the standard Chinese SF-12 were appropriate health indicators for Chinese adolescents [20]. Studies have also shown that the SF-12 correlated highly with SF-36 in obese and non-obese patients [3,4].
However, many problems still existed, such as high correlation between two components, low internal reliability and ceiling effect of individual domain [34]. Comparing the SF-12 and SF-36, previous studies in patients with specific diseases or health conditions have generally found moderate to high correlations between corresponding domains and components of both scales [18,35]. Our study also demonstrated these correlations. Since the SF-12 is embedded in the SF-36, we expected reasonably high correlations. Overall, the dimensions of the SF-12 scale could reflect 64.5-92.3% of the corresponding dimensions of the SF-36 scale.
A low reliability and validity of social functioning domain was also noted. This might indicate questionable reliability and validity of the instruments or the lack of representation [3]. On the other hand, also it be attributed to the presence of inconsistent responding, which might occur when respondents completed a questionnaire without comprehending the items in adolescents [20]. Due to the brevity of the SF-12 instrument, related research showed it was not possible to get reliable information for each of the eight domains of SF-12 so that one would not be able to draw conclusions about specific domains [36]. Indeed, we found SF-36 was better than SF-12 in reliability. At the same time, compared with SF-12 and SF-36 in validity, no loss in effectiveness had been shown, even a slight improvement. But, we also found criterion validities of PF, SF and MH were low, self-reported health was our criterion. relevant research found that adolescents performed moderate activities or climbing several flights of stairs would not present problems for most because of typically physically fit and active, combining limited social life and adolescent mental state, inconsistent responding would be possible [20].
Unlike previous research [34,[36][37][38], we found domains of BP and SF had poor discriminations of items, instead of PF, and BP, SF, RP and VT of SF-12 had higher discriminations of items than in SF-36.
We thought Compared with PF items, the items in other domains were not easy for teenagers to understand, resulting in a lack of sensitivity in the measurement of adolescents. Similarly, a loss of information had been found in SF-12 that would be provided by the eight dimensions of the SF-36, but, utilization of the two summary dimensions of SF-12 had the advantage based on adolescents, which was consistent with the results of other population studies [20].
Methodological limitations should be mentioned. The participants were stratified regarding geographical areas for minimizing the risk of possible regional. However, the regions chosen were vast and concluded small towns and big cities as well as rural areas [39,40]. Differences due to these circumstances might exist, but not come to light in this design. Additionally, there was a difference in response consistency between samples because of characteristics of adolescence, leading to bias in results [41].

Conclusion
In general, our study suggested that the SF-12 correlated highly with the SF-36 in adolescence groups in china, If you only focused on two components (PCS and MCS) measurements, and SF-12 appeared to perform at least as well as the SF-36 in cross-sectional settings in adolescence; hence, using the SF-12 in place of the SF-36 might be appropriate in this situation. Simultaneously, whether some domains, for instance SF and BP, were suitable for adolescents need study further.

Availability of data and materials
The study data is available upon request.