- Open Access
The 12-item medical outcomes study short form health survey version 2.0 (SF-12v2): a population-based validation study from Tehran, Iran
Health and Quality of Life Outcomesvolume 9, Article number: 12 (2011)
The SF-12v2 is the improved version of the SF-12v1. This study aimed to validate the SF-12v2 in Iran.
A random sample of the general population aged 18 years and over living in Tehran, Iran completed the instrument. Reliability was estimated using internal consistency and validity was assessed using known-groups comparison and convergent validity. In addition the factor structure of the questionnaire was extracted by performing both exploratory and confirmatory factor analyses (EFA and CFA).
In all, 3685 individuals were studied (1887male and 1798 female). Internal consistency for both summary measures was satisfactory. Cronbach's α for the Physical Component Summary (PCS-12) was 0.87 and for the Mental Component Summary (MCS-12) it was 0.82. Known-groups comparison showed that the SF-12v2 discriminated well between men and women and those who differed in age and educational status (P < 0.05). Furthermore, as hypothesized the physical functioning, role physical, bodily pain and general health subscales correlated higher with the PCS-12, while the vitality, social functioning, role emotional and mental health subscales correlated higher with the MCS-12. Finally the exploratory factor analysis indicated a two-factor structure (physical and mental health) that jointly accounted for 59.9% of the variance. The confirmatory factory analysis also indicated a good fit to the data for the two-latent structure (physical and mental health).
Although the findings could not be generalized to the Iranian population, overall the findings suggest that the SF-12v2 is a reliable and valid measure of health related quality of life among Iranians and now could be used in future health outcome studies. However, further studies are recommended to establish its stability, responsiveness to change, and concurrent validity for this health survey in Iran.
The SF-12 is the abridged practical version of the 36-item Short Form Health Survey (SF-36) that is developed as an applicable instrument for measuring health-related quality of life [1, 2]. The instrument contains eight subscales as original 36-item questionnaire: physical functioning (PF, 2 items), role limitations due to physical problems (RP, 2 items), bodily pain (BP, 1 item), general health perceptions (GH, 1 item), vitality (VT, 1 item), social functioning (SF, 1 item), role limitations due to emotional problems (RE, 2 items) and mental health (MH, 2 items). The psychometric properties and factor structure of the SF-12 have been examined in several studies worldwide. Overall all results have indicated that the instrument is a reliable and valid measure that can be used in a variety of population groups [3–9].
The SF-12v2 has yielded a number of changes from Version 1 including item wording and response options. The response options have been extended for items of the RP and RE scales from 2 to 5 whilst the response categories for VT and MH items have been reduced from 6 to 5. Moreover two items are reworded . Although the SF-12version 2 gives estimates of all 8 domains, there is more interest to focus on two distinct overall physical and mental health concepts known as Physical Component Summary (PCS) and Mental Component Summary (MCS).
The reliability and validity of the SF-12v2 has been investigated in numerous studies. The results of Medical Expenditure Panel Survey (MEPS) has shown that both component scores of the SF-12v2 have adequate reliability and validity and should be suitable for use in a variety of proposes within this database . The Chinese version of the instrument has also acknowledged as an appropriate health indicator in Chinese adolescents . In addition it has been demonstrated that the measure is suitable for assessment of health status in a variety of population groups such as diabetes , rheumatoid arthritis , hemophilia , cervical and lumbosacral disorders  and other health-related conditions [17–20].
Although in recent years we were witnessed the development of several health-related quality of life instruments in Iran [see http://www.Qolbank.ir], the Iranian versions of the well-developed, and well-known questionnaires still are lacking. Since 1997 we are working with Medical Outcome Trust and now QualityMetric Inc. to provide Iranian standard versions for one of the most popular general health-related quality of life instruments that is the Short Form Health Survey. It was hoped this might contribute to the existing literature and help both researchers and health professionals to have an opportunity to use the questionnaire in their potential research and practices. Thus, as part of a large study on the application of urban health equity assessment and response tool (Urban HEART) in Tehran , and alongside with our previous efforts [22, 23], the aim of this study was to investigate the psychometric properties of the Iranian version of SF-12v2 among a general Iranian population. The second objective of the study was to establish normative data for the questionnaire in Iran.
The questionnaire and scoring
Permission was asked from the QualityMetric Inc. to develop the Iranian version of SF-12v2 (License agreement #CT103890/OP008065). Since we have previously developed the Iranian version of the SF-36v1 and SF-12v1 [22, 23], the SF-12v2 was provided from the SF-12v1 and was used in this study.
To calculate the PCS-12 and the MCS-12 scores we used the QualityMetric Health Outcomes Scoring Software 2. The software uses all the 12 items to produce scores for the PCS-12 and the MCS-12 and applies a norm-based scoring algorithm empirically derived from the data of a US general population survey . It has been recommended that the US-derived summary scores, that assume a mean of 50 and a standard deviation (SD) of 10, be used in order to facilitate cross-cultural comparison of results [2, 4]. In theory the possible scores for the PCS-12 and the MCS-12 could be ranged from 0 (the worst) to 100 (the best).
A cross-sectional population-based study was conducted in Tehran, Iran in 2009. The ethics committee of the Iranian Center for Education, Culture and Research (ACECR) approved the study. The Iranian version of SF-12v2 was administered to a random sample of individuals aged 18 years and over. To select a representative sample of the general population a multi-stage area sampling procedure was applied. Every household within 22 municipal districts in Tehran had the same probability to be sampled. A team of trained interviewers collected data and all participants were interviewed in their home. The interviews were carried out with individual's informed consent.
In addition to descriptive statistics (including floor and ceiling effects), according to International Quality of Life Assessment (IQOLA) Project to assess the psychometric properties of the Iranian version of SF-12v2 several tests were performed. To test reliability, the internal consistency for summary measures was estimated using Cronbach's alpha coefficient and alpha equal to or greater than 0.70 was considered satisfactory . Validity was assessed using known-groups comparison to test how well the instrument discriminates between subgroups of the study sample that differed in their health conditions. This was a separate item in the introductory part of the questionnaire asking each respondent to report if they were suffering from a chronic illness. This included recording of cardiovascular, musculoskeletal, gastrointestinal, hematological, neurological and chronic respiratory diseases, diabetes, and cancers. It was expected that those who reported to be free of a chronic condition would have higher scores in all measures than those who reported to have one or more chronic conditions . The t-test was used for comparison. Furthermore convergent validity was assessed performing item-scale correlations. This approach is to examine the correlation between similar attributes as to establish convergent validity (known as multitrait analysis) . Correlations were calculated using Spearman's correlation coefficient (rho). It was expected that item scores would correlate higher with own hypothesized scale than other scales and PF, RP, BP and GH scores would correlate higher with the PCS-12 whether the VI, SF, RE and MH scores would correlate higher with the MCS-12. Correlation values of 0.40 or above were considered satisfactory (r ≥ 0.81-1.0 as excellent, 0.61-0.80 very good, 0.41-0.60 good, 0.21-0.40 fair and 0.20 poor) .
The factor structure of the questionnaire was extracted by performing both exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). Exploratory factor analysis was performed using the principal component analysis with obligue rotation. It was hypothesized that a two-factor solution would be obtained with eigenvalues greater than 1. Finally, confirmatory factor analysis was performed while a two-factor model (physical component summary and mental component summary) was specified for the analysis. We report several goodness-of-fit indicators including: goodness of fit index (GFI), adjusted goodness of fit index (AGFI), the root mean square error of approximation (RMSEA), normed fit index (NFI), and comparative fit index (CFI). The GFI and AGFI are chi-square based calculations independent of degrees of freedom. The recommended cut-off values for acceptable values are ≥ 0.90. The RMSEA tests the fit of the model to the covariance matrix. As a guideline, values of < 0.05 indicate a close fit and values below 0.11 are an acceptable fit. The NFI and CFI values range from 0 to 1 with a value of greater than 0.90 being acceptable fit to the data [27, 28].
In all 4337 individuals were approached. Of these, 3685 individuals (1887 male and 1798 female) agreed to take part in the study, giving a response rate of 85.0%. The mean age of the respondents was 35.6 (SD = 14.7) and mostly had secondary education (51.1%). The demographic characteristics of the study sample are shown in Table 1.
The results showed that both summary measures exceeded the 0.70 level for Cronbach's alpha indicating satisfactory results (α for the PCS-12 and the MCS-12 was 0.87 and 0.82 respectively). The mean score for the PCS-12 was 42.3 (SD = 11.4) and for the MCS-12 it was 44.6 (SD = 11.9). For both the PCS-12 and the MCS-12 the percentage of respondents scoring at the lowest level (i.e. floor effect) and at the highest level (i.e. ceiling effect) was almost nothing (frequency was 1 for each). The descriptive statistics for the SF-12v2 scales and its summary measures are shown in Table 2. In addition to provide normative data for subgroups of the study sample the summary scores for different age groups, males and females and people with different level of education are presented in Table 3.
Known-groups comparison showed that the SF-12v2 discriminated well between subgroups of people who were differed in their health condition. As hypothesized those without any chronic conditions scored higher on the PCS-12 and the MCS-12 than those with a chronic condition. To avoid the danger of colinearity between chronic pathology and age the same analysis was applied to older age groups only and the same results were obtained as expected (Table 3).
The results from correlation analysis demonstrated that item scores correlated higher with own hypothesized scale than other scales and that the PF, RP, BP, and GH subscales correlated higher with the PCS-12 score, while the VT, SF, RE, and MH subscales more correlated with the MCS-12 score lending support to its good convergent validity. Table 4 shows the results of item-scale correlation matrix for SF-12 subscales and summary measures.
Principal component analysis with oblique rotation loaded two factors. The results are shown in Table 5. Eigenvalues for the two factors that explained most of the variance observed was 5.80 and 1.37 respectively. The two-factor structure (physical and mental health) jointly accounted for 59.9% of the variance. The results indicated that PF, RP, BP, and GH items loaded higher on the physical health component and VT, SF, RE, and MH loaded higher on the mental health component.
Finally, the results for confirmatory factor analysis are shown in Figure 1. The two-factor model, that is physical component summary (PCS-12) and mental component summary (MCS-12), was specified and tested. The results provided a good fit to the data lending support to the original hypothesized structure of the questionnaire with GFI = 0.93, AGFI = 0.87, RMSE = 0.10, 90% CI RMSE = 0.10 to 0.11, NFI = 0.96, and CFI = 0.96.
This study reported the psychometric properties of the Iranian version of SF-12v2 among a general population in Tehran. The results indicated that the instrument is a reliable and valid measure that can be used in monitoring and measuring population health status. Since the present study used the norm-based scoring algorithms for calculating the PCS-12 and the MCS-12, the results from this study also can be used for cross-cultural health-related quality of life comparisons. The psychometric properties of the SF-12v2 in different cultures are also showed satisfactory results [12, 13]. Indeed evidence suggests that the instrument is applicable among diverse population clusters and is appropriate as a health status measure in subgroups of a population [14–17]. The findings from this study indicated that women, older age groups and people with lower educational status had poorer health compared to men, the younger respondents and those with better educational status. The findings are consistent with results from other studies carried out in different settings [12–14, 22]. In addition, known groups comparison indicated that the SF-12v2 summary components were able to distinguish very well between subgroups of the respondents who differed in chronic health problem.
This study used a relatively large sample of the general population. Therefore as it has been suggested  that the results of this study might be considered as Iranian normative data for the 12-item Short Form Health Survey version 2 (SF-12v2) and perhaps could be used as a basis for comparison with specific populations in the future studies. However one might argue that a sample from capital is not necessarily representative of the entire country. In general this is true but since Tehran has become a multicultural metropolitan area it has been suggested that a sample from the general population in Tehran could be regarded as a representative sample of the general population in Iran . The migration rate from the entire country to Tehran (due to its apparent attractiveness, facilities for living and opportunities for jobs etc.) is very high and vibrant. Usually in a random sample of the general population in Tehran the possibility to reach people from almost all part of the Iran is very likely.
The hypothesis regarding the item component correlations also showed desirable results. As expected the PF, RP, BP and GH subscales correlated higher with the PCS-12 while the VT, SF, RE and MH more correlated with the MCS-12 score (Table 4). This finding is somewhat different from those reported by the Ware et al. where physical functioning, role physical and bodily pain correlated most highly with the PCS and mental health, role emotional and social functioning correlated most highly with the MCS; and vitality, general health and social functioning had a relatively high correlation with both components . However, a number of studies have shown that vitality item has appeared to correlate higher with the PCS than with the MCS score . It is argued this might be due to cultural differences among people from different countries or simply this might be occurred due to translation problems [22, 30]. In addition, it has been reported that even translation of concepts such as social functioning could be difficult in some Asian cultures . As Ware indicates the most important empirical point that should be noted is the fact that scales that load highest on the physical component are most responsive to treatment that change physical morbidity whereas scales loading highest on the mental component respond to drugs and therapies that target mental health .
In general, the psychometric tests of the Iranian version of SF-12v2 showed satisfactory results. Principal component analysis with oblique rotation supported a two-factor structure for the instrument that ensured the original conceptual model of the instrument [1, 2]. A recent study on driving the SF-12v2 physical and mental health summary scores with different scoring algorithms suggested the summary scores were more consistent with changes in individual scales when the oblique rotation was performed. The authors, thus, concluded that oblique rotation would be more preferable when performing factor analysis for the SF-12v2 . In addition, the results obtained from the confirmatory factor analysis indicated that the two-factor model fitted the data very well. A study in Chinese adolescents reported that a one-factor structure also showed a satisfactory fit in the CFA .
The findings from this study indicated that overall the Iranian version of SF-12v2 performed better than the Iranian version of the SF-12v1. The Chrobach's alpha for the PCS and the MCS version 1 were 0.73 and 0.72 while for version 2 these were 0.87 and 0.82, respectively. Similarly the results from EFA indicated that the two-factor structure for version 1 jointly accounted for 57.8% of the variance observed whereas this for version 2 was 59.9% .
Although this study did not provide evidence for test-retest reliability, responsiveness to change or other psychometric tests; the findings showed that the Iranian version of SF-12v2 is a reliable instrument for measuring health-related quality of life. The future studies could focus on other psychometric properties of the questionnaire and also on different applications of the instrument. In addition, since the study sample was from Tehran, for the certainty data from this sample should not be generalized to the whole Iranian population. In fact this is a major limitation.
In general the findings suggest that the SF-12v2 is a reliable and valid measure of health-related quality of life among Iranian population and now could be used in future health outcome studies. However, further studies are recommended to establish stronger psychometric properties for this health survey in Iran.
The 12-item Short Form Health Survey version 2
International Quality of Life Assessment
Physical Component Summary
Mental Component Summary
exploratory factor analysis
confirmatory factor analysis
Ware JE, Kosinski M, Keller SD: A 12-item Short-Form Health Survey: construction of scales and preliminary tests of reliability and validity. Medical Care 1996, 34: 220–233. 10.1097/00005650-199603000-00003
Gandek B, Ware JE, Aaronson NK, Apolone G, Bjorner JB, Brazier JE, Kassa S, Lepleg A, Prieto L, Sullivan M: Cross-validation of item selection and scoring for the SF-12 Health Survey in nine countries: results from the IQOLA Project. J Clin Epidemiol 1998, 51: 1171–1178. 10.1016/S0895-4356(98)00109-7
Jayasinghe UW, Proudofoot J, Barton CA, Amoroso C, Holton C, Davies GP, Beilby J, Harris FM: Quality of life of Australian chronically-ill adults: patient and practice characteristics matter. Health and Quality of Life Outcome 2009, 7: 50. 10.1186/1477-7525-7-50
Kontodimopoulos N, Pappa E, Niakis D, Tountas Y: Validity of SF-12 summary scores in a Greek general population. Health and Quality of Life Outcomes 2007, 5: 55. 10.1186/1477-7525-5-55
Gandhi SK, Salmon JW, Zhao SZ, Lambert BL, Gore PR, Conrad K: Psychometric evaluation of the 12-item Short Form Health Survey (SF-12) in osteoarthritis and rheumatoid arthritis clinical trials. Clinical Therapeutics 2001, 2: 1080–1098. 10.1016/S0149-2918(01)80093-X
Maurischat C, Herschbach P, Peters A, Bullinger M: Factorial validity of the Short-Form 12 (SF-12) in patients with diabetes mellitus. Psychology Science Quarterly 2008, 50: 7–20.
Kodraliu G, Mosconi P, Groth N, Carmosino G, Perilli A, Gianicolo EA, Rossi A, Apolone G: Subjective health status assessment: evaluation of the Italian version of the SF-12 Health Survey. Results from the MiOS Project. J Epidemiol Biostat 2001, 6: 305–316. 10.1080/135952201317080715
Hoffman CH, McFarland BH, Kinzie D, Bresler L, Rakhlin D, Wolf S, Kovas AE: Psychometric properties of a Russian version of the SF-12 Health Survey in a refugee population. Comprehensive Psych 2005, 46: 390–397. 10.1016/j.comppsych.2004.12.002
Lam CL, Tse EY, Gandek B: Is the standard SF-12 health survey valid and equivalent for a Chinese population? Qual Life Res 2005, 14: 539–547. 10.1007/s11136-004-0704-3
Ware JE, Kosinski M, Turner-Bowker DM, Gandek B: How to score version 2 of the SF-12 HEALTH Survey. Lincoln, RI: Quality Metric Incorporated; 2002.
Cheak-Zamora NC, Wyrwich KW, McBride TD: Reliability and validity of the SF-12v2 in the medical expenditure panel survey. Qual Life Res 2009, 18: 727–735. 10.1007/s11136-009-9483-1
Fong DY, Lam CL, Mak K, Lo WS, Lai YK, Ho SY, Lam TH: The Short Form Health Survey was a valid instrument in Chinese adolescents. J Clin Epidemiol 2010, 63: 1020–1029. 10.1016/j.jclinepi.2009.11.011
Monteagudo Piqueras O, Hernando Arizaleta L, Palomar Rodriguez JA: Reference values of the Spanish version of the SF-12v2 for the diabetic population. Gac Sanit 2009, 23: 526–532. 10.1016/j.gaceta.2008.11.005
Linde L, Srensen J, stergaard M, Hrslev-Petersen K, Rasmussen C, Jensen DV, Hetland ML: What factors influence the health status of patients with rheumatoid arthritis measured by the SF-12v2 Health Survey and the Health Assessment Questionnaire? J Rheumatol 2009, 36: 2183–2189. 10.3899/jrheum.090134
Brown TM, Lee WC, Joshi AV, Pashos CL: Health-related quality of life and productivity impact haemophilia patients with inhibitors. Haemophilia 2009, 15: 911–917. 10.1111/j.1365-2516.2009.02032.x
Lee CE, Browell LM, Jones DL: Measuring health in patients with cervical and lumbosacral spinal disorders: is the 12-item short-form health survey a valid alternative for the 36-item short-form health survey? Arch Phys Med Rehabil 2008, 89: 829–833. 10.1016/j.apmr.2007.09.056
Yang M, Wallenstein G, Hagan M, Guo A, Chang J, Kornstein S: Burden of premenstrual dysphoric disorder on health-related quality of life. L Womens Health 2008, 17: 13–121.
Sutton D, Raines DA: Health-related quality of life following a surgical weight loss intervention. Appl Nurs Res 2010, 23: 52–56. 10.1016/j.apnr.2008.01.001
McBride O, Adamson G, Bunting BP, McCann S: Assessing the general health of diagnostic orphans using the Short Form Health Survey (SF-12v2): a latent variable modeling approach. Alcohol Alcohol 2009, 44: 67–76.
Sloan RA, Sawada SS, Martin CK, Church T, Blair SN: Associations between cardiorespiratory fitness and health-related quality of life. Health Qual Life Outcomes 2009, 28: 47. 10.1186/1477-7525-7-47
Asadi-Lari M, Vaez-Mahdavi MR, Faghihzadeh S, Montazeri A, Farshad A, Kalantari N, Maher A, Golmakani MM, Salehi GH, Motevallian SA, Malek-Afzali H: The application of urban health equity assessment and response tool (Urban HEART) in Tehran; concepts and framework. Medical Journal of the Islamic Republic of Iran 2010, 24: 115–125.
Montazeri A, Goshtasebi A, Vahdaninia M, Gandek B: The Short Form Health Survey: translation and validation study of the Iranian version. Qual Life Res 2005, 14: 875–882. 10.1007/s11136-004-1014-5
Montazeri A, Vahdaninia M, Mousavi SJ, Omidvari S: The Iranian version of 12-item Short-Form Health Survey (SF-12): factor structure, internal consistency and construct validity. BMC Public Health 2009, 9: 341. 10.1186/1471-2458-9-341
Saris-Baglama RN, Dewey CJ, Chisholm GB, Plumb E, Kosinski M, Bjorner JB, Ware JE: QualityMetric Health Outcomes Scoring Software 2.0: User's Guide. Lincoln, R.I: QualityMetric Incorporated; 2007.
Nunnally JC, Bernstein IR: Psychometric Theory. 3rd edition. New York: McGraw-Hill; 1994.
Campbell DT, Fiske DW: Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin 1959, 56: 81–105. 10.1037/h0046016
Marsh HW, Hau K, Wen Z: In search of golden rules: comment on hypothesis testing approaches to setting cut-off values for fit indexes and dangers in over generalizing Hu and Bentler's findings. Structural Equation Modelling 2004, 11: 320–341. 10.1207/s15328007sem1103_2
Byrne BM: Structural Equation Modelling. Mahwah, NJ: Lawrence Erlbaum Associates Publishers; 1998.
Gandek B, Ware JE: Methods for validating and norming translations of health status questionnaires: The IQOLA Project approach. J Clin Epidemiol 1998, 51: 953–959. 10.1016/S0895-4356(98)00086-9
Bullinger M, Alonso J, Apolone G, Leplege A, Sullivan M, Wood-Dauphinee S, Gandek B, Wagner A, Aaronson N, Bech P, Fukuhara S, Hassa S, Ware JE: Translating health status questionnaires and evaluating their quality: The IQOLA Project approach. International Quality of Life Assessment. J Clin Epidemiol 1998, 41: 913–923. 10.1016/S0895-4356(98)00082-1
Lim LLY, Seubsman S, Sleigh A: The SF-36 health survey: tests of data quality, scaling assumptions, reliability and validity in healthy men and women. Health Qual Life Outcomes 2008, 6: 52. 10.1186/1477-7525-6-52
Ware JE, Kosinski M, Keller SK: SF-36 Physical and Mental Summary Scales: S User's Manual. Boston, MA: The Health Institute; 1994.
Fleishman JA, Selim AJ, Kazis LE: Deriving SF-12v2 physical and mental health summary scores: a comparison of different scoring algorithms. Qual Life Res 2010, 19: 231–241. 10.1007/s11136-009-9582-z
We are grateful to the QualityMetric Inc. for their kind permission to validate the Iranian version of SF-12v2 and providing us the QualityMetrics Health Outcomes Scoring Software 2. We are also grateful to the Iranian Students' Polling Agency (ISPA) for helping us to collect data.
The authors declare that they have no competing interests.
AM was the main investigator, provided the questionnaire, carried out the analysis, and wrote the paper. MV contributed to the analysis and the writing process. MAL contributed to the data collection and the study management. SJM contributed to the study design, and analysis. SO contributed to the study design and drafting. MT contributed to the CFA analysis. All authors read and approved the manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.