The 12-item General Health Questionnaire (GHQ-12) is widely used as a unidimensional instrument, but factor analyses tended to suggest that it contains two or three factors. Not much is known about the usefulness of the GHQ-12 factors, if they exist, in revealing between-patient differences in clinical states and health-related quality of life.
We addressed this issue in a cross-sectional survey of out-patients with psychological disorders in Singapore. The participants (n = 120) completed the GHQ-12, the Beck Anxiety Inventory, and the Short-Form 36 Health Survey. Confirmatory factor analysis was used to compare six previously proposed factor structures for the GHQ-12. Factor scores of the best-fitting model, as well as the overall GHQ-12 score, were assessed in relation to clinical and health-related quality of life variables.
The 3-factor model proposed by Graetz fitted the data better than a unidimensional model, two 2-factor models, and two other 3-factor models. However, the three factors were strongly correlated. Their values varied in a similar fashion in relation to clinical and health-related quality of life variables.
The 12-item General Health Questionnaire contains three factors, namely Anxiety and Depression, Social Dysfunction, and Loss of Confidence. Nevertheless, using them separately does not offer many practical advantages in differentiating clinical groups or identifying association with clinical or health-related quality of life variables.
Recent studies of disease burden have demonstrated the importance of psychological disorders. For instance, depression was the fourth leading cause of disease burden, accounting for 4.4% of total disability adjusted life years in the world in 2000 . The 12-item General Health Questionnaire (GHQ-12) has been widely used in many countries for detecting psychological morbidity. Some major national studies such as the British Household Panel Survey (BHPS) also employ this instrument . Calibration of this instrument may therefore contribute significantly to a large community of researchers.
While the longer versions of the GHQ are normally considered multidimensional, the GHQ-12 is often regarded as measuring only a single dimension of psychological health. For example, Corti  analyzed the GHQ-12 data in the BHPS and maintained that the high Cronbach's alpha value indicated the unidimensionality of this instrument. However, several authors suggested that the GHQ-12 contained two or three clinically meaningful factors. Using principal component analysis, Politi et al.  identified two factors: general dysphoria and social dysfunction. Andrich and van Schoubroeck  suggested that the positively worded items formed one factor and the negatively worded items formed another. Graetz , Martin  and Worsely and Gribbin  proposed three different 3-factor models. In a multi-centre study, although considerable between-centre variation was found, the final solution tended to have either two or three factors .
Using confirmatory factor analysis (CFA) to analyze the BHPS data, Cheung  compared various models and found that the 3-factor model proposed by Graetz  gave the best fit. The factors are anxiety and depression (4 items), social dysfunction (6 items), and loss of confidence (2 items). In a study of employees in New Zealand, Kalliath et al  also employed CFA to compare various models. They also found that Graetz's 3-factor model gave better goodness-of-fit than the others. However, they maintained that none of the models they examined gave a sufficient level of goodness-of-fit. Hence they modified the instrument to propose a short (8-item) version of GHQ. In a study of college students and young adolescents in Australia, French and Tait  found that Graetz's model not only fitted the data better than other models, but also satisfactorily achieved some fit indices targets such as Comparative Fit Index > 0.95. In a study of a rural population in Australia , the model of Worsely and Gribbin fitted best and that of Graetz was second best.
While the structure of the GHQ-12 has been studied using factor analysis methods, the construct validity and usefulness of those resulting factors are not often tested. The question is whether the additional information provided by the 2 or 3 factors, if they exist, is clinically useful. In other words, will multiple scores be more useful than a total single score in helping us to understand respondents' health status?
The purpose of this study was therefore two-fold. First, we aimed to compare the previously proposed models of the GHQ-12 in an oriental population and identify the best-fitting one. It was not our objective to assess their absolute level of fit or to derive new model or version of the GHQ. Second, we aimed to assess whether the factors identified relate to clinical and health-related quality of life variables in different ways.
Subjects and study design
A consecutive sample of outpatients with anxiety disorders and/or depressive disorders was recruited from a psychiatric clinic at a tertiary hospital in Singapore. Inclusion criteria were the presence of any anxiety disorder and/or major depressive disorder, literacy in English or Chinese, and completion of an informed consent form. Patients with organic brain syndrome or psychosis were excluded.
During routine consultation visits, diagnoses of recruited patients were ascertained by a psychiatrist using DSM-IV criteria and the severity of their psychiatric disorders was assessed using a Clinical Global Impression (CGI) scale, which ranges from 1 (very mild) to 5 (very severe). Patients were then given a questionnaire containing the General Health Questionnaire (GHQ-12) , the Beck Anxiety Inventory (BAI) , and the Short Form-36 Health Survey (SF-36)  for self-completion. Identical English and Chinese questionnaires were prepared for subjects to select according to their preference. A research assistant checked returned questionnaires for completeness.
The General Health Questionnaire (GHQ-12) consists of 12 items, each assessing the severity of a mental problem over the past few weeks using a 4-point scale (from 0 to 3). The score was used to generate a total score ranging from 0 to 36, with higher scores indicating worse conditions . The Chinese version of GHQ-12 used in this study had been validated [17, 18]. A previous study of the 60- and 30-item versions of English and Chinese GHQ yielded comparable scale scores, suggesting equivalence for the two language versions .
The Beck Anxiety Inventory (BAI) is a valid and reliable self-report checklist for anxiety symptoms . This instrument consists of 21 items, each describing an anxiety symptom for a respondent to assess how much he or she has been bothered by the symptom over the past week on a 4-point scale. Responses to all items are summed up to a total score ranging from 0 to 63, with higher scores indicating more severe anxiety. A Chinese BAI was developed by the authors using forward- and back-translation procedures, and refined after a pilot study of subjects with anxiety disorders .
The Short Form 36 Health Survey (SF-36)  is a 36-item questionnaire assessing functional health-related quality of life (HRQoL) in 8 domains: physical functioning, role limitations due to physical problems, bodily pain, general health, vitality, social functioning, role limitations due to emotional problems, and mental health. The instrument yields each domain a score ranging from 0 to 100, with higher scores indicating better HRQoL. The validity and reliability of SF-36 have been extensively documented . In Singapore, both the UK English  and Chinese (Hong Kong)  versions of SF-36 have been validated [23, 24] and these two language versions appear to be equivalent .
Various factor structures of the GHQ-12 were tested by confirmatory factor analysis. Model I was unidimensional. Model IIA contained 2 factors: General Dysphoria and Social Dysfunction . Model IIB also contained 2 factors: positively worded items forming one factor and negatively worded items forming another . Model IIIA contained 3 factors: Cope, Stress and Depress, identified by Martin . Model IIIB was the 3-factor model proposed by Graetz : Anxiety and Depression, Social dysfunction, and Loss of Confidence. Model IIIC was also a 3-factor model: Anhedonia-Sleep disturbance, Social Performance and Loss of Confidence . In the confirmatory factor analysis the number of factors and the relationship between factors and observed GHQ-12 items were pre-specified according to the models. The loading of an item on a factor within a model was estimated using the maximum likelihood method.
Methodologists have emphasized that it is desirable to use different indicators to examine a model's goodness-of-fit . The fit of the six models was assessed by three measures. The Akaike's Information Criterion (AIC) penalizes the maximum log likelihood of a model according to its number of parameters. A model with a lower AIC is more plausible than one with a higher AIC. Instead of showing relative fitness, the Comparative Fit Index (CFI) assesses the fit of a model itself. The values range between 0 and 1. A CFI larger than 0.90 indicates an acceptable model. (Hu and Bentler  suggested that a CFI value above 0.95 indicates an acceptable model. In a later section we will discuss the more stringent cutoff.) The Root Mean Square of Approximation (RMSEA) assesses a model's amount of error. An RMSEA value larger than 0.08 indicates too much error.
The best-fitting model was examined in detail. The Kruskal-Wallis test was used to compare the GHQ-12 overall and factor scores of patients with different diagnosis. Pearson's correlation coefficient (r) was used to assess the association between GHQ-12 scores and various variables, namely Beck Anxiety Inventory, Clinical Global Impression and SF-36 scores. The Fisher's Z transformation was used to produce 95% confidence interval.
Results and Discussion
A total of 120 participants (63 man and 57 women) were included in the analysis (Table 1). Most (90%) respondents were Chinese; the mean (SD) age was 43.1 (12.7). Sixty six percent of the participants chose to administer an English version of the questionnaire. The mean scores of clinical and HRQoL data reported by the respondents in both gender were shown in Table 1. Men tended to have less anxiety, better clinical global impression, and higher SF-36 scores.
Table 2 shows goodness-of-fit statistics for the 1-, 2- and 3-factor models. The 3-factor model (IIIB) proposed by Graetz (1991) was the best in terms of all three fit statistics. It gave the lowest AIC and RMSEA and highest CFI. Its CFI was 0.935. All six models produced RMSEA's which exceeded 0.08. The one-dimensional model (Model I) had the highest AIC, highest RMSEA and lowest CFI.
Figure 1 displays the standardized factor loadings and between-factor correlation of model IIIB. The factor loadings ranged between 0.72 and 0.90. The three factors were strongly correlated. The correlation between factor 1 (Anxiety and Depression) and factor 2 (Social Dysfunction) was 0.89. The correlation between factor 2 and factor 3 (Loss of Confidence) was 0.83. That between factor 1 and 3 was 0.90. These strong correlations suggest that even if there were in fact three factors, in practice it may be very difficult to discern them.
Having established that Graetz's 3-factor model fitted the data better than the other models, we calculated the factor scores as unweighted sums of the items concerned. From figure 1 we could see that the loadings on each factor did not vary substantially. Hence we chose to use unweighted sums for simplicity. Table 3 shows the mean (SD) factor scores and the overall GHQ-12 score by clinical diagnosis. Some patients had multiple diagnoses; we categorized them into one of three major clinical diagnoses. The three factor scores and the overall GHQ-12 scores behaved in fairly similar ways. All four scores were significantly different between patients with and without depression; none was significantly different between patients with and without general anxiety disorder. Patients with panic disorder had lower scores on the factor Loss of Confidence (difference = 0.68; P = 0.043). The SD of the two diagnosis groups pooled was about 1.75; the between group difference was therefore approximately about 0.4 SD.
Table 4 presents the results of the correlation of 3 factors of Graetz's model and BAI, Clinical Global Impression Score, and SF-36 scales. The 3 factors were correlated with the 10 clinical and HRQoL variables to very similar degree.
Several previous confirmatory factor analyses found that the 3-factor model of Graetz gave better fit to survey data from Australia , Britain  and New Zealand . In this study we examined the issue in an Asian population in Singapore, whose members are mainly ethnic Chinese. All three goodness-of-fit indices employed, namely AIC, CFI and RMSEA, agreed that the 3-factor model of Graetz out-performed the other five models. The CFI value was 0.935. Conventionally, a CFI of 0.90 or larger is taken as evidence of sufficient fit. A more stringent criterion of CFI larger than 0.95 has recently been proposed and debated [27, 28]. The RMSEA also indicated that even the best-fitting model did not fit well, using the cut-off of 0.08 as a criterion. However, our aim is to compare the models rather than to modify the instrument. So for our purpose it is the comparison of the goodness-of-fit of the six models that matters, not the absolute values of the fit indices. We consider the "correctness" and "usefulness" of a model two fairly separate issues. Although the goodness-of-fit of Graetz's model was limited, we proceeded to examine the factor scores in relation to external criteria in order to reach a conclusion about the usefulness of the model.
The one-dimensional model was the worst according to all three goodness-of-fit indices.
The three factors in the model proposed by Graetz were found to be strongly correlated with each other, with correlation coefficients in the neighborhood of 0.8 to 0.9. Such strong correlations suggest that even if there were indeed three different factors, in practice it is quite difficult to differentiate them. The study of French and Tait  also showed strong correlation between the factors, which led the authors to recommend that it may be prudent to use the overall score rather than overinterpret the factors within the GHQ-12. We examined the three factor scores and the overall GHQ-12 score in relation to clinical diagnoses. The four scores behaved in fairly similar ways. Although the Loss of Confidence scale was significantly different between patients with and without panic disorder while the other three scales did not show significant differences between the two groups of patients, the difference was only about 0.4 SD. This is smaller than a recommended threshold (0.5 SD) corresponding to minimal clinically important differences for health states questionnaires . We also examined the association between the three GHQ scores and the Beck Anxiety Inventory, a clinical impression score, and the 8 scales of the SF-36. The three factors were associated with the clinical and HRQoL variables to similar degrees.
Two limitations of the study should be noted. Firstly, the sample size was somewhat small for confirmatory factor analysis. Secondly, the participants were clinical cases. This homogeneity might have made it more difficult to detect variations in GHQ-12 scores. We believe that the question about the relative plausibility of various factor models have been sufficiently answered by this and several previous studies [10–12]. Nevertheless, future studies of non-clinical participants based on larger sample sizes will be helpful to further assess the practical usefulness of the factors of the GHQ-12.
Several studies, including the present one, have found that Graetz's 3-factor model of the GHQ-12 is more plausible than other models. However, the factors were strongly correlated and difficult to discern. Our analysis of the three GHQ scores in relation to clinical variables and aspects of health-related quality of life did not appear to be more informative than analysis of a single overall GHQ-12 score. As such, from a pragmatic point of view we consider it acceptable to use this instrument as a one-dimensional measure. Unless one has specific questions that are best answered by a subset of the three factors, there is no need to consider the multi-dimensionality.
Ustun TB, Ayuso-Mateos JL, Chatterji S, Mathers C, Murray CJ: Global burden of depressive disorders in the year 2000.Br J Psychiatry 2004, 184: 386–392. 10.1192/bjp.184.5.386
Wiggins RD, Schofield P, Sacker A, Head J, Bartley M: Social position and minor psychiatric morbidity over time in the British Household Panel Survey 1991–1998.J Epidemiol Community Health 2004, 58: 779–787. 10.1136/jech.2003.015958
Corti L: For better or worse? Annual change in smoking, self-assessed health and subjective well-being. In In Changing Households: The British Household Panel Survey 1990–1992. Edited by: Buck N, Gershuny J, Rose D, Scott J. Colchester: University of Essex; 1994:199–219.
McHorney CA, Ware JE Jr, Lu JF, Sherbourne CD: The MOS 36-item Short-Form Health Survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups.Med Care 1994, 32: 40–66.
Lam CLK, Gandek B, Ren XS, Chan MS: Tests of scaling assumptions and construct validity of the Chinese (HK) version of the SF-36 Health Survey.J Clin Epidemiol 1998, 51: 1139–1147. 10.1016/S0895-4356(98)00105-X
Thumboo J, Feng PH, Soh CH, Boey ML, Thio ST, Fong KY: Validation of the Chinese SF-36 for quality of life assessment in patients with systemic lupus erythematosus.Lupus 2000, 9: 708–712. 10.1191/096120300673421268
Thumboo J, Fong KY, Machin D, Chan SP, Leong KH, Feng PH, Thio ST, Boe ML: A community based study of scaling assumptions and construct validity of the English (UK) and Chinese (HK) SF-36 in Singapore.Qual Life Res 2001, 10: 175–188. 10.1023/A:1016701514299
Thumboo J, Fong KY, Chan SP, Machin D, Feng PH, Thio ST, Boey ML: The equivalence of English and Chinese SF-36 versions in bilingual Singapore Chinese.Qual Life Res 2002, 11: 495–503. 10.1023/A:1015680029998
Marsh HW, Balla JR, Hau KT: An evaluation of incremental fit indices: a clarification of mathematical and empirical properties. In In Advanced Structural Equation Modeling: Issues and Techniques. Edited by: Marcoulides GA, Schumacker RE. Mahwah, NJ: Lawrence Erlbaum Associates; 1996:315–354.
Marsh HW, Hau KT, Wen Z: In search of golden rules: comment on hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing Hu and Bentler's (1999) findings.Structural Equation Modeling 2004, 11: 320–341. 10.1207/s15328007sem1103_2
Norman GR, Sloan JA, Wyrwich KW: Interpretation of changes in health-related quality of life: the remarkable universality of half a standard deviation.Med Care 2003, 41: 582–592. 10.1097/00005650-200305000-00004
FG carried out the confirmatory factor analysis, interpreted the findings, and drafted part of the manuscript. NL designed the study, participated in the development of the statistical framework and interpreted the findings. JT participated in the study design, discussion of the statistical framework, and the interpretation of findings. CF participated in the study design and carried out the data collection and clinical assessments. SCL participated in the study design and discussion and interpretation of findings. YBC conceived of the study, developed the statistical framework, carried out part of the statistical analysis, and drafted part of the manuscript. All authors read and approved the final manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Gao, F., Luo, N., Thumboo, J. et al. Does the 12-item General Health Questionnaire contain multiple factors and do we need them?.
Health Qual Life Outcomes2, 63 (2004). https://doi.org/10.1186/1477-7525-2-63