The Four-Dimensional Symptom Questionnaire (4DSQ) in the general population: scale structure, reliability, measurement invariance and normative data: a cross-sectional survey

Background The Four-Dimensional Symptom Questionnaire (4DSQ) is a self-report questionnaire measuring distress, depression, anxiety and somatization with separate scales. The 4DSQ has extensively been validated in clinical samples, especially from primary care settings. Information about measurement properties and normative data in the general population was lacking. In a Dutch general population sample we examined the 4DSQ scales’ structure, the scales’ reliability and measurement invariance with respect to gender, age and education, the scales’ score distributions across demographic categories, and normative data. Methods 4DSQ data were collected in a representative Dutch Internet panel. Confirmatory factor analysis was used to examine the scales’ structure. Reliability was examined by Cronbach’s alpha, and coefficients omega-total and omega-hierarchical. Differential item functioning (DIF) analysis was used to evaluate measurement invariance across gender, age and education. Results The total response rate was 82.4 % (n = 5273/6399). The depression scale proved to be unidimensional. The other scales were best represented as bifactor models consisting of a large general factor and one or more smaller specific factors. The general factors accounted for more than 95 % of the reliable variance of the scales. Reliability was high (≥0.85) by all estimates. The distress-, depression- and anxiety scales were invariant across gender, age and education. The somatization scale demonstrated some lack of measurement invariance as a result of decreased thresholds for some of the items in young people (16–24 years) and increased thresholds in elderly people (65+ years). The somatization scale was invariant regarding gender and education. The 4DSQ scores varied significantly across demographic categories, but the explained variance was small (<6 %). Normative data were generated for gender and age categories. Approximately 17 % of the participants scored above average on de distress scale, whereas 12 % scored above average on de somatization scale. Percentages of people scoring high enough on depression or anxiety as to suspect the presence of depressive or anxiety disorder were 4.1 and 2.5 respectively. Conclusions Evidence supports reliability and measurement invariance of the 4DSQ in the general Dutch population. The normative data provided in this study can be used to compare a subject’s 4DSQ scores with a general population reference group. Electronic supplementary material The online version of this article (doi:10.1186/s12955-016-0533-4) contains supplementary material, which is available to authorized users.


Background
The Four-Dimensional Symptom Questionnaire (4DSQ) is a self-report questionnaire comprising four scales measuring distress, depression, anxiety and somatization [1]. The 4DSQ was developed in Dutch general practice and is currently used by increasingly larger numbers of family and occupational physicians, physiotherapists, social workers, counsellors, and primary care psychologists. The 4DSQ is intended to be used in both clinical and research settings. The distress scale aims to measure the kind of symptoms people experience when they are "under stress" as a result of high demands, psychosocial difficulties, daily hassles, life events, or traumatic experiences [2]. The distress scale measures people's most general, most basic response to stress of any kind. The distress score reflects any mental health problem and indicates the degree of subjective psychological suffering [3]. The depression scale measures symptoms that are relatively specific to depressive disorder, notably, anhedonia and negative cognitions [4,5]. The anxiety scale measures symptoms that are relatively specific to anxiety disorder [6]. Scores on the 4DSQ depression and anxiety scales indicate the likelihood of a (DSM-IV) depressive or anxiety disorder [7,8]. The somatization scale measures symptoms of somatic distress and somatoform disorder [9,10].
The 4DSQ has been validated in selected, mainly clinical samples from primary care settings [1,7,8,11]. The present paper aims to evaluate the 4DSQ scales' measurement properties in the general Dutch population and to provide normative data. In particular, we examined the following scale characteristics: the scales' factor structures, the scales' reliability, the scales' measurement invariance with respect to gender, age and education, the scales' score distributions across demographic categories, normative data for the general Dutch population.

Design and participants
The present study was performed in the LISS panel (LISS: Longitudinal Internet Study in the Social Sciences), an Internet panel consisting of a representative sample of Dutch-speaking non-institutionalized individuals from approximately 5,000 households in the Netherlands, managed by CentERdata [12]. The LISS panel is based on a true probability sample drawn from the population register by Statistics Netherlands. All eligible people were approached in traditional ways (i.e., by letters, telephone calls and/or house visits) with an invitation to participate in the panel. Households that could not otherwise participate were provided with a computer and Internet connection. As participation was not open for people not included in the sample drawn by Statistics Netherlands, self-selection is not an issue in the LISS panel. Imminent under-coverage of specific groups (e.g., youths, ethnic minorities) due to reduced willingness to participate or increased attrition is actively counteracted by targeted oversampling of those groups in additional "refreshment" samples [12]. Panel members complete online questionnaires on a monthly basis receiving a reimbursement of €7.50 for a questionnaire of 30 min. In July 2013, the 4DSQ was presented to a random sample (n = 771) of all available panel members aged 16 years and older. In October 2013, the 4DSQ was presented to all then available panel members of 16 years and older, except those who had already completed the 4DSQ in July (the October questionnaire was presented to 5659 participants). For the present study the July and October samples were pooled. 1

Measurements
The 4DSQ comprises four symptom scales: distress (16 items), depression (6 items), anxiety (12 items), and somatization (16 items). The 4DSQ uses a timeframe reference of 7 days. The items are answered on a 5-point frequency scale from "no" to "very often or constantly". In order to calculate sum scores the responses are coded on a 3-point scale: "no" (0 points), "sometimes" (1 point), "regularly", "often", and "very often or constantly" (2 points). By lumping the response categories "regularly", "often", and "very often or constantly" together relatively more weight is put on the number of symptoms experienced than on their perceived frequency. The 4DSQ is freely available for non-commercial use at www.4dsq.eu.

Analyses Weighting
In order to account for selective non-response (of e.g. people with low income) and to obtain results applicable to the general Dutch population, the responders were weighted using inverse response probability weighting [13]. All analyses were performed on weighted data.

Confirmatory factor analysis
The total sample of responders was randomly divided into two equally sized groups, a "training set" (n = 2636) that was used for model selection, and a "validation set" (n = 2637) that was used for validation of the models obtained in the training set [14].
We examined the latent structure of the 4DSQ scales using scale wise confirmatory factor analyses (CFA), using the package "lavaan" version 0.5-17 in R 3.1.2 [15,16]. The item responses were treated as ordered categories. Diagonally weighted least squares (DWLS) was used for model estimation [17] and mean and variance adjusted test statistics were computed. Fit measures indicating good fit included the comparative fit index (CFI) >0.95, Tucker-Lewis index (TLI) >0.95 and root mean square error of approximation (RMSEA) <0.06 [18]. An RMSEA value <0.05 indicates "close fit" to the data [19]. In addition, we examined the matrix of residual correlations and aimed for less than 5 % of the residual correlations (in absolute values) greater than 0.1. For each scale, we started by fitting a one-factor model in the training set. Informed by the modification indices, improved model fit was iteratively accomplished by allowing residual item variance to correlate (but only when the items shared specific content justifying correlated residual variance). Note that correlated item residuals suggest the presence of additional "specific" factors beyond the general factor of the scale [20]. Therefore, a fitting one-factor model with correlated residual variances was transformed into a corresponding bifactor model by defining the items with correlated residuals as indicators of one or more "group" (or specific) factors [21]. The bifactor model is characterized by one large general factor on which all items are loading, and one or more smaller group factors on which subsets of items load [22]. Psychological constructs are often "multifaceted" and the bifactor model allows to model a general factor representing the overall target construct of the scale, whereas one or more group factors model specific "facets" of the construct [23]. The bifactor models obtained in the training set were subsequently validated in the validation set using the model parameters (factor structure and loadings) from the training set.
To provide insight into the relationships between the (sub)scales we obtained factor scores for the general and specific factors in the validation set, and calculated Pearson product moment correlations.

Reliability
Reliability was assessed in the total sample. Conventional Cronbach's alpha values were calculated using the Rpackage "psych" [24]. Cronbach's alpha represents a lower bound to reliability [25]. In addition, we calculated coefficients omega-total and omega-hierarchical based on the standardized factor loadings derived from the bifactor models obtained in the CFAs, as described by Reise [22]. Omega-total reflects the proportion of the total variance that is due to all common (general and group) factors, whereas omega-hierarchical reflects the proportion of the total variance that is accounted for by the general factor alone [21]. Omega-hierarchical can be viewed as reflecting the general factor saturation of a scale [26].
In addition to reliability, we calculated standard errors of measurement (SEM) using the formula SEM = SD * √(1r), in which SD is the standard deviation and r is the reliability of the scale. We used omegatotal for r. SEM is a useful measure of measurement precision.

Measurement invariance
Measurement invariance is present when a scale measures the same construct (e.g., distress) in the same way across different groups of responders (e.g., women and men) [27]. Then the scale scores can be assumed to convey the same meaning (i.e., validity) across those groups. Psychological constructs, such as distress, are often measured using multi-item questionnaires. The responses to the items are thought to be driven by the latent (i.e., not directly observable) constructor trait (e.g., distress). Thus, the items' responses are indicators of the underlying latent trait and together provide information about responders' positions on the trait. The relationship between item responses and the underlying trait is defined by two characteristics, the correlation between the trait and the item responses, and the "threshold" of the item relative to the trait. The threshold of the item is represented by the level of the latent trait at which 50 % of the respondents endorse the item. Items are said to "function the same" when they have the same item characteristics (i.e., correlation and threshold) with respect to the underlying trait. When the items of a scale function the same in different groups, the scale can be assumed to have the same validity in these groups. Whether or not items function the same in different groups can be assessed using "differential item functioning" (DIF) analysis [28]. There are several methods to detect DIF, but no single method has proven superiority over the other methods [29]. Some authors, therefore, suggest using two different methods [30]. We used a parametric method, hybrid ordinal logistic regression (HOLR) as implemented in the R-package "lordif" version 0.2.2 [31], and a non-parametric method, the Mantel-Haenszel (M-H) method as implemented in the statistical program jMetrik 3.0 [32]. We tested DIF with respect to gender, age (age groups: 15-24, 25-44, 45-64 and 65+ years) and education (categorized in lower, intermediate, and higher education) in the training set. The criterion for DIF was the group factor explaining >2 % of the item variance (McFadden's R 2 ) in the HOLR-method, or a standardized mean difference (SMD) in item score >0.1 between groups in the M-Hmethod. Unlike the M-H-method, the HOLR-method is capable of testing more than 2 groups simultaneously. Using the M-H-method, we tested any pair of groups at the time (e.g., lower education versus intermediate education, lower education versus higher education, and intermediate education versus higher education). To account for multiple testing we adopted p < 0.001 as significance level.
The effect of DIF on the mean scale score (i.e., differential test functioning; DTF) was subsequently evaluated in the validation set. We regressed the raw scale score on the group variable while adjusting for the sum score of the items that were found to be free of DIF in both methods. The resulting difference in mean total score between 2 groups is denoted as DTFR statistic [33]. We calculated effect sizes, denoted d DTF , by dividing the DTFR values by the scale's standard deviation. These effect sizes can be interpreted in the usual way: 0.2 represents a small effect, 0.5 a moderate effect, and 0.8 a large effect [34].

Association with demographic characteristics and normative data
We examined the associations between 4DSQ scores and demographic characteristics using univariate analysis of variance (ANOVA) in the total sample. Furthermore, we calculated normative data by gender and age group, providing the distribution parameters mean, standard deviation and skewness, and percentile scores.

Demographics and response
In total, either in July or October 2013, the 4DSQ was presented to 6399 LISS participants (31 non-responders in July received the 4DSQ again in October). The response rate was 5273/6399 (82.4 %). The demographic characteristics of the total sample and the responders are presented in Table 1. Standardized residuals >2 or < −2 indicate over-or underrepresentation among the responders. Underrepresented were younger and unmarried people, people with paid work or studying/school going, people with low personal income, and people with a non-Western or unknown ethnicity. Overrepresented were retired and widowed people. After weighting, the responders sample mirrored the total sample almost perfectly. There were no significant differences between the responders in July and the responders in October, expect for age: the July responders were on average 1.7 years older than the October responders (see Additional file 1). This probably reflected differences between the panel members available in July and those available in October. There were no significant differences between the training set and the validation set of responders (see Additional file 2).

Confirmatory factor analysis Distress
The one-factor model of the distress scale with correlated residuals in 3 item doublets demonstrated good fit to the data in the training set (Table 2). No more than 4 residual correlations (3.3 %) exceeded 0.10 (in absolute values); none of the residuals exceeded 0.20. The correlated item doublets were, in order of importance, #47 -#48, referring to consequences of upsetting events, #20 -#39, related to disturbed sleep, and #32 -#36, expressing failure to cope. The corresponding bifactor model fitted the data well. In order to allow identification of the model, the loadings of the item doublets were constrained to be equal. The same bifactor model in the validation set, using the factor loadings from the training set, fitted the data slightly better. The confidence interval of the RMSEA, lying entirely below 0.05, indicated close fit of the model to the data. Figure 1 displays the bifactor model of distress in the upper left part.

Depression
The one-factor model of the depression scale demonstrated good fit without the need to allow residuals to correlate (Table 2). Consequently, there was no need to define group factors in a bifactor model. The one-factor model was replicated in the validation set, demonstrating close fit to the data. The one-factor model of depression is shown in the upper right part of Fig. 1.

Anxiety
The one-factor model of the anxiety scale, with one residual correlation (between #21 and #27, both items refer to free floating anxiety), demonstrated good fit ( Table 2). The corresponding bifactor model was confirmed in the validation set, showing close fit to the data. The model is shown in the lower left part of Fig. 1.

Somatization
The one-factor model of the somatization scale needed correlated residuals between two item triplets and one item doublet to obtain good fit ( Table 2). The item triplets were: #09 -#12 -#13 (gastro-intestinal symptoms) and #02 -#04 -#05 (musculoskeletal symptoms), whereas the item doublet concerned items #15 -#16 (cardiovascular or thoracic symptoms). The corresponding bifactor model fitted well in the training set. This model was replicated in the validation set, showing close fit to the data ( Table 2). The model is displayed in the lower right part of Fig. 1. Table 3 displays the correlation matrix of the 4DSQ factor scores. The correlations between the general factors were largely in agreement with correlations between the raw scale scores in previous studies [1]. The correlations with the (residualized) specific factors were all small.

Reliability
The different reliability coefficients are summarized in Table 4. All coefficients were over 0.85 and many were over 0.90, suggesting (more than) adequate reliability of the scales. Given the omega-hierarchical values, the general factors accounted for the lion's share of the scales' total reliable variance. The SEM values were relatively small compared with the scales' ranges. For instance, the SEM of the distress scale (range 32 points) was 1 point, indicating that the 95 % confidence interval of an observed distress score of x was x-1.96 to x + 1.96.

Measurement invariance
Items that demonstrated DIF for gender, age or education in the training set are listed in Table 5. The items of the depression scale were all free of DIF. Regarding the other scales, a total of 17 items were found to have DIF by either method (i.e., HOLR or M-H). Only 4 items were flagged for DIF by both methods. Most DIF was due to the factor age. Figure 2 illustrates DIF by age for two items, showing the expected item score as a function of the trait score, i.e., the DIF-free item response theory theta score. The slope of the curves represent the item-trait correlation. The horizontal shift of the curves for different age groups indicate different item thresholds across the age groups. The thresholds for headache (left panel) and irritability (right panel) increased progressively with increasing age. Older people reported less headache and irritability than younger people at comparable levels of somatization and distress respectively.
Differential test functioning (DTF; i.e., the effect of DIF on the scale score) is presented in Table 6. The largest DTF effect concerned the effect of age on the somatization score: younger people (16-24 years) scored on average 1.234 scale points higher on the somatization scale than elderly people (65+ years), adjusted for the true level of somatization. Similarly, they scored on average 1.234 -0.561 = 0.673 scale points higher than young adults (25-44 years) and 1.234 -0.355 = 0.879 scale point higher than older adults (45-64 years), all adjusted for differences in somatization trait levels across the age groups. This DTF effect resulted from some of the somatization items having lower thresholds in younger people (16-24 years) than in older people and some (partly other) somatization items having higher thresholds in elderly people (65+ years) than in younger people. In terms of effect size, however, the DTF effect of age on the somatization score constituted only a small effect, and only when comparing the youngest group (16-24 years) with the oldest group (65+ years). All other DTF effects were negligible from a practical point of view (i.e., considering the effect sizes d DFT ). Associations with demographic characteristics Table 7 demonstrates that the mean 4DSQ scores for distress, depression, anxiety and somatization varied significantly across demographic characteristics. Women scored higher than men (with the exception of depression; p = 0.054). Younger people (16-24 years) scored higher and elderly people (65+ years) scored lower than "working age" people (25-64 years). People of non-Dutch descent scored higher than native Dutch people. People with lower education scored higher than people with higher education. Divorced people scored higher than married people. Disabled and unemployed people The general factors are represented by "dis", "dep", "anx", and "som". The other factors represent group factors: "sleep" = disturbed sleep, "cope" = failure to cope, "upset" = symptoms related to past upsetting events, "ff-anx" = free floating anxiety, "musc" = musculoskeletal symptoms, "g-int" = gastro-intestinal symptoms, "c-vas" = cardiovascular symptoms. Coefficients are standardized factor loadings scored higher than people with paid work. And, finally, there was a clear (negative) gradient of the 4DSQ scores with the personal income level. Nevertheless, the explained variance, expressed as Eta-squared, did not exceed 6 % for any of the characteristics explaining any of the 4DSQ scores. The largest effects were observed for somatization, 5.6 % of its variance being explained by employment status. Employment status was the demographic characteristic with the largest effects on all 4DSQ scores, explaining 4.4 % of distress, 3.2 % of depression, 4.0 % of anxiety, and 5.6 % of somatization.
It is important to note that DTF was responsible for most of the differences in mean somatization scores across the age categories. Taking DTF into account (and taking the age group 65+ as reference), the youngest group (16-24 years) scored 5.92 -1.23 = 4.69 for somatization, which is only marginally higher than the mean somatization score of the oldest group (65+ years): 4.55. Similarly, young adults (25-44 years) scored 4.87 -0.56 = 4.31 and older adults (45-64 years) scored 4.94 -0 36 = 4.58 on somatization after taking DTF into account. DTF did not account for other differences in 4DSQ scores. Table 8 provides normative data by gender and age category. Clearly, the 4DSQ scores were positively skewed, as is normally the case with symptom questionnaires in non-clinical populations [35]. The depression and anxiety scores were more heavily skewed than the distress and somatization scores as a result of sizeable "floor effects": 77.8 % of all women and 79.7 % of all men scored zero on the depression scale, and 62.9 % of the women and 73.1 % of the men scored zero on the anxiety scale. In contrast, only 16.4 % of the women and 25.6 % of the men scored zero on the distress scale, and 12.5 % of the women and 21.5 % of the men scored zero on the somatization scale.

Normative data by gender and age
Regarding currently applicable cut-offs of the 4DSQ (see: www.4dsq.eu), most participants (at least 75 %) scored in the "normal" ranges of the 4DSQ scales (Table 9). Regarding distress and somatization 17.5 and 12.3 % of all participants scored above "normal" (i.e., >10). Even less people scored above "normal' for depression (>2, 9.4 %) or anxiety (>3, 9.7 %). Only 4.1 % scored high enough on depression to qualify for an immediate diagnostic assessment for depressive disorder, and no more than 2.5 % scored high enough on anxiety to qualify for an immediate diagnostic assessment for anxiety disorder.

Discussion
This study examined the 4DSQ scales' structure, reliability and measurement invariance in the general population. In addition, the study examined the 4DSQ's associations with demographic characteristics and provided normative data by gender and age.

Scale structure
The depression scale proved to be an almost perfectly unidimensional scale. The other scales were best represented  as bifactor structures, each consisting of a large general factor underlying all the items of the scale and one or more smaller "group" or "specific" factors underlying subsets of items. The general factor represents the target construct of the scale. The smaller group factors may represent certain specific "facets" of the construct.
The distress scale contained two substantive group factors that have been found in previous studies in clinical samples and translations of the 4DSQ [36,37]: a sleep factor (items #20 and #39) and a factor associated with having experienced past upsetting events (items #47 and #48). The sleep factor may be Effect size: R 2 : item score variance (%) explained by the group factor (hybrid ordinal logistic regression method); SMD standardized mean difference (Mantel-Heanszel method; multiple SMDs are noted as a range, e.g. 0.10/0.14 means from 0.10 to 0.14) c Direction of DIF: one group tends to score higher (>) or lower (<) than the other group due to DIF Theta score distress Theta score somatization   Fig. 2 Illustration of differential item functioning (DIF) by age. Expected mean item scores as a function of the latent trait score derived from item response theory (IRT) modelling, accounting for DIF. The left-hand panel displays the mean item score of item 8 as a function of the trait score for somatization, by age category. The right-hand panel displays the mean item score of item 26 as a function of the trait score for distress, by age category. The graphs were obtained from the program "lordif" explained by assuming that not everyone is equally vulnerable to sleep disturbances when distressed. The upsetting events factor is probably due to the fact that not every distressed person has experienced past stressful or traumatic events. Nevertheless, the sleep items and upsetting events items still demonstrated rather high loadings on the general distress factor, providing valuable information about the general distress level. In addition, the items provide valuable information about one possible cause of distress (past upsetting events) and one possible consequence of distress (sleep disturbance). The distress group factor consisting of item #32 ("can't cope anymore") and item #36 ("can't face it anymore") was more likely due to over-similarity of the items. An indication for over-similarity may be found in the relatively low group factor loadings relative to the general factor loadings. The anxiety scale probably also contained a group factor due to over-similarity of the items #21 ("vague feeling of fear") and #27 ("feeling frightened").
The somatization scale demonstrated three group factors that have also been encountered in previous studies in clinical and population samples and translations of the 4DSQ [36][37][38]: a musculoskeletal factor (items #02, #04 and #05), a gastrointestinal factor (items #09, #12 and #13) and a cardiovascular (or thoracic symptoms) factor (items #15 and #16). These specific factors have also been found in other studies using other scales of physical symptoms [39]. In the 4DSQ somatization scale all items contributed substantively to the general factor, but in addition some items provided extra information about certain "facets" of the clinical picture. While experiencing various levels of "general" somatization, some people tended to report relatively more musculoskeletal symptoms while others tended to report relatively more cardiovascular or gastrointestinal symptoms. This resulted in some variation within the somatization syndrome. The somatization "facets" may even be affected differentially by internal or external stressors. For instance, in residents living near a newly constructed high-voltage power line, the rise in somatization was uniquely due to a rise in musculoskeletal and gastrointestinal symptoms [38].

Reliability
We provided Cronbach's alpha values to allow comparison with earlier studies and other scales. Cronbach's alpha is often used as a measure of "internal consistency reliability" but it is usually not the best reliability estimate [25,40]. Cronbach's alpha often underestimates a scale's true reliability [40]. A better alternative constitutes coefficient omega, based on a "bifactor" representation of the scale's factor structure [22]. The 4DSQ scales proved to be highly reliable (omega-total >0.90), which enables application in clinical settings (where individual scores must be interpreted). The total scale scores predominantly represent general factor variance (i.e., distress: 0.952/0.976 = 97.5 %, anxiety: 0.959/0.963 = 99.6 %, somatization: 0.896/0.944 = 94.9 %), confirming that the     4DSQ scales were "essentially unidimensional", the total scores mainly reflecting a single common factor [41]. The depression scale only had one (general) factor. Consequently, the 4DSQ scales can safely be used as unidimensional instruments to measure their respective constructs.

Measurement invariance
Despite the existence of some degree of differential item functioning (DIF) in 17 items, the net effect of DIF on the mean scale score was negligible in most instances. This means that the 4DSQ scales measure the same constructs in the same way across gender, age and education. The only exception concerned the effect of (young) age on the somatization score. Because young people (16-24 years) had lower thresholds for a number of somatization symptoms (e.g., headache) they tended to score on average about 1 scale point higher than people over 25 years, compared to the true level of somatization. This has consequences for the interpretation of somatization scores in young people: a score of 11 in young people (16-24 years) corresponds with a score of 10 in older people. So, without taking DIF into account, young people's somatization scores would overestimate their true levels of somatization. To be "fair" to young people with respect to the interpretation of their somatization scores, their age-specific cut-off points of the somatization scale should be raised by 1 point. This ensures that the cut-off points retain the same meaning across all age groups.

Associations with demographic characteristics
By and large, the associations between the 4DSQ dimensions and demographic variables were in line with what is known about risk factors for poor mental health: higher scores were associated with female gender, younger age, lower education, lower income, being divorced, being unemployed or disabled, and being an immigrant (e.g., [42][43][44][45]) However, the net effect of the demographic variables on the 4DSQ scores, in terms of explained variance (given the Etasquared values), was smallin most cases no more than a few per cent. Remarkably, the way the 4DSQ scores varied across the demographic categories was very similar across the 4DSQ dimensions. For instance, women scored higher than men, non-Western migrants scored higher than native Dutch people, unemployed people scored higher than employed people on all four 4DSQ scales.

Normative data
Normative data are helpful to interpret the clinical significance of individual 4DSQ scores. The "average" person, representing at least 75 % of the general population, scored in the lower third of the scale range for distress and somatization, and not at all on the depression and anxiety scales. About one in six people (17.5 %) experienced "more than average" distress, including normal, but more severe responses to psychosocial stress, loss and adversity, as well as pathological responses such as depressive or anxiety disorder. 2 Regarding somatization, one in eight people (12.3 %) experienced more than average somatization. This group was largely overlapping with the more than average distressed group, the percentage people experiencing either more than average distress or more than average somatization or both being 22.1 %. Thus, the experience of some distress and/or some somatization is rather common among the general population. In contrast, however, the experience of specific symptoms of depressive or anxiety disorder is relatively uncommon in the general population. The 4DSQ depression score is best at detecting moderate-to-severe DSM-IV major depressive disorder, the kind of depression that is more likely requiring a specific treatment [7]. Only 4.1 % of the people experienced depression scores high enough (i.e., >5) to suspect depressive disorder. With respect to anxiety, the 4DSQ anxiety score detects the majority of anxiety disorders, especially panic disorder, agoraphobia, social phobia, obsessive compulsive disorder and posttraumatic stress disorder [8]. Only 2.5 % of the people scored high enough on anxiety (i.e., >9) to suspect one or more anxiety disorders. These figures are largely in agreement with previous general population studies [43,46], taking into account that some studies report 12-month prevalence instead of point-prevalence and that the 4DSQ is less effective in detecting specific phobias (such as spider-and claustrophobia).

Practical implications
The (essentially) unidimensional structure of the 4DSQ scales supports the continued use of simple sum scores. Given the fairly homogeneous factor loadings within the scales, we do not expect any added value from weighted sum scores. Moreover, researchers and practitioners can take advantage of the availability of normed data that is expressed in conventional sum scores. High reliability and measurement precision make the 4DSQ suitable for application in clinical situations.

Limitations and strengths
This study has a number of strengths including its large sample size (n > 5000), the representativeness of the sample, and the high response rate (>80 %). Moreover, because detailed demographic information was available, we were able to correct for non-response bias through inverse response probability weighting. A limitation, however, is that one can never be certain that all factors associated with non-response have been accounted for.
A second limitation of the study, given that depression and other moods demonstrate (some) seasonal variation [47], is that most of the data have been collected in October. However, evidence suggests that psychological symptom levels during autumn approximate the average levels across the year. A third limitation is that equivalence of the Internet-based 4DSQ compared to the paper-and-pencil version has not been established yet. However, differences between Web-based and corresponding paper-and-pencil versions of questionnaires are usually small [48][49][50]. Nevertheless, this is a direction for future research.

Conclusions
In the general Dutch population, the 4DSQ comprises four reliable, (essentially) unidimensional scales measuring distress, depression, anxiety and somatization. With the exception of measuring somatization in people aged 16-24 years, the 4DSQ scales measure their respective constructs in the same way across gender, age and educational groups. Young people tend to score higher on the somatization scale than older people, and for that we recommend to raise the somatization cut-offs by 1 point for the age group 16-24 years. We have provided normative data by gender and age to assist the interpretation of individual 4DSQ scores.
Endnotes 1 In July 2013 the 4DSQ was presented to all available panel members of 16 years and older in two forms, the standard present tense form and an alternative past tense form. The purpose was to examine whether these forms produced different responses. It was the original plan to present the standard form to the larger part of the panel and the alternative form to a relatively small subsample. Unfortunately, however, the forms were swapped so that the smaller subsample was presented with the standard form and the greater subsample with the alternative form. As it was suspected that the form could have an effect on the way people respond to the 4DSQ (which was later partly confirmed [51]), it was decided to present the standard form of the 4DSQ again in October to those panel members who had not completed a standard 4DSQ in July. The present study includes the responders who completed the standard present tense form of the 4DSQ, either in July or in October. 2 The relationship between distress and depression/ anxiety is characterized by a non-reciprocal hierarchy