Psychometric evaluation and predictive validity of Ryff's psychological well-being items in a UK birth cohort sample of women

Background Investigations of the structure of psychological well-being items are useful for advancing knowledge of what dimensions define psychological well-being in practice. Ryff has proposed a multidimensional model of psychological well-being and her questionnaire items are widely used but their latent structure and factorial validity remains contentious. Methods We applied latent variable models for factor analysis of ordinal/categorical data to a 42-item version of Ryff's psychological well-being scales administered to women aged 52 in a UK birth cohort study (n = 1,179). Construct (predictive) validity was examined against a measure of mental health recorded one year later. Results Inter-factor correlations among four of the first-order psychological well-being constructs were sufficiently high (> 0.80) to warrant a parsimonious representation as a second-order general well-being dimension. Method factors for questions reflecting positive and negative item content, orthogonal to the construct factors and assumed independent of each other, improved model fit by removing nuisance variance. Predictive validity correlations between psychological well-being and a multidimensional measure of psychological distress were dominated by the contribution of environmental mastery, in keeping with earlier findings from cross-sectional studies that have correlated well-being and severity of depression. Conclusion Our preferred model included a single second-order factor, loaded by four of the six first-order factors, two method factors, and two more distinct first-order factors. Psychological well-being is negatively associated with dimensions of mental health. Further investigation of precision of measurement across the health continuum is required.


Background
Recent years have seen a widening interest in research on aspects of well-being [1][2][3][4]. Extensive research on subjective well-being (SWB) which focuses mainly on how people feel, e.g. positive affect, negative affect and life satisfaction (see review by Diener et al.) [5], has begun to be complemented by a heightened interest in how well people perceive aspects of their functioning, e.g. the extent to which they feel they are in control of their lives, feel that what they do is meaningful and worthwhile, and have good relationships with others e.g. [6,7]. This perspective is often referred to as psychological well-being (PWB) and is based on a eudaimonic perspective, rather than the hedonic perspective of subjective well-being research.
This new focus has necessitated the theoretical development of new constructs as well as questionnaire items to measure psychological well-being in clinical and population samples. The work of Ryff and colleagues has been at the forefront of this endeavour.
Ryff's scales of Psychological Well-being [8,9] were designed to measure six theoretically motivated constructs of psychological well-being: autonomy -independence and self-determination; environmental mastery -the ability to manage one's life; personal growth -being open to new experiences; positive relations with others-having satisfying high quality relationships; purpose in lifebelieving that one's life is meaningful; and self-acceptance -a positive attitude towards oneself and one's past life.
Despite the widespread interest in Ryff's theoretical framework, and application of the Ryff PWB items, the psychometric properties of the proposed sub-scales remain contentious. In particular there has been concern over issues of factorial validity and distinctiveness. Do the items intended to measure each theoretical domain, really do so? Do the items capture information from more than one domain? Are fewer dimensions actually revealed by empirical data collected to test the multidimensional theory?
Previous psychometric studies of the Ryff PWB are summarised in Table 1. To date no independent investigation of the factorial validity of Ryff's well-being items has unequivocally supported the a priori six-factor structure. Authors of existing studies either challenge the value of so many theoretical constructs, whose scores correlate >0.8 or 0.9, or have not confirmed the fit of the proposed model [10][11][12][13][14] Many of these studies have reached similar conclusions despite the analysis of different short and long forms of the Ryff scales. As shown in Table 1, versions with differ-ent numbers of items have been applied in a variety of settings and samples. The original instrument included 120 items (20 per dimension) but shorter versions comprising 84 items (14 per dimension), 54 items (9 per dimension), 42 items (7 per dimension) and 18 items (3 per dimension) are now widely used. It is important to note that the overlap among items in the alternative versions of the Ryff scales is limited; for example, the 18-item version has only six items in common with the 42-item version, one item for each dimension.
Ryff's own studies [7,9] have reported high correlations among scores for the constructs that were proposed as independent. It is possible that the measures may not, in practice, adequately operationalise the constructs proposed by her theory. For example, in Ryff's first study, which employed 120 items, the inter-correlations among factor scores for the six dimensions ranged from 0.32 to 0.76. Associations were particularly strong between personal growth and purpose in life; self-acceptance and purpose in life; and environmental mastery and selfacceptance [9]. Indeed, the magnitude of these inter-factor associations prompted Ryff and Keyes [7] to estimate a second-order factor model which invoked a general PWB factor to explain associations among the first-order constructs, so clearly they acknowledged the high interdependencies among the six factors.
In a psychometric investigation of multi-samples, Springer & Hauser [14] factor analysed Ryff PWB items from three large North American studies; the Wisconsin Longitudinal Survey (42-items and 12-items); MIDUS -Midlife in the United States (18-items) and the National Survey of Families and Households (NSFH II) (18-items). Their results, based on internal construct validity arguments alone, seem to provide yet further evidence that the Ryff PWB items may either measure less than six distinct constructs, or that the theoretical constructs exist at two levels of definition.
Psychometric studies of multi-item questionnaires often see a need to isolate components of response tendency that are due to methodological features e.g. design or wording of items [15,16]. Springer & Hauser [14] introduced a single latent variable (a method factor) to isolate the covariance among responses common to all negatively worded Ryff items. In their study, this component of their model was found to considerably improve model fit. In their response [17] to a commentary on their conclusions by Ryff and Singer [18] they reported a test of a 4factor model based on the four most highly correlated dimensions (environmental mastery, personal growth, purpose in life and self-acceptance) and compared this to a 4-factor model using the same items but where item allocation was based on positive and negative wording  and position (i.e. earlier or later) in the instrument. They demonstrated similar indices of model fit between the two models.
The penultimate column of Table 1 reports the factor analysis method used by existing psychometric studies.
Most existing work has examined the dimensionality of Ryff PWB items using the traditional linear factor model, which assumes that responses are continuous scores on an interval scale metric [7,11,12]. Hauser and Springer's analysis [14,17] was performed using a factor model that provide an ordinal/graded treatment of the Likert style response scales. Model estimation was based on polychoric correlation among items and weighted least squares methodologies (WLS). They argued that application of the standard linear model was inappropriate. Application of linear statistical models to ordinal data can result in biased estimates of factor loadings [19][20][21][22]. Categorical data factor analyses models are considered to be more theoretically appropriate in their statistical underpinnings for Likert scaled (ordinal) data [23][24][25][26] In addition to these considerations that have focused entirely on issues of internal construct and factorial validity of the Ryff PWB items, it is important to consider evidence for the construct validity of the PWB in relation to other dimensions of mental health and well-being.
Ryff [7] reported correlations from three cross-sectional studies that included measures of happiness, life satisfaction and depression in addition to PWB items. Positive associations were found between measures of happiness and life satisfaction and all PWB dimensions but with the strongest correlations for self-acceptance and environmental mastery. Conversely, the severity of depressive symptoms were negatively associated with all PWB dimensions, but with the strongest negative correlations again evident for environmental mastery and self-acceptance. In a small European sample of Swedish white collar workers (N = 91) Lindfors [27] reported a correlation of -0.61 between the score on a short screening measure for minor psychiatric morbidity (the 12-item General Health Questionnaire) [28] using a total (sum) score from the 18-item Ryff. These results suggest 1) some overlap between reported psychological well-being and the absence of depressive symptoms, and 2) positive associations with other measures of subjective well-being. More external construct validity evidence is desirable since the convergent and divergent validity of PWB measures is still not well-understood. Longitudinal studies of PWB and related constructs are of value since it is of intrinsic interest to examine the consequences of PWB for other outcomes, and to contribute new data on predictive validity, which is currently absent. The existing studies are limited by being based almost solely on concurrent self-report data.
Motivated by the controversy over the dimensionality of Ryff PWB items and methodological developments described in existing studies (Table 1) we aimed to provide the first independent examination of the a priori structure of the Ryff PWB items in a UK population-based sample. In doing so we use methods that are theoretically appropriate for factor analysis of ordinal data and compare the fit of models with the following components: a) single (unidimensional) versus multi-factor (multidimensional) models,

b) incorporation of method factors c) consideration of hierarchical models with second-order factors
Because few studies have reported any prospective consequence or correlates of population variations in levels of PWB we also examine the predictive validity i.e. the longitudinal association between the PWB constructs and a summary measure of psychological distress comprising the 28-item General Health Questionnaire [29].

Sample
The sample comprised participants from the Medical Research Council's National Survey of Health and Development (NSHD), the 1946 British birth cohort study. The NSHD is a stratified sample of singleton births occurring to married parents in England, Scotland and Wales during the week of 3-9 March 1946 (see [30,31]). The sample comprised 5,362 individuals (2,547 women) and data have been collected regularly since childhood. The representativeness of the study sample has been well documented [30,31]. A comparison of the sample retained at age 43 and 53 with population census data has shown that the NSHD survey members are generally representative of the national population of a similar age [32].
An annual sub-study of women's health in midlife was undertaken by postal questionnaire between the ages of 47-54. This study included 1,778 (70%) of the original cohort of women; the others had died (6%), previously refused to take part (12%) or lived abroad and were not in contact with the study or could not be traced (13%). The Ryff PWB was sent to the 1,421 women who had completed at least one women's health questionnaire in the previous 2 years. The representativeness of the sample of women who completed the Ryff items at age 52 has not been established in the same terms with respect to population census data. However, we compared the sample of women who completed the PWB and participated in the age 53 follow-up (N = 1108) or age 43 (where 53 data was not available (N = 57)) with those involved in the followups but did not complete the PWB (n = 413). Ryff completers were of higher social class [chi-sq 16.6 df = 1, p < 0.001), more likely to be married (chi-sq 9.9 df = 1, p = 0.002) than non-completers and more educated (63.0 df = 1, p < 0.001). There was no difference due to employment status. This comparison excluded women (n = 50) who completed the Ryff items but neither the age 53 nor age 43 follow-ups. Comparative socio-demographic data was not available for the excluded group of women.

Measures
Psychological well-being A forty-two item version of the Ryff PWB was included in the women's health questionnaire at age 52 on the recommendation of C.Ryff (personal communication from C.Ryff to DK 1998). The response format for all items comprised six ordered categories labelled from 'disagree strongly' to 'agree strongly'. Twenty PWB items were positively worded and 22 negatively worded. Prior to analysis, negatively worded items were reverse scored so that high values indicated well-being. This made it easier to identify floor and ceiling effects. Full question wording of the 42items is shown in Table 2.

The General Health Questionnaire
One year after the Ryff items were completed, women survey members completed the 28 items of the "scaled" General Health Questionnaire [29]. The GHQ-28 is a multidimensional measure of psychological distress. The GHQ-28 comprises four sub-scales, Somatic symptoms, Anxiety/Insomnia symptoms, Social Dysfunction and Severe Depression, each with seven questions. Few of the items address positive aspects of function, although some items are positively worded [33,34]. A psychometric analysis conducted by the authors has shown that responses to GHQ-28 items in this cohort can be modelled in terms of four a priori first-order factors which all load (>0.80) on a higher (second) order latent factor capturing psychological distress.

Psychometric modelling
Method of factor analysis Confirmatory factor analyses were performed treating the six category PWB items as ordinal response variables. Model estimation was performed using robust Weighted Least Squares [26] (rWLS; estimator = Weighted Least Squares Mean and Variance adjusted (WLSMV)) procedures in Mplus Version 3.13 [35]. Estimation using rWLS returns modified standard errors and a corrected chisquare test statistic of model fit. Unlike normal-theory maximum likelihood (ML) estimation for factor analysis of continuous scores, our use of Muthén's categorical data factor analysis methodology provides asymptotically unbiased, consistent and efficient parameter estimates, as well as a correct chi-square test of fit with dichotomous or ordinal observed variables [26]. To compare non-nested models, we report the sample size adjusted Bayesian Information Criteria (ssaBIC) from traditional linear factor analysis models that treat the ordinal responses as continuous (metric) variables (interval scores). In all models, individuals with partially missing item level data were included, since estimation of missing data patterns is possible under both estimators (traditional ML and WLSMV).

Stages in analysis
Models were estimated based on combinations of the following three model components: number of first-order factors (1 or 6); method factors (none, positive, negative, or both); second-order factors (present versus absent).
We introduced "method" factors in order to isolate nuisance variance due to item wording or content that was unrelated to the constructs being measured [15,16,36]. Inclusion of a method factor removed from the model any common tendency to respond similarly to PWB items with either positive or negative item content. Our method factors isolated between item-covariance orthogonal to the measured constructs. Technically these were assumed to be uncorrelated with the construct factors, and with each other. Each method factor was examined separately and then both were modelled simultaneously. Original coding, no recoding for positive and negative wording. 1. strongly disagree to 6. strongly agree.
Italics = Questions with negative item content. Modal response categories are highlighted in bold. Questions have been ordered by dimension, and do not reflect the order questions were asked to respondents. *Analysis Sample = 1,179 (includes subjects with data on at least 36/42 questions). The magnitude of some inter-factor correlations reported by previous studies has given rise to the suggestion that the item-factor correspondences for some items are very weak; this can be tested by comparing the fit of the a priori measurement model, with one based on arbitrary allocation of items to factors (this is tantamount to saying that all measure well-being, but none measure any particular component or dimension of PWB). We generated four random item-factor models in order to evaluate the improvement of the a priori model over this scenario. We report the average fit statistics across the four solutions since all four random solutions were similar.

Post-hoc modelling refinements
Further structural refinements were identified based on consideration of modification indices and a slightly revised model proposed (see results).

Construct validity of the PWB constructs with respect to subsequent mental health
In order to examine the association between scores on the psychological well-being constructs, under our preferred model, and another measure of health (predictive validity), we linked the PWB scores for the women to their responses to the GHQ-28 conducted one year later.

Results
Our analysis sample includes 1,179 respondents who completed at least 85% of PWB items (36 out of 42 questions); 957 had complete data on all items. Descriptive statistics revealed a general positive skew towards the well-being end of the response scales (Table 2). Responses to the most positive category were common (ranging from 12%-60%) and for just over half of the items this formed the modal category. These results indicated a ceiling effect on measurement of the individual items comprising the well-being scale. For questions including positive item content, responses to the lowest levels of well-being were few, often as little as 1-2% of responses to that question.
Each model is described in a single line in Table 3. This table includes a model reference number, the modelling components included, and fit statistics/information criteria.

Models A0-A3
Our first set of models (A0-A3, Table 3) tested the a priori model against a model with random item-factor associations (A0) and a unidimensional model with all 42 items loading on a single latent factor (A1). Here model A2 is the a priori model, and A3 is extended to incorporate a second-order factor (loaded by all six first-order factors).
Model fit was poor for all models in terms of all criteria (CFI <= 0.70; TLI < 0.90; RMSEA > 0.11; WRMR > 2.8) ( Table 3). The worst model, in terms of the ssaBIC (highest value) was A0 with random factors. The a priori model (A2) returned a lower ssaBIC value than unidimensional model (A1). The model with a second-order factor (A3) returned a higher ssaBIC value than the a priori model with only first-order factors.

Models B1-B5
Our second set of models (B1-B5) repeated A1-A3 with one, or both method factors. Compared to models A0-A3 any model incorporating either or both method factors improved model fit and substantially reduced the ssaBIC, regardless of the number of factors. Even in the model assuming a unidimensional construct of PWB (single firstorder factor), but with both method factors, the ssaBIC dropped by a huge amount (>2000 points). In the a priori models with both method factors (B4, B5) RMSEA approached 0.08 and TLI approached 0.94, but CFI remained below 0.80. These two models (B4 a priori and B5 a priori plus second-order) were within 110 ssaBIC points but were indistinguishable on all other indices of fit.

Interpretation of factor loadings from selected models
We report factor loadings from two models (A2 and B4) in columns 2 and 3 of Table 4. Inter-factor correlations for models A2 and B4 are shown as lower and upper diagonal entries in Table 5. In general, four factors were strongly associated (environmental mastery (E), personal growth (G), purpose in life (P), self-acceptance (S), but autonomy (A) and positive relations (R) were more distinct correlating <0.6 with these four constructs, and only 0.4 with each other. It is therefore particularly interesting to inspect the magnitude of the factor loadings for these four versus two constructs in the second-order model. In Table 6 we report the second-order factor loadings from these models; the two lowest loadings were for autonomy (A) and positive relations (R); all other loadings were 0.8 or above.

Post-hoc models
In a final round of modelling (Table 7) we found it useful to drop two items from personal growth (G) that exhibited a complex pattern of cross-loadings. Both of the excluded items, G2 (

I don't want to try new ways of doing things -my life is fine the way it is) and G3 (I think it is important to have new experiences that challenge how I think about
myself and the world) are complex questions, capturing more than one issue, and include both positive and negative item content. Item E1 (I do not fit very well with the people and the community around me) loaded more highly on positive relations (R) than its designated factor (environmental mastery (E)), reflecting the initial part of the question concerned with relationships with others. We therefore chose to model this item on positive relations (R).
Examination of residuals also suggested potential overlap with two questions from positive relations (R) 'people would describe me as a giving person, willing to share my time for others', and 'most people see me as loving and affectionate' and so we allowed correlated residuals between these two items. These small modifications to the a priori model, together with method factors improved fit statistics for TLI and RMSEA (Models PH2 & PH3). The CFI however, still remained below 0.86.
We also tested a six-factor model (PH4) where four constructs (environmental mastery (E), personal growth (G), purpose in life (P), self-acceptance (S)) loaded onto a second-order factor, and autonomy (A) and positive relations (R) remained as first-order factors (freely correlated). This model is drawn as a path diagram in Figure 1. Goodness of fit statistics for this model (PH4) were similar to the modified model (PH3) with all 6 constructs loading on the second-order factor. The distinctiveness of A and R from the four constructs that are most highly related (E,G,P,S) can be seen in the magnitude of the firstorder factor inter-correlations from the modified model (PH2 ; Table 5) and the second-order factor loadings (PH3; Table 6) which were both less than 0.75 (50% common variance).

Construct validity: predictive validity of the PWB for GHQ
The estimated correlation between our second-order PWB factor (model PH3 based on 40 items) and the GHQ-28 second-order factor was -0.45. The correlations among the a priori first-order PWB factors and the GHQ-28 secondorder factor were low (-0.10-0.08) except for environmental mastery (E) (-0.52).

Discussion
In this study we provide the first confirmatory test of the factorial validity and structure of Ryff's Psychological Well-being (PWB) scales (42-item version) in the UK. In contrast to previous studies, our sample come from the UK and comprise only women who are surviving members of a national birth cohort study which began in 1946. This sample completed the Ryff items as part of an annual woman's health survey in midlife and also completed a mental health measure one year later.
In our psychometric modelling we evaluated the fit of categorical (ordinal response) factor models with single and six construct factors, first and second-order factors, and method factors, as well as providing a reference comparison to a model with random item-factor associations. Like all previous research we were unable to identify a model that fitted the data well, although a number of modelling components appeared to be useful in improving model fit to the data, and therefore determine our conclusions regarding the factorial validity of Ryff's measures, with reference to her theory, and in regard to these 42 items. Our results indicate the following: 1) We found conceptual and empirical value (improved model fit) from the addition of both positive and negative method factors to address methodological artefacts. Springer & Hauser [14] suggested the addition of a negative method factor (correlated with the construct factors) to the Ryff PWB model, but we extended this approach to the addition of both positive and negative method factors which were independent of both each other and the measured constructs. Models incorporating a single (either positive or negative) method factor offered an improve-  [43]. WRMR Weighted root mean residual (good fit <1.0) [44]. SsaBIC Sample size adjusted BIC statistic (lower numbers show improvement among non-nested models). 2) Regarding dimensionality of the PWB measure, and empirical associations among the a priori constructs, we found that in our sample four of the six dimensions of well-being (environmental mastery (E), personal growth (G), purpose in life (P), and self-acceptance (S)), as operationalised by these 42 items, were sufficiently highly correlated to warrant introduction of a general well-being factor, as a second-order general factor, that explained the association among the first-order constructs. We could not justify the inclusion of the remaining two dimensions (autonomy (A) and positive relations (R)) on this secondorder construct since they were more independent of these four factors, of the second-order factor, and of each other. This gives some credence to claims that there are fewer than six dimensions under-pinning Ryffs PWB items. However our interpretation is in terms of the hierarchical organisation of the six factors, which seem to span two conceptual levels [7]. Further replications of this structure are warranted.
3) Finally, we found a strong negative association between a measure of mental health (severity of psychological distress based on the GHQ-28) and the PWB which were measured one year apart. The major contribution to this predictive association came from the environmental mastery items. This replicates a finding reported by Ryff & Keyes [7] using cross-sectional data. A possible explanation of this finding from attribution theory is that people who perceive their environment as uncontrollable, i.e. Six-factor with method factors (42-item) PH3 Modified 40-item model with method factors score low on the environmental mastery construct, and attribute this lack of control to some internal cause that is global and stable, feel helpless to prevent future negative outcomes and consequently experience depression [37,38]. There is also some overlap in item content to do with task-related and role functioning between Ryff's environmental mastery items and some items in the GHQ-28.
Validation against more objective measures could be useful, since most data concern other self-report questions We tested the fit of random-item factor models in our data. Our random item-factor models differed by 1,356 BIC points from the ssaBIC for the theoretical model. This indicates to us that there are still some fragile item factor associations in the six-factor model, otherwise this comparison would yield a much larger reduction in BIC when comparing theoretical to random models.
Previous authors have concluded that the empirical data are not consistent with a six-factor model [11]. We do not reject the six construct factors, but see the value of a more parsimonious model, based on a hierarchical representation of the proposed dimensions. This approach is common in mental health epidemiology and personality research but does not seem to be as frequently adopted in well-being literature.
Our second-order factor model requires the item to factor mapping established for the first-order factors, for its definition, since it is the second-order (more general) factor that is proposed as the explanation for the association among environmental mastery (E), personal growth (G), purpose in life (P), and self-acceptance (S). Examination of item content suggests that this second-order factor may encapsulate a motivational aspect of well-being which incorporates notions of goal orientation and self-direction. Our finding that there are three (rather than six) distinct factors -autonomy, positive relations and motivation/self-direction -is reminiscent of the work of Deci and Ryan [39,40] which postulates that well-being results from the fulfillment of three basic psychological needs -autonomy, relatedness and competence. It could be argued that our second-order factor bears a relationship to Deci and Ryan's concept of competence. However it should be noted that while there is overlap between the autonomy concepts of Ryff and of Deci & Ryan, the latter focus on the core concept of personal control while Ryff's items include an element of not caring what others think.
The three factor structure of well-being has also suggested by Kafka & Kozma [11]. Their factor analyses of Ryff PWB (120-items) (See table 1 for details) which also included the Satisfaction with Life Scales (SWLS) and Memorial University of Newfoundland Scale of Happiness (MUNSH) extracted main three factors, with the first comprising environmental mastery (E), personal growth (G), purpose in life (P), and self-acceptance (S); the second MUNSH & SWLS and the third factor autonomy (A) and positive relations (R). Given these results we advocate a more parsimonious approach to examine antecedents and correlates of general well-being, as defined by the secondorder factor, and/or to examine the specific antecedents or correlates of the first-order factors. These are scientific questions at different levels of generality, and should be recognized as such.
Many aspects of our modelling results do suggest that some Ryff items may measure more than one of the six constructs in the theory. This possibility requires further theoretical work that was not undertaken here, but should form an agenda for future research, and for future factorial complexity studies.

What are the implications of our factor analysis results for users of the Ryff PWB scales?
The factor loadings from our preferred model (Figure 1) indicate that many item-factor loadings for Ryff PWB items on the six construct factors are generally low, with only 11/40 exceeding 0.70. This would suggest that shortening this version of the PWB may not be practically possible, since the item reliabilities for almost three-quarters of the items are probably too low to allow for reliable estimation of construct scores. We note that others have sug- Psychological well-being modified 40-item model, with second-order factor Figure 1 Psychological well-being modified 40-item model, with second-order factor. EGPS = general well-being factor comprising four first-order factors, environmental mastery, personal growth, purpose in life and self-acceptance. The model also includes residual correlation between R1 & R6 (not shown).  External criterion validation of psychological well-being (modified 40-item model) with second-order GHQ-28 gested shorter versions, or motivated the need for them to reduce respondent burden in well-being surveys [12]. Existing short versions, e.g. the 18-item PWB do not include many items from the 42 analysed here.
Related to this observation, Figure 1 shows that 3 out of the 6 first-order factors have only 1 high factor loading (>0.70) indicating that the underlying construct explains only 50% of the variance in item response. This brings into question the definition of the constructs in terms of these single high loading items. These results suggest to us that future studies should continue to examine internal construct validity of the PWB items. They also indicate that the items in this version, and perhaps other long versions, are not sufficient to define robust latent constructs: more items with high loadings should increase the stability of the factor solutions recovered across different study samples (we thank an anonymous referee for distinguishing these two suggestions).
Applied researchers who do not wish to execute complex latent variable models will not be able to distinguish contributions to variance from method versus construct sources and are at a disadvantage in terms of their ability to define and refine both conceptually relevant and psychometrically important variants of multidimensional scale analysis. However, in these instances, a parsimonious account of associations of other variables with Ryff PWB outcomes may be achieved by adding all items loading on the four factors that define our second-order wellbeing continuum (EGPS; Environmental mastery, personal growth, purpose in life and self-acceptance). In samples similar in composition to ours, researchers might wish to consider using our factor loadings as weights, to form sum scores using our loadings in Figure 1. Further insights into the latent structure of the Ryff items will require equally complex models and replications of these results with method factors. In other areas of multivariate statistics the role of model-based analyses is also central e.g. missing data modelling using maximum likelihood.
The nature of our data ensures we are undertaking a pure test of the structure of PWB items since our sample are homogeneous with respect to age and gender (women age 52). Our study design therefore minimises the impact of socio-demographic characteristics. Although our sample suggested Ryff completers were more likely to be of higher socio-economic backgrounds than non-completers, comparative studies using the PWB in nationally representative samples e.g. MIDUS and NSFH [7,14] do not report details regarding representativeness or non-completion. Therefore, it is not possible to assess whether the imbalance noted in our sample is likely to be present in other nationally representative samples of PWB using self-completion methods. Further, it could be argued that our con-clusions regarding the latent structure of Ryff PWB items may be unique to this cohort and to the 42-item version of the Ryff PWB, but we believe that our results are similar enough to other studies to suggest that our psychometric conclusions and modelling innovations have validity outside of this sample.
Future research could apply further psychometric refinement to the Ryff PWB dimensions, by exploring scoring and effective measurement range using item response theory methodology.

Conclusion
Our psychometric analyses of the Ryff 42-item PWB suggests that the addition of two method factors to reflect positive and negative item content improves model fit. A revised model with a single second-order factor, loaded by four of the six first-order factors (environmental mastery, personal growth, purpose in life and self-acceptance), two method factors, and two more distinct first-order factors (autonomy and positive relations) provided the most parsimonious solution in this birth cohort sample. Psychological well-being was negatively associated with mental health, but further investigation of precision of measurement across the health continuum is required.