Skip to main content

Psychometric evaluation and predictive validity of Ryff's psychological well-being items in a UK birth cohort sample of women



Investigations of the structure of psychological well-being items are useful for advancing knowledge of what dimensions define psychological well-being in practice. Ryff has proposed a multidimensional model of psychological well-being and her questionnaire items are widely used but their latent structure and factorial validity remains contentious.


We applied latent variable models for factor analysis of ordinal/categorical data to a 42-item version of Ryff's psychological well-being scales administered to women aged 52 in a UK birth cohort study (n = 1,179). Construct (predictive) validity was examined against a measure of mental health recorded one year later.


Inter-factor correlations among four of the first-order psychological well-being constructs were sufficiently high (> 0.80) to warrant a parsimonious representation as a second-order general well-being dimension. Method factors for questions reflecting positive and negative item content, orthogonal to the construct factors and assumed independent of each other, improved model fit by removing nuisance variance. Predictive validity correlations between psychological well-being and a multidimensional measure of psychological distress were dominated by the contribution of environmental mastery, in keeping with earlier findings from cross-sectional studies that have correlated well-being and severity of depression.


Our preferred model included a single second-order factor, loaded by four of the six first-order factors, two method factors, and two more distinct first-order factors. Psychological well-being is negatively associated with dimensions of mental health. Further investigation of precision of measurement across the health continuum is required.


Recent years have seen a widening interest in research on aspects of well-being [14]. Extensive research on subjective well-being (SWB) which focuses mainly on how people feel, e.g. positive affect, negative affect and life satisfaction (see review by Diener et al.) [5], has begun to be complemented by a heightened interest in how well people perceive aspects of their functioning, e.g. the extent to which they feel they are in control of their lives, feel that what they do is meaningful and worthwhile, and have good relationships with others e.g. [6, 7]. This perspective is often referred to as psychological well-being (PWB) and is based on a eudaimonic perspective, rather than the hedonic perspective of subjective well-being research.

This new focus has necessitated the theoretical development of new constructs as well as questionnaire items to measure psychological well-being in clinical and population samples. The work of Ryff and colleagues has been at the forefront of this endeavour.

Ryff's scales of Psychological Well-being [8, 9] were designed to measure six theoretically motivated constructs of psychological well-being: autonomy – independence and self-determination; environmental mastery – the ability to manage one’s life; personal growth – being open to new experiences; positive relations with others– having satisfying high quality relationships; purpose in life – believing that one’s life is meaningful; and self-acceptance – a positive attitude towards oneself and one’s past life.

Despite the widespread interest in Ryff's theoretical framework, and application of the Ryff PWB items, the psychometric properties of the proposed sub-scales remain contentious. In particular there has been concern over issues of factorial validity and distinctiveness. Do the items intended to measure each theoretical domain, really do so? Do the items capture information from more than one domain? Are fewer dimensions actually revealed by empirical data collected to test the multidimensional theory?

Previous psychometric studies of the Ryff PWB are summarised in Table 1. To date no independent investigation of the factorial validity of Ryff's well-being items has unequivocally supported the a priori six-factor structure. Authors of existing studies either challenge the value of so many theoretical constructs, whose scores correlate >0.8 or 0.9, or have not confirmed the fit of the proposed model [1014]

Table 1 Summary of psychometric studies of Ryff's Scales of Psychological Well-being

Many of these studies have reached similar conclusions despite the analysis of different short and long forms of the Ryff scales. As shown in Table 1, versions with different numbers of items have been applied in a variety of settings and samples. The original instrument included 120 items (20 per dimension) but shorter versions comprising 84 items (14 per dimension), 54 items (9 per dimension), 42 items (7 per dimension) and 18 items (3 per dimension) are now widely used. It is important to note that the overlap among items in the alternative versions of the Ryff scales is limited; for example, the 18-item version has only six items in common with the 42-item version, one item for each dimension.

Ryff's own studies [7, 9] have reported high correlations among scores for the constructs that were proposed as independent. It is possible that the measures may not, in practice, adequately operationalise the constructs proposed by her theory. For example, in Ryff's first study, which employed 120 items, the inter-correlations among factor scores for the six dimensions ranged from 0.32 to 0.76. Associations were particularly strong between personal growth and purpose in life; self-acceptance and purpose in life; and environmental mastery and self-acceptance [9]. Indeed, the magnitude of these inter-factor associations prompted Ryff and Keyes [7] to estimate a second-order factor model which invoked a general PWB factor to explain associations among the first-order constructs, so clearly they acknowledged the high inter-dependencies among the six factors.

In a psychometric investigation of multi-samples, Springer & Hauser [14] factor analysed Ryff PWB items from three large North American studies; the Wisconsin Longitudinal Survey (42-items and 12-items); MIDUS – Midlife in the United States (18-items) and the National Survey of Families and Households (NSFH II) (18-items). Their results, based on internal construct validity arguments alone, seem to provide yet further evidence that the Ryff PWB items may either measure less than six distinct constructs, or that the theoretical constructs exist at two levels of definition.

Psychometric studies of multi-item questionnaires often see a need to isolate components of response tendency that are due to methodological features e.g. design or wording of items [15, 16]. Springer & Hauser [14] introduced a single latent variable (a method factor) to isolate the covariance among responses common to all negatively worded Ryff items. In their study, this component of their model was found to considerably improve model fit. In their response 17 to a commentary on their conclusions by Ryff and Singer 18 they reported a test of a 4-factor model based on the four most highly correlated dimensions (environmental mastery, personal growth, purpose in life and self-acceptance) and compared this to a 4-factor model using the same items but where item allocation was based on positive and negative wording and position (i.e. earlier or later) in the instrument. They demonstrated similar indices of model fit between the two models.

The penultimate column of Table 1 reports the factor analysis method used by existing psychometric studies. Most existing work has examined the dimensionality of Ryff PWB items using the traditional linear factor model, which assumes that responses are continuous scores on an interval scale metric [7, 11, 12]. Hauser and Springer's analysis [14, 17] was performed using a factor model that provide an ordinal/graded treatment of the Likert style response scales. Model estimation was based on polychoric correlation among items and weighted least squares methodologies (WLS). They argued that application of the standard linear model was inappropriate. Application of linear statistical models to ordinal data can result in biased estimates of factor loadings [1922]. Categorical data factor analyses models are considered to be more theoretically appropriate in their statistical underpinnings for Likert scaled (ordinal) data [2326]

In addition to these considerations that have focused entirely on issues of internal construct and factorial validity of the Ryff PWB items, it is important to consider evidence for the construct validity of the PWB in relation to other dimensions of mental health and well-being.

Ryff [7] reported correlations from three cross-sectional studies that included measures of happiness, life satisfaction and depression in addition to PWB items. Positive associations were found between measures of happiness and life satisfaction and all PWB dimensions but with the strongest correlations for self-acceptance and environmental mastery. Conversely, the severity of depressive symptoms were negatively associated with all PWB dimensions, but with the strongest negative correlations again evident for environmental mastery and self-acceptance. In a small European sample of Swedish white collar workers (N = 91) Lindfors [27] reported a correlation of -0.61 between the score on a short screening measure for minor psychiatric morbidity (the 12-item General Health Questionnaire) [28] using a total (sum) score from the 18-item Ryff. These results suggest 1) some overlap between reported psychological well-being and the absence of depressive symptoms, and 2) positive associations with other measures of subjective well-being. More external construct validity evidence is desirable since the convergent and divergent validity of PWB measures is still not well-understood. Longitudinal studies of PWB and related constructs are of value since it is of intrinsic interest to examine the consequences of PWB for other outcomes, and to contribute new data on predictive validity, which is currently absent. The existing studies are limited by being based almost solely on concurrent self-report data.

Motivated by the controversy over the dimensionality of Ryff PWB items and methodological developments described in existing studies (Table 1) we aimed to provide the first independent examination of the a priori structure of the Ryff PWB items in a UK population-based sample. In doing so we use methods that are theoretically appropriate for factor analysis of ordinal data and compare the fit of models with the following components:

  1. a)

    single (unidimensional) versus multi-factor (multidimensional) models,

  2. b)

    incorporation of method factors

  3. c)

    consideration of hierarchical models with second-order factors

Because few studies have reported any prospective consequence or correlates of population variations in levels of PWB we also examine the predictive validity i.e. the longitudinal association between the PWB constructs and a summary measure of psychological distress comprising the 28-item General Health Questionnaire [29].



The sample comprised participants from the Medical Research Council's National Survey of Health and Development (NSHD), the 1946 British birth cohort study. The NSHD is a stratified sample of singleton births occurring to married parents in England, Scotland and Wales during the week of 3–9 March 1946 (see [30, 31]). The sample comprised 5,362 individuals (2,547 women) and data have been collected regularly since childhood. The representativeness of the study sample has been well documented [30, 31]. A comparison of the sample retained at age 43 and 53 with population census data has shown that the NSHD survey members are generally representative of the national population of a similar age [32].

An annual sub-study of women's health in midlife was undertaken by postal questionnaire between the ages of 47–54. This study included 1,778 (70%) of the original cohort of women; the others had died (6%), previously refused to take part (12%) or lived abroad and were not in contact with the study or could not be traced (13%). The Ryff PWB was sent to the 1,421 women who had completed at least one women's health questionnaire in the previous 2 years. The representativeness of the sample of women who completed the Ryff items at age 52 has not been established in the same terms with respect to population census data. However, we compared the sample of women who completed the PWB and participated in the age 53 follow-up (N = 1108) or age 43 (where 53 data was not available (N = 57)) with those involved in the follow-ups but did not complete the PWB (n = 413). Ryff completers were of higher social class [chi-sq 16.6 df = 1, p < 0.001), more likely to be married (chi-sq 9.9 df = 1, p = 0.002) than non-completers and more educated (63.0 df = 1, p < 0.001). There was no difference due to employment status. This comparison excluded women (n = 50) who completed the Ryff items but neither the age 53 nor age 43 follow-ups. Comparative socio-demographic data was not available for the excluded group of women.


Psychological well-being

A forty-two item version of the Ryff PWB was included in the women's health questionnaire at age 52 on the recommendation of C.Ryff (personal communication from C.Ryff to DK 1998). The response format for all items comprised six ordered categories labelled from 'disagree strongly' to 'agree strongly'. Twenty PWB items were positively worded and 22 negatively worded. Prior to analysis, negatively worded items were reverse scored so that high values indicated well-being. This made it easier to identify floor and ceiling effects. Full question wording of the 42-items is shown in Table 2.

Table 2 Response frequencies, Ryff 42-item Psychological Well-Being Scale (N = 1214*).

The General Health Questionnaire

One year after the Ryff items were completed, women survey members completed the 28 items of the "scaled" General Health Questionnaire [29]. The GHQ-28 is a multidimensional measure of psychological distress. The GHQ-28 comprises four sub-scales, Somatic symptoms, Anxiety/Insomnia symptoms, Social Dysfunction and Severe Depression, each with seven questions. Few of the items address positive aspects of function, although some items are positively worded [33, 34]. A psychometric analysis conducted by the authors has shown that responses to GHQ-28 items in this cohort can be modelled in terms of four a priori first-order factors which all load (>0.80) on a higher (second) order latent factor capturing psychological distress.

Psychometric modelling

Method of factor analysis

Confirmatory factor analyses were performed treating the six category PWB items as ordinal response variables. Model estimation was performed using robust Weighted Least Squares [26] (rWLS; estimator = Weighted Least Squares Mean and Variance adjusted (WLSMV)) procedures in Mplus Version 3.13 [35]. Estimation using rWLS returns modified standard errors and a corrected chi-square test statistic of model fit. Unlike normal-theory maximum likelihood (ML) estimation for factor analysis of continuous scores, our use of Muthén's categorical data factor analysis methodology provides asymptotically unbiased, consistent and efficient parameter estimates, as well as a correct chi-square test of fit with dichotomous or ordinal observed variables [26]. To compare non-nested models, we report the sample size adjusted Bayesian Information Criteria (ssaBIC) from traditional linear factor analysis models that treat the ordinal responses as continuous (metric) variables (interval scores). In all models, individuals with partially missing item level data were included, since estimation of missing data patterns is possible under both estimators (traditional ML and WLSMV).

Stages in analysis

Models were estimated based on combinations of the following three model components: number of first-order factors (1 or 6); method factors (none, positive, negative, or both); second-order factors (present versus absent).

We introduced "method" factors in order to isolate nuisance variance due to item wording or content that was unrelated to the constructs being measured [15, 16, 36]. Inclusion of a method factor removed from the model any common tendency to respond similarly to PWB items with either positive or negative item content. Our method factors isolated between item-covariance orthogonal to the measured constructs. Technically these were assumed to be uncorrelated with the construct factors, and with each other. Each method factor was examined separately and then both were modelled simultaneously.

The magnitude of some inter-factor correlations reported by previous studies has given rise to the suggestion that the item-factor correspondences for some items are very weak; this can be tested by comparing the fit of the a priori measurement model, with one based on arbitrary allocation of items to factors (this is tantamount to saying that all measure well-being, but none measure any particular component or dimension of PWB). We generated four random item-factor models in order to evaluate the improvement of the a priori model over this scenario. We report the average fit statistics across the four solutions since all four random solutions were similar.

Post-hoc modelling refinements

Further structural refinements were identified based on consideration of modification indices and a slightly revised model proposed (see results).

Construct validity of the PWB constructs with respect to subsequent mental health

In order to examine the association between scores on the psychological well-being constructs, under our preferred model, and another measure of health (predictive validity), we linked the PWB scores for the women to their responses to the GHQ-28 conducted one year later.


Our analysis sample includes 1,179 respondents who completed at least 85% of PWB items (36 out of 42 questions); 957 had complete data on all items. Descriptive statistics revealed a general positive skew towards the well-being end of the response scales (Table 2). Responses to the most positive category were common (ranging from 12%–60%) and for just over half of the items this formed the modal category. These results indicated a ceiling effect on measurement of the individual items comprising the well-being scale. For questions including positive item content, responses to the lowest levels of well-being were few, often as little as 1–2% of responses to that question.

Each model is described in a single line in Table 3. This table includes a model reference number, the modelling components included, and fit statistics/information criteria.

Table 3 Model chi-square statistics (df) and goodness of fit criteria for Ryff 42-item Psychological Well-being Scale, N = 1,179

Models A0-A3

Our first set of models (A0-A3, Table 3) tested the a priori model against a model with random item-factor associations (A0) and a unidimensional model with all 42 items loading on a single latent factor (A1). Here model A2 is the a priori model, and A3 is extended to incorporate a second-order factor (loaded by all six first-order factors).

Model fit was poor for all models in terms of all criteria (CFI <= 0.70; TLI < 0.90; RMSEA > 0.11; WRMR > 2.8) (Table 3). The worst model, in terms of the ssaBIC (highest value) was A0 with random factors. The a priori model (A2) returned a lower ssaBIC value than unidimensional model (A1). The model with a second-order factor (A3) returned a higher ssaBIC value than the a priori model with only first-order factors.

Models B1-B5

Our second set of models (B1-B5) repeated A1-A3 with one, or both method factors. Compared to models A0-A3 any model incorporating either or both method factors improved model fit and substantially reduced the ssaBIC, regardless of the number of factors. Even in the model assuming a unidimensional construct of PWB (single first-order factor), but with both method factors, the ssaBIC dropped by a huge amount (>2000 points). In the a priori models with both method factors (B4, B5) RMSEA approached 0.08 and TLI approached 0.94, but CFI remained below 0.80. These two models (B4 a priori and B5 a priori plus second-order) were within 110 ssaBIC points but were indistinguishable on all other indices of fit.

Interpretation of factor loadings from selected models

We report factor loadings from two models (A2 and B4) in columns 2 and 3 of Table 4. Inter-factor correlations for models A2 and B4 are shown as lower and upper diagonal entries in Table 5. In general, four factors were strongly associated (environmental mastery (E), personal growth (G), purpose in life (P), self-acceptance (S), but autonomy (A) and positive relations (R) were more distinct correlating <0.6 with these four constructs, and only 0.4 with each other. It is therefore particularly interesting to inspect the magnitude of the factor loadings for these four versus two constructs in the second-order model. In Table 6 we report the second-order factor loadings from these models; the two lowest loadings were for autonomy (A) and positive relations (R); all other loadings were 0.8 or above.

Table 4 CFA Model Estimates (Mplus Estimator = WLSMV) for Ryff 42-item Psychological Well-being Scale, N = 1,179. a) Unstandardised Loadings (SE), b) Standardised Loadings
Table 5 Correlation Coefficients for Ryff 42-item Psychological Well-being Scale, N = 1,179
Table 6 Factor loadings from second-order model, Ryff 42-item Psychological Well-being Scale, N = 1,179.

Post-hoc models

In a final round of modelling (Table 7) we found it useful to drop two items from personal growth (G) that exhibited a complex pattern of cross-loadings. Both of the excluded items, G2 (I don't want to try new ways of doing things – my life is fine the way it is) and G3 (I think it is important to have new experiences that challenge how I think about myself and the world) are complex questions, capturing more than one issue, and include both positive and negative item content. Item E1 (I do not fit very well with the people and the community around me) loaded more highly on positive relations (R) than its designated factor (environmental mastery (E)), reflecting the initial part of the question concerned with relationships with others. We therefore chose to model this item on positive relations (R).

Table 7 Post Hoc Models, Ryff Psychological Well-being Scale, modified models (40-item)

Examination of residuals also suggested potential overlap with two questions from positive relations (R) 'people would describe me as a giving person, willing to share my time for others', and 'most people see me as loving and affectionate' and so we allowed correlated residuals between these two items. These small modifications to the a priori model, together with method factors improved fit statistics for TLI and RMSEA (Models PH2 & PH3). The CFI however, still remained below 0.86.

We also tested a six-factor model (PH4) where four constructs (environmental mastery (E), personal growth (G), purpose in life (P), self-acceptance (S)) loaded onto a second-order factor, and autonomy (A) and positive relations (R) remained as first-order factors (freely correlated). This model is drawn as a path diagram in Figure 1. Goodness of fit statistics for this model (PH4) were similar to the modified model (PH3) with all 6 constructs loading on the second-order factor. The distinctiveness of A and R from the four constructs that are most highly related (E,G,P,S) can be seen in the magnitude of the first-order factor inter-correlations from the modified model (PH2; Table 5) and the second-order factor loadings (PH3; Table 6) which were both less than 0.75 (50% common variance).

Figure 1
figure 1

Psychological well-being modified 40-item model, with second-order factor. EGPS = general well-being factor comprising four first-order factors, environmental mastery, personal growth, purpose in life and self-acceptance. The model also includes residual correlation between R1 & R6 (not shown).

Construct validity: predictive validity of the PWB for GHQ

The estimated correlation between our second-order PWB factor (model PH3 based on 40 items) and the GHQ-28 second-order factor was -0.45. The correlations among the a priori first-order PWB factors and the GHQ-28 second-order factor were low (-0.10–0.08) except for environmental mastery (E) (-0.52).

Figure 2 shows that the correlation between the factors of our preferred Ryff model and second order GHQ (Model PH4) was -0.57 with the four first-order factors (E,G,P,S) loading on a second-order general well-being factor (model PH4).

Figure 2
figure 2

External criterion validation of psychological well-being (modified 40-item model) with second-order GHQ-28. Revised 40-item PWB model (PH4). EGPS = general well-being factor comprising four first-order factors, environmental mastery, personal growth, purpose in life and self-acceptance. Factor loadings for Ryff model are given in figure 1. The model includes residual correlation between R1 & R6. The GHQ first-order factors are comprised of 28 items (seven per sub-scale) (not shown). Correlations between Ryff six first-order constructs and second-order GHQ factor (model not shown): Autonomy -0.06, environmental mastery -0.52, personal growth -0.10, positive relations 0.08, purpose in life 0.08, self-acceptance 0.02.


In this study we provide the first confirmatory test of the factorial validity and structure of Ryff's Psychological Well-being (PWB) scales (42-item version) in the UK. In contrast to previous studies, our sample come from the UK and comprise only women who are surviving members of a national birth cohort study which began in 1946. This sample completed the Ryff items as part of an annual woman's health survey in midlife and also completed a mental health measure one year later.

In our psychometric modelling we evaluated the fit of categorical (ordinal response) factor models with single and six construct factors, first and second-order factors, and method factors, as well as providing a reference comparison to a model with random item-factor associations. Like all previous research we were unable to identify a model that fitted the data well, although a number of modelling components appeared to be useful in improving model fit to the data, and therefore determine our conclusions regarding the factorial validity of Ryff's measures, with reference to her theory, and in regard to these 42 items. Our results indicate the following:

  1. 1)

    We found conceptual and empirical value (improved model fit) from the addition of both positive and negative method factors to address methodological artefacts. Springer & Hauser [14] suggested the addition of a negative method factor (correlated with the construct factors) to the Ryff PWB model, but we extended this approach to the addition of both positive and negative method factors which were independent of both each other and the measured constructs. Models incorporating a single (either positive or negative) method factor offered an improvement over the same model without this feature, although models incorporating both methods factors had greater impact. Method factors introduce additional latent variables, and model parameters, but ssaBIC comparisons show that these modelling additions improve fit despite penalties for the improvement in the log-likelihood value achieved by the estimation of additional parameters. However we note that these BIC values are taken from traditional linear factor models (since BIC values are not available for WLS solutions).

  2. 2)

    Regarding dimensionality of the PWB measure, and empirical associations among the a priori constructs, we found that in our sample four of the six dimensions of well-being (environmental mastery (E), personal growth (G), purpose in life (P), and self-acceptance (S)), as operationalised by these 42 items, were sufficiently highly correlated to warrant introduction of a general well-being factor, as a second-order general factor, that explained the association among the first-order constructs. We could not justify the inclusion of the remaining two dimensions (autonomy (A) and positive relations (R)) on this second-order construct since they were more independent of these four factors, of the second-order factor, and of each other. This gives some credence to claims that there are fewer than six dimensions under-pinning Ryffs PWB items. However our interpretation is in terms of the hierarchical organisation of the six factors, which seem to span two conceptual levels [7]. Further replications of this structure are warranted.

  3. 3)

    Finally, we found a strong negative association between a measure of mental health (severity of psychological distress based on the GHQ-28) and the PWB which were measured one year apart. The major contribution to this predictive association came from the environmental mastery items. This replicates a finding reported by Ryff & Keyes [7] using cross-sectional data. A possible explanation of this finding from attribution theory is that people who perceive their environment as uncontrollable, i.e. score low on the environmental mastery construct, and attribute this lack of control to some internal cause that is global and stable, feel helpless to prevent future negative outcomes and consequently experience depression [37, 38]. There is also some overlap in item content to do with task-related and role functioning between Ryff's environmental mastery items and some items in the GHQ-28. Validation against more objective measures could be useful, since most data concern other self-report questions

We tested the fit of random-item factor models in our data. Our random item-factor models differed by 1,356 BIC points from the ssaBIC for the theoretical model. This indicates to us that there are still some fragile item factor associations in the six-factor model, otherwise this comparison would yield a much larger reduction in BIC when comparing theoretical to random models.

Previous authors have concluded that the empirical data are not consistent with a six-factor model [11]. We do not reject the six construct factors, but see the value of a more parsimonious model, based on a hierarchical representation of the proposed dimensions. This approach is common in mental health epidemiology and personality research but does not seem to be as frequently adopted in well-being literature.

Our second-order factor model requires the item to factor mapping established for the first-order factors, for its definition, since it is the second-order (more general) factor that is proposed as the explanation for the association among environmental mastery (E), personal growth (G), purpose in life (P), and self-acceptance (S). Examination of item content suggests that this second-order factor may encapsulate a motivational aspect of well-being which incorporates notions of goal orientation and self-direction. Our finding that there are three (rather than six) distinct factors – autonomy, positive relations and motivation/self-direction – is reminiscent of the work of Deci and Ryan [39, 40] which postulates that well-being results from the fulfillment of three basic psychological needs – autonomy, relatedness and competence. It could be argued that our second-order factor bears a relationship to Deci and Ryan's concept of competence. However it should be noted that while there is overlap between the autonomy concepts of Ryff and of Deci & Ryan, the latter focus on the core concept of personal control while Ryff's items include an element of not caring what others think. The three factor structure of well-being has also suggested by Kafka & Kozma [11]. Their factor analyses of Ryff PWB (120-items) (See table 1 for details) which also included the Satisfaction with Life Scales (SWLS) and Memorial University of Newfoundland Scale of Happiness (MUNSH) extracted main three factors, with the first comprising environmental mastery (E), personal growth (G), purpose in life (P), and self-acceptance (S); the second MUNSH & SWLS and the third factor autonomy (A) and positive relations (R). Given these results we advocate a more parsimonious approach to examine antecedents and correlates of general well-being, as defined by the second-order factor, and/or to examine the specific antecedents or correlates of the first-order factors. These are scientific questions at different levels of generality, and should be recognized as such.

Many aspects of our modelling results do suggest that some Ryff items may measure more than one of the six constructs in the theory. This possibility requires further theoretical work that was not undertaken here, but should form an agenda for future research, and for future factorial complexity studies.

What are the implications of our factor analysis results for users of the Ryff PWB scales?

The factor loadings from our preferred model (Figure 1) indicate that many item-factor loadings for Ryff PWB items on the six construct factors are generally low, with only 11/40 exceeding 0.70. This would suggest that shortening this version of the PWB may not be practically possible, since the item reliabilities for almost three-quarters of the items are probably too low to allow for reliable estimation of construct scores. We note that others have suggested shorter versions, or motivated the need for them to reduce respondent burden in well-being surveys [12]. Existing short versions, e.g. the 18-item PWB do not include many items from the 42 analysed here.

Related to this observation, Figure 1 shows that 3 out of the 6 first-order factors have only 1 high factor loading (>0.70) indicating that the underlying construct explains only 50% of the variance in item response. This brings into question the definition of the constructs in terms of these single high loading items. These results suggest to us that future studies should continue to examine internal construct validity of the PWB items. They also indicate that the items in this version, and perhaps other long versions, are not sufficient to define robust latent constructs: more items with high loadings should increase the stability of the factor solutions recovered across different study samples (we thank an anonymous referee for distinguishing these two suggestions).

Applied researchers who do not wish to execute complex latent variable models will not be able to distinguish contributions to variance from method versus construct sources and are at a disadvantage in terms of their ability to define and refine both conceptually relevant and psychometrically important variants of multidimensional scale analysis. However, in these instances, a parsimonious account of associations of other variables with Ryff PWB outcomes may be achieved by adding all items loading on the four factors that define our second-order well-being continuum (EGPS; Environmental mastery, personal growth, purpose in life and self-acceptance). In samples similar in composition to ours, researchers might wish to consider using our factor loadings as weights, to form sum scores using our loadings in Figure 1. Further insights into the latent structure of the Ryff items will require equally complex models and replications of these results with method factors. In other areas of multivariate statistics the role of model-based analyses is also central e.g. missing data modelling using maximum likelihood.

The nature of our data ensures we are undertaking a pure test of the structure of PWB items since our sample are homogeneous with respect to age and gender (women age 52). Our study design therefore minimises the impact of socio-demographic characteristics. Although our sample suggested Ryff completers were more likely to be of higher socio-economic backgrounds than non-completers, comparative studies using the PWB in nationally representative samples e.g. MIDUS and NSFH [7, 14] do not report details regarding representativeness or non-completion. Therefore, it is not possible to assess whether the imbalance noted in our sample is likely to be present in other nationally representative samples of PWB using self-completion methods. Further, it could be argued that our conclusions regarding the latent structure of Ryff PWB items may be unique to this cohort and to the 42-item version of the Ryff PWB, but we believe that our results are similar enough to other studies to suggest that our psychometric conclusions and modelling innovations have validity outside of this sample.

Future research could apply further psychometric refinement to the Ryff PWB dimensions, by exploring scoring and effective measurement range using item response theory methodology.


Our psychometric analyses of the Ryff 42-item PWB suggests that the addition of two method factors to reflect positive and negative item content improves model fit. A revised model with a single second-order factor, loaded by four of the six first-order factors (environmental mastery, personal growth, purpose in life and self-acceptance), two method factors, and two more distinct first-order factors (autonomy and positive relations) provided the most parsimonious solution in this birth cohort sample. Psychological well-being was negatively associated with mental health, but further investigation of precision of measurement across the health continuum is required.



subjective well-being


psychological well-being




environmental mastery


personal growth


positive relations with others


purpose in life




National Survey of Health and Development


General Health Questionnaire


Root Mean Square Error of Approximation


Tucker Lewis Index


Comparative Fit Index


Weighted Least Squares Mean Variance adjusted


Robust weighted least squares ssaBIC Sample size adjusted Bayesian information criteria


Maximum likelihood


  1. Kahneman D, Diener E, Schwarz N: Well-Being: The Foundations of Hedonic Psychology. New York , Russell Sage Foundation; 1999.

    Google Scholar 

  2. Snyder CR, Lopez SJ: Handbook of Positive Psychology. USA , Oxford University Press, Inc.; 2001.

    Google Scholar 

  3. Huppert FA, Keverne B, Bayliss N: The Science of well-being, Integrating neurobiology, psychology and social science. Proceedings of Royal Society Scientific Discussion Meeting, 19–20 November 2003. Philosophical Transactions of the Royal Society, Series B May edition. 2004., 358:

    Google Scholar 

  4. Ryff C, Singer B: The Contours of positive human health. Psychological Inquiry 1998, 9: 1–28. 10.1207/s15327965pli0901_1

    Article  Google Scholar 

  5. Diener E, Suh EM, Lucas RE, Smith HL: Subjective well-being: Three decades of progress. Psychological Bulletin 1999,125(2):276–302. 10.1037/0033-2909.125.2.276

    Article  Google Scholar 

  6. Ryan R, Deci EL: On happiness and human potentials: a review of research on hedonic and eudaimonic well-being. A R Psychol 2001, 52: 141–166. 10.1146/annurev.psych.52.1.141

    Article  CAS  Google Scholar 

  7. Ryff CD, Keyes CL: The structure of psychological well-being revisited. J Pers Soc Psychol 1995,69(4):719–727. 10.1037/0022-3514.69.4.719

    Article  CAS  PubMed  Google Scholar 

  8. Ryff C: Beyond Ponce de Leon and life satisfaction: New directions in quest of successful aging. International Journal of Behavioural Development 1989, 12: 35–55. 10.1177/016502548901200102

    Article  Google Scholar 

  9. Ryff C: Happiness is everything, or is it?. Explorations on the meaning of psychological well-being. Journal of Personality and Social Psychology 1989, 57: 1069–1081. 10.1037/0022-3514.57.6.1069

    Article  Google Scholar 

  10. Clarke PJ, Marshall VW, Ryff CD, Wheaton B: Measuring psychological well-being in the Canadian Study of Health and Aging. Int Psychogeriatr 2001, 13 Supp 1: 79–90. 10.1017/S1041610202008013

    Article  CAS  PubMed  Google Scholar 

  11. Kafka GJ, Kozma A: The construct validity of Ryff's scales of psychological well-being (SPWB) and their relationship to measures of subjective well-being. Social Indicators Research 2002, 57: 171–190. 10.1023/A:1014451725204

    Article  Google Scholar 

  12. Van Dierendonck D: The construct validity of Ryff's Scales of Psychological Well-being and it's extension with spiritual well-being. Personality and Individual Differences 2004, 36: 629–643. 10.1016/S0191-8869(03)00122-3

    Article  Google Scholar 

  13. Cheng ST, Chan AC: Measuring psychological well-being in the Chinese. Personality and Individual Differences 2005, 38: 1307–1316. 10.1016/j.paid.2004.08.013

    Article  Google Scholar 

  14. Springer KW, Hauser RM: An assessment of the construct validity of Ryff’s scales of psychological well-being: method, mode and measurement effects. Social Science Research 2006,35(4):1079–1110.

    Google Scholar 

  15. Marsh H: Negative Item bias in ratings scales for preadolescent children: a cognitive-development phenomenon. Dev Psychol 1986,22(1):37–49. 10.1037/0012-1649.22.1.37

    Article  Google Scholar 

  16. Mook J, Kleijn W, van der Ploeg H: Symptom-positively and -negatively worded items in two popular self-report inventories of anxiety and depression. Psychological Reports 1991, 69: 551–560. 10.2466/PR0.69.6.551-560

    Article  CAS  PubMed  Google Scholar 

  17. Springer KW, Hauser RM, Freese J: Bad news indeed for Ryff's six-factor model of well-being. Social Science Research 2006,35(4):1119–30.

    Google Scholar 

  18. Ryff CD, Singer BH: Best news yet on the six-factor model of well-being. Social Science Research 2006,35(4):1102–18. 10.1016/j.ssresearch.2006.01.002

    Article  Google Scholar 

  19. Muthén B: Dichotomous factor analysis of symptom data. In Latent Variable Models for Dichotomous Outcomes: Analysis of Data from the Epidemiological Catchment Area Program Sociological Methods & Research, 18, 19–65 Edited by: Bohrnstedt E. 1989.

    Google Scholar 

  20. McDonald RP: Test Theory: a unified treatment. Manwah NJ , Lawrence Erlbaum Associates; 1999.

    Google Scholar 

  21. Embretson SE, Reise SP: Item Response for Psychologists. Mahwah NJ , Erlbaum; 2000.

    Google Scholar 

  22. Skrondal A, Rabe-Hesketh S: Generalized latent variable modeling: multilevel, longitudinal and structural equation models. Boca Raton, FL , Chapman & Hall/CRC; 2004.

    Chapter  Google Scholar 

  23. Bartholomew DJ, Knott M: Latent variable models and factor analysis. London, United Kingdom , Arnold Publishers; 1999.

    Google Scholar 

  24. Mislevy R: Recent developments in the factor analysis of categorical variables. Journal of Educational Studies 1986.,11(3–31):

  25. Bartholomew DJ, Steele F, Moustaki I, Galbraith JI: The Analysis and Interpretation of Multivariate Data for Social Scientists. London , CRC Press; 2003.

    Google Scholar 

  26. Flora DB, Curran PJ: An empirical evaluation of alternative methods of estimation for confirmatory factor analysis with ordinal data. Psychol Methods 2004,9(4):466–491. 10.1037/1082-989X.9.4.466

    Article  PubMed Central  PubMed  Google Scholar 

  27. Lindfors P: Positive health in a group of Swedish white-collar workers. Psychological Reports 2002.

    Google Scholar 

  28. Goldberg DP: General Health Questionnaire 12-item version. Windsor, Berkshire , NFER Nelson; 1978.

    Google Scholar 

  29. Goldberg DP, Hillier VF: A scaled version of the General Health Questionnaire. Psychol Med 1979, 9: 139–145.

    Article  CAS  PubMed  Google Scholar 

  30. Wadsworth ME: The imprint of time: childhood, history, and adult life. Clarendon Press. Oxford, United Kingdom , Clarendon Press; 1991.

    Google Scholar 

  31. Wadsworth ME, Kuh DJ, Richards M, Hardy RJ: Cohort Profile: The 1946 National Birth Cohort (MRC National Survey of Health and Development) . International Journal of Epidemiology 2006,31(1):49–54.

    Google Scholar 

  32. Wadsworth ME, Butterworth SL, Hardy RJ, Kuh DJ, Richards M, Langenberg C, Hilder WS, Connor M: The life course prospective design: an example of benefits and problems associated with study longevity. Soc Sci Med 2003,57(11):2193–2205. 10.1016/S0277-9536(03)00083-2

    Article  CAS  PubMed  Google Scholar 

  33. Huppert FA, Whittington JE: Evidence for the independence of positive and negative well-being: implications for quality of life assessment. British Journal of Health Psychology 2003, 8: 107–122. 10.1348/135910703762879246

    Article  PubMed  Google Scholar 

  34. Ploubidis GB, Abbott RA, Huppert FA, Kuh D, Wadsworth ME, Croudace TJ: Improvements in positive social functioning reported by a birth cohort in mid-adult life: a person-centred analysis of GHQ-28 social dysfunction items using latent class analysis. Personality and Individual Differences, in press.

  35. Muthén LK, Muthén BO: Mplus User Guide. Third edition. Los Angeles, CA , Muthén & Muthén; 2004.

    Google Scholar 

  36. Quilty LC, Oakman JM, Risko E: Correlates of the Rosenberg Self-Esteem Scale Method Effects. Structural Equation Modelling 2006,13 (1):99–117.

    Article  Google Scholar 

  37. Abramson LY, Metalsky GI, Alloy LB: Hopelessness depression: A theory-based subtype of depression. Psychol Rev 1989, 96: 358–372. 10.1037/0033-295X.96.2.358

    Article  Google Scholar 

  38. Abramson LY, Seligman ME, Teasdale JD: Learned helplessness in humans: Critique and reformulation. J Abnorm Psychol 1978, 87: 49–74. 10.1037/0021-843X.87.1.49

    Article  CAS  PubMed  Google Scholar 

  39. Deci EL, Ryan RM: Intrinsic motivation and self-determination in human behavior. New York , Plenum Publishing Co; 1985 .

    Chapter  Google Scholar 

  40. Deci EL, Ryan RM: The "what" and "why" of goal pursuits: Human needs and the self-determination of behavior. Psychological Inquiry 2000, 11: 227–268. 10.1207/S15327965PLI1104_01

    Article  Google Scholar 

  41. Bentler PM: Comparative fit indices in structural models. Psychological Bulletin 1990, 107: 238–246. 10.1037/0033-2909.107.2.238

    Article  CAS  PubMed  Google Scholar 

  42. Tucker LR, Lewis C: A reliability coefficient for maximum likelihood factor analysis. Psychometrika 1973, 38: 1–10. 10.1007/BF02291170

    Article  Google Scholar 

  43. Steiger JH: Structural model evaluation and modification: An interval estimation approach. Multivariate Behavioral Research 1990, 25: 173–180. 10.1207/s15327906mbr2502_4

    Article  Google Scholar 

  44. Yu CU: Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes. Los Angles , University of California; 2002.

    Google Scholar 

Download references


This work was supported by a project grant from The Leverhulme Foundation entitled "Developing high quality psychometric measures of positive mental health" F/09/903/A and from supplementary funding from the Isaac Newton Trust. TJC is funded by a Career Scientist Award (Public Health) from the UK Department of Health

Author information

Authors and Affiliations


Corresponding author

Correspondence to Rosemary A Abbott.

Additional information

Competing interests

The author(s) declare that they have no competing interests.

Authors' contributions

RAA undertook the statistical modelling, manuscript preparation and revision. TJC planned and advised on the statistical modelling using latent variables and participated with manuscript preparation and revision. GBP was actively involved with the statistical modelling; FAH advised on the conceptual framework and data interpretation and DK and MEW were responsible for the design and acquisition of the data. FAH, GBP, DK, MEW all advised on manuscript preparation.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Abbott, R.A., Ploubidis, G.B., Huppert, F.A. et al. Psychometric evaluation and predictive validity of Ryff's psychological well-being items in a UK birth cohort sample of women. Health Qual Life Outcomes 4, 76 (2006).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: