The psychometric properties of the German version of the WHOQOL-OLD in the German population aged 60 and older

Background The WHOQOL-OLD is an instrument for the assessment of subjective quality of life in elderly people. It is based on the WHO definition of quality of life and is available in more than 20 languages. However, in most countries, the psychometric properties of the WHOQOL-OLD have been assessed only on the basis of small local samples and not in representative studies. In this study, the psychometric properties of the WHOQOL-OLD are evaluated based on a representative sample of Germany’s elderly population. Methods Face-to-face interviews with 1133 respondents from the German population aged 60 years and older were conducted. Quality of life was assessed by means of the WHOQOL-BREF, the WHOQOL-OLD and the SF12. Moreover, the GDS, the DemTect and the IADL were applied for the assessment of depressive symptoms, cognitive capacities and capacity for carrying out daily activities. Psychometric properties of the WHOQOL-OLD were evaluated by means of classical and probabilistic test theory, confirmatory factor analysis and multivariate regression model. Results Cronbach’s alpha was found to be above 0.85 for four and above .75 for two of the six facets of the WHOQOL-OLD. IRT analyses indicated that all items of the WHOQOL-OLD contribute considerably to the measurement of the associated facets. While the six-facet structure of the WHOQOL-OLD was well supported by the results of the confirmatory factor analysis, a common latent factor for the WHOQOL-OLD total scale could not be identified. Correlations with other quality of life measures and multivariate regression models with GDS, IADL and the DemTect indicate a good criterion validity of all six WHOQOL-OLD facets. Conclusions Study results confirm that the good psychometric properties of the WHOQOL-OLD that have been found in international studies could be replicated in a representative study of the German population. These results suggest that the WHOQOL-OLD is an instrument that is well suited to identify the needs and the wishes of an aging population.


Introduction
Given the predictions of an aging population, assessment of quality of life (QoL) of older adults is increasingly important. People in Europe are older than people in any other world region, and older adults are expected to increase to 25% of the population in several European countries by 2020 [1]. In the United States, 12% of the population or 36.3 million people are over the age of 65 years. It is projected that by 2050, 21% of the American population will be over 65 [2].
The changing demographics have significant implications for policy makers, as well as professionals providing health and social services [3]. As a result of higher life expectancy and a trend to earlier retirement, many people in industrialized societies spend an increasing proportion of their lifetime in the "third age" [4]; that is the stage of life between retirement and the age when 50% of the age group have died [5]. Variable life courses as well as social, economic and political conditions [4,5] result in a great variety of health states and living conditions. Although QoL measurement is becoming increasingly important, issues exist regarding measurement in older adults. There is a lack of age-specific measurements [6], and the appropriateness of QoL instruments designed for younger adults has been questioned [6][7][8].
There have been a number of conceptualizations of QoL for older adults. Using Erikson's theory of life cycles [9,10], we define QoL in later life as the capacity to satisfy higher order needs of Maslow's hierarchy, in particular control, autonomy, pleasure and self realization. While both approaches appropriately separate QoL from the environmental and intra-personal factors that influence it, they are limited because they ignore the subjective experience [11] and use a deductive approach to identifying the dimensions of QoL. Moreover, concepts such as control, autonomy, pleasure and self-realization may be more relevant to Western cultures. Since QoL is regarded as a universal concept reflecting the subjective experience of people [12], the individual experience, as well as cultural differences, must be taken into consideration. Consequently, the WHOQOL-OLD, an add on-module of the younger adults version of the WHOQOL for use with older adults, was developed in a cross-cultural study.
In 1991, the World Health Organization Quality of Life Project (WHOQOL) was the first attempt to take account of cultural differences during the instrument development [13][14][15]. This was based on the following definition of quality of life: "Quality of life is defined as individuals' perceptions of their position in life in the context of the culture and value systems in which they live and in relation to their goals, expectations, standards and concerns". The center of this definition is the subjective perception und evaluation of the living conditions by the individual. Furthermore, as a fundamental characteristic of this approach, the term quality of life is embedded into an intercultural context [13,14,16,17]. The intercultural comparability of the WHOQOL-OLD instrument, it was claimed, was ensured by the participation of research centers from diverse cultural areas in the development of the pilot instrument. This included the definition and operationalization of individual facets (sub-categories) and facets (main categories) of quality of life, the formulation and choice of questionnaire items, the development of response scales for single item groups and the field testing of the instrument by conducting pilot studies [12,18]. The result of this WHOQOL-project was the development of two generic instruments for the assessment of quality of life: the WHOQOL-100 and its short form WHOQOL-BREF [13][14][15]19]. Today, these two instruments are available in approximately 30 languages [19]. However, it became more and more obvious that the generic version of the WHOQOL-questionnaires was insufficient for the specific requirements of the assessment of quality of life in old age.
Therefore, a worldwide project called WHOQOL-OLD for the development of an instrument for the intercultural assessment of quality of life in old age, based on the WHOQOL-100 was initiated. Within the scope of this project, under the patronage of the WHO, research centers from 22 countries developed an instrument for the assessment of quality of life in old age. In order to determine the dimensional structure of the quality of life concept for older people, as well as to develop facet definitions, focus groups with experts and lay persons were conducted at project baseline. The results showed that older people relate the term quality of life to social, healthrelated and environmental aspects [20]. Based on these results, items were generated whose psychometric characteristics were evaluated by a pilot study. The results of this study led to a reduction of items. The psychometric verification of the questionnaire was carried out within a survey among the respective age target-group population. The result of this study was the final version of the WHOQOL-OLD questionnaire for the assessment of quality of life in older people, consisting of six new facets (Figure 1), which can be applied in combination with the WHOQOL-100 or the WHOQOL-BREF, respectively. However, the calculations of the psychometric characteristics of the final version of this instrument for the assessment of quality of life in older people (WHOQOL-OLD) were based on the same data set that was used for the development of the final version. Although the WHOQOL-OLD exists in more than 20 languages, validation of the instrument in general populations is rare. Only recently a Chinese version has been evaluated in the general population of Guangzhou (formerly Canton) [21].
In this article, the psychometric properties of the German version of the WHOQOL-OLD are assessed on the basis of a representative survey of the German population aged 60 years and older.

Data
In 2012, a representative, face-to-face survey of respondents 60 years and older was conducted in Germany. The sample was drawn using a random sampling procedure with three stages: (1) sample points (regional area), (2) households, and (3) individuals within the target households. Target households within 129 sample points were determined according to the random route procedure. 105 sample points comprise the area of the old and 24 the area of the new "Länder" of Germany. Target persons were selected using random digits. For the 129 sample points, a gross N of 5418 was chosen in order to finally realize a total sample of about 1000 respondents. In a second step for the age group 8o+, an additional sample was drawn in order to increase this part of the sample to about 300. Adding this sample of 102 respondents to the first one resulted in a total of 1133 (309 respondents aged 80 and older).
Ethical approval was obtained (University of Leipzig and Ulm University).

Instruments
To control for the inability to conduct the interview, the interview began with the DemTect in order to identify respondents with severe cognitive impairment. The DemTect is a cognitive screening test (including 5 tasks: a word list, a number transcoding task, a word fluency task, digit span reverse, delayed recall of the word list) to support the diagnosis of Mild Cognitive Impairment (MCI) and early dementia. Its transformed total score is independent of age and education [22].
To assess subjective QoL, the German version of the WHOQOL-BREF ( Figure 1) consisting of the six domains: "physical" (7 items), "psychological" (6 items), "social relationships" (3 items), "environment" (8 items) and "overall QoL" (2 items) was used [16,19]. Values of domains will be transformed into a range between 0 and 100. Internal consistency, as measured with Cronbach's alphas, of all subscales ranged between 0.57 and 0.88. For assessing older-specific facets of quality of life, the 24-item add-on module, WHOQOL-OLD, consisting of 6 facets (sensory abilities, autonomy, past, present and future activities, social participation, death and dying and intimacy) was used ( Figure 1, Table 1) [23]. Values of facets were transformed into a range between 0 and 100, as well. Internal consistency of the subscales ranged between 0.75 for autonomy and 0.92 for intimacy.
Comorbidity was defined as the number of chronic diseases using the comorbidity list from the Federal Health Survey [24].
The respondents' functioning level concerning instrumental activities of daily living (IADL) was assessed with the Instrumental Activities of Daily Living Scale [25,26].

Assessment of reliability
According to recent developments in psychometric assessment in quality of life research [27][28][29][30], psychometric properties of WHOQOL-OLD were assessed by means of classical and probabilistic test theory [8].
Following the principles of the classical test theory, the reliability of the WHOQOL-OLD facets was determined on the basis of the internal consistency. Cronbach's alpha was estimated for the 6 facets of the WHOQOL-OLD. The inter item correlation as well as the item scale correlation were estimated for all items in relation to the 6 facets.
To examine the reliability by means of probabilistic test theory, a Partial Credit Model (PCM) was employed [31][32][33]. The PCM comes from the family of IRT (Item Response Theory) models, and is an extension of the Rasch model [34,35] for polytomous items with ordered response categories: The PCM models the probability of response category j for item i and person as a function of the latent "ability" θ p and the threshold parameter δ il [36]. Both the thresholds and the latent ability are mapped on the same scale. The threshold parameters mark the point on the latent dimension θ where the Category Characteristic Curves intersect (e.g. the point where the probability of endorsing 2 particular adjacent categories is equal). Whether the thresholds are located on the dimension in ascending order is of major concern and not a necessary characteristic of this (ordinal) model. The PCM is suited to model sums of binary responses which are not supposed to be stochastically independent [37].
To evaluate model, two fit-indices were estimated. First, "INFIT" and "OUTFIT" which are measures for the "randomness" or "determination" of an item concerning a particular measurement model were estimated. "Values larger than 1.0 indicate unmodeled noise. Values are on a ratio scale, so that 1.2 indicates 20% excess noise. Values less than 1.0 indicate a lack of stochasticity" [33,[38][39][40][41]. Since the INFIT is an information-weighted form of the OUTFIT which "…reduces the influence of less informative, low variance, off-target responses" [38], we expressly will focus on this parameter. This leads to the pragmatic categorization [42]: > 2.0 Distorts or degrades the measurement system 1.5 -2.0 Unproductive for construction of measurement, but not degrading 0.5 -1.5 Productive for measurement < 0.5 Less productive for measurement, but not degrading. May produce misleadingly good reliabilities and separations.
Secondly, the so-called Q-index (also called Person-Separation-Index PSI) [43,44] was estimated. "The Qindex lies between zero (indicating perfect discrimination, i.e., a Guttman-pattern) and one (indicating perfect "antidiscrimination"). A value of 0.5 indicates no relationship between the individual parameter and the reaction to the item. The Zq value is a transformation of the Q-index that is approximately normally distributed if the Rasch model holds for the respective item. High positive values indicate that the item discrimination is lower than assumed by the Rasch model (under-fit), negative values indicate higher Opportunities to be loved discrimination than assumed (over-fit)" [45]. ZQ values within the range of -1.96 and 1.96 indicate that the Qindex of an item is in the expected range with a probability of 95%.
The thresholds for the answering categories and the distributions of the latent scale dimensions are presented in the person-item maps (PIM). The histograms in the upper part of the PIM represent the distribution on the latent scale of each facet. The lines in the lower part of the PIM represent the ranges of the latent scales with the means symbolized, as dark dots and the thresholds of the k-1 answering categories symbolized as circles with the number of the category. As an indicator of a high reliability, all thresholds should have the same ascending order. The discriminatory power of the items is represented by the range between the thresholds. Small ranges represent a high discriminatory power and vice versa. Since the PCM supposes an ordinal scaling model, it does not require equal ranges between thresholds.

Assessment of validity
The construct validity of the WHOQOL-OLD was assessed by means of a first order and second order confirmatory factor analysis. The first model represents the 6 factor structure in the sense of a congeneric measurement model [46]. The second model contains an additional factor of 2 nd order that was included to investigate whether Discriminant validity was assessed by multivariate regression models for each of the WHOQOL-OLD facets with the socio-demographic characteristics, the living situation, the GDS, the IADL, the number of chronic diseases and the cognitive status measured by the Dem-Tect as independent variables.

Software
The CFA was estimated by Mplus 7.11 [47]. Analyses for the PCM were conducted by the package eRm [48] or ltm [49] for R. The Q-Index was computed using WINMIRA [45]. The indices regarding "classical test theory" were estimated by the command "alpha" using STATA 13 [50].

Results
A total of 1133 people aged 60 to 96 years old participated in the study ( Table 2). The mean age was 72.3 years (SD 8.7 years). The gender ratio of the sample was about equal. About 50% of the sample was married and lived together with a spouse, while the other half were separated, divorced, widowed or never married. About 43% of the study population lived alone, while 57% lived together with partners, children or other people. About 42% of the study population had finished ten years or more of formal education, while 58% had finished less than 10 years of school.
Of the study population 66% had no cognitive impairments, 24% had mild impairments and 9.5% were identified as having severe cognitive impairments according to the DemTect.
The mean Instrumental Activity of Daily Living (IADL) score is 6.7, indicating that the study participants, on    average, are able to live largely independent. The mean Geriatric Depression Scale (GDS) value of 3.5 indicates a low level of depressive symptoms. Table 3 shows the reliability parameters for the WHOQOL- The INFIT parameters between 0.5 and 1.5 indicate that all items are productive for the measurement of the associated facets. The z values for the transformed Qindex indicate no significant deviance of the response patterns from those expected by the partial credit model.

Reliability
As indicated by Figures 2, 3, 4, 5, 6 and 7, all facets show ordered answering thresholds for the associated items. The varying threshold ranges within and between the items of each facet indicate considerable differences in the discriminatory power not only of the items but also of the answering categories within the same items.
Frequency distributions for the latent scales indicate negatively skewed distributions for all facets, however the modal value of the facet death and dying is much lower than those of the other facets. Particularly the facet sensory abilities but also death and dying have bimodal distributions.

Construct validity
Results of the first order confirmatory factor analysis ( Figure 8) reveal that all WHOQOL-OLD facets are represented by sufficient significant standardized loadings above 0.5 on the associated items. The only exception is the small loading of the facet past present and future activities on the item 19 (0.339). R 2 values indicate sufficient communalities of above 0.3 for all items with the exception of item 19 with 0.431.
As shown in Table 4, correlations between the factors representing the 6 WHOQOL-OLD facets range between r = 0.180 between sensory abilities and death and dying and r = 0.907 between social participation and past, present and future activities. In particular, the high correlations between the factors representing the facets social participation, autonomy and past, present and future activities suggest that a latent common factor representing a WHOQOL-OLD total score may exist.
To test this assumption, a second order confirmatory factor model was estimated. For this purpose, the variance of the factor representing the WHOQOL-OLD facet past, present and future activities was fixed to zero. The factor loading structure of this model (Figure 9) reveals sufficient standardized loadings above 0.5 of the common factor on five of the six factors representing the WHOQOL-OLD facets. Only the loading on the factor representing the WHOQOL-OLD facet death and dying is 0.295, which is far below the limit of 0.500. Moreover, the R 2 of 0.087 indicates an insufficient low communality for the factor representing the facet death and dying but with an estimate of 0.257. This also holds for the factor representing the WHOQOL-OLD facet sensory abilities.
The fit-characteristics for both models are presented in Table 5. The Chi 2 values indicate significant deviances from the empirical covariance structure but that would be expected because of the large sample size. The general fit parameters CFI and TFI are sufficient for both models; the same is true for RMSEA and the SRMR. The comparison of the fit parameters between both models reveals no improvement of the model fit by adding the second order common factor. The loadings Table 4 Inter-correlations of the factors representing the WHOQOL-OLD facets   Figure 9 Confirmatory second order factor model for the six WHOQOL-OLD facets.
clearly show that a one-dimensional representation cannot be recommended (Figure 9). Table 6 shows the correlations between the WHOQOL-OLD facets and the criterion variables. With the exception of the death and dying facet, all WHOQOL-OLD facets and the WHOQOL-OLD total score show medium to high positive correlations (between r = 0.363 and r = 0.798) with the WHOQOL-BREF subscales and the WHOQOL-BREF overall score. Medium to high positive correlations were also found between the WHOQOL-OLD facets except death and dying and the SF12 subscales "Physical Health Index" and "Mental Health Index." In contrast to all other WHOQOL-OLD facets, the facet death and dying shows much smaller correlations between r = 0.185 and r = 0.286 with the generic quality of life scales.

Discriminant validity
Results of the linear regression models are presented in Table 7. As indicated by the standardized regression coefficients, depressive symptoms have the strongest negative effect on all six WHOQOL-OLD facets and on the total WHOQOL-OLD score. The level of cognitive functioning has a positive effect on all facets except death and dying and on the total score. The number of chronic diseases is negatively related to sensory abilities and to death and dying and positively related to intimacy. Socio-demographic characteristics and living arrangements affect only some of the WHOQOL-OLD facets. Increasing age is related to decreasing sensory abilities but positively to past, present and future activities. Female sex is negatively related to autonomy. In comparison to persons who live alone, those who live with others have a higher quality of life on the WHOQOL-OLD facets death and dying, intimacy and a higher WHOQOL-OLD total score. Persons with a higher formal education assess their past, present and future activities better than those with a lower educational level.
As indicated by the adjusted R 2 a considerable amount of variance was explained by the model variables.

Discussion
This is the first examination of the psychometric properties of the WHOQOL-OLD for a representative sample of the German population aged 60 years and older. Psychometric properties were examined by means of the classic test theory and, essentially, by probabilistic test theory.
The examination of the parameters for the internal consistency revealed high reliability coefficients and high item-scale respective intern item correlations for four facets sensory abilities, participation, death and dying and intimacy of the six facets of the WHOQOL-OLD. The remaining two facets autonomy and activity show low, but still acceptable, values for the internal consistency.   Results of the probabilistic test theory approach indicate that all facets of the WHOQOL-OLD can be represented by a partial credit model with ordered thresholds. Fit indices show that all items are productive for measurement. The thresholds of the answering categories have an ascending order for all items but the varying thresholds between the answering categories indicate that the measurement characteristics of the items and the answering categories are unequal.
The construct validity of the six-facet model of the WHOQOL-OLD was supported by the first order confirmatory factor analysis for the six facets model but not by the second order model for the WHOQOL-OLD total scale. Convergent validity of the WHOQOL-OLD facets could be well confirmed with regard to the subscales of the generic quality of life measures WHOQOL-BREF and SF12.
Results from the multiple regression models indicate that symptoms of depression are the strongest predictor of all WHOQOL-OLD facets. Nevertheless, cognitive functioning, the ability to carry out daily activities and chronic diseases are also important factors in explaining quality of life.
Results of our analyses reveal that the psychometric properties of the German version of the WHOQOL-OLD are similar, as good as, or better than those reported from the international WHOQOL-OLD field study [51] and as those of other country versions recently tested in Norway [52], China [21], Brazil [53,54], France [55] and Turkey [56].
As revealed by Power et al. [51] for the international WHOQOL-OLD data set and by Liu et al [57] for the Chinese version of the WHOQOL-OLD, a good construct validity was obtained for the German version of the WHOQOL-OLD in our study for the six facet structure but not for second order factor model. These results underline that the WHOQOL-OLD represents a multidimensional construct of quality of life in old age that cannot be reduced to one latent dimension. Nevertheless, efforts have been made to develop a short version of the WHOQOL-OLD [57] and the authors recommend three versions with different selections of six items from the WHOQOL-OLD. However, the reliability of all three versions of this instrument is worse in comparison to that of the WHOQOL-OLD.
As in the cross-cultural WHOQOL-OLD studies [51] and in several national studies [21,[58][59][60], depressive symptoms were also found to explain a considerable amount of variance in all facets of the German version. Chachamovic et al. [60] examined the effects of a major depression diagnosis in comparison to subclinical symptoms of depression and found that even in the absence of a diagnosis of a major depression, sub clinical symptoms of depression have a strong negative effect on all facets of the WHOQOL-OLD.
The strong negative effect of depressive symptoms on QoL in the German population corresponds with results from cross cultural studies on the importance of different domains of QoL showing that the presence of positive feelings and the absence of negative feelings ranked higher than average in the German sample [61]. The importance of positive feelings on QoL could be related to the high level of economic development in Germany. Economic development has been identified as a major cultural factor in explaining the variance in cross cultural importance rankings. While in developing countries the facets related to physical health were ranked higher than those related to psychological well-being, the opposite was the case in developed countries [61]. Nevertheless, associations of important rankings of psychological well-being with economical development do not necessarily result in different effects of depressive symptoms on QoL. Dragomirecka et al. [62] identified depressive symptoms as the main predictors of most WHOQOL-OLD domains in all countries independent of the countries' economical wealth status in their cross cultural comparison of QoL in the elderly population of six European countries [62]. These results support the hypothesis that depressive symptoms are intercultural predictors of quality of life in elderly people. However, since most studies on QoL in elderly people are crosssectional, the exact relationships between objective living circumstances, cultural factors, depressive symptoms and QoL are still unclear. Longitudinal cross-cultural studies would allow for the analysis of whether cultural factors or symptoms of depression work as mediator or moderator variables in this relationship.

Limitations
Due to the cross-sectional design of the study, test-retest reliability and sensitivity to change of the WHOQOL-OLD could not be assessed. The clinical status of the respondents was assessed by means of the self-rating GDS, which does not allow the diagnosis of major depression. Therefore, it was not possible to examine differences between the impact of clinical and sub-clinical levels of depression.