Development and validation of the Brazilian version of the Attitudes to Aging Questionnaire (AAQ): An example of merging classical psychometric theory and the Rasch measurement model
Health and Quality of Life Outcomes volume 6, Article number: 5 (2008)
Aging has determined a demographic shift in the world, which is considered a major societal achievement, and a challenge. Aging is primarily a subjective experience, shaped by factors such as gender and culture. There is a lack of instruments to assess attitudes to aging adequately. In addition, there is no instrument developed or validated in developing region contexts, so that the particularities of ageing in these areas are not included in the measures available. This paper aims to develop and validate a reliable attitude to aging instrument by combining classical psychometric approach and Rasch analysis.
Pilot study and field trial are described in details. Statistical analysis included classic psychometric theory (EFA and CFA) and Rasch measurement model. The latter was applied to examine unidimensionality, response scale and item fit.
Sample was composed of 424 Brazilian old adults, which was compared to an international sample (n = 5238). The final instrument shows excellent psychometric performance (discriminant validity, confirmatory factor analysis and Rasch fit statistics). Rasch analysis indicated that modifications in the response scale and item deletions improved the initial solution derived from the classic approach.
The combination of classic and modern psychometric theories in a complementary way is fruitful for development and validation of instruments. The construction of a reliable Brazilian Attitudes to Aging Questionnaire is important for assessing cultural specificities of aging in a transcultural perspective and can be applied in international cross-cultural investigations running less risk of cultural bias.
The world is experiencing a profound and irreversible demographic shift as older people are living longer and healthier than ever before [1, 2]. The world's older adult population is estimated to show a threefold increase over the next fifty years, from 606 million people today to 2 billion in 2050 . In 2002, older people constituted 7 per cent of the world's population and this figure is expected to rise to 17 per cent globally by 2050 . The most dramatic increases in proportions of older people are evident in the oldest old section of society (people aged 80 years plus) with an almost fivefold increase from 69 million in 2000 to 377 million in 2050 .
The World Health Organisation has described this demographic shift as a major societal achievement, and a challenge . The increase in longevity is being experienced in the developed and the developing world alike, but where the developed world grew rich before it grew old, the developing world is growing old before it has grown rich . While older people are living longer they are generally remaining healthier with an increase in percentage of life lived with good health. Nonetheless older people are still seen as net burdens on society rather than net contributors to it [5, 6].
Quantifying the raise of proportion of old adults in the world population is relevant but insufficient. It is also important to study the quality of this increase. The experience of ageing is primarily subjective and depends on several factors, such as gender, physical condition, environment, behavioural and social determinants, psychological strategies and culture [5, 7–10]. Culture is considered particularly relevant since it shapes the way in which one ages due to the influence it has on how the elderly are seen by a determined context . Moreover, the cultural aspects could be understood as a pathway through which the external aspects would impact on ageing experiences.
Authors state that the vast majority of research and discussion is done by young adults, whereas older adults would be the most indicated to propose adequate ways of doing it [11, 12]. Bowling and Diepe argue that lay viewers are important for testing the validity of existing models and measures, since most of the discussion tends to reflect only the academic point of view . Even though investigating the ageing process has been a topic of increased interest, there is a remarkable lack of well-designed and tested instruments to assess it. The few developed so far are either not specific to cover older adult's experiences or have been exclusively carried out in developed countries . As far as we are aware, there is no instrument developed or validated in developing region contexts, so that the particularities of ageing in these areas are not included in the measures available.
To address this issue, the WHOQOL Group has developed the AAQ instrument under a simultaneous methodology , which ensured the participation of different centres throughout the world (described in details in Laidlaw et al, 2007) . Briefly, the development process included centres from distinct cultural contexts in qualitative item generation, piloting and field testing. The applied methodology followed the one established by the World Health Organization Quality of Life Group [16, 17] for the development and adaptation of quality of life measures and was used for the development of the WHOQOL-OLD module [18, 19].
Regarding development of new measures or validation of existing ones, new approaches have been added to the traditional ones in order to expand the scale's properties beyond reliability and validity . The Rasch model has been adopted since it permits that data collected may be compared to an expected model and allows testing other important scale features, such as reversed response thresholds and differential item functioning.
The present paper aims to illustrate the potential combination of classical psychometric theory and Rasch Analysis in the validation of the AAQ instrument in a Brazilian sample of older adults.
The pilot study followed the methodology applied by the WHOQOL Group in developing quality of life measures [16, 17]. This includes translation and back-translation of the items and instructions by distinct professionals, as well as semantic and formal examination by the coordinator centre. Convenience sampling was used. The main purpose of this stage was to collect data about the item performance in order to produce a reduced version after refinement. The combination of classical and modern (item response theory) statistical analyses was used at this point. A set of 44 items were tested in an opportunistic sample of 143 subjects (age range 60–99, 59% female, 55% living alone, and 59% considered themselves subjectively healthy). Patients with dementia, other significant cognitive impairments and/or terminal illness were excluded. Data collected at this stage were sent to the coordinator centre to be merged with other centres' information.
Statistical analyses were carried out to check the items regarding missing values, item response frequency distributions, item and subscale correlations and internal reliability. No missing values were found in any of the 44 items in the Brazilian sample. The analysis of the pooled international data indicated the need of item refinement, which resulted in a 38-item version to be tested in the field trial (see Laidlaw et al (2007) for more details on this refinement stage) .
The Brazilian Field Trial was carried out with a non-probabilistic opportunistic sample of 424 older adults recruited from a university hospital, community houses and nursing homes, elderly community groups, and their own homes. Subjects were invited to take part of the study and were asked to indicate other potential participants (snowball strategy). Sampling was used according to previous stratification determined by subjective perception of health status (50% healthy ones and 50% unhealthy ones), gender (50% female) and age (60–69 years of age, 70–79 years of age and over 79 years of age). Subjective perception of health status was assessed by the question "In general, you consider yourself healthy or unhealthy?", regardless of the objective health condition. Exclusion criteria followed the ones used in the pilot study . The purpose of stratification was to ensure a minimal representation in each subgroup to make further analyses possible.
This version comprised the 33 items from the Pilot Study plus 5 items added by the Coordinator Centre (Edinburgh) in order to cover areas not sufficiently investigated by the original format. These 5 items were translated and back-translated and re-examined by the coordinator centre. In addition, subjects completed a socio-demographic form and the Geriatric Depression Scale 15-item version .
The combination of classical and modern psychometric approaches was applied. The descriptive data analysis was used to determine item response frequency distributions, missing values analysis, item and subscales correlations and internal reliability analyses. Exploratory and Confirmatory Factor analysis were performed to assess whether the Brazilian data fit the international pooled model. Finally, an IRT approach, in particular, that of the Rasch model as implemented in the RUMM 2020 program , was used to examine the performance of items in the Brazilian dataset.
Table 1 describes the socio-demographic characteristics of both the Brazilian and the international samples. Note that the international sample is composed of the data collected in all centers apart from Brazil. Chi-Square and Independent T-tests were carried out to check statistical differences across both samples. Following the detection of differences in gender and educational level distributions, as well as in the mean depression level, an Independent T-test was then run to compare means of the three original AAQ factor scores (as described in Laidlaw et al, 2007)  between the two samples. Briefly, the factor scores were calculated by summing the items included in each factor. Results indicate statistical differences in all three factor scores, as well as in the overall score.
An Ancova analysis was then carried out to assess the extent to which the interaction among depression, gender and educational level was implied in determining differences in the scores (overall and each factor). Comparisons between both samples were run to rule out the possibility that differences in posterior factor analyses are due to distinct sample characteristics. Table 2 illustrates the Ancova findings, indicating that the statistical difference in the distribution of these variables between the two samples does not interfere significantly with the score variations .
Summary descriptives statistics for item analyses are shown in Table 3. There is low frequency of missing values across the items. Comparison of the missing frequencies with the international dataset showed a lower frequency in the Brazilian sample.
Exploratory Factor Analysis
Data were initially examined through Exploratory Factor Analysis (Principal Component Analysis with Varimax Rotation). Extraction strategy included selecting factors with eigenvalues higher than 1 (and confronted to Monte Carlo Parallel Analysis to control for spurious findings) and scree plot observation [24–26]. The three-factor solution (indicated both by the Kaiser Rule plus Parallel Analysis and Scree Plot) accounted for 34.45% of the total variance, whereas in the international sample the same structure was responsible for 32.74%.
EFA findings were compared to the international ones. There is a great similarity of the item loadings when comparing to the EFA run in the international dataset. Out of 38 items, only five (items 4, 5, 9, 15 and 31) loaded onto different factors across both datasets. It is important to notice that items 4 and 31 were not retained in the final AAQ version since they lowered CFA results in further international analyses.
The item reliability was analyzed through Cronbach's alpha coefficients for the three subscales suggested by the EFA. The Brazilian dataset showed coefficients of .863 for the Subscale I (and .845 for the International dataset), .804 for the Subscale II (.822 for the International sample) and .671 for the Subscale III (.701 for the International subscale).
The Item Total Correlation Analysis was then carried out in distinct steps. Firstly, the Brazilian dataset was analyzed to verify correlations below a critical cut-point (r = 0.40). Secondly, the International dataset underwent the same analysis. Thirdly, both findings were compared to verify potential discrepancies. Six items in the Brazilian dataset showed insufficient correlations (items 1,5,6,11,18 and 19). All these six items proved to show low coefficients in the International dataset too. Out of these, only item 18 remained in the final international AAQ version.
The Multi-trait Analysis Program (MAP)  was also used to assess scale fit and internal reliability of the three-factor model. Although six items loaded highly on other factors besides the predicted one (9, 13, 21, 24, 33 and 34, r ≥ .40 < .52), no items presented higher correlations with an unpredicted factor than with the predicted one. Furthermore, the directions presented by the MAP analysis (correlation coefficients) were in accordance with the EFA loadings.
Confirmatory Factor Analysis
CFA was carried out using AMOS 6.0 software . First, the 38 items three-correlated-factor solution was tested, showing insufficient results (χ2 = 1516.60 p < .001, df = 662, CFI = 0.79, RMSEA = 0.05). In order to verify the impact of the correlation among factors, the uncorrelated solution was then tested, showing further decrease in model fit (χ2 = 1943.63 p < .001, df = 665, CFI = 0.68, RMSEA 0.06).
Following the steps adopted by the international development of AAQ , the 31-item three-factor solution was then assessed in order to verify potential improvement in model fit. Similarly to the international findings, this structure showed insufficient improvement (χ2 = 1005.62 p < .001, df = 431, CFI = 0.82, RMSEA = 0.05). Again, allowing interfactor correlation determines great model fit improvement.
The final 24-item version was also tested in the Brazilian dataset, according to the structure illustrated in Figure 3.
Remarkable improvements in model fit were shown (χ2 = 645.19 p = .061, df = 249, CFI = .83, RMSEA = .06). The comparison of these indexes to the international ones indicate that the performance of the Brazilian final version is similar (international findings present CFI = .84 and RMSEA = .05)
To assess the discriminant validity, a correlation between each domain score and the depression levels was performed. It was predicted that depression levels would be negatively correlated to the three factors, and that the physical factor should present a lower coefficient than the two psychological factors. In fact, the correlation results showed coefficients of r = -.59 with psychosocial loss, r = -.59 with psychological growth and r = -.35 with physical change.
Item Response Theory
Responses were tested according to the Rasch model for polytomous scales . Basically, the responses patterns observed in data collected are tested against an expected probabilistic form of the Guttman Scale . Different fit statistics are applied to determine whether the observed data fits the expected model or not . According to Rasch measurement theory, a scale should have the same performance, independently of the sample being assessed (e.g., age or gender) [20, 21]. Reverse thresholds, an overall Chi-Square test (indicating whether the observed data differs from the expected model), item Chi-Square fit and Item fit-residuals were tested. In addition to these fit indexes, the item bias DIF (differential item functioning) was verified, since it can determine decrease in model fit, as well as measurement inappropriateness. The Person Separation Index (PSI) was calculated for each factor as an indicator of internal consistency reliability. In fact, the PSI gives information comparable to the Cronbach's Alpha from classic psychometric theory.
Table 4 presents the Rasch findings for the 24-item version in its original form. At this stage, the 5-point Likert response scale was maintained in its original form. As mentioned above, the Chi-Square (both for the model and for items separately) has the purpose of assessing whether the data collected fits the expected theoretical model. Thus, p values lower than 0.05 (corrected for Bonferroni Multiple Comparisons) indicate that the first is significantly different from the second, rejecting the desired similarity . Item residuals (a sum of item and individual person deviations) also permit the assessment of item fit, and values from -2.5 to +2.5 show adequate fit.
Results described in Table 4 show that 6 items (9, 14, 15, 19, 21 and 22) presented high residuals and/or item χ2 scores significantly different from the expected. The model fit for the three subscales also indicated misfitting. Furthermore, 15 out of 24 items presented threshold disorders, which suggests that the response scale is not adequate and therefore contribute to the misfittings found both in model and item levels.
Thus, rescoring items was carried out in order to improve the model. Firstly, the category probability curves were checked for each item. This approach allows the investigator to verify what response categories present disorders and, thus, what specific categories should be collapsed to improve the scale. Factors I and II demanded that categories two and three were merged, whereas factor III needed categories 3 and 4 collapsed together.
Analysis using the new 4-point scale showed that Factors I and III had remarkable improvement, with no model or item misfittings. On the other hand, Factor II presented a slight increased fit, but still insufficient (Model χ2 = 87.12, DF 48, P = 0.0004, PSI = .752). The second step was then deleting the items responsible for the remaining misfitting, namely items 19 and 22. The final model, then, proved adequate fit. No reversed threshold or DIF remained after rescoring and item deletion (Factor II). Person Separation Indexes showed adequate scores for group comparisons (i.e., PSI > .70). Table 5 presents the indexes for the final model.
Local independence of items and unidimensionality (two Rasch assumptions) were assessed for the three final factors through two statistical tests. Item residuals correlations were firstly analysed to check the potential presence of local dependence (i.e., two items highly correlated in the final model, so that the response to one would be determined by the other). No correlations above 0.300 were found, which indicates local independence. Secondly, the pattern of residuals was analysed thorough PCA of the residuals. The first PCA factor was divided into two subsets (defining the most positive and negative loadings on the first residual component). These two subsets were then separately fitted into Rasch Model and the person estimates were obtained. An Independent T-test was then carried out to detect potential differences between the two subsets, which would indicate the presence of multidimensionality in the model . No significant differences were found for the three factors of the scale (Factor 1, p = 0.051, Factor 2 p = 0.654, Factor 3 p = 0.090).
The present paper had two complementary aims. First, it had the goal of presenting a validated Brazilian version of the Attitudes to Aging Scale. This version will permit that aging experiences may be assessed in a distinct and poorly investigated population. Furthermore, since aging is a widespread phenomenon and is highly dependent on socio-cultural aspects, it is extremely important that new measures of this construct can be successfully applied in different contexts. This would permit that adequate cross-cultural investigations on attitudes to aging may be carried out, including a valid and reliable instrument.
Secondly, this article aims to present a comprehensive approach in validating new measures, which include both classical psychometric theory and modern methodologies together in a complementary way. While the traditional approach provides relevant information regarding discriminant validity, missing values distributions and factor analyses loading, Rasch analysis represents a powerful tool in assessing item bias, threshold disorders and model fit .
The Attitudes to Aging Questionnaire is a unique measure of perception regarding aging, since it was developed through a well-established international methodology and based since its principle in focus groups run with older adults [15–17, 33]. Furthermore, it relies on the assumption that the subjective perception of the aging process is the ultimate construct to be measured, other than objective indicators of physical activity or psychological distress.
Regarding the psychometric performance, the Brazilian version demonstrates good performance on both classical and Rasch approaches. Despite the insufficient goodness-of-fit indexes in CFA (CFI < .90), suitable discriminant validity, and excellent fit indicators from Rasch analysis suggested that the Brazilian version has satisfactory performance and, thus, can be applied in different studies reliably.
Another relevant issue regarding the findings of the AAQ validation is the construct similarity between the international sample and the Brazilian one. The three factors proposed by the international analysis seem to be replicated in the Brazilian dataset. Indeed, Psychosocial Loss, Physical Change and Psychological Growth represented the theoretical ground upon which items were grouped during the factor analysis phase. It could indicate that the perception of aging did not differ significantly between the two samples and raises the question of whether these similarities remain or not in other different cultures. The demonstration of cultural invariance of the core attitudes to aging could lead to the possibility of reliable comparisons, which is needed by both researchers and policy makers.
It is suggested, however, that rescoring and two item deletions could increase Brazilian scale fit and performance. These potential alterations should not promote crucial modifications in the scale format, since they can be made during the statistical analysis phase and not necessarily in the data collection stage. Since this is the first psychometric analysis of the Brazilian AAQ version, authors encourage the scale users to verify whether the 22-item version maintains its superiority over the original 24-item format in distinct samples, and then explicitly decide for one format.
The described findings support the hypothesis that the development of a new international instrument according to a simultaneous methodology, which includes an intense qualitative initial phase, is adequate to generate reliable cross-cultural measures. In conclusion, the Brazilian version of the AAQ instrument is a reliable, valid and consistent tool to assess attitudes to aging and can be applied in international cross-cultural investigations running less risk of cultural bias.
Kinsella K, Velkoff VA: US Census Bureau, Series P95/01–1, An Aging World: 2001. Washington D.C.: US Government Printing Office; 2001.
United Nations: World Population Ageing: 1950–2050. Department of Economic and Social Affairs, Population Division. New York: United Nations Publications; 2001.
US Census Bureau: International Population Reports WP/02, Global Population Profile: 2002. U.S. Government Printing Office, Washington, DC; 2004.
United Nations: World Population Prospects: The 2002 revision. United Nations Population Division; New York 2003.
WHO: Active Ageing: A Policy Framework. World Health Organisation Geneva 2002.
WHO: Ageing: Exploding the Myths. World Health Organisation Geneva 1999.
Baltes , Smith : New frontiers in the future of aging: from successful aging to the young old to the dilemmas of the fourth age. Gerontology 2003, 49(2):123–35. 10.1159/000067946
Levy Br, Slade MD, Kunkel SR, Kasl SV: Longevity increased by positive self-perceptions of aging. J Pers Soc Psychol 2002, 83(2):261–270. 10.1037/0022-35126.96.36.1991
Knight BG: Psychotherapy with Older adults. 3rd edition. Thousand Oaks: Sage Publications; 2004.
Ebner NC, Freund AM, Baltes PB: Developmental changes in personal goal orientation from young to late adulthood: From striving for gains to maintenance and prevention of losses. Psychology and Aging 2006, 21: 664–678. 10.1037/0882-79188.8.131.524
Duhl LJ: Aging by one who is aging. J Epidemiol Community Health 2005, 59(10):816–7. 10.1136/jech.2005.035675
Boduroglu A, Yoon C, Luo T, Park DC: Age-related stereotypes: A comparison af American and Chinese cultures. Gerontology 2006, 52: 324–333. 10.1159/000094614
Bowling A, Dieppe P: What is successful ageing and who should define it? BMJ 2005, 331: 1458–1551. 10.1136/bmj.331.7531.1548
Laidlaw K, Power MJ, Schmidt S, the WHOQOL Group: The attitudes to ageing questionnaire (AAQ): Development and psychometric properties. Int J Geriatr Psychiatry 2007, 22: 367–379. 10.1002/gps.1683
Bullinger M, Power M, Aaronson NK, Cella DF, Anderson RT: Creating and evaluating cross-cultural instruments. In Quality of life and pharmacoeonomics in clinical trials. Edited by: Spilker B. Hagerstown, MD. Lippincott-Raven; 1996:659–668.
The WHOQOL Group: The World Health Organization quality of life assessment (WHOQOL): development and general psychometric properties. Soc Sci Med 1998, 46: 1569–85. 10.1016/S0277-9536(98)00009-4
The WHOQOL Group: Development of The World Health Organization WHOQOL-BREF Quality of Life Assessment. Psychol Med 1998, 28: 551–558. 10.1017/S0033291798006667
Power MJ, Quinn K, Schmidt S, WHOQOL-OLD Group: Development of the WHOQOL-old module. Qual Life Res 2005, 14(10):2197–214. 10.1007/s11136-005-7380-9
Fleck MP, Chachamovich E, Trentini C: Development and validation of the Portuguese version of the WHOQOL-OLD module. Rev Saude Publica 2006, 40(5):785–91.
Pallant J, Tennant A: An introduction to the Rasch measurement model: An example using the Hospital Anxiety and Depression Scale (HADS). Br J Clin Psychol 2007, 46: 1–18. 10.1348/014466506X96931
Sheik JI, Yesavage JA: Geriatric Depression Scale (GDS): recent evidence and development of a shorter version. Clin Gerontol 1986, 37: 819–820.
Andrich D, Lyne A, Sheridan B, Luo G: RUMM 2020. Perth: RUMM Laboratory; 2003.
Field A: Discovering Statistics using SPSS. 2nd edition. SAGE, London; 2005.
Kauffman JD, Dunlap WP: Determining the number of factors to retain: a Widows-based FORTRAN-ISL program for parallel analysis. Behav Res Methods Instrum Comput 2000, 32(3):389–95.
Zwick WR, Velicer WF: Comparison of five rules for determining the number of components to retain. Psychol Bulletin 1986, 99: 432–442. 10.1037/0033-2909.99.3.432
O'Connor BP: SPSS and SAS programs for determining the number of components using parallel analysis and Velicer's MAP tests. Behav Res Methods Instrum Comput 2000, 32(3):396–402.
Hays RD, Hayashi T, Carson S, Ware JE: User's guide for the Multitrait Analysis Program (MAP). Santa Monica, CA. The Rand Corporation, N-2786-RC 1988.
Arbuckle JA: Amos 6.0 User's Guide. In Amos Development Corporation. Spring House, PA, USA; 2005.
Andrich D: Rating formulation for ordered response categories. Psychometrika 1978, 43: 561–573. 10.1007/BF02293814
Gutman L: The basis of scalogram analysis. In Measurement and prediction. Edited by: Stouffer SA. Princeton, NJ: Princeton University Press; 1950.
Smith EV Jr: Detecting and evaluation of the impact of multidimensionality using item fit statistics and principal component analysis of residuals. Journal of Appl Meas 2002, 3(2):205–231.
Bland JM, Altman DG: Multiple significant tests: The Bonferroni Method. BMJ 1995, 310: 170.
Guillemin F: Cross-cultural adaptation and validation of health status measures. Scand J Reumathol 1995, 24(2):61–3. 10.3109/03009749509099285
This paper was partially supported by CAPES, scholarship number PDEE 3604-06/3
The author(s) declare that they have no competing interests.
EC participated in the study design, data collection, statistical analysis and drafted the manuscript; MPF participated in the study design, statistical analysis and helped to draft the manuscript; CMT participated in the study design and data collection; KL helped to draft the manuscript and took part in the theoretical discussion; MJP participated in the study design, statistical analysis and helped to draft the manuscript. All authors read and approved the final manuscript.
Marcelo P Fleck, Clarissa M Trentini, Ken Laidlaw and Mick J Power contributed equally to this work.
About this article
Cite this article
Chachamovich, E., Fleck, M.P., Trentini, C.M. et al. Development and validation of the Brazilian version of the Attitudes to Aging Questionnaire (AAQ): An example of merging classical psychometric theory and the Rasch measurement model. Health Qual Life Outcomes 6, 5 (2008). https://doi.org/10.1186/1477-7525-6-5