Skip to main content

Utilization of the propensity score method: an exploratory comparison of proxy-completed to self-completed responses in the Medicare Health Outcomes Survey



This research examined the use of the propensity score method to compare proxy-completed responses to self-completed responses in the first three baseline cohorts of the Medicare Health Outcomes Survey, administered in 1998, 1999, and 2000, respectively. A proxy is someone other than the respondent who completes the survey for the respondent.


The propensity score method of matched sampling was used to compare proxy and self-completed responses. A propensity score is a value that equals the estimated probability of a given individual belonging to a treatment group given the observed background characteristics of that individual. Proxy and self-completed responses were compared on demographics, the SF-36, chronic conditions, activities of daily living, and depression-screening questions. For each individual survey respondent, logistic regression was used to calculate the probability that this individual belonged to the proxy respondent group (propensity score). Pre and post adjustment comparisons were tested by calculating effect sizes.


Differences between self and proxy-completed responses were substantially reduced with the use of the propensity score method. However, differences were still found in the SF-36, several demographics, several impaired activities of daily living, several chronic conditions, and one depression-screening question.


The propensity score method helped to reduce differences between proxy-completed and self-completed survey responses, thereby providing an approximation to a randomized controlled experiment of proxy-completed versus self-completed survey responses.


Surveys such as the Medicare Health Outcomes Survey (HOS) [1] are widely used to assess respondents' physical and mental health status. While survey methods are crucial to the assessment of self-reported health care conditions and outcomes, the use of proxy-completed responses in interviews and surveys may systematically affect responses (a proxy is someone other than the respondent, i.e. professional caregiver, friend, family member, or relative who completes the survey for the beneficiary). This is a particularly troublesome problem for data collected on an elderly population, since the elderly frequently must rely on a proxy. The propensity score method provides an approach for assessing bias in self-report surveys such as the Medicare HOS. The goal of using the propensity score methodology is to create balance between different groups of subjects [2]. In this research, we apply the propensity score method [3] to three cohorts of self and proxy-completed responses in the Medicare HOS to compare results for physical and mental health status.

Self-Completed and Proxy-Completed Response Differences

Literature exists documenting the differences between self and proxy-completed responses on health status surveys. For example, some research demonstrates that proxy-completed responses tend to more accurately report conditions that are less private and more observable, but tend to underestimate less observable conditions such as emotional and affective states [4, 5]. Additionally, Yip, Wilber, Myrtle, and Grazman found that mean scores were significantly lower for proxy-completed responses compared to self-completed responses on the Physical Functioning, Vitality, and Mental Health scales of the 36-Item Short-Form Health Survey (SF-36)[6].

Other research also has indicated significant disagreement between proxy-completed responses and observers for instrumental activities of daily living (IADLs). In this research, proxy-completed responses underreported IADLs compared to observers who watched subjects engaged in IADLs [7]. Systematic biases were found in the National Health Interview Survey; results indicated that proxy-completed responses underreported disabilities for those aged 18 to 64 years, but overreported disabilities for those 65 and older [8]. Data from the Canadian SF-36 indicated that proxy-completed responses tended to underestimate health status, with poor to moderate agreement between proxy-completed responses and the disabled elderly [9]. In examining data from Canada's National Population Health Survey, Shields [10] found significant differences between self and proxy-completed responses. In general, proxy-completed respondents underestimated the prevalence of certain health conditions. However, disagreement between self-completed and proxy-completed respondents was less likely for conditions that the proxy respondents were more likely not to mislabel, such as diabetes, heart disease, and cancer. It is evident that there are inconsistencies in the literature regarding proxy-completed and self-completed respondents and continued research is necessary to understand these inconsistencies.

Because proxy-completed responses are often necessary in assessing health outcomes for the elderly, it is important that methods be found for examining selection bias. The propensity score is one such method used to reduce selection bias in observational studies. This paper explores the use of the propensity score method in understanding the differences between proxy and self-completed responses, by applying this method to the Medicare HOS data.

Propensity Score Methodology

Donald Rubin [3, 1113] pioneered the propensity score methodology, which has been used extensively in medical research [e.g. 14–17]. Theoretically, this method is similar to an experimental design, but it is applied to survey or observational data and has the potential to reduce selection bias. Simply put, the propensity score is the probability that an individual belongs to a naturally occurring treatment group, based on the individual's background characteristics (covariates). Since the propensity score summarizes the information on the background characteristics in a single summary score, it has a distinct advantage over standard matching techniques [1820]. These latter techniques require the investigator to find subjects who are closely matched on each of many individual covariates, an often difficult task. Once the propensity scores have been calculated, the "treatment" and "control" groups (in this case, proxy-completed and self-completed respondents) can each be stratified into similarly matched comparison groups based upon their propensity scores. For each stratum we can then examine two groups of survey respondents, a group of proxy-respondents and a group of self-respondents that have similar propensity scores and who were "randomly" assigned to the groups in the sense of being equally likely to be a proxy or self-respondent [2].

For example, in a 2002 study on nephrology consultation in acute renal failure, Mehta et al. [17] used the propensity score methodology to assess the timing of nephrology consultation and in-hospital mortality. In this example, the authors created the propensity score using the characteristics that differentiated the delayed and early consultation groups. These authors state, "Inclusion of the propensity score as a covariate in a multivariable regression accounts for likelihood of 'treatment' (in this case, timing of consultation), and may adjust for unobserved, confounding, and selection bias, thereby refining regression estimates".

Similarly, Kilborn et al. [14] utilized the propensity score methodology in a nonrandomized study of amiodarone and mortality among acute myocardial infarction patients with atrial fibrillation. Patient characteristics that were associated with prescriptive use of amiodarone were incorporated into a regression analysis through the use of the propensity score methodology.

Assessing the difference between proxy-completed and self-completed responses in survey research is analogous to nonrandomized treatment studies such as the two discussed above. Proxy-completed responses can be conceptualized partly as the results of selection bias, and partly the result of true differences between proxy and self-respondents. Generally, proxy respondents who answer survey items for the respondent occupy a specific role in the respondent's life such as a family member (spouse, child), friend, or professional caregiver. Essentially, these proxy respondents bring a cognitive role set with them when they complete survey items on behalf of the respondent. This role set can bias responses to survey items. For example, if an adult child completes a survey for an elderly parent, the adult child may understate the physical and mental health status of the elderly parent. Indeed, these results were found in analyses conducted on the Canadian SF-36 [9]. Hence, it becomes very important to test for differences between proxy and self-completed respondents. If differences are found, adjustment procedures such as the propensity score should be used to reduce those differences in further analyses on outcomes such as physical and mental health status.

The propensity score is a methodology that has heuristic potential for quality of life research, which generally involves self and proxy-completed responses. We apply this methodology to three cohorts of data that include self-completed and proxy-completed responses from the Medicare HOS in order to determine if differences in physical and mental health status remain after adjustment using the propensity score.


Data collection

The Medicare HOS assesses the physical and mental health status of the Medicare elderly enrolled in managed care in the United States. Beginning in 1998 and continuing annually, a baseline cohort is created from a randomly selected sample of 1,000 Medicare members from each applicable Medicare contract market area. In plans with fewer than 1,000 Medicare members, the sample includes the entire enrolled Medicare population that meets the inclusion criteria. Medicare beneficiaries who are continuously enrolled in the health plans for at least six months are eligible for sampling [21].

The data collection protocol includes a combination of mail and telephone surveys. Multiple mailings, standardized telephone interviews, interviewer training, and methods for maximizing response rates are well established in the Health Plan Employer Data and Information Set (HEDIS) specifications [22]. The Medicare HOS instrument includes the SF-36 health survey, which is a widely used multi-purpose, short-form health survey. Psychometric properties, reliability and validity studies of the SF-36 as well as normative data are available in user manuals [23, 24]. The SF-36 yields an eight-scale profile of scores and is a generic measure as opposed to one that targets a specific age, disease, or treatment group. The eight scales form two distinct higher ordered clusters that are the basis for scoring the physical component summary (PCS) measure and mental component summary (MCS) measure. For this analysis, the SF-36 individual scale scores, as well as the PCS and MCS scores, have been normed to the values for the 1990 general U.S. population, so that a score of fifty represents the national average for a given scale or summary score. Higher scores on the SF-36 represent better physical and/or mental health status.

The respondents included in this study were beneficiaries in baseline cohorts I, II, and III; the data sets represented survey results for 1998, 1999, and 2000, respectively. The Medicare HOS cohort I baseline consisted of 279,135 Medicare members who were sampled from 269 Medicare+Choice organizations (M+COs) representing 287 contract market areas. Cohort II baseline consisted of 301,184 Medicare members from 283 M+COs in 312 contract market areas, and cohort III baseline consisted of 298,883 Medicare members from 275 M+COs in 306 market areas. Several criteria were met in selecting the final analytic sample. First, all duplicates were removed; i.e., only the first survey was used for any beneficiary. Second, beneficiaries in plans for Program of All-Inclusive Care for the Elderly (PACE), as well as EVERCARE (a program that provides care and care coordination to vulnerable, chronically ill beneficiaries) in cohorts II and III were removed (0 in cohort I; 4,225 PACE and 5,015 EVERCARE beneficiaries in cohort II; 3,267 beneficiaries in PACE programs in cohort III; PACE and EVERCARE beneficiaries have significantly lower PCS and MCS scores and are much more ill than non-PACE and non-EVERCARE beneficiaries in the Medicare HOS). Third, surveys must have had a response for the question, "Who completed this survey form?" and finally, responses must have had a survey for which the PCS and MCS scores were calculable. Based on these criteria, the total sample size of proxy-completed responses was 65,668 and for self-completed responses the total was 457,837.

In addition to the SF-36, demographic data, activities of daily living (ADLs), chronic conditions (angina pectoris, arthritis, cancer, congestive heart failure, Crohn's disease, diabetes, emphysema/asthma/chronic obstructive pulmonary disease [COPD], hypertension, myocardial infarction, other heart conditions, sciatica, and stroke), and three depression-screening questions were examined for differences between self-completed and proxy-completed responses.

Data Analyses

Three steps were necessary in applying the propensity score method to the Medicare HOS data. First, self-completed and proxy-completed responses were examined to establish differences between the groups of respondents (unadjusted comparisons). These two groups were compared on demographic variables, the SF-36 scores, type of chronic condition, type of impaired ADL, and three depression-screening questions.

Second, the propensity score was used to create comparison samples. The propensity score matching process involved developing a stepwise logistic model [12] to determine which demographic, disease, and disability variables affected the likelihood of a proxy response. Based on the values of these predictors, each beneficiary in the data set had an estimated probability of using a proxy, which is the propensity score.

Third, due to the large sample sizes, a stratified random sample was drawn from each decile of the distribution of the propensity score from the proxy-completed and self-completed respondent groups. Once the sample was drawn, resulting in a risk adjusted sub-sample, comparisons were made to determine whether proxy-completed responses differed from self-completed responses. Given the large sample sizes, effect sizes were used to determine significance. Effect size is defined as "...the degree to which the phenomenon is present in the population...or the degree to which the null hypothesis is false." Cohen [25] operationally defines effect sizes as follows: a small effect size is one that accounts for 2% (0.02) of the variance, a medium effect size accounts for 13%, (0.13) and a large effect size accounts for 26% (0.26). Cohen's effect size for proportions (p) was used to calculate the effect sizes for Tables 1 and 2 (h = φ1 - φ2, where: φ = 2arcsin √p). Cohen's effect size for means was calculated as:

Table 1 Demographics: Unadjusted Data from Cohorts I, II, and III
Table 2 SF-36, Proxy and Self-Completed Differences in Mean Scores: Unadjusted Data

Results and Discussion

Unadjusted Self-Completed and Proxy-Completed Response Comparisons


The proxy-completed responses differed from the self-completed responses on most demographic characteristics (table 1). Small effect sizes were found for all variables (with the exception of separated) and many large effects were found (white and Hispanic race; age 65–74 and 85 or over; 8th grade or less, some college, and more than a 4 year degree; homeowner status of owned and owned by someone in the family).

SF-36 Scores

Large effect sizes were found for the PCS, MCS, and all scales. Table 2 indicates that the mean PCS scores between proxy-completed and self-completed responses reflected a seven-point difference, with proxy-completed responses having lower scores than self-completed responses (33.99 and 40.97, respectively). The mean MCS score indicated a strikingly similar situation with a proxy-completed mean score of 46.69 and a self-completed mean score of 52.56.

Chronic Conditions

The proxy-completed responses differed from the self-completed responses on all chronic conditions. Additionally, proxy-completed respondents reported proportionally more of each condition; small, medium, and large effects were found for all conditions. The effect size for stroke was the largest (0.38) with about 20% of the proxy respondents who reported this condition compared to approximately 7% of the self respondents (table 3).

Table 3 Chronic Conditions, Activities of Daily Living, and Depression: Unadjusted Data

Impaired ADLs

Table 3 also indicates that large effects were found for all impaired ADLs. Proxy-completed responses had proportionally more impaired ADLs than self-completed responses. For example, a large difference existed in difficulty dressing. Approximately 36% of the proxy respondents reported inability or difficulty dressing, whereas only 10% of the self respondents reported a problem. Bathing was another ADL reflecting extreme differences. Approximately 41% of the proxy respondents had inability or difficulty bathing compared to only 12% of the self respondents. About 59% of the proxy respondents reported inability or difficulty walking and approximately 33% of the self respondents reported a problem walking.


Large effect sizes were also found between proxy-completed and self-completed responses for the three depression-screening questions. Proxy-completed responses had proportionally more affirmative responses to the depression-screening questions compared to self-completed responses. Approximately 40% of the proxy-completed respondents indicated feeling sad/blue for two or more weeks in the past year compared to about 21% of self-completed respondents. About 13% of the self-completed respondents reported feeling depressed or sad much of the time in the past year compared to 31% of the proxy-completed respondents. Similarly, approximately 25% of the proxy-completed respondents reported feeling depressed/sad for two or more years in their life compared to about 14% of the self-completed respondents.

These comparisons established that the proxy-completed respondents were demographically different, varied in type of chronic condition, reported more depression, and reported decreased status in physical and mental health compared to self-completed respondents.

Stepwise Regression

Since significant differences were found between self and proxy-completed responses on the above stated characteristics, these variables were entered in a stepwise logistic regression as independent variables, with the dependent variable coded as 1 for proxy-completed responses and 0 for self-completed responses. The independent variables were: age under 45, age 45 – 54, age 55 – 64, age 75 – 84, age 85 and over (reference group was 65 – 74); Black/African American race, Asian race, other race (reference group was white); Hispanic ethnicity; widowed, never married, divorced or separated (reference group was married); educational attainment of eighth grade or less, educational attainment of some high school, education beyond high school (reference group was high school graduate or GED); homeownership; female; Medicaid enrolled; institutionalized; activities of daily living; chronic conditions; and three depression-screening questions (table 4).

Table 4 Significant Variables for Propensity Score Adjustment

Based on the results of the regression analyses, proxy-completed respondents were about twice as likely to be under 45 years old and approximately four times as likely to be over the age of 84. They were about five times as likely to have an 8th grade education or less and to be institutionalized. They were about one and a quarter times more likely to have congestive heart failure and to be depressed two or more weeks in the past year.

The next step in the propensity score process involved creating a distribution from which to randomly sample respondents.

Sampling. Based on the values of the variables listed above, each beneficiary in the three cohorts had an associated probability of having a proxy-completed survey (i.e. the propensity score). The distribution of the propensity score was divided into deciles, and stratified random samples of 400 from the first through the eighth deciles were drawn from both the proxy-completed responses and the self-completed responses (table 5). Due to the large size of decile one, the stratified random sample was selected from the midpoint of that decile. Due to the small size of deciles nine and ten, 200 were drawn from the ninth decile and 50 were drawn from the tenth decile for both groups (see table 5). This methodology had the effect of providing a sample that was reasonably well distributed between the groups with respect to the characteristics that helped to determine proxy-completed responses. Thus, respondents in the proxy-completed and self-completed groups with equal (or nearly equal) propensity scores should have the same (or nearly the same) distributions on the variables included in the logistic regression model [22].

Table 5 Distribution of Propensity Scores Prior to Matched Sampling and Random Samples from Each Decile

Adjusted Self-Completed and Proxy-Completed Response Comparisons


Despite the propensity score adjustment, small effects in demographics existed within the adjusted self-completed and proxy-completed comparisons, as shown in table 6. Small effect sizes were found for both male and female gender; greater proportions of self-completed responses were male and higher proportions of proxy-completed responses were female. Small effects were found for American Indian/Alaskan Native, with proportionally less proxy-completed responses in this category. A small effect was also found for white race (more white proxy-completed responses). Small effects were found for ages 55–64 (more self-completed responses), 65–74 (more self-completed responses), and 85 or over (more proxy-completed responses). Small effects were found for all categories of marital status and a medium effect was found for divorced (more self-completed responses). Small effects were found for all educational levels. More self-completed respondents reported an 8th grade or less education and some high school; more proxy respondents reported an educational level of high school/GED, some college and college graduate; however, more self-completed respondents had more than a four year degree. and for all income levels, with the exception of $5,000 – $9,999 and $80,000 – $99,999. Small effects were found for all categories of homeowner status and for institutionalization.

Table 6 Demographics: Adjusted Data from Cohorts I, II, and III


Small effects were found for PCS and MCS scores (table 7). Medium effects were found for the Physical Functioning scale, the Vitality scale, the Social Functioning scale, and the Role-Emotional scale. Small effect sizes were found for all other scales.

Table 7 SF-36, Proxy, and Self-Completed Differences in Mean Normed Scores: Adjusted Data

Chronic Conditions

Table 8 indicates that small effect sizes were found for all chronic conditions except any cancer, congestive heart failure, and stroke.

Table 8 Chronic Conditions, Activities of Daily Living, Depression: Adjusted Data from Cohorts I, II, and III

Impaired ADLs

Small effects were found for all impaired ADLs; inability/difficulty toileting, eating, bathing, and dressing. However, inability/difficulty getting in or out of chairs and walking did not meet the effect size criterion (table 8).


A small effect size was found for the depression screening question, "depressed /sad 2 or more years in life." However, the other two depression screening questions did not meet the small effect size criterion.


The results of this exploratory use of the propensity score method to compare proxy-completed and self-completed responses indicate that differences between the two samples were substantially reduced, although some differences remained after utilizing the propensity score methodology. We believe that three conclusions can be drawn from this research. First, the use of the propensity score method may be quite useful in reducing selection bias between self and proxy respondents in survey research. This methodology provides a unique tool and innovative approach for reducing this bias.

Second, though some differences between self and proxy-completed responses in this research remained after applying the propensity score methodology, the consistent use of this methodology in the literature should result in increased understanding regarding the differences between self and proxy-completed responses. For example, future research should consider the role of the proxy respondent to the self-respondent. Relatives and professional caregivers may systematically overstate or understate a respondent's physical and/or mental health status. Information on the nature of the role relationship between the proxy respondent and the self-respondent may be important to assess in health status surveys and may be a crucial factor in understanding selection bias.

While more research is needed on applying the propensity score method to self and proxy-completed responses, the use of this method in these populations can help researchers understand the differences in self and proxy-completed responses, and to reduce these differences.

Finally, the propensity score method should be examined in the context of the literature on cognitive psychology. Response bias is a phenomenon entirely consistent with the social and cognitive psychological literature on attributional biases. Overall, the findings from dozens of empirical studies indicate that humans are relatively poor processors of information and form biases and inferences that systematically distort perception [26]. Using the National Health Interview Survey on Disability, recent research indicates that conditional likelihood judgments (for example, the likelihood that an individual has a disability given another disability) predicted the number of disabilities for proxy-completed responses but not for self-completed responses [27]. The continuing search for methods to understand how to reduce proxy bias in quality of life research is important since the implications for policy direction may depend on such research.


  1. 1.

    Cooper JK, Kohlmann T, Michael JA, Haffer SC, Stevic M: Health outcomes: new quality measure for Medicare. Int J Qual Health Care 2001, 13: 9–16. 10.1093/intqhc/13.1.9

    CAS  PubMed  Article  Google Scholar 

  2. 2.

    D'Agostino RB: Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat Med 1998, 17: 2265–2281. 10.1002/(SICI)1097-0258(19981015)17:19<2265::AID-SIM918>3.0.CO;2-B

    PubMed  Article  Google Scholar 

  3. 3.

    Rubin DB: Matching to remove bias in observational studies. Biometrics 1973, 29: 159–183.

    Article  Google Scholar 

  4. 4.

    Neumann PJ, Araki SS, Gutterman EM: The use of proxy-completed responses in studies of older adults: lessons, challenges, and opportunities. J Am Geriatr Soc 2000, 48: 1646–1654.

    CAS  PubMed  Article  Google Scholar 

  5. 5.

    Dorman PJ, Waddel F, Slattery J, Dennis M, Sandercock P: Are proxy assessments of health status after stroke with the EuroQol questionnaire feasible, accurate and unbiased? Stroke 1997, 28: 1883–1887.

    CAS  PubMed  Article  Google Scholar 

  6. 6.

    Yip JY, Wilber KH, Myrtle RC, Grazman DN: Comparison of older adult subject and proxy responses on the SF-36 health-related quality of life instrument. Aging Ment Health 2001, 5: 136–142. 10.1080/13607860120038357

    CAS  PubMed  Article  Google Scholar 

  7. 7.

    Magaziner J, Zimmerman SI, Gruber-Baldini AL, Hebel JR, Fox KM: Proxy reporting in five areas of functional status: comparison with self reports and observations of performance. Am J Epidemiol 1997, 146: 418–428.

    CAS  PubMed  Article  Google Scholar 

  8. 8.

    Tordorov A, Kirchner C: Bias in proxies' reports of disability: data from the national health interview survey on disability. Am J Public Health 2000, 90: 1248–1253.

    Article  Google Scholar 

  9. 9.

    Pierre U, Wood-Dauphinee S, Korner-Bitensky N, Gayton D, Hanley J: Proxy use of the Canadian SF-36 in rating health status of the disabled elderly. J Clin Epidemiol 1998, 51: 983–990. 10.1016/S0895-4356(98)00090-0

    CAS  PubMed  Article  Google Scholar 

  10. 10.

    Shields M: Proxy reporting in the national population health survey. Health Rep 2000, 12: 21–39.

    CAS  PubMed  Google Scholar 

  11. 11.

    Rosenbaum PR, Rubin DB: The central role of the propensity score in observational studies for causal effects. Biometrika 1983, 70: 41–55.

    Article  Google Scholar 

  12. 12.

    Rubin DB, Thomas N: Characterizing the effect of matching using linear propensity score methods with normal distributions. Biometrika 1992, 79: 797–809.

    Article  Google Scholar 

  13. 13.

    Rubin DB, Thomas N: Combining propensity score matching with additional adjustments for prognostic covariates. J Am Stat Assoc 2000, 95: 573–585.

    Article  Google Scholar 

  14. 14.

    Kilborn MJ, Rathore SS, Gersh BJ, Oetgen WJ, Solomon AJ: Amiodarone and mortality among elderly patients with acute myocardial infarction with atrial fibrillation. Am Heart J 2002, 144: 1095–1101. 10.1067/mhj.2002.125836

    CAS  PubMed  Article  Google Scholar 

  15. 15.

    Teufelsbauer H, Prusa AM, Wolff K, Polterauer P, Nanobashvili J, Prager M, Holzenbein T, Thurnher S, Lammer J, Schemper M, Kretschmer G, Huk I: Endovascular stent grafting versus open surgical operation in patients with infrarenal aortic aneurysms: a propensity score-adjusted analysis. Circulation 2002, 106: 782–787. 10.1161/01.CIR.0000028603.73287.7D

    PubMed  Article  Google Scholar 

  16. 16.

    Neugut AI, Fleischauer AT, Sundararajan V, Mitra N, Heitjan DF, Jacobson JS, Grann VR: Use of adjuvant chemotherapy and radiation therapy for rectal cancer among the elderly: a population-based study. J Clin Oncol 2002, 20: 2643–2650. 10.1200/JCO.2002.08.062

    PubMed  Article  Google Scholar 

  17. 17.

    Mehta RL, McDonald B, Gabbai F, Pahl M, Farkas A, Pascual MTA, Zhuang S, Kaplan RM, Chertow GM: Nephrology consultation in acute renal failure: does timing matter? Am J Med 2002, 113: 456–528. 10.1016/S0002-9343(02)01230-5

    PubMed  Article  Google Scholar 

  18. 18.

    Drake C: Effects of misspecification of the propensity score on estimators of treatment effect. Biometrics 1993, 49: 1231–1236.

    Article  Google Scholar 

  19. 19.

    Gu XS, Rosenbaum PR: Comparison of multivariate matching methods: structures, distances, and algorithms. J Comp Graph Stat 1993, 2: 405–420.

    Google Scholar 

  20. 20.

    Dehejia RF, Wahba S: Causal effects in nonexperimental studies: reevaluating the evaluation of training programs. J Am Stat Assoc 1999, 94: 1053–1062.

    Article  Google Scholar 

  21. 21.

    Medicare Health Outcomes Survey []

  22. 22.

    National Committee for Quality Assurance: HEDIS® 3.0 Volume 6 Health of Seniors Survey Manual . Washington DC 1998.

    Google Scholar 

  23. 23.

    Ware JE, Snow KK, Kosinski M, Gandek B: SF-36® Health Status Survey Manual and Interpretation Guide New Haven: The Health Institute, New England Medical Center 1993.

    Google Scholar 

  24. 24.

    Ware JE, Kosinski M: SF-36® Physical and Mental Health Summary Scales: A Manual for Users of Version 1 Second Edition Lincoln: QualityMetric, Inc 2001.

    Google Scholar 

  25. 25.

    Cohen J: Statistical power analysis for the behavioral sciences Hillsdale: Lawrence Erlbaum 1988.

    Google Scholar 

  26. 26.

    Markus H, Zajonc RB: The cognitive perspective in social psychology. In The handbook of social psychology 3 Edition (Edited by: Lindzey G, Aronson E). Hillsdale: Erlbaum 1985, 137–230.

    Google Scholar 

  27. 27.

    Todorov A: Cognitive procedures for correcting proxy-response bias in surveys. Appl Cog Psychol 2002, 17: 215–224. 10.1002/acp.850

    Article  Google Scholar 

Download references


The authors acknowledge Samuel C. Haffer, PhD, Patricia Wright-Gaines, and Sonya Bowen, MSW of the Centers for Medicare & Medicaid for their support of the Medicare Health Outcomes Survey and research emanating from the survey. The authors thank Wendy A. Richard, MA for proofing, and also acknowledge Susan Grace, BSN; Max Johnson, MPH; Efthimios Laios, MPH; Barbara Mayl, RN; Rajesh Shrestha, MPH for data cleaning and dissemination of reports to health plans.

Author information



Corresponding author

Correspondence to Beth Hartman Ellis.

Additional information

Authors' Contributions

BHE participated in the interpretation of statistical analyses and co-wrote the manuscript.

WMB conceptualized applying the propensity score method to the proxy and self-respondent Medicare HOS data, wrote the majority of the SAS code, conducted much of the statistical analyses, participated in the interpretation of the statistical analyses, and co-wrote the manuscript.

JKC participated in the statistical analyses and interpretation.

BMF wrote portions of the SAS code for analyses.

EDS wrote portions of SAS code, participated in the interpretation of statistical analyses, and edited the manuscript.

DD participated in the interpretation of statistical analyses and offered valuable critical thought on previous versions of this manuscript.

RWA secured funding.

LAG secured funding.

All authors read and approved the final manuscript.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Ellis, B.H., Bannister, W.M., Cox, J.K. et al. Utilization of the propensity score method: an exploratory comparison of proxy-completed to self-completed responses in the Medicare Health Outcomes Survey. Health Qual Life Outcomes 1, 47 (2003).

Download citation


  • Propensity score
  • Medicare Health Outcomes Survey
  • elderly
  • proxy