- Open Access
Using multiple survey vendors to collect health outcomes information: How accurate are the data?
© Haffer; licensee BioMed Central Ltd. 2003
- Received: 5 February 2003
- Accepted: 16 April 2003
- Published: 16 April 2003
To measure and assess health outcomes and quality of life at the national level, large-scale surveys using multiple vendors to gather health information is becoming the norm. This study evaluates the ability of multiple survey vendors to gather and report data collected as part of the 1998 Medicare Health Outcomes Survey (HOS).
Four hundred randomly sampled completed mailed surveys were chosen from each of six certified vendors (N = 2397) participating in the 1998 HOS. The accuracy of the data gathered from the vendors was measured by creating a "gold standard" record for each survey and comparing it to the final record submitted by the vendor.
Overall rates of agreement were calculated, and ranged from 97.0% to 99.8% across the vendors.
Researchers may be confident that using multiple vendors to gather health outcomes information will yield accurate data.
- Data Entry
- Agreement Rate
- Data Entry Error
- Computer Assisted Telephone Interview
- Health Outcome Survey
With the recognition of patient-based assessments of satisfaction and outcomes as important measures of heath care quality and organizational performance, survey research has taken its place among the mainstream methodologies for collecting quality of care data. However, data collection costs are significant and are usually borne by the health care organizations. Users of these data, primarily large purchasers and accreditors, are continually seeking ways to lower the costs of administering these survey efforts without compromising data quality.
One promising approach for large surveys with hundreds of participating health care organizations is to allow multiple survey vendors who meet certain minimum performance standards to compete against each other for business. Since multiple vendors are involved in collecting, entering, and transmitting data, there may be an increased opportunity for data entry errors and systematic bias. The quality of the data gathered through this type of arrangement is unclear.
The purpose of this study is to assess the ability of multiple survey vendors to accurately collect and report data collected as part of a large-scale national survey using a standardized approach to data collection with a protocol that is simple and clear.
In response to demands by stakeholders and the evolving field of quality measurement, the Centers for Medicare & Medicaid Services Care (CMS) forged a public-private partnership with the National Committee for Quality Assurance (NCQA), a not-for-profit managed care accreditation organization best known for its work in producing the annual Health Plan Employer Data and Information Set (HEDIS®). The goal of this initiative was to develop and implement the first outcome measure which would assess the health status of Medicare beneficiaries in managed care plans. In 1997 NCQA's HEDIS oversight body, the Committee on Performance Measurement, adopted and endorsed the Medicare Health Outcomes Survey (HOS) [formerly the Health of Seniors survey] as a HEDIS 3.0 measure for Medicare plans, in effect mandating its use http://cms.hhs.gov/surveys/hos. The design of the instrument, survey methodology, and implementation of the HOS in 1998 have been well documented [1, 2].
To summarize, the survey instrument consists of the Short Form-36 (SF-36) health status questionnaire as its core supplemented with demographic, clinical variables as well as questions which assess Activities of Daily Living (ADLs). The instrument was administered to a random sample of 1000 Medicare beneficiaries continuously enrolled for six months in each managed care plan market area with a Medicare contract in place on or before January 1, 1997. In plans with 1000 or fewer Medicare enrollees, all eligible members were surveyed. The sampling frame included the aged and the disabled, but excluded those eligible for Medicare because of End-Stage Renal Disease. The survey was administered to the cohort at baseline and will be administered again to the same group in two years. A new cohort will be selected each year for baseline measurement.
Wave one data collection began in May 1998. Surveys were mailed to 279,135 beneficiaries in 287 managed care plan market areas. Completed surveys were received from 167,096 for an overall response rate of 60%. 145,203 of the completed surveys were returned by mail (87%). The remainder was completed using Computer Assisted Telephone Interviewing (CATI) techniques.
To ensure the validity and reliability of the data, managed care plans were instructed to contract with one of six NCQA-certified vendors for administration of the survey. To achieve NCQA certification, a survey vendor must demonstrate prior experience in administering health status surveys with particular expertise in administering either the SF-36 or SF-12, undergo intense scrutiny of their internal operations, complete an intensive two day training program, and submit a comprehensive quality assurance plan against which their performance was judged . As part of the data entry protocol, CMS and NCQA allowed vendors to use either optical character recognition (scanning) technology or manual data entry. If manually entering the data, CMS and NCQA required vendors to perform 100% double data entry.
During the computer assisted telephone interviewing (CATI) component of the survey, NCQA-certified vendors were required to demonstrate and document an interviewer monitoring rate of 10%. This measure was implemented to verify survey responses. A minimum of 5% of all survey monitoring was to be "silent" with survey supervisors listening to interviewers as they completed the HOS over the telephone with respondents. A minimum 2% call back rate was established for respondents who had completed surveys to verify survey completion and to ascertain the quality and professionalism of the interviewer. The remaining 3% were distributed between either of the two categories at the vendor's discretion.
Using multiple vendors to gather health care data is becoming quite commonplace. Multiple vendors are used in gathering health outcomes information as well as patient satisfaction data. A potential source of bias exists however when considering the possibility of data entry errors across all vendors. This increased chance of error may result from vendor differences in interpretation of data specifications, variations in staff training, and inconsistent implementation of the survey protocol.
To assure the integrity of the data collection process for the Health Outcomes Survey, CMS designed a data consistency validation project utilizing independent data collection experts, the Clinical Data Abstraction Centers (CDACs). CDACs have been used extensively in the collection and validation of clinical data for quality improvement projects [3, 4].
Four hundred randomly sampled hard copy surveys were selected by CMS staff from all mailed surveys coded as being completed surveys. By definition a completed survey was one on which 3 critical items were answered as well as 80% of the remaining survey items excluding a 5 question battery on smoking behavior .
Enter exactly what was documented on the form;
Enter an "X" for any question where the survey respondent either,
Left the question blank,
Made a complete mark for more than one answer,
Marked between 2 answers, or
Recorded an illegible birth year or name;
Enter the answer corresponding to a complete mark, if the survey respondent made both a partial and complete mark for two different answers to a question; and,
If two or more answers were marked for "Highest level of education", select the highest level of education marked.
Data from each survey was entered by two different CDAC data entry staff. Mismatches between data entry staff were adjudicated by a third party data entry supervisor. The result of this process produced a CDAC data entry standard for each survey.
The CDAC data entry standard was compared with the vendor electronic data record using the MedQuest IQC application. To minimize potential data entry errors by the two CDAC data entry staff, mismatches between the CDAC standard and the vendor standard were compared to the original hard copy surveys and adjudicated by a data quality control supervisor. A final "gold standard" for each survey was created.
Vendor data submitted for each questionnaire were compared to the final "gold standard" developed for each survey by the CDAC. Reliability statistics were calculated for each vendor.
Data Entry Reliability Study Results 1998 Medicare Health Outcomes Survey
Vendor (Overall Response Rate)
Number of Cases
Mean Agreement Rate
Mean Agreement Rate Adjusted for Skip Pattern Problems
All Vendors (60%)
Mean Agreement Rate is the unadjusted overall agreement rate between each variable in each "gold standard" with each variable in each of the vendor data records for every survey sampled (N [variables per vendor] = 40,000). Mean Agreement Rate varied across vendors from 97.0% to 99.8%.
A more detailed review of the mismatches by CDAC staff revealed that, across vendors, questions involving the two skip patterns were contributing disproportionately to the overall number of mismatches. A look back at the hard copy instruments indicated that vendors did not consistently follow the pre-defined data entry rule number 5 that states that, "if a question was supposed to have been skipped but was not, the data entry person was to key in the answers exactly as they were written". In many instances data entry staff chose the answer that was most logical or consistent with the responses preceding the skip pattern. Our desire is to have the data set reflect exactly what the respondent entered. The following example serves to illustrate the problem inherent in allowing data entry staff to attempt to rectify data inconsistencies related to skip patterns. A respondent who answers negatively to the smoking screening question, but completes the remaining battery of smoking questions, may be considered a smoker, with the remaining smoking questions considered valid. Alternatively, the respondent may be considered a non-smoker and answers to the remaining questions would be ignored. We believe that deciding which of these alternatives is correct or to exclude this case from analysis altogether should be left to the research team.
An adjusted mean agreement rate was computed which did not penalize the vendors for data entry problems associated with the skip patterns. These adjusted agreement rates ranged from 97.9% to 99.9%. The mean adjusted agreement rate for all vendors was 99.2%.
Another finding worthy to note is that the vendor with the lowest adjusted agreement score, Vendor C, was found to have only a 2% agreement on one particular question where the other vendors achieved an average agreement rate of 99.5% on the same question. While automated data entry edits would have detected an out-of-range value, this problem was traced to a data coding error which consisted of a misassignment by the vendor of the valid response values for the question. When corrected, the overall average adjusted agreement rate for Vendor C increased by one percentage point to 98.9%.
Evidence presented here indicates that the use of multiple vendors to collect health outcomes information need not compromise the quality of the data collected across vendor sites. As the results of this study seem to suggest, researchers may utilize multiple vendors for data collection using mailed surveys without fear of data quality problems.
This does not mean, however, that health services researchers should not concern themselves with data quality when using multiple vendors. Conversely, they must establish the parameters (operating guidelines and procedures) within which the vendors will operate during the survey implementation, data collection, and data entry phases of a study. The two guiding principles for ensuring data quality that were vigorously adhered to during the inaugural fielding of the Medicare Health Outcomes Survey were standardization and simplicity.
systems and processes – establishes minimum system requirements for vendors survey management systems, CATI systems, and the printing and mailing of survey instruments and related materials;
maximizing response rates – sets minimum expectations on the number and type of resources to be utilized by the vendors to obtain information that provides valid respondent addresses and telephone numbers;
data integrity – institutes minimum standards for quality of the data including staff training, receipt of the data by vendor staff, data entry, decision rules, quality monitoring of data entry by vendor staff, and preparation and submission of data files;
beneficiary confidentiality – mandates specific requirements for confidentiality of person-level data; and,
project reporting – establishes uniform reporting and definitions that allow rapid and accurate vendor-to-vendor progress comparisons throughout the survey administration.
The most recent HOS Quality Assurance Plan entitled NCQA's 2002 Quality Assurance Plan is available for download from the NCQA website at http://www.ncqa.org/programs/hedis/hos/2002hosqap.doc.
There is a trade-off between standardization, which requires complex definitions to cover multiple possible responses, and simplicity, which requires brief, easy to understand instructions. In implementing the Medicare Health Outcomes Survey with the largest sample of Medicare beneficiaries ever surveyed, CMS and NCQA realized that technical complexity increases the likelihood for systematic error especially while ensuring standardization across multiple vendors.
To realize the goal of simplicity in instrument design, the survey was limited in scope to only those items necessary to validly and reliably measure health status over time, casemix adjust the results, and provide actionable information for clinicians and plans to use in improving the quality of care provided to patients. In addition questions requiring skip patterns were minimized as they tend to confuse respondents (and data entry staff as the above results demonstrate).
An obvious source of potential error was the data file creation and data transfer processes. To reduce the risk of corrupt files populated with poor quality data, CMS and NCQA required vendors to output data into an ASCII fixed-width text file and transmit each plan's data on a separate 3.5 inch diskette. This file type was one with which all vendors were familiar, had a great deal of experience in producing, and required no exceptional technological sophistication or explanation.
The nationwide implementation of the Medicare Health Outcomes Survey in managed care has demonstrated the feasibility of gathering high quality, clinically useful information at low cost (approximately US$25 per completed survey). This was achieved using multiple third party survey vendors to implement the survey administration protocol. We believe the analyses presented here indicate that by employing the principles of standardization and simplicity researchers and project managers can maximize the probability of collecting high quality data when utilizing multiple survey vendors in obtaining health outcomes information.
If a respondent marks between two choices and it is obviously closer to one choice than another, select the answer choice that the mark is closest to and the data entry person should initial the edit.
If a respondent's mark falls equidistant between two adjacent choices on a five point scale, flip a coin if the mark is between 1 and 2, choose 3 if the mark is between 2 and 3, choose 3 if the mark is between 3 and 4, and flip a coin if the choice is between 4 and 5. If a coin is flipped, heads equals the value to the right and tails equals the value to the left. Each time a determination is made the data entry person should mark the corrected box on the paper copy, initial the entry and indicate "H" if heads occurred or "T" if tails occurred.
If a value is missing, leave the value blank unless the respondent is called back to ascertain the response.
When multiple values are checked inappropriately for a response category,
if it is a critical item, call the respondent to obtain a valid response,
for all other questions, if the marks are NOT adjacent, flip a coin. If the marks are adjacent where 1 and 2 are marked, flip a coin. If 2 and 3 are marked choose 3. If 3 and 4 are marked choose 4. If 4 and 5 are marked, flip a coin. If a coin is flipped, heads equals the value to the right and tails equals the value to the left. Each time a determination is made the data entry person should mark the corrected box on the paper copy, initial the entry and indicate "H" if heads occurred or "T" if tails occurred.
If a question was supposed to have been skipped but was not, the data entry person was to key in the answers exactly as they were written.
The views expressed here are those of the author and do not necessarily represent the position of the Centers for Medicare and Medical Services, or the U.S Department of Health and Human Services.
Sonya Bowen, CMS; James McCarthy, Oanh Vuong, Kristin Spector, and Lori Andersen, National Committee for Quality Assurance; Maria Caschetta, FMAS Corporation; Susan Guiswite, and Sean Warner, DynKePRO contributed to this effort.
- National Committee for Quality Assurance: Health of Seniors Manual.: HEDIS 3.0/1998 Volume 6. Washington, DC 1998.Google Scholar
- Bierman AS, Lawrence WF, Haffer SC, Clancy CM: Functional Health Outcomes as a Measure of Health Care Quality for Medicare Beneficiaries. Health Services Research 2001,35(6 Part II):90–108.Google Scholar
- Marciniak TA, Ellerbeck EF, Radford MJ, Kresowik TF, Gold JA, Krumholz HM, Kiefe CI, Allman RM, Vogel RA, Jencks SF: Improving the quality of care for Medicare patients with acute myocardial infarction: Results from the Cooperative Cardiovascular Project pilot. Journal of the American Medical Association 1998, 279: 1351–1357. 10.1001/jama.279.17.1351PubMedView ArticleGoogle Scholar
- Marciniak TA, Mosedale L, Ellerbeck EF: Quality Improvement at the National Level: Lessons from the Cooperative Cardiovascular Project. Evaluation & The Health Professions 1998,21(4):525–536.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.