Using multiple survey vendors to collect health outcomes information: How accurate are the data?

Background To measure and assess health outcomes and quality of life at the national level, large-scale surveys using multiple vendors to gather health information is becoming the norm. This study evaluates the ability of multiple survey vendors to gather and report data collected as part of the 1998 Medicare Health Outcomes Survey (HOS). Method Four hundred randomly sampled completed mailed surveys were chosen from each of six certified vendors (N = 2397) participating in the 1998 HOS. The accuracy of the data gathered from the vendors was measured by creating a "gold standard" record for each survey and comparing it to the final record submitted by the vendor. Results Overall rates of agreement were calculated, and ranged from 97.0% to 99.8% across the vendors. Conclusion Researchers may be confident that using multiple vendors to gather health outcomes information will yield accurate data.


Introduction
With the recognition of patient-based assessments of satisfaction and outcomes as important measures of heath care quality and organizational performance, survey research has taken its place among the mainstream methodologies for collecting quality of care data. However, data collection costs are significant and are usually borne by the health care organizations. Users of these data, primarily large purchasers and accreditors, are continually seeking ways to lower the costs of administering these survey efforts without compromising data quality.
One promising approach for large surveys with hundreds of participating health care organizations is to allow mul-tiple survey vendors who meet certain minimum performance standards to compete against each other for business. Since multiple vendors are involved in collecting, entering, and transmitting data, there may be an increased opportunity for data entry errors and systematic bias. The quality of the data gathered through this type of arrangement is unclear.
The purpose of this study is to assess the ability of multiple survey vendors to accurately collect and report data collected as part of a large-scale national survey using a standardized approach to data collection with a protocol that is simple and clear.

Background
In response to demands by stakeholders and the evolving field of quality measurement, the Centers for Medicare & Medicaid Services Care (CMS) forged a public-private partnership with the National Committee for Quality Assurance (NCQA), a not-for-profit managed care accreditation organization best known for its work in producing the annual Health Plan Employer Data and Information Set (HEDIS ® ). The goal of this initiative was to develop and implement the first outcome measure which would assess the health status of Medicare beneficiaries in managed care plans. In 1997 NCQA's HEDIS oversight body, the Committee on Performance Measurement, adopted and endorsed the Medicare Health Outcomes Survey (HOS) [formerly the Health of Seniors survey] as a HEDIS 3.0 measure for Medicare plans, in effect mandating its use http://cms.hhs.gov/surveys/hos. The design of the instrument, survey methodology, and implementation of the HOS in 1998 have been well documented [1,2].
To summarize, the survey instrument consists of the Short Form-36 (SF-36) health status questionnaire as its core supplemented with demographic, clinical variables as well as questions which assess Activities of Daily Living (ADLs). The instrument was administered to a random sample of 1000 Medicare beneficiaries continuously enrolled for six months in each managed care plan market area with a Medicare contract in place on or before January 1, 1997. In plans with 1000 or fewer Medicare enrollees, all eligible members were surveyed. The sampling frame included the aged and the disabled, but excluded those eligible for Medicare because of End-Stage Renal Disease. The survey was administered to the cohort at baseline and will be administered again to the same group in two years. A new cohort will be selected each year for baseline measurement.
Wave one data collection began in May 1998. Surveys were mailed to 279,135 beneficiaries in 287 managed care plan market areas. Completed surveys were received from 167,096 for an overall response rate of 60%. 145,203 of the completed surveys were returned by mail (87%). The remainder was completed using Computer Assisted Telephone Interviewing (CATI) techniques.
To ensure the validity and reliability of the data, managed care plans were instructed to contract with one of six NCQA-certified vendors for administration of the survey. To achieve NCQA certification, a survey vendor must demonstrate prior experience in administering health status surveys with particular expertise in administering either the SF-36 or SF-12, undergo intense scrutiny of their internal operations, complete an intensive two day training program, and submit a comprehensive quality assurance plan against which their performance was judged [1].
As part of the data entry protocol, CMS and NCQA allowed vendors to use either optical character recognition (scanning) technology or manual data entry. If manually entering the data, CMS and NCQA required vendors to perform 100% double data entry.
During the computer assisted telephone interviewing (CATI) component of the survey, NCQA-certified vendors were required to demonstrate and document an interviewer monitoring rate of 10%. This measure was implemented to verify survey responses. A minimum of 5% of all survey monitoring was to be "silent" with survey supervisors listening to interviewers as they completed the HOS over the telephone with respondents. A minimum 2% call back rate was established for respondents who had completed surveys to verify survey completion and to ascertain the quality and professionalism of the interviewer. The remaining 3% were distributed between either of the two categories at the vendor's discretion.
Using multiple vendors to gather health care data is becoming quite commonplace. Multiple vendors are used in gathering health outcomes information as well as patient satisfaction data. A potential source of bias exists however when considering the possibility of data entry errors across all vendors. This increased chance of error may result from vendor differences in interpretation of data specifications, variations in staff training, and inconsistent implementation of the survey protocol.
To assure the integrity of the data collection process for the Health Outcomes Survey, CMS designed a data consistency validation project utilizing independent data collection experts, the Clinical Data Abstraction Centers (CDACs). CDACs have been used extensively in the collection and validation of clinical data for quality improvement projects [3,4].

Methods
Four hundred randomly sampled hard copy surveys were selected by CMS staff from all mailed surveys coded as being completed surveys. By definition a completed survey was one on which 3 critical items were answered as well as 80% of the remaining survey items excluding a 5 question battery on smoking behavior [1].
A request for photocopies of the each of the surveys selected for the reliability study was forwarded by CMS through NCQA to the six vendors. Vendors then mailed photocopies of the instruments directly to the CDACs (Figure 1). CMS contracting requirements require that CDAC workload be distributed equally between CDACs depending on the geographic location of the vendor. Therefore, cases from vendors on the Atlantic or Pacific Coasts of the U.S. were assigned to one CDAC (N = 1197) and cases from vendors in the central U.S. were assigned to the other (N = 1200). Simultaneously, vendor electronic data files, containing the final data records submitted by the vendor to NCQA for each of the sampled cases, were provided by CMS to the CDAC and loaded into the MedQuest http:// cms.hhs.gov/medquest/ internal quality control database module.
Using the MedQuest data entry and management tool, CDAC staff designed a data entry module mirroring the hard copy HOS instrument. CDAC data entry staff were trained in the use of the data entry tool and in the decision rules created by CMS and NCQA and used by the vendors to address data entry anomalies (see Appendix A). Data entry personnel were instructed to: 3. Enter the answer corresponding to a complete mark, if the survey respondent made both a partial and complete mark for two different answers to a question; and, 4. If two or more answers were marked for "Highest level of education", select the highest level of education marked.
Data from each survey was entered by two different CDAC data entry staff. Mismatches between data entry staff were adjudicated by a third party data entry supervisor. The  result of this process produced a CDAC data entry standard for each survey.
The CDAC data entry standard was compared with the vendor electronic data record using the MedQuest IQC application. To minimize potential data entry errors by the two CDAC data entry staff, mismatches between the CDAC standard and the vendor standard were compared to the original hard copy surveys and adjudicated by a data quality control supervisor. A final "gold standard" for each survey was created.
Vendor data submitted for each questionnaire were compared to the final "gold standard" developed for each survey by the CDAC. Reliability statistics were calculated for each vendor. A more detailed review of the mismatches by CDAC staff revealed that, across vendors, questions involving the two skip patterns were contributing disproportionately to the overall number of mismatches. A look back at the hard copy instruments indicated that vendors did not consistently follow the pre-defined data entry rule number 5 that states that, "if a question was supposed to have been skipped but was not, the data entry person was to key in the answers exactly as they were written". In many instances data entry staff chose the answer that was most logical or consistent with the responses preceding the skip pattern. Our desire is to have the data set reflect exactly what the respondent entered. The following example serves to illustrate the problem inherent in allowing data entry staff to attempt to rectify data inconsistencies related to skip patterns. A respondent who answers negatively to the smoking screening question, but completes the remaining battery of smoking questions, may be considered a smoker, with the remaining smoking questions considered valid. Alternatively, the respondent may be considered a non-smoker and answers to the remaining questions would be ignored. We believe that deciding which of these alternatives is correct or to exclude this case from analysis altogether should be left to the research team.

Results
An adjusted mean agreement rate was computed which did not penalize the vendors for data entry problems associated with the skip patterns. These adjusted agreement rates ranged from 97.9% to 99.9%. The mean adjusted agreement rate for all vendors was 99.2%.
Another finding worthy to note is that the vendor with the lowest adjusted agreement score, Vendor C, was found to have only a 2% agreement on one particular question where the other vendors achieved an average agreement rate of 99.5% on the same question. While automated data entry edits would have detected an out-of-range value, this problem was traced to a data coding error which consisted of a misassignment by the vendor of the valid response values for the question. When corrected, the overall average adjusted agreement rate for Vendor C increased by one percentage point to 98.9%.

Discussion
Evidence presented here indicates that the use of multiple vendors to collect health outcomes information need not compromise the quality of the data collected across vendor sites. As the results of this study seem to suggest, researchers may utilize multiple vendors for data collection using mailed surveys without fear of data quality problems. This does not mean, however, that health services researchers should not concern themselves with data quality when using multiple vendors. Conversely, they must establish the parameters (operating guidelines and procedures) within which the vendors will operate during the survey implementation, data collection, and data entry phases of a study. The two guiding principles for ensuring data quality that were vigorously adhered to during the inaugural fielding of the Medicare Health Outcomes Survey were standardization and simplicity.

Standardization
In addition to using a standardized survey administration protocol and instrument to minimize the potential for systematic inter-vendor bias, CMS and NCQA developed data entry rules [Appendix A] and a quality assurance plan. The quality assurance plan focuses attention on standardizing five key areas of vendor performance: 1. systems and processes -establishes minimum system requirements for vendors survey management systems, CATI systems, and the printing and mailing of survey instruments and related materials; 2. maximizing response rates -sets minimum expectations on the number and type of resources to be utilized by the vendors to obtain information that provides valid respondent addresses and telephone numbers; 3. data integrity -institutes minimum standards for quality of the data including staff training, receipt of the data by vendor staff, data entry, decision rules, quality monitoring of data entry by vendor staff, and preparation and submission of data files; 4. beneficiary confidentiality -mandates specific requirements for confidentiality of person-level data; and, 5. project reporting -establishes uniform reporting and definitions that allow rapid and accurate vendor-to-vendor progress comparisons throughout the survey administration.

Simplicity
There is a trade-off between standardization, which requires complex definitions to cover multiple possible responses, and simplicity, which requires brief, easy to understand instructions. In implementing the Medicare Health Outcomes Survey with the largest sample of Medicare beneficiaries ever surveyed, CMS and NCQA realized that technical complexity increases the likelihood for systematic error especially while ensuring standardization across multiple vendors.
To realize the goal of simplicity in instrument design, the survey was limited in scope to only those items necessary to validly and reliably measure health status over time, casemix adjust the results, and provide actionable information for clinicians and plans to use in improving the quality of care provided to patients. In addition questions requiring skip patterns were minimized as they tend to confuse respondents (and data entry staff as the above results demonstrate).
An obvious source of potential error was the data file creation and data transfer processes. To reduce the risk of corrupt files populated with poor quality data, CMS and NCQA required vendors to output data into an ASCII fixed-width text file and transmit each plan's data on a separate 3.5 inch diskette. This file type was one with which all vendors were familiar, had a great deal of experience in producing, and required no exceptional technological sophistication or explanation.
The nationwide implementation of the Medicare Health Outcomes Survey in managed care has demonstrated the feasibility of gathering high quality, clinically useful information at low cost (approximately US$25 per completed survey). This was achieved using multiple third party survey vendors to implement the survey administration protocol. We believe the analyses presented here indicate that by employing the principles of standardization and simplicity researchers and project managers can maximize the probability of collecting high quality data when utilizing multiple survey vendors in obtaining health outcomes information.
initial the entry and indicate "H" if heads occurred or "T" if tails occurred.
3. If a value is missing, leave the value blank unless the respondent is called back to ascertain the response. 5. If a question was supposed to have been skipped but was not, the data entry person was to key in the answers exactly as they were written.