Data source and cohorts
Study data were from the population-based Winnipeg Regional Health Authority (WRHA) Joint Replacement Registry. The WRHA is the largest health region in the central Canadian province of Manitoba, which has a population of approximately 1.2 million residents. The Registry captures more than 90% of all hip and knee replacement surgeries performed within the health region and approximately 75% of all replacement surgeries conducted in the province.
Information contained in the Registry includes age, sex, medical conditions, implant details, complications, and both general and condition-specific PROs. These data are collected via self-report and chart abstraction from medical records [24, 25]. Much of the information is collected at a pre-operative assessment conducted approximately one month prior to surgery, with additional data collection at one year following surgery.
The study cohort included all patients in the Registry who had total knee arthroplasty (TKA) between April 1, 2009 and March 31, 2015. Patients with inaccurate data on sex and BMI were excluded. The simulation cohort was comprised of patients from the study cohort who had complete information on PRO measures, demographic, and health status measures.
Study measures
Generic and condition-specific PROs in the WRHA Joint Replacement Registry are collected via self-report questionnaires completed in the pre-operative assessment clinic and via mailed self-report questionnaires completed one year following surgery. We limited our attention in this study to the generic Short Form Survey version 2 (SF-12v2), a 12-item generic measure of physical and mental well-being. It produces Physical Component Summary (PCS) and Mental Component Summary (MCS) scores, which can range in value from 0 (worst) to 100 (best). Scores are normalized so that values above or below 50 are better or worse, respectively, than their corresponding values in the general population [26].
Demographic information on patient age and sex are extracted from patients’ medical records and included in the Joint Replacement Registry. Age was defined at the time of the pre-operative assessment. Body mass index (BMI) was calculated from self-reported weight and height at the pre-operative assessment. Information about comorbid health conditions, such as heart disease, were also obtained via self-report at the pre-operative assessment.
Missing data methods
ML and MI methods were selected for use in this study because of their efficient computational requirements and recommendations in the literature for their adoption in practice [11]. The ML method chooses parameter values that assign the maximum possible probability or probability density to the observed data under a well-defined family of parametric probability models. The probability or probability density of the realized data is the likelihood function [11]. The missing values are removed from the likelihood by a process of summation or integration. These likelihood functions have a complicated form that requires special computational techniques such as expectation maximization [27]. Estimates obtained using this method are unbiased if the missing data are MAR and the statistical model has been correctly specified.
For the MI method, each missing value is replaced by M > 1 imputed values. Each value is a Bayesian draw from the conditional distribution of the missing observation given the observed data. The imputations are expected to represent the information about the missing values that is contained in the observed data for the chosen imputation model. MI involves three distinct tasks: (a) missing values are filled in M times to generate M complete data sets, (b) the complete data sets are analyzed using standard procedures, and (c) results from the M analyses are combined into a single inference estimate. The efficiency of the estimate is dependent on the number of imputations and fraction of missing information [28, 29]. Similar to the ML approach, the MI procedure also relies on the assumption that data are MAR. However, the process of handling missing data differs. In MI, missing values are treated in a step that is completely separate from the analysis. This separation has both positive and negative consequences. On the negative side, there is the possibility that a researcher may proceed to analyze the imputed data without considering how the imputations were generated. For example, a model based on a multivariate normal distribution allows pairwise associations among variables but not interactions. Therefore, imputed data set may tend to exhibit interactions that are weaker than those found in the population. On the positive side, the model is flexible and a straightforward process to include auxiliary variables. It is also not necessary to include these auxiliary variables in subsequent analyses, as their full effect is taken into account during the MI process, and is automatically carried forward into subsequent analyses of the imputed data [20, 30].
Statistical analysis
Descriptive statistics including means, standard deviations (SD), frequencies, and percentages were used to describe the cohorts at the baseline (i.e., pre-operative) measurement occasion. Patterns of missing data were described for the study cohort using percentages.
A linear mixed-effects model was used to estimate change in SF-12v2 PCS and MCS scores between pre- and post-operative occasions; the choice of models and covariates was based on previous research with these data [31]. Specifically, the model included a random intercept and multiple fixed covariates, including time, age, sex (male [reference], female), and body mass index (BMI < 24.9, 25.0-29.9, 30.0+ [reference], comorbid chronic conditions (including heart disease, depression, high blood pressure, diabetes and back pain (No [reference], Yes)). The two-way interaction of sex and time was included in the model based on preliminary assessments of model fit using penalized likelihood-based fit statistics (e.g., Akaike Information Criterion).
Mixed-effects regression models based on CCA, ML, and MI methods were applied to the study cohort data; separate analyses were conducted for the PCS and MCS. CCA was conducted for the subset of patients who had no missing observations on any variables at either the pre- or post-operative occasions. For the MI method, Markov Chain Monte Carlo (MCMC) sampling of the full predictive distribution was adopted; it assumes a multivariate normal distribution for the imputations. This assumption was descriptively assessed using quantile-quantile plots of the observed values. Ten imputations were conducted, as this number has been shown to be sufficient for achieving a reasonable efficiency for high proportions of missing observations [29].
All analysis were carried out in R using the lme function [32] and multiple imputation by chained equations [33].
Simulation study
The simulation study was conducted next. In our analysis of the simulation cohort data, we used all variables previously described for the study cohort in addition to a single hypothetical auxiliary variable, Z, which was generated from a bivariate normal distribution. Specifically, Z was correlated with both the pre- (Y1) and post-operative (Y2) scores with ρ = corr (Y, Z) = 0.2, 0.5 and 0.8, where Y = (Y1, Y2).
Random samples of size n = 1000 were selected from the simulation cohort; mixed-effects models, as specified previously, were applied to PCS scores. Pre-specified amounts (10%, 25% and 50%) of data were removed from the outcome variable via MCAR, MAR, and MNAR mechanisms by modeling the probability of the missing indicator conditional on the outcome variable using a logistic regression model. The ML, MI and MI-Aux (i.e., multiple imputation with Z included the imputation model) methods were used to address missingness.
A total of 1000 replications were conducted for each of the 27 simulation conditions, which were obtained by crossing all possible combinations of types and amounts of missingness with the magnitude of correlation of the hypothetical auxiliary variable with the outcome measure. We evaluated bias and error in the regression parameter estimates including the intercept (β0), which is the estimated average PRO score at the pre-operative occasion, change (βT) between the pre- and post-operative occasions, and time-sex interaction (βTS). Specifically, we computed standardized bias, root mean squared error (RMSE), 95% confidence interval (CI) coverage, and the average width of the 95% CI for each regression parameter mentioned above. Standardized bias was the ratio of the bias, the difference between the estimates obtained from the model applied to the random sample with n = 1000 obervations, and all data in the simulation cohort, and the SD of the estimates expressed as a percent; smaller values indicate less bias. The RMSE was calculated from the sum of squared bias and variance; smaller values indicate less error. Coverage was calculated as the proportion of the replications for which the 95% CI contained the true value of the parameter of interest; good performance is evident when the actual coverage is approximately equal to the nominal coverage rate of 95%. The average width of the 95% CI was the difference between the upper and lower limits of the interval averaged over the number of replications. Shorter intervals imply greater precision and higher power, provided the 95% CI coverage is high.