Mulugeta Gebregziabher, Yumin Zhao, Neal Axon, Gregory E. Gilbert, Carrae Echols and Leonard E. Egede
Abstract Background: Missing race data is a ubiquitous problem in studies using data from large administrative datasets such as the Veteran Health Administration and other sources. The most common approach to deal with this problem has been analyzing only those records with complete data, Complete Case Analysis (CCA) which requires the assumption of Missing Completely At Random (MCAR) but CCA could lead to biased estimates with inflated standard errors. Objective: To examine the performance of a new imputation approach, Latent Class Multiple Imputation (LCMI), for imputing missing race data and make comparisons with CCA, Multiple Imputation (MI) and Log-Linear Multiple Imputation (LLMI). Design/Participants: To empirically compare LCMI to CCA, MI and LLMI using simulated data and demonstrate their applications using data from a sample of 13,705 veterans with type 2 diabetes among whom 23% had unknown/ missing race information. Results: Our simulation study shows that under MAR, LCMI leads to lower bias and lower standard error estimates compared to CCA, MI and LLMI. Similarly, in our data example which does not conform to MCAR since subjects with missing race information had lower rates of medical comorbidities than those with race information, LCMI outperformed MI and LLMI providing lower standard errors especially when relatively larger number of latent classes is assumed for the latent class imputation model. Conclusions: Our results show that LCMI is a valid statistical technique for imputing missing categorical covariate data and particularly missing race data that offers advantages with respect to precision of estimates.
PDFShare this article
Journal of Biometrics & Biostatistics received 3254 citations as per Google Scholar report