James A Hanley and Erica EM Moodie
The sample size formulae given in elementary biostatistics textbooks deal only with simple situations: estimation of one, or a comparison of at most two, mean(s) or proportion(s). While many specialized textbooks give sample formulae/tables for analyses involving odds and rate ratios, few deal explicitly with statistical considerations for slopes (regression coefficients), for analyses involving confounding variables or with the fact that most analyses rely on some type of generalized linear model. Thus, the investigator is typically forced to use “black-box� computer programs or tables, or to borrow from tables in the social sciences, where the emphasis is on correlation coefficients. The concern in the – usually very separate – modules or standalone software programs is more with user friendly input and output. The emphasis on numerical exactness is particularly unfortunate, given the rough, prospective, and thus uncertain, nature of the exercise, and that different textbooks and software may give different sample sizes for the same design. In addition, some programs focus on required numbers per group, others on an overall number. We present users with a single universal (though sometimes approximate) formula that explicitly isolates the impacts of the various factors one from another, and gives some insight into the determinants for each factor. Equally important, it shows how seemingly very different types of analyses, from the elementary to the complex, can be accommodated within a common framework by viewing them as special cases of the generalized linear model.
PDFShare this article
Journal of Biometrics & Biostatistics received 3254 citations as per Google Scholar report