Michal Brylinski and Wei P. Feinstein
eThread, a meta-threading and machine learning-based approach, is designed to effectively identify structural templates for use in protein structure and function modeling from genomic data. This is an essential methodology for high-throughput structural bioinformatics and critical for systems biology, where extensive knowledge of protein structures and functions at the systems level is prerequisite. eThread integrates a diverse collection of algorithms, therefore its deployment on a large multi-core system necessarily requires comprehensive profiling to ensure the optimal utilization of available resources. Resource profiling of eThread and the single-threading component algorithms indicate as wide range of demands with respect to wall clock time and host memory. Depending on the threading algorithm used, the modeling of a single protein sequence of up to 600 residues in length takes minutes to hours. Full meta-threading of one gene product from E. coli proteome requires ~12h on average on a single state-of-the-art computing core. Depending on the target sequence length, the subsequent three-dimensional structure modeling using eThread/Modeller and eThread/TASSER-Lite takes additional 1-3 days of computing time. Using the entire proteome of E. coli, we demonstrate that parallel computing on a multi-core system follows Gustafson-Barsis' law and can significantly reduce the production time of eThread. Furthermore, graphics processor units can speedup portions of the calculations; however, to fully utilize this technology in protein threading, a substantial code development is required. eThread is freely available to the academic and non-commercial community as a user-friendly web-service at http://www.brylinski.org/ethread. We also provide source codes and step-by-step instructions for the local software installation as well as a case study demonstrating the complete procedure for protein structure modeling. We hope that genome-wide high-throughput structural bioinformatics using eThread will significantly expand our knowledge of protein structures and their molecular functions and contribute to the thriving area of systems biology.
PDFShare this article
Journal of Computer Science & Systems Biology received 2279 citations as per Google Scholar report