DOI: 10.4172/jcsb.1000014
Entropy plays a critical role in the long range structure of biopolymers. To model the coarse-grained chain entropy of the residues in biopolymers, the lattice model or the Gaussian polymer chain (GPC) model is typically used. Both models use the concept of a random walk to find the conformations of an unstructured polymer. However, the entropy of the lattice model is a function of the coordination number, whereas the entropy of the GPC is a function of the root-mean square separation distance between the ends of the polymer. This can lead to inconsistent predictions for the coarse-grained entropy. Here we show that the GPC model and the lattice model both are consistent under transformations using the cross-linking entropy (CLE) model and that the CLE model generates a family of equations that include these two models at important limits. We show that the CLE model is a unifying approach to the thermodynamics of biopolymers that links these incompatible models into a single framework, elicits their similarities and differences, and expands beyond the models allowing calculation of variable flexibility and incorporating important corrections such as the worm-like-chain model. The CLE model is also consistent with the contact-order model and, when combined with existing local pairing potentials, can predict correct structures at the minimum free energy.
Alessandro Ambrosi and Clelia Di Serio
DOI: 10.4172/jcsb.1000023
In gene therapy the integration process of the viral DNA genome into the host cell genome is a necessary step for virus integration. Just few years ago, retrovirus integration was believed to be random and the chance of accidentally activating a gene was considered remote. It has been seen that this process is not random and different viruses may show different preferences to integrate in some specific areas of the genome. Tumorigensis associated to some studies in gene therapy is suspected to be caused by insertion process. Depending on whether the provirus integrates into or in the vicinity of genes (Transcription Start Sites , TSS), normal trascription can be enhanced or disrupted thus inducing oncogenic mutations. This is called “insertional mutagenesis”. Investigating whether an area over the genome could be favoured by retrovirus integration is a crucial aspect in gene therapy. These area are called “Common Integration Sites”(CIS)or “hotspots”. In the paper we stressed the importance of developing statistical procedures leading to a unique definition of CIS rather than a “problem related” definition. We here propose some statistical solutions for the search of hotspots based on the “Peaksheight distribution”, which account within the null hypothesis for the possible non-random behaviour of the integrations.
DOI: 10.4172/jcsb.1000022
The modern science mainly treats the biochemical basis of sequencing in bio-macromolecules and processes in biochemistry. One can ask weather the language of biochemistry is the adequate scientific language to explain the phenomenon in that science. Is there maybe some other language, out of biochemistry, that determines how the biochemical processes will function and what the structure and organization of life systems will be? The research results provide some answers to these questions. They reveal to us that the process of sequencing in bio-macromolecules is conditioned and determined not only through biochemical, but also through cybernetic and information principles.
Hanuman Thota, Raghava Naidu Miriyala, Siva Prasad Akula, K.Mrithyunjaya Rao, Chandra Sekhar Vellanki, Allam Appa Rao and Srinubabu Gedela
DOI: 10.4172/jcsb.1000021
Classification is one of the most common data mining tasks, used frequently for data categorization and analysis in the industry and research. In real-world data mining sometimes it mainly deals with noisy information sources, because of data collection inaccuracy, device limitations, data transmission and discretization errors, or man-made perturbations frequently result in imprecise or vague data which is called as noisy data. This noisy data may decrease performance of any classification algorithms. This paper deals with the performance of different classification algorithms and the impact of feature selection algorithm on Logistic Regression Classifier, How it controls False Discovery Rate (FDR) and thus improves the efficiency of Logistic Regression classifier.
Itaraju J. B. Brum, Daniel Martins-de-Souza, Marcus B. Smolka, José C. Novello and Eduardo Galembeck
DOI: 10.4172/jcsb.1000020
The genomic projects have provided a far wide amount of information that still requires be analyzing and interpreting. That would be impossible to be done without the development of well adapted computational tools that might help the analysis of these data we have collected so far. Due to the need for analyzing proteomes we developed a tool, implemented through the CGI that can simulate the two-dimensional electrophoresis from a whole genome.
Fabrice Armougom and Didier Raoult
DOI: 10.4172/jcsb.1000019
As a result of advancements in high-throughput technology, the sequencing of the pioneering 16S rRNA gene marker is gradually shedding light on the taxonomic characterization of the spectacular microbial diversity that inhabits the earth. 16S rRNA-based investigations of microbial environmental niches are currently conducted using several technologies, including large-scale clonal Sanger sequencing, oligonucleotide microarrays, and, particularly, 454 pyrosequencing that targets specific regions or is linked to barcoding strategies. Interestingly, the short read length produced by next-generation sequencing technology has led to new computational efforts in the taxonomic sequence assignment process. From a medical perspective, the characterization of the microbial composition of the skin surface, oral cavity, and gut in both healthy and diseased people enables a comparison of microbial community profiles and also contributes to the understanding of the potential impact of a particular microbial community.
Vijai Singh, Indramani, Dharmendra Kumar Chaudhary and Pallavi Somvanshi
DOI: 10.4172/jcsb.1000018
Hepatitis B virus is a human infectious disease universally caused by the hepatitis B virus. Its genome size is 3.215 kb. Immunoinformatics tools have been used to predict the epitopes from seven putative protein viz. polymerase, large-S- and middle –S- Protein, S and X- protein, Precore/Core Protein, Core and E- antigen. Total 50 epitopes were predicted for MHC class I and 55 epitopes for class II MHC molecules. These epitopes showed highest binding score at optimum threshold. Epitopes may use as an antigen for diagnosis and also might be helpful for designing peptide based subunit vaccine against Hepatitis B virus.
Shanthi V, Ramanathan K and Rao Sethumadhavan
DOI: 10.4172/jcsb.1000017
The cation-p interaction is an important, general force for molecular recognition in biological receptors. In this study, we have analyzed the energy contribution resulting from cation-p interactions in the set of therapeutic proteins. The contribution of cation-p interacting residues in secondary structure involvement, solvent accessibility, stabilization centers, stabilizing residues and conservation score has been evaluated. Secondary structure of the cation-p involving residues shows that, Arg and Lys prefers to be in strand. Among the p residues, Phe prefer to be in coil, Tyr prefers to be in strand and Trp prefer to be in helix. Among the cation-p interacting residues Arg and Lys were in the exposed regions. Phe and Tyr were in the partially buried region and Trp in the fully buried region. Stabilization centers for these proteins showed that all the five residues found in cation-p interactions are important in locating one or more of such centers. The contribution of stabilizing residues in the cation–p interactions was analyzed. Further, the study shows that, 43 percent of the amino acid residues that are involved in cation-p interactions might be conserved in therapeutic proteins. The comparison between the conventional and nonconventional interactions in the data set, clearly depict the significance of cation-p interaction in the stability of therapeutic proteins. On the whole, the results presented in this work will be very useful for understanding the contribution of cation-p interaction to the stability of therapeutic proteins.
DOI: 10.4172/jcsb.1000016
Using only the transcription network structure information, a probabilistic model was developed that computes the probabilities with which a pair of genes responds simultaneously (SR) or differentially (DR) to a random network perturbation. Study of yeast’s transcription regulatory network in association with gene expression profiles shows that SR and DR probabilities are significantly associated with the distribution of strong co-expression. It is 100 fold more probable to observe co-expression when P(SR)»0.5 for a random perturbation of 3 transcription factors (TFs), allowing for perturbation spread until a depth of 3 connections in the regulatory network. The model also predicts that positive co-expression enhancement is related with the proportion of common TFs (number of TFs that regulate both genes in a pair divided by the total number of TFs that regulate at least one gene in the pair), and not to the absolute number. The relationship between the model derived probabilities and other graph-theoretic measures used to analyse biological networks is discussed.
Yao Li and Rongling Wu
DOI: 10.4172/jcsb.1000015
Cancer susceptibility may be controlled not only by host genes and mutated genes in cancer cells, but also by the epistatic interactions between genes from the host and cancer genomes. We derive a novel statistical model for cancer gene identification by integrating the gene mutation hypothesis of cancer formation into the mixturemodel framework. Within this framework, genetic interactions of DNA sequences (or haplotypes) between host and cancer genes responsible for cancer risk are defined in terms of quantitative genetic principle. Our model was founded on a commonly used genetic association design in which a random sample of patients is drawn from a natural human population. Each patient is typed for single nucleotide polymorphisms (SNPs) on normal and cancer cells and measured for cancer susceptibility. The model is formulated within the maximum likelihood context and implemented with the EM algorithm, allowing the estimation of both population and quantitative genetic parameters. The model provides a general procedure for testing the distribution of haplotypes constructed by SNPs from host and cancer genes and the linkage disequilibria of different orders among the SNPs. The model also formulates a series of testable hypotheses about the effects of host genes, cancer genes, and their interactions on cancer susceptibility. We carried out simulation studies to examine the statistical properties of the model. The implications of this model for cancer gene identification are discussed.
Journal of Computer Science & Systems Biology received 2279 citations as per Google Scholar report