GET THE APP

How Codon Usage Biases Affect Our Ability to Recover the Tree of Life
..

Journal of Phylogenetics & Evolutionary Biology

ISSN: 2329-9002

Open Access

Review - (2021) Volume 9, Issue 1

How Codon Usage Biases Affect Our Ability to Recover the Tree of Life

Justin B. Miller1, Michael F. Whiting1,2, John S.K. Kauwe1 and Perry G. Ridge1*
*Correspondence: Perry G. Ridge, Department of Biology, Brigham Young University, Provo, UT 84602, USA, Email:
1Department of Biology, Brigham Young University, Provo, UT 84602, USA
2M.L. Bean Museum, Brigham Young University, Provo, UT 84602, USA

Received: 17-Nov-2020 Published: 05-Jan-2021 , DOI: 10.37421/2329-9002.2021.9.211
Citation: Justin B. Miller, Michael F. Whiting, John S.K. Kauwe and Perry G. Ridge. How Codon Usage Biases Affect Our Ability to Recover the Tree of Life. J Phylogenetics Evol Biol 8 (2021) doi: 10.37421/jpgeb.2021.8.211
Copyright: © 2021 Ridge PG, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Abstract

Many common phylogenomic algorithms that were well-adapted to classify limited numbers of species have become increasingly intractable as large whole-genome sequencing datasets have emerged. Various novel approaches use characteristics of DNA sequences, including variations in codon usage biases, to establish the phylogenetic relatedness of species. Codon choice affects transcription and translational efficiencies, which can lead to differential protein expression and phenotypic variation that may be a target of selection. Several functional biases exist within genes, including the number of codons that are used, the position of the codons, and the overall nucleotide composition of the genome. Although recent algorithms capitalize on specific codon usage biases to improve phylogenetic tree inference, the phylogenies produced by these algorithms vary significantly and indicate different evolutionary histories. Therefore, we propose that gene-specific analyses of the phylogenetic signal of specific codon usage biases are required to best incorporate these biases in phylogenomic models.

Keywords

Codon usage bias • Phylogenetics • Ortholog • Codon aversion • Codon pairing • Ramp sequence • Phylogenomics

The Continued Importance of Phylogenetic Systematics

Phylogenetic systematics explores the historical and hierarchical relationships among genes, individuals, populations, and taxa. Phylogenies allow biologists to infer similar characteristics in closely related species and provide an evolutionary framework for analyzing biological patterns [1]. Furthermore, phylogenies are statements of homology and are used to organize shared structures or patterns between species [2]. Originally, phylogenies were recovered using only morphological data. However, with the increased availability of molecular data, a combined approach using morphology and genetic markers is typically used in phylogenetic analyses [3]. Although genetic data provide researchers with access to more species, the datasets typically require significant data cleaning (e.g., alignment and annotation) before they become useful. Some of the greatest difficulties in recovering phylogenetic trees from molecular data (e.g., multiple substitutions at the same position between ancient terminal branches or no substitutions in a gene between short internal tree branches) are explored by Philippe, Brinkmann [4]. These issues have recently become more pertinent as sequencing costs have decreased and genomic data now largely span the Tree of Life.

Codon Usage Biases Span the Tree of Life

Codon usage biases are present throughout molecular datasets. There are 61 canonical codons plus three stop codons that indicate the incorporation of 20 amino acids and the stop signal [5]. Since there are more codons than amino acids, the term synonymous codon is used to describe how multiple codons encode the same amino acid and were presumably identical in function. However, an unequal distribution of synonymous codons occurs within genomes, and highly expressed genes have especially prominent biases that suggest synonymous codons might play different roles in species fitness [6]. Furthermore, an unequal distribution of tRNA anticodons directly coupling codons also varies between species, leading to the wobble hypothesis: tRNA anticodons do not need to latch onto all three codon nucleotides during translation [7]. Codon usage is highly associated with the most abundant tRNA present in the cell [8], and codon usage patterns affect gene expression [9]. Some phylogenetic differences in synonymous codon usage biases may be explained by non-random mutations or selection for phenotypic differences caused by differential gene expression. Although codon usages directly affect phenotypes by altering gene expression, common phylogenomic approaches typically ignore the subtle influences of codon usage biases when recovering a phylogeny. Common phylogenomic approaches are described below.

Overview of Common Phylogenomic Techniques that do not Utilize Codon Usage Biases

Homologous characters are often identified by aligning orthologous gene sequences and identifying character state changes of amino acid residues or nucleotides that are then used to recover a tree topology. This multi-step process is time-consuming and requires significant data preprocessing (e.g., orthologous gene annotations). Non-homologous sequence comparisons have also been explored in alignment-free methods and will subsequently be discussed.

Ortholog identification

Orthologs are genes within two or more species that usually share the same function because they are derived from the same ancestral gene in the most recent common ancestor [10]. In contrast, paralogs and xenologs may share the same function, but can arise from gene duplication or horizontal gene transfer. Paralogs may not be under the same evolutionary pressures and should not be compared in a direct positional alignment because these comparisons are often a poor indicator of phylogenetic relationships [10]. An in-depth evaluation of ortholog identification techniques is presented by Tekaia [11]. Once an ortholog is identified, phylogenetic studies typically require a multiple sequence alignment to align homologous characters. Reviews of some common multiple sequence aligners such as T-coffee [12], MUSCLE [13], Clustal [14], Clustal Omega [15], and MAFFT [16] can be examined elsewhere [17,18].

Recovering the phylogenetic tree

Maximum parsimony: Maximum parsimony assumes that each character is equally important and minimizes the number of character state changes to recover the relatedness of species. Proponents of parsimony point to its explanatory power and ability to minimize ad hoc hypotheses [19]. However, parsimony can be misleading if unequal evolutionary rates between lineages exist because longer evolutionary branches have a tendency to form monophyletic groups even if the species have different phylogenetic histories [20]. PAUP [21] and TNT [22] are two popular software packages to identify phylogenies based on parsimony.

Maximum likelihood: Maximum likelihood requires specific models of evolution that show the probability of character state changes and can be used in the likelihood function. Maximum likelihood calculates the probability of obtaining the data given the model and tree topology. One of the main reasons that maximum likelihood estimates have gained traction is the mathematical property of consistency, which states that as more data (i.e., phylogenetically informative characters) are added, the likelihood function will converge to the correct tree, assuming the underlying model is correct [23,24]. Furthermore, maximum likelihood takes into account more complex modeling of datasets, and the modeling has become more computationally tractable through faster algorithmic design and faster computer processors [25]. However, in contrast to maximum parsimony, maximum likelihood is more likely to separate highly divergent species, leading to long branch repulsion [26]. MEGA X [27], RaxML [28], IQ-TREE [29] and PHYLIP [30] are commonly used to recover phylogenies using maximum likelihood.

Bayesian inference: Bayesian phylogenetic estimates use posterior probabilities of a distribution of trees calculated with Markov Chain Monte Carlo (MCMC) techniques to evaluate tree probabilities. Bayesian inference adds statistical support to phylogenies and produces more accurate trees in simulations. However, Bayesian inference is highly sensitive to prior probabilities [31]. How Bayesian techniques compare to other phylogenetic methods is addressed by Yang and Rannala [32], and popular Bayesian techniques are implemented in MrBayes [33,34] and BEAST2 [35].

Distance-based and alignment-free: Distance-based phylogenies use techniques such as neighbor-joining to quickly produce relatively good trees that are often used as a starting point for phylogenetic analyses using other methods. Neighbor-joining decomposes a star tree by taking the two closest taxa based on the number of character changes between them, pairing the taxa together to form a new node, recalculating weights based on the shortest distance between the new node and all other species (or nodes), and repeating this process until all taxa are paired. Although this technique is computationally fast, compressing the sequences into distances loses information and phylogenetic reliability is difficult to ascertain from highly divergent sequences [36]. However, distance-based methods are frequently used when sequence alignments are not available or in whole genome comparisons. Since genome assembly and multiple sequence alignments affect phylogenies more than the algorithm used to recover the phylogeny, alignment-free methods attempt to recover shared phylogenetic history without an alignment by comparing basic characteristics of genomes (i.e., GC content, k-mer counts, codon usages, etc.) [37]. Broadly, alignmentfree approaches can be classified into three main groups. The first group analyzes the frequency of words with a certain length (e.g., FFP [38, 39] and CV Tree [40]). The second group matches lengths of overlapping sequences (e.g., ACS [41], KMACS [42], and Kr [43]). The last group calculates informational content between sequences (e.g., Co-phylog [44], FSWM [45], andi [46], CAM [47], and codon pairing [48]). These techniques are still being developed, and new software packages are updated to recover more robust trees.

Assessing the phylogenetic tree: Bootstrapping is a common technique to assess the robustness of a phylogeny by randomly sampling characters with replacement and determining the extent to which the recovered phylogenetic tree changes. Proponents of bootstrapping point to its ability to uncover the phylogenetic signal under the noise of phylogenetically uninformative characters. Bootstrapping also has statistical properties that allow a confidence value to be placed on clades [49]. On the other hand, critics of bootstrapping (and phylogenomic algorithms in general) point to the statistical assumptions that are violated in DNA characters because DNA characters cannot be considered independently and identically distributed [49]. Furthermore, a bootstrap proportion is generally unbiased but highly imprecise, meaning the bootstrap number can give high confidence that the data support a clade even if the clade is not real [50].

Biological Construct of Codon Usage Bias

Phylogenomic studies have recently used codon usage biases to recover species relationships with or without ortholog annotations. Various codon usage biases appear to track speciation events and can cause gene expression to either increase or decrease [51]. Furthermore, codon usage biases affect protein and RNA folding, which impacts transcription and translational efficiency, as well as gene expression. Although genetic drift drives global codon usages, the majority of codon usage biases within individual genes is influenced by translational selection [52]. Figure 1 outlines how codon biases affect protein levels.

phylogenetics-evolutionary-biology-Biases

Figure 1. How Codon Usage Biases Affect Protein Levels. Many types of codon usage biases directly affect DNA, RNA, and protein secondary structure. They also affect transcription and translational efficiency. The mechanisms by which ramp sequences, codon pairing, tRNA competition, and the GC nucleotide composition affect protein levels are depicted.

Codon usage metrics

Originally, the Codon Adaptation Index was used to compare the relative codon usage of the most commonly used codons within highly expressed genes [6]. This metric was soon replaced by the effective number of codons, which quantified the difference in codon usage versus the expected usage if all synonymous codons were used equally [53]. Because of their simplicity, the effective number of codons and codon adaptation index are still widely used techniques. However, those methods oversimplify the dynamics of codon usage. The tRNA adaptation index (tAI) takes into account the complex relationship between tRNA and codons by using tRNA copy number, gene length, number of codons, and the preponderance of tRNA wobble to determine codon optimality [54,55]. Building on tAI, the normalized translational efficiency (nTE) measurement balances tRNA supply and demand on codon usage and considers cellular tRNA dynamics. A codon is considered "optimal" if the relative supply of its cognate tRNAs exceeds the codon's usage [56]. Unfortunately, tAI and nTE require data that are not always available in a species and can vary between individuals and cell types, limiting their use across the Tree of Life.

Biological implications of codon usage bias

Selection toward decreased translational efficiency: Occasionally, suboptimal codons are beneficial to cells because they slow the ribosome (or polymerase) and allow for more precise, deliberate gene translation (or transcription). Codon usage biases affect mRNA secondary structure so strongly that local mRNA secondary structure can be used to predict codon usage in highly expressed genes [57]. Highly expressed genes also have a ramp of 30-50 slowly-translated, rare codons at the 5' end of most protein coding sequences [58] that serves to evenly space ribosomes [59] and reduce mRNA secondary structure [60] at translation initiation. These ramp sequences are population-specific and can also have disease implications [61]. A comprehensive analysis of ramp sequences from all domains of life, as well as a method to extract ramp sequences from individual genes is presented in Miller, Brase [62].

Additionally, the cell cycle impacts codon choice for suboptimal codons. Since tRNA expression levels are highest during the G2 phase, s uboptimal codon usage for genes expressed during this phase is also highest. The G1 phase has the lowest tRNA expression, and genes expressed during G1 have a tendency toward optimal codon usage [63].

Codon usage biases in various bacteria are associated with species lifestyle [64,65]. For cyanobacteria (photosynthetic bacteria), selection toward sub-optimal codon usage produces the circadian clock conditionality, where the circadian clock is expressed only under certain environmental conditions where cyanobacteria are not intrinsically robust [66]. Similarly, the pathogenicity and habitat of Actinobacteria (High GC gram positive bacteria important for soil systems) also influence codon usage, where aerobic species vary significantly from anaerobic species, and pathogenic species vary significantly from non-pathogenic species [67]. In each case, codon usage alone explains bacterial adaptation to their environment.

Selection toward increased translational efficiency: Highly expressed genes tend to use more optimal codons after the ramp sequence to increase overall gene expression because once ribosomes (or polymerases) are evenly spaced; they can translate optimal codons more efficiently [51]. Faster translation is due to decreased wobble interactions, increased optimal tRNA composition, and decreased competition from synonymous codons within a gene [68]. Selective pressures for protein expression also act on mRNA sequences to optimize co-translational folding within polypeptides in over 90% of high expression genes and about 80% of low expression genes [56]. Furthermore, gene body methylation is strongly correlated with codon usage bias and appears to systematically replace CpG bearing codons, potentially influencing optimal codon establishment [69].

Recharging a tRNA while the ribosome is still attached to the mRNA strand is another strategy used to increase translational efficiency and decrease overall resource utilization. Co-tRNA codon pairing occurs when two non-identical codons that encode the same amino acid are located in close proximity to each other in a gene. Identical codon pairing occurs when identical codons are located in close proximity in a gene sequence. Co-tRNA and identical codon pairing are mechanisms to reuse a tRNA by recharging the tRNA with an amino acid before it diffuses from the ribosome, increasing translational speed by approximately 30% [70]. Although co-tRNA codon pairing occurs more prominently in eukaryotes and identical codon pairing occurs prominently in bacteria [71] and archaea [72], both co-tRNA and identical codon pairing are phylogenetically conserved in all domains of life [48].

Other systematic biases also influence codon choice. Background dinucleotide substitution biases from GC to AT and AT to GC often coincide with shifts in optimal codons [73]. Even under sustained selective pressure, GC content at the third codon position is highly correlated with overall GC content in a gene, suggesting that optimal codons are affected by genomic GC content [73]. In an analysis of 65 eukaryotes and prokaryotes, GC content accounted for 76.7% of amino acid variation [74]. A summary of mechanisms that affect codon usage bias are shown in Table 1.

Table 1: Mechanisms affecting codon usage biases.

Name Location/ Domain Description
Ramp Sequence 30-50 nucleotides downstream of start codon The ramp sequence consists of rare, slowly translated codons that increase ribosomal spacing, reduce mRNA secondary structure, and slow initial translation.
Co-tRNA Codon
Pairing
More prominent in eukaryotes. Phylogenetically conserved in all domains of life tRNA are recharged with amino acids for synonymous codon translation when synonymous codons are in close proximity to each other. Recharging allows the tRNA to stay attached to the ribosome and significantly increases translation efficiency.
Identical Codon Pairing All domains of life tRNA are recharged with amino acids for identical codon translation when identical codons are in close proximity to each other. Recharging allows the tRNA to stay attached to the ribosome and significantly increases translation efficiency.
tRNA competition Eukarya, bacteria, and archaea Cognate, near-cognate, and non-cognate tRNA may attempt to bind to an mRNA codon. If relatively few cognate tRNA are available, translation will slow because other tRNA attempt to bind to the same codon. This process is essential for translation elongation, efficiency, and accuracy [75].
GC Content All domains of life Overall GC content in a gene is highly correlated with GC content at the third codon position. GC content influences over two-thirds of codon variation.

Codon Usage Bias in Phylogenetic Systematics

Codon usage biases are less likely to be affected by random mutations than expected based on genomic mutation rates because codons often reside in conserved genomic regions [76]. Therefore, random mutations appear to play less of a role in phenotypic variation caused by codon usage, and the extent to which codon usage can be used in phylogenomics is currently being explored.

Codon usage in maximum likelihood

Limited codon substitution models have been used for decades in maximum likelihood estimates. However, until recently, a full 61 x 61 codon matrix was too computational intensive to apply to more than a few species and genes [77]. Somewhat surprisingly, after a 61 x 61 codon matrix became computationally viable, it was determined that the full matrix is not always optimal because models that use a fixed codon mutation rate for phylogenetic tree reconstruction fit the data better than a variable codon substitution rate. The apparent variation in codon substitution is actually caused by variable selection against amino acid substitutions in the regions used to develop the model, specifically mitochondria, chloroplast, and hemagglutinin proteins [78]. Maximum likelihood estimates that use codon models outperform a parsimony analysis only when codon usage is highly skewed and is not affected by asymmetry in substitution rates (approach validated using Drosophila) [79].

Because full codon models are computationally intensive and do not always elucidate more information than simpler models, common likelihood approaches use non synonymous to synonymous mutation rates per site (dN/ dS) instead of the complete codon model. If the codon usage bias is strongly conserved, then dS will decrease and dN/dS will increase within a population. The dN/dS ratio was used in Drosophila lineages, and helped determine that the Notch locus had evolved to include suboptimal codons [80]. Using 158 orthologous genes, maximum likelihood also detected a strong shift from suboptimal to optimal codons in two lineages of Populus [81]. Detecting the cause of such shifts in codon usage is important for determining the biological significance of mutations. SCUMBLE (Synonymous Codon Usage Bias Maximum Likelihood Estimation) uses a model inspired by statistical physics to identify different sources of codon bias including selection and mutation [82]. SCUMBLE is also used as a filter to identify regions with insufficient information for analysis. This technique helped determine that natural selection shaped codon biases in Strongylocentrotus purpuratus (purple sea urchin) by limiting the analysis to only regions with sufficient support [83]. Shifts in mutation and selection rates allow the evolutionary history of species to be recovered using this method.

Violations of maximum likelihood statistical properties in a codon model

Many assumptions of the statistical properties in maximum likelihood are violated by a codon model. For instance, species are constrained to taxon-specific pools of tRNA, and triplets in coding sequences are not independent. Algorithms with statistical properties that require character independence, such as maximum likelihood, violate that rule for genetic data [84]. Furthermore, the codon model assumption of homogeneity of codon composition leads to seriously biased phylogenetic estimations when that assumption is violated [85].

Horizontal gene transfer is another important mechanism in evolution and complicates phylogenetic analyses in bacteria because 81±15% of genes have been laterally transferred among bacteria at some point in their evolutionary history [86]. Common transposable elements in eukaryotes also arose from horizontal gene transfer, with over 50% of some mammalian genomes originally arising from horizontal gene transfer [87]. Detecting horizontal gene transfer has been challenging, and codon bias is a poor indicator of horizontal transmission, normally underestimating the effects of lateral transfer [88-90]. However, codon composition is an excellent indicator of whether a gene will become fixed in a species after a lateral transfer event [90]. The concept of horizontal gene transfer not only complicates a general phylogenetic analysis, but suggests that a standard bifurcating tree might not be the best choice in analyses of bacteria or archaea [91]. Although it is known that codons (and DNA in general) do not strictly follow many of the assumptions of phylogenetic analyses, the bifurcating tree is still the most widely used phylogenetic representation, and generally depicts statements of homology even when some assumptions are violated.

Codon usage in viruses

Phylogenies have also been used to predict the pathogenicity of viruses and viral interactions with their hosts. Bee-infecting viruses have strong correlations in their codon usages with their hosts, and the infected insects' codon usage similarity follows the insect phylogeny [92]. Furthermore, human-host viruses tend to share the same codon usages as proteins expressed in tissues that the viruses infect [93]. More specifically, the key determinant in codon patterns within herpes viruses were the overall GC content, GC content at the third codon position, and gene length [94]. In contrast, mutation played a larger role in Zika viruses, with higher frequencies of A-ending codons [95]. However, evidence of natural selection in Zika viruses also suggest that they evolved host- and vectorspecific codon usage patterns to successfully replicate in various hosts and vectors [96]. In hepatitis C, preferred codon usages did not always match the phylogenetic histories of the viruses as determined by sequence similarity, indicating that codon usage might provide additional information not identified by common phylogenomic approaches [97].

Successful implementations of codon usage bias in phylogenetics

Beyond analyzing pathogenicity, phylogenetic inferences using codon usage biases from all domains of life have successfully uncovered several interesting biological principles. One study found compositional differences in codon usage between monocots (i.e., flowering plants whose seeds contain one embryonic leaf) and dicots (i.e., flowering plants whose seeds contains two embryonic leaves), where monocots had lower DNA background compositional bias, but higher codon usage bias than dicots [98]. Another technique used a distance-based clustering method of codon usage weighted by nucleotide base bias per position (i.e., the frequency of a codon over the product of the frequency of the nucleotide at the first, second, and third positions) to recover the phylogeny of closely related Ectocarpales (brown algae) [99]. The phylogenetic signal of codon usage was not limited to nuclear DNA, and mitochondrial synonymous codon usage in plants was associated with intron number that mirrored species evolution [100].

Creative attempts at analyzing codon usage have also proven fruitful. A binary representation of codon aversion (i.e., creating a character matrix based on codons which are not used in an ortholog) successfully recover the phylogeny of various tetrapods, showing that complete codon aversion is also conserved [101]. That study also found that stop codon usage had the highest phylogenetic signal [101], meaning a codon matrix of 64 x 64 (the probability of all codons including the stop codons transitioning to all other codons) might be better than the traditional 61 x 61 codon matrix in a likelihood framework. Codon aversion has also been used in an alignment-free context by comparing sets of codon tuples found in a genome, where each tuple is a list of codons not used in a gene [47]. A similar technique found that codon pairing (i.e., the same codon being used within a ribosomal window) is phylogenetically informative under both alignment-free and parsimony frameworks [48].

Other studies map codon usage in a particular gene across a reference phylogeny. This technique can produce meaningful representations of codon transitions across genes. Mapping the codon usage bias of a gene tree to a species tree revealed purifying selection among the actin-depolymerizing factor/cofilin (ADF/CFL) gene family [102]. This technique also showed that codon usage is significantly correlated with gene age within metazoan genomes [103]. Codon aversion in all domains of life was also mapped to the Open Tree of Life (OTL) [104] and showed that codon aversion follows established species relationships more closely than expected by random chance [105].

Contradictory Signals

At times, codon usage dynamics have contradictory signals that indicate different evolutionary histories. For instance, Miller, McKinnon [47], Miller, McKinnon [48], and Miller, McKinnon [105] used the same dataset to conclude that codon aversion can be used in an alignment-free algorithm, codon pairing can recover phylogenies using either parsimony or alignment-free techniques, and codon aversion is largely conserved within orthologs across the Tree of Life. However, the reported trees from those three studies vary significantly from each other (Figure 2), indicating codon aversion and codon pairing do not have the same evolutionary constraints. Even using the same codon usage bias, the model used to recover the phylogeny produced contradictory results, with recovered phylogenies differing by 10- 45%. Therefore, gene selection appears to play a pivotal role in recovering the species tree, and more work needs to be done to identify which genes have the highest phylogenetic signal under each codon model. Perhaps a combination of different codon usage biases, or using certain biases in only highly expressed genes, may more adequately track speciation.

phylogenetics-evolutionary-biology-median

Figure 2. Difference in Percent Branch Overlap of Seven Phylogenies Recovered Using Codon Pairing or Codon Aversion. Mean differences in percent branch overlap are marked with an 'X', median differences for each taxonomic group are marked with a horizontal line, and outlier are individually displayed on the box plot.

Conclusion

Codon usage biases continue to be widely studied in a phylogenetic construct. However, their application in phylogenomics remains limited by their incorporation in current phylogenomic techniques. While some applications attempt to include codon usage biases either as a singular character state in parsimony or in combination with the overall maximum likelihood model, many key attributes of codon biases remain unexplored. For instance, the cause of differing phylogenetic signals between codon aversion and codon pairing has yet to be identified. Additionally, although it is known that tRNA supply and demand is correlated to codon usage, a model does not currently exist to assess tRNA supply and demand in a maximum likelihood framework. Future codon analyses will necessitate more complete datasets with accurate tRNA expression values in different tissues and species. A more robust dataset of tRNA expression values would also facilitate more precise codon modeling. Furthermore, since codons are used to regulate gene translational efficiency, codon models might require gene expression data in addition to the full (or reduced) codon matrix, and some codon usage biases may track speciation only within certain genes.

Codon usage bias is an exciting biological principle that has not been fully utilized in phylogenetic systematics. Few likelihood methods incorporate specific codon usage biases in their models beyond nucleotide substitution rates, and many aspects of the ramp sequence, co-tRNA codon pairing, gene expression, and tRNA expression remain unknown. Although codon usage biases have been shown to be phylogenetically conserved, many of the biological principles surrounding codon usage bias have yet to be fully utilized in phylogenomics. Therefore, including specific codon usage biases in phylogenomic algorithms and identifying the gene-specific biological implications of each codon usage bias will enable future phylogenomic studies to identify more robust phylogenetic trees and aid in understanding nuanced phylogenetically conserved mechanisms affecting gene expression and overall species fitness.

References

Google Scholar citation report
Citations: 911

Journal of Phylogenetics & Evolutionary Biology received 911 citations as per Google Scholar report

Journal of Phylogenetics & Evolutionary Biology peer review process verified at publons

Indexed In

 
arrow_upward arrow_upward