Nucleotide composition is considered to be one of the most important factors that shapes codon usage among genes and genomes, with GC-content reflecting the overall trend of codon mutation . The average GC-content of the total of 8,620 T. multiceps genes investigated was 49.27% (slightly below the average AT content), while the average GC3 content was slightly higher at 51.43%. These results are consistent with the GC and AT contents of Giardia lamblia  and T. saginata .
The average effective number of codons (ENc) among the T. multiceps genes was 56.68, with only two genes showing a strong CUB (ENc < 36). This indicates random codon usage in T. multiceps, with no strong codon bias, which is in accordance with the pattern in B. mori . Furthermore, more than half of the high frequency codons ended with G/C (18/32); this phenomenon has been found in many other GC-rich organisms, including bacteria, archaea, fungi, wheat and rice [40–43].
CUB is a complex evolutionary phenomenon known to exist in a wide variety of organisms, including prokaryotes, as well as unicellular and multicellular eukaryotes . Numerous hypotheses have been proposed to explain this phenomenon including the neutral theory  and the selection-mutation-drift balance model . The number of factors reported to affect CUB is increasing, with gene length , GC-content [46, 47], recombination rate [46, 48–50], and gene expression level [45, 48, 51] shown to exert influences. Other studies have shown that RNA and protein structure [29, 52–54], intron length , population size , evolutionary age of the genes , and environmental stress , in addition to the hydrophobicity and the aromaticity of the encoded proteins [59, 60] are influencing factors. In this study, various factors such as gene compositional constraints, mutation pressure, gene expression level and, in particular natural selection, were all found to contribute to shaping the codon usage of T. multiceps. Other factors, such as hydrophobicity and aromaticity of the encoded proteins were implicated in generating the CUB of T. multiceps, while our analysis indicated that amino-acid selection also affects translational efficiency of T. multiceps.
Base changes in first and second positions of the codon lead to changes in the encoded amino-acid sequence, while the third codon position rarely induces such sequence variation. It is generally acknowledged that the third codon position is subject to lower selection pressure compared with that of the first and second codon positions. Thus, ENc-GC3s correlation analysis, PR2 bias plot analyses and neutrality plot analysis based on GC3 or GC3s are vitally important for elucidation of the CUB patterns in many organisms.
ENc-GC3s correlation analysis showed that mutation plays a minor role in shaping CUB in T. multiceps, while other factors, such as natural selection, exert significant effects on CUB in this species. Additionally, correlation analysis indicated that the CUB of ribosomal genes was shaped mainly by mutations, while essential genes were affected mainly by natural selection. Further evidence in support of this conclusion was provides by the PR2 bias plot analyses, which also indicated that selection is the major factor that shapes CUB in T. multiceps. ENc plots provide a method of quantifying the CUB of synonymous codons; however, this analysis alone is insufficient for determining the exact contributions of natural selection and mutational pressure to CUB within a species [35, 61]. In this study, we generated a neutrality plot to provide more precise information on this issue. According to the neutrality plot, directional mutation pressure accounts for only 11.04% of the effect, while other factors, such as natural selection, account for 88.96% [34, 35]. Therefore, natural selection was thought to be the major factor affecting the codon usage variation in T. multiceps. These results are similar to those obtained in investigations of B. mori .
Natural selection can enhance efficiency of transcription/translation by preferential usage of alternative synonymous codons. The study of Drosophila and Caenorhabditis revealed that significant codon usage bias was existed in highly expressed genes, and this is due to the increased effectiveness and accuracy during translation by preferential usage of optimal synonymous codons [45, 62]. Since synonymous mutations do not change the final protein product, selection for optimal codons is thought to be fairly weak . This explains the possible relation between natural selection and the overall low levels of CUB in T. multiceps.
Previous studies have revealed that CUB in mammals is not correlated with the gene expression levels. However, in Arabidopsis thaliana , Oryza sativa , C. elegans , B. mori  and T. saginata , genes expressed at relatively high levels exhibited a greater degree of CUB.Various analyses can be used to assess gene expression levels, including EST (expressed sequence tag) counting , CAI values [10, 45] and ENc values . In this study, calculation of CAI values was adopted to evaluate the levels of expression of T. multiceps genes. CAI and ENC values showed a significant negative correlation with PC1, suggesting gene expression levels influence CUB in T. multiceps, with stronger CUB in highly expressed genes.
For various organisms, such as Populus tremula , Caenorhabditis elegans , Drosophila melanogaster , Arabidopsis thaliana  Silene latifolia  and T. saginata , significant negative correlations were found between gene length and CUB. To account for this phenomenon, Moriyama and Powell proposed that selection constraints tend to reduce the length of highly expressed genes to generate shorter proteins that perform functions similar to those of longer proteins; thus reducing the energy expenditure required to generate a protein with a specific function . In T. multiceps, however, gene length was found to be irrelevant in shaping CUB, although it was positively correlated with the gene expression level. This finding is inconsistent with that obtained in studies of T. saginata  and further investigations are required to explore the mechanisms of this phenomenon.
Identification of optimal codons could provide valuable information for use in molecular genetics studies of evolutionary and rational rearrangement (transformation) of codon usage [71–73]. Under normal circumstances, the optimal codons tend to reflect the GC and AT content of the genomes [43, 74], such as those of bacteria, archaebacteria and fungi. In the present study, the GC-content of codons in the T. multiceps transcriptome was lower than the AT content (GC:AT, 0.97:1), although 21 optimal codons found to be GC-rich (AU:GC, 41:22), with most ending in G/C. The same phenomenon has been reported in other organisms, such as Populus tremula (average GC-content, 45%) , Drosophila (average GC-content, 35%) , T. pisiformis (average GC-content, 49.48%), and T. saginata (average GC-content, 43.61%), with most favored codons being GC-rich or ending with G and/or C. In Triticum aestivum , Hordeumvulgare , Oryza sativa  and Zea mays , the average GC-content is 55.6%, 59.3%, 56.8% and 60% respectively, with optimal codons being AT-rich or ending in G or C.
Correspondence analysis (COA) is widely used to elucidate the variation in synonymous codon usage among genes. However, COA based on RSCU can be affected by biases such as amino acid biases . Principal Component Analysis (PCA) using relative adaptiveness  or within-block correspondence analysis  can avoid the biases. Thus in this paper, PCA using relative adaptiveness was adopted to perform multivariate analysis other than correspondence analysis.