Sequencing, assembly and alignment of orthologs
For G. dybowskii, the Illumina MiSeq and HiSeq2000 sequencing platforms respectively generated 11,550,142 and 75,226,074 paired-end reads (Table 1 and Additional file 2: Table S1). For S. pseudaksaiensis, the corresponding procedures generated 11,769,716 and 76,383,862 reads. After quality control, more than 74 M paired-end reads totaling more than 10 G nt for both of the species were retained and subjected to subsequent de novo assembly. Based on the combined data generated from different sequencing platforms, a total of 139,382 and 114,997 contigs with N50 lengths of 2407 nt and 2283 nt were obtained for G. dybowskii and S. pseudaksaiensis, respectively. In previous projects sequencing the transcriptomes of polyploid cyprinids, the amounts of data generated for the two species were larger than that for G. pachycheilus  but were smaller than those for S. angustiporus and G. przewalskii [32, 33]. However, we hypothesized that the longer length of short reads, especially of those generated by the MiSeq platforms, may lead to some compensation.
For G. p. ganzihonensis, short reads from the gills and kidneys were combined, producing a total of 210,085 contigs with an N50 length of 1447 nt. As expected, when these short reads with different origins were assembled separately, a much larger number of contigs with a shorter N50 length was generated . For S. angustiporus, short reads from the whole eye and brain were combined and assembled, which generated 100,762 contigs with an N50 length of 621 nt. However, based on the same data, the number of contigs generated by Meng et al. was 156,118, with an N50 length of 534 nt . The different numbers of contigs and different N50 lengths may be due to our use of different quality control tools and different versions of de novo assembly software.
A total of 5894 orthologous groups with an alignment length longer than 90 nt after removing gaps was obtained and subject to subsequent evolutionary analyses. Most of the alignment lengths were shorter than 400 nt, and the average and median lengths of the alignments were 219 and 192 nt, respectively (Additional file 3: Figure S2). The relatively short alignment lengths may have resulted from the incomplete sequencing of transcripts. We expect that a much greater number of alignments could be obtained if more tissues were sampled and longer reads were generated, given that all fish species that were subjected to this analysis were polyploid.
Fast-evolving gene categories
The average Ka/Ks value of the branch leading to the Highly Specialized group was significantly lower than that of the branch leading to the Specialized group (p-value < 2.2e-16, Binomial test). By contrast, the average values of Ka and Ks were significantly larger in branches leading to the Specialized group (p-value < 2.2e-16, Binomial test). The same pattern was found when comparing the Ka, Ks and Ka/Ks values between terminal branches leading to G. dybowskii and G. p. ganzihonensis (p-value < 2.2e-16, Binomial test). The somewhat unexpected larger Ks and smaller Ka/Ks values for G. p. ganzihonensis may be due to the following three reasons: the intensive ultraviolet radiation may accelerate the mutation rates of the entire genome; divergence of morphological traits may lag behind sequence divergences; and shorter read lengths generated by Illumina HiSeq sequencing platforms may result in more paralog assemblies.
Several gene categories with the same GO term or KEGG pathway assigned were identified as fast-evolving gene categories along branches II and III, as shown in Fig. 1. There were 117 and 15 fast-evolving gene categories for branches II and III, respectively (fdr < 0.05, Binomial test, Additional file 4: Table S2). Unexpectedly, the evolutionary rates of the GO categories, which were probably relevant to adaptation to high altitudes, such as “response to hypoxia”, “ATP metabolic process” and “response to UV” , were not significantly accelerated in our analyses. This phenomenon likely results from at least three causes: in the waters of the Tibetan Plateau, the ultraviolet intensity may be lower than that of air, and the water may contain higher levels of oxygen; the evolutionary rates of these genes probably accelerated during the adaptation to the first phase of plateau uplift; and many of these genes may also be expressed in other organs, such as the skin, or may have been lost when performing ortholog assignments. Moreover, we considered that more fast-evolving gene categories were represented in branch II and may be the result of relaxed selection, given that branch III is believed to undergo stronger selective pressure. Further investigation of recruiting transcripts from various tissues of more representative species of the three main groups of schizothoracins and outgroups should be used to test these hypotheses.
Candidate genes probably subject to positive selection
Likelihood ratio tests indicated that 40, 36, and 55 genes were probably subject to positive selection along branches I, II, and III, respectively (fdr
: Table S3). Interestingly, several genes were subject to continuous positive selection during plateau uplift (Fig.
). Specifically, two of these genes were subject to positive selection along the branches leading to all of the Primitive, Specialized, and Highly Specialized groups (hereafter termed PSH for convenience), and the numbers of genes subject to positive selection along the branches leading to the Primitive and Specialized (PS), Specialized and Highly Specialized (SH), and Primitive and Highly Specialized (PH) groups were 9, 9, and 4, respectively. The numbers of lineage-specific genes subject to positive selection along the branches leading to the Primitive (P), Specialized (S) and Highly Specialized (H) groups were 25, 16, and 40, respectively.
Venn diagram of the overlap between the candidate genes subject to positive selection along the branches leading to the Primitive, Specialized and Highly Specialized groups of Schizothoracins (branchesI, II and III in Fig. 1)
One of the two PSH genes was highly similar to “ecotropic viral integration site 5b” (evi5b) in zebrafish, which harbors a coiled-coil region that resembles that of structural maintenance of chromosomes (SMC) proteins. These proteins are core components of the condensing and cohesin complexes responsible for regulating chromosomal condensation, pairing and segregation in eukaryotes . Given that the fish that were subjected to analyses in this report were all polyploid, we assumed that evi5b probably played an important role in their diploidization. Another gene, annotated as “supervillin b,” seems to be involved in cytokinesis during cell division . However, why it was subject to positive selection throughout the course of plateau-adaptation in schizothoracins is presently unknown.
Two of the PS genes, “protein jagunal homolog 1-A” (jagn1) and “polypyrimidine tract-binding protein 2” (ptbp2), were subject to positive selection, which is not unexpected, given that the former is an immune gene that can regulate neutrophil function in microbial pathogenesis  and the latter is a reproductive protein that may function specifically in the male germline ; these two kinds of genes usually undergo more adaptive evolution than other genes [38, 39]. Notably, two PS genes, “mycophenolic acid acyl-glucuronide” and “solute carrier organic anion transporter family member 3a1” (slco3a1), which are related to the transport of organic anions [40, 41], were also subject to positive selection, but the reasons for that selection remain unknown.
The light intensity increases with the elevation of the plateau, and one of the SH genes, “guanine nucleotide-binding protein beta-1” (gnb1), which is involved in the pathway “Phototransduction”, was subject to positive selection, likely resulting from adaptation to the greater light intensity. Moreover, intense ultraviolet light can cause DNA damage and even worse, skin cancer . Thus, the “dab2 interacting protein” (dab2ip) gene, which suppresses the development of many cancer types , likely appeared among the SH genes for this reason. Meanwhile, some genes relevant to energy metabolism, for example, the “2-oxoglutarate dehydrogenase” (ogdh)  and atp6v0a1 genes, participate in the “oxidative phosphorylation” pathway and probably contributed to adaptation to the reduced water temperature.
P genes may have been important for adaptation to the first phase of plateau uplift. These genes include several relevant to adaptation to hypoxia, low temperature, and high-intensity ultraviolet light. First, hypoxia can reduce the activity of “arylsulfatase b” (arsb) , but it is a tumor suppressor and is presumably associated with carcinogenesis at low levels [46–48]; therefore, arsb has evolved to adapt to the low-oxygen environment. The formation and maintenance of the cristae structure of the inner mitochondrial membrane is largely based on the mitochondrial contact site and cristae-organizing system (MICOS), and “MICOS complex subunit MIC19” (mic19) may have adapted to maintain the normal mitochondrial structure and function during hypoxia . Mic19, together with “cytochrome c-type heme lyase” (cchl), which is involved in electron transfer processes , and “creg1”, which promotes cardiomyogenesis , may be indirectly relevant to hypoxia. Second, two other P genes, bag1 and a member of the immunoglobulin superfamily that is homologous to several cell adhesion molecules, “cell surface glycoprotein MUC18” (mcam), may have evolved to adapt to high-intensity ultraviolet light, which can cause DNA damage and induce cancer. The reason for this adaptation may be that the products of bag1 can form complexes with the anti-apoptotic protein bcl-2 and can thereby contribute to rendering cells more resistant to apoptosis, and altered bag1 expression levels can be detected in some cancerous cells , whereas mcam cannot be detected in normal melanocytes but can be detected in primary melanomas and is highly expressed in metastatic melanoma cells . Moreover, “bromodomain-containing 2” (brd2) plays a critical role in adipogenesis  and may be adapted to low temperatures.
The third phase of uplift may have been the fastest, and the height has increased by 3000 m since approximately 2.6 Myr ago . During this process, a number of genes were supposed to be under positive selection. The “endoplasmic reticulum transmembrane prolyl 4-hydroxylase” (p4h-tm) gene can hydroxylate the subunit of hypoxia-inducible factor (hif) . In normoxia, hydroxylation of 2 critical proline residues can rapidly degrade hif because the hif-subunit isoforms hif-1 and hif-2 are synthesized constitutively. However, in hypoxia, hydroxylation is inhibited, allowing hif to escape degradation and regulate hypoxia-related genes . Moreover, p4h-tm can regulate erythropoietin production, hepcidin expression, and erythropoiesis . Therefore, we suggest that the modification of p4h-tm may contribute to the adaptation to hypoxic waters in the Qinghai-Tibet Plateau. The protein “nibrin” plays an important role in the initial recognition of DNA damage during the recruitment of proteins responsible for DNA double-strand breaks repair . “Tumor protein d54” (tpd52l2) is a member of the D52-like family, and these proteins have been predicted to interact with each other to regulate cell proliferation . “Tumor protein D52” (tpd52) is frequently overexpressed in several carcinomas and may be the co-expressed tpd52l2 . The two aforementioned cancer-related genes were positively selected, perhaps as the result of adaptation to the increasingly high-intensity ultraviolet light associated with the uplift of the plateau. Notably, the H genes included several genes related to the physiological function of the nervous system, such as “neurexin-1β”, which binds with the postsynaptic membrane protein neuroligin-1 and plays a central role in the formation of synapses in the central nervous system ; “kinesin-like protein kif1a” (kif1a), a unique monomeric motor for the anterograde axonal transport of synaptic vesicle precursors ; “wnt-7a”, which preferentially stimulates excitatory synapse formation and function ; “rest corepressor 2” (rcor1), which is co-expressed with zmynd8 and interacts to form a complex that might be involved in the regulation of neural differentiation in the nervous system ; and “alsin”, which is particularly abundant in motor neurons and is involved in the development of axons and dendrites . Potassium channels are found in most cell types and control a wide variety of cell functions, including modulating neuronal signaling in the brain and the peripheral nervous system, as well as regulating cell volume and the flow of salts across epithelia . “Voltage-gated potassium channel subunit beta-1” (kcnab1) may markedly influence the gating mode of potassium channels , along with “ammonium transporter rh type b” (rhbg), which may be a major ammonium transporter in vertebrates , were also included as H genes. The modification of these genes relevant to ion transport and physiological function of the nervous system may have resulted from adaptation to the changed environment.