In the last few years, dozens of gene fragments associated with apomixis were identified in Paspalum spp. through mapping analysis [26, 27, 29, 42, 64, 65], BAC sequencing [40, 41], differential display [45, 48] or cDNA AFLP . However, these approaches were compromised either by the structural organization complexity of the apomixis locus or by the relatively short size of candidate fragments, making difficult the recovery of reliable sequence information for further annotation and functional characterization. Therefore, the main goal of this study was to generate a global, annotated reference database for the reproductive transcriptome of Paspalum and, to achieve this, the use of a long-read RNAseq technique allowing the establishment of robust assemblies was mandatory.
Here, apomictic and the sexual 454/Roche libraries were compared to discover genes involved in the switch between both reproductive modes. A total of 35,430/ 37,124 and 48,842 different genes (isogroups) were identified in the sexual, the apomictic and the global assembly, respectively. The highest number of genes (isogroups) detected in the global assembly is probably reflecting both a deeper coverage of the transcriptome (by random sequencing of a biological duplicate) and the presence of genes that are specifically expressed in either of the samples, which were originated from plants with different genotypes and reproductive modes. Besides, the number of alleles/splice variants (isotigs) detected in the sexual genotype C4–4× (43,888) was lower than that observed in the apomictic one Q4117 (47,569). This result is in agreement with the plants biological origin: C4–4× is a dihaploid generated by colchicine duplication of a natural diploid ; by contrast, Q4117 is a natural highly heterozygous tetraploid genotype. Therefore, a lower genetic variation could be anticipated in C4-4× with respect to Q4117. Validation of our assemblies by comparison with physically supported sequences indicated that both databases extensively cover the P. notatum floral transcriptome. Moreover, alignment scores suggested their high potential for full length transcript identification. The database may allow the recovery of allele/splice variants corresponding to dozens of apomixis and ploidy response candidate genes that had been identified and verified in prior works through PCR (polymerase chain reaction)-based approaches [45, 48, 53, 67]. It can also be used for the generation of EST-SSR (Expressed Sequence Tag-Simple Sequence Repeat) markers covering the Paspalum spp. genome for mapping experiments and breeding.
We conducted GO analysis in the whole reference assembly in order to reveal the location and nature of the biological processes operating in florets. The apomictic and sexual reads were mapped on this general annotated assembly as a common reference. The GO annotation and subsequent general analysis of major classes provided a general view of gene activity in reproductive organs. Moreover, its use might greatly simplify the comparison of the molecular routes involved in species displaying different apomixis mechanisms. According to our results, in P. notatum flowers, the products of protein-coding transcripts are distributed in at least 772 different cellular locations, performing 2,102 molecular functions and participating in 3,375 biological processes. Note that these numbers correspond to different reproductive developmental stages, and also include the two possible reproductive modes via seeds that alternatively occur (apomixis and sexuality). Moreover, the biological samples used consisted of whole flowers (spikelets), which comprise the raquis, glumes, lemma, palea, ovary and anthers. Therefore, the whole set of transcripts characterized here derives from a variety of cell types including somatic cells and male and female reproductive cells from premeiosis to anthesis. Accordingly, the database will be very useful to identify any transcript expressed in Paspalum flowers at detectable levels. However, the spatial and temporal specificity of expression will need to be assessed by using additional experiments based on in situ hybridization, specific promoter-directed markers expression and/or tissue- or cell-type q-PCR.
One of the major weaknesses of molecular reproductive research in Paspalum was the need to carry out laborious RACE (Random Amplification of cDNA Ends) experiments in order to isolate the full sequences of candidates genes expressed in flowers. RACE amplification experiments were in fact conducted successfully for several candidates [55–57], but the recovery of full genic sequences turned out very difficult and time consuming, especially for complex or long transcripts. Moreover, the characterization of all allelic/splice variants expressed in flowers was virtually impracticable. Here, through the use of next generation sequencing, we successfully recovered the full cDNA sequences of 24 differential display (DD) fragments, including several detectable alleles/splice variants, therefore validating the value of our database for the detailed characterization of specific gene family members. From the 65 DD sequence segments used as queries, only 24 matched the apo and sexual 454 isotigs. Lack of detection of the remaining sequences could be explained from the emergence of false positives in DD analysis and/or poor 454 sequencing coverage. However, 20 of the sequence segments used as queries showed no BLAST hits in the sequence databases of plant species and several of them were amplified from internal parts of the target transcripts (displaying two random decamers located at the edges). Therefore, they might correspond to different parts of the same transcript, leading to an overestimation of the rate of sequences undetected in the 454/Roche database.
The public availability of a global database of transcripts expressed during reproductive development will also be of invaluable benefit for harnessing important target traits in Paspalum breeding research. The use of apomixis is currently having a direct impact on the breeding of natural Paspalum species . Among other species of the genus, P. notatum and P. dilatatum are the most widely cultivated forage grasses. The specific objectives of the breeding are directed to the enhancement of cold tolerance and cool-season growth, seed yield, grazing/biotic stress resistance and nutritive value . Advanced breeding programs were conducted under two different approaches: 1) germplasm collection, evaluation, selection, multiplication of the best ecotypes, and release of elite genotypes as new apomictic cultivars; and 2) hybridization using sexual mother plants and apomictic male progenitors, followed by the selection of superior full apomictic progeny hybrids, which breed true due to its clonal reproductive mode. The availability of the sequence database reported here would make possible the characterization of numerous genes responsible for important metabolic/biological pathways and their transfer to different genetic backgrounds by traditional breeding or genetic engineering. Moreover, we recently established a biolistic transformation platform for tetraploid P. notatum in our laboratory , a tool that will certainly benefit from the sequence data we generated here to engineer the expression of genes related to reproduction and seed yield. Further characterization of leaf and root transcriptomes would be also desirable in order to provide additional useful information.
While the apomictic, sexual and global 454 reference libraries are useful to rescue the full sequences of a considerable number of candidate genes, its use as a tool to reveal differential expression is more limited, because contrasting representation can be masked by heterochronic expression in sexual and apomictic samples and/or differential expression being restricted to a very particular developmental stage and or individual cells. In order to achieve accurate assessment of differential expression, deeper coverage approaches should be used, i.e. Illumina sequencing. However, the construction of a reference transcriptome is a pre-requisite to Illumina (short-read) sequencing, in order to reach a sound assembly in these complex non-model polyploidy systems. Therefore, although the estimation of differential expression is considered preliminary and needs further validation, it revealed a number candidate genes and cluster networks that are potentially altered during apomictic development. Many of the top ranked genes in the differential expression list are included in protein-protein interaction clusters related to abiotic and biotic stress response, growth, development, cell death and senescence. Particularly, the detection of numerous candidates related with the first category supports previous hypothesis pointing to the participation of stress response pathways on meiosis initiation  and the early preparatory events ahead of apomeiotic transition , as well as the influence of environmental factors and polyploidization genomic shocks on the expressivity of facultative apomixis . Once particular pathways associated with apomixis are identified, a scrutiny of the correlation associations within these networks and the physical location of particular candidates within the ACL have the potential to reveal the nature of the genes controlling both apomeiotic transition and parthenogenesis. Therefore, the systematic use of the information provided in this report will contribute to accelerate the discovery of the triggers of apomixis and to the future harnessing of the trait.