RED HOT Contributors

 

Genome-wide survey of miRNAs and their evolutionary history in the ascidian, Halocynthia roretzi

0
CREDIT: This post was originally published on this site

61 miRBase metazoan miRNAs are conserved in Halocynthia and approximately half of them may be ascidian or tunicate-specific

To survey the repertoire of miRNAs in

H. roretzi

, we first carried out a BLASTN similarity search (word size 7,

E

-value

H. roretzi

genome, using as input all known mature metazoan miRNAs deposited in miRBase (28,645 entries) [

36

]. We further selected miRNA candidates whose flanking genome sequences passed our filtration criteria on the stem-loop structure and minimum folding free energy (MFE) (see Methods section for details). This identified 61 candidate

H. roretzi

miRNA precursors, belonging to 49 known miRNA families (Fig. 

1

, Table 

1

and Additional file

1

). Figure 

2

shows the stem loops formed by genomic sequences flanking a selection of predicted

H. roretzi

miRNAs.

Fig. 1

Phylogenic survey of the conserved H. roretzi miRNAs in other species. Red squares indicate that the miRNA exists in the corresponding species, Light blue square indicates that the miRNAs has not been reported in the species. 13 highly conserved families (miR-1, miR-15, miR-78, miR-101, miR-125, miR-132, miR-141 (miR-8), miR-155, miR-181, miR-199, miR-375, miR-672 and miR-1473) that were detected in C. robusta were not found in H. roretzi via our method. These miRNAs may exist in H. roretzi, without satisfying the stringent filtering criteria we applied. Alternatively, they may have been lost in the H. roretzi lineage. See details of species names in Additional file 2

Table 1

Conserved miRNAs in H. roretzi

S648_2373-2295:-

let-7

UGAGGUAGUAGGUUGUAUAGUUU

79

-30.4

S93_67805-67726:-

let-7

UGAGGUAGUGGAUUAUGCAGUU

80

-30.1

S93_68361-68283:-

let-7

UGAGGUAGUAGGUUAUAUCAGU

79

-22.6

S93_68526-68445:-

let-7

UGAGGUAGUAGGUUAUGUAGUG

82

-24.6

S93_68698-68609:-

let-7

CUGAGGUAGUAGGUUAUGCAGUU

90

-31.2

S24_284667-284752:+

miR-7

UGGAAGACUAGUGAUUUUGUUGUUC

86

-19.5

S159_72225-72288:+

miR-9

UCUUUGGUUAUCUAGUUUUGUG

64

-20.3

S93_128856-128791:-

miR-29

UAGCACCAUUGGAAAUCGGUC

66

-19.8

S65_9585-9666:+

miR-31

UAUGGCAAGAUGUUGGCAUAGCUGC

82

-34

S298_83663-83598:-

miR-33

AAGUGCAUUGUAGUUGCAUUGCACA

66

-20.8

S160_170587-170662:+

miR-34

AGGCAGUGUAGUUAGCUAGUUG

76

-19.3

S27_318389-318451:+

miR-92

UAUUGCACUUGUCCCGGCCU

63

-18.8

S57_155850-155929:+

miR-92

UAUUGCACUCGUCCCGGUCUAU

80

-21.5

S181_161093-161016:-

miR-96

UUUGGCACUAGCACAUUAUU

78

-25.2

S375_52432-52351:-

miR-124

CGUGUUCACUGCAGACCUU, CAUUAAGGCACGCGGUGAAUGCUAU

82

-30

S375_52641-52557:-

miR-124

AAUUAAGGCACGCGGUGAAUGCCAGA

85

-37.6

S375_60909-60992:+

miR-124

UAUUAAGGCACGCGGUGAAUGCCAAG

84

-37

S60_145419-145497:+

miR-126

CCUUGUUACUUACGGUACC, AUCUCGUACCGUGAGUAAUAAAGCU

79

-38.2

S6_131727-131818:+

miR-133

GCUGGUCAACCGGAACCAAAUC, UUUGGUCCCCUUCAACCAGCUGUU

92

-29.5

S56_118756-118667:-

miR-135

UAUGGCUUUAUUUAUUCCUGUGUGA

90

-32.9

S67_157861-157774:-

miR-153

GUCAUUUUUGUAUUAUGCAA, UUGCAUAGUAACAAAAGUGAUCAU

88

-40.3

S181_160894-160804:-

miR-182

CUUGGCAAAAUAUAGAACUC

91

-36.1

S181_165741-165631:-

miR-183

UAUGGCACUAGUAGAAUUCACUGC

111

-36.3

S602_41811-41890:+

miR-184

UGGACGGAGAAUUGAUAAGGAA

80

-30.4

S54_165680-165756:+

miR-196

UAGGUAGUUACAAGUUGUGG

77

-25.6

S353_78598-78518:-

miR-200

UAAUACUGCUUGGUAAUGAUGAU

81

-24.5

S244_68891-68803:-

miR-216

UAAUCUCAGCUGGCAAUCUGUGA

89

-35.3

S244_68624-68517:-

miR-217

AUACUGCAUUAGGAACUGAUUGGU

108

-30.3

S248_20812-20705:-

miR-218

UAUGUGCUUUGAUCUAACCAUGU

108

-34.9

S21_373535-373467:-

miR-219

UGAUUGUCCAAACGCAAUUCGCG

69

-19.3

S11_93983-94061:+

miR-281

UGUCAUGGAGUUGCUCUCUUAUU

79

-24.3

S99_176620-176695:+

miR-367

UAUUGCACAUUGUAAUGGUA

76

-29.3

S291_113322-113388:+

miR-1497

UUGAAGAAUUGCAGGUGGUAGGU

67

-23.2

S290_113331-113406:+

miR-1502

UUGAACUUUCUAAGGAAUAG

76

-30.2

S19_110094-110150:+

miR-3182

GCUUUUGUAGUUUAGUC

57

-20.4

S210_48343-48421:+

miR-3598

UCACAGUGGUUGUAUACUGC

79

-42.1

S176_129704-129781:+

miR-3876

GUUUUGUUUUAACACUUAC

78

-22.4

Fig. 2

Stem-loop structure of some conserved known miRNA genes of H. roretzi. Red upper-case letters indicate mature miRNA sequences

The phylogenetic distribution of these H. roretzi 49 miRNA families in miRBase was next examined (Fig. 1 and Additional file 2). 25 families were highly conserved across metazoa, including let-7 and miR-7 to -367 (Fig. 1). Of these, 18 families were found in both deuterostomes and protostomes and may thus represent ancestral metazoan miRNAs. Seven families were exclusively found in deuterostomes, in either only chordates (6) or in both chordates and ambulacraria. We attribute the absence of miR-218 from Ciona and Oikopleura to the possible restricted expression of this miRNA, which may have precluded its identification by miRNA-seq. Interestingly, an ancestral metazoan miRNA, miR-281, appears to have been specifically lost from the vertebrate lineage, as it is present in all surveyed tunicates, amphioxus, and protostomes. The loss in echinoderms is not clear, since the number of species is only three.

Twenty one families were found in the distantly related C. robusta and H. roretzi ascidians but not in other animals (labeled in green on Fig. 1), and may thus correspond to ascidian or tunicate-specific miRNAs. These miRNAs are all represented by at least 10 reads in the Ciona small RNA sequencing dataset [32] (BLASTN, word size of 15, and E-value ≤1000), suggesting that they are expressed during C. robusta embryogenesis and are therefore likely to be genuine miRNAs. Finally, three miRNA families, miR-3182, miR-3876 and miR-7238 were only found in H. roretzi and a single non-tunicate species (Fig. 1, right most three columns). Our confidence in the predictions of these miRNAs is more limited.

To test whether the small number of candidate Halocynthia miRNAs conserved across metazoa reflected a low sensitivity in our identification method, or the overall weak conservation of ascidian miRNAs, we checked the situation in C. robusta. A total of 348 C. robusta miRNAs, belonging to 285 families, were previously identified and deposited in miRBase. These miRNAs were predicted from miRNA-seq data collected from Ciona embryos at the gastrula and larval stages, using the miRTRAP computational program [32], a method that makes no hypothesis on the evolutionary conservation of these candidates. Of these 348 miRNAs, only 47 miRNAs, belonging to 36 families, were widely conserved in many non-tunicate metazoan species (Additional file 3). An additional six miRNAs (Cirobu-mir-1473, Cirobu-mir-1497, Cirobu-mir-1502a, Cirobu-mir-1502b, Cirobu-mir-1502c, Cirobu-mir-1502d) belonging to three families were found in at least one tunicate species other than C. robusta (miRNA data had so far been described in three tunicate species, C. robusta [2932], Ciona savignyi, 43 miRNAs in miRBase [32, 36, 37], and Oikopleura dioica, 69 miRNAs reported, [36, 37]). miRNA candidate Cirobu-mir-3575 was also found in Rattus norvegicus. The remaining 294 C. robusta candidate miRNAs (belonging to 239 families) appeared to be specific for C. robusta.

The evolutionary analysis of these Ciona miRNAs, and the small number of Halocynthia candidate miRNA detected by conservation to miRBase entries suggest that a majority of ascidian miRNAs may be either ascidian or tunicate-specific. Discriminating between these two possibilities is currently difficult as the number of miRNA reported so far in the non-ascidian tunicate Oikopleura dioica (n = 69) are small, suggesting that the list could be far from complete. Similarly, the current repertoire of Ciona savignyi miRNA (n = 43) is incomplete, explaining the small number of the miRNAs for this species listed in Fig. 1.

De novo miRNAs prediction

To extend our study of the

Halocynthia

miRNA repertoire, we next used srnaloop [

38

], to detect potential miRNA precursors on the sole basis of the presence of a canonical stem-loop structure [

16

,

39

]. Genomic sequences that met minimum folding free energy (MFE) and stem-loop structure filtration criteria were considered as potential novel miRNA precursors. Clustering of these candidate sequences using CD-HIT (sequence identity threshold: 0.9) [

40

] identified some clusters with high similarity. Some of these clustered sequences were mapped to repeated sequences, which could form palindromes and confound srnaloop. These sequences were excluded from the final predictions. This approach predicted 268 miRNA candidates including 42 that were found in the previous section (Fig. 

3

, Additional file

1

, miR-5000 to -5257). These novel candidates showed no significant hit by BLASTX (E-value ≤ 1e-3) with proteins in nr and UniProtKB/Swiss-Prot database confirming that they are non-coding RNAs.

Fig. 3

Numbers of miRNAs predicted by three different methods. Numbers of miRNAs that were found in this study are presented in the Venn diagram. There is significant overlap between miRNAs that were found by the three kinds of approaches

miRNA precursors are transcribed from only one of the double-stranded DNA template. Which strand the miRNA precursors originated from was inferred in 291 cases from published transcript information (see Method section). Prediction of the mature miRNA sequences for

de novo

predicted miRNAs is difficult and moderately reliable as the precise mechanism by which Dicer cuts miRNAs from the hairpin structure remains unknown. Li

et al

[

7

] reported an inference method for mature miRNA locations by supposing that Dicer precisely cuts the mature miRNA at the loop-stem junction. We note, however, that the majority of miRBase miRNAs do not obey this rule. Furthermore, 5p and 3p miRNAs are localized to the two parallel arms of the hairpin structure, and are not exactly complementary in sequence (see Fig. 

2

). So we tentatively predict the positions of 5p and 3p miRNAs by detecting paired segments using patscan although we do not know which of 5p and/or 3p is the actual miRNA [

13

,

41

,

42

]. For those precursors, in which patscan failed to find the paired sequences, a modified Li’s method was used to find the 5p and 3p segments. The sequence of predicted mature miRNAs are presented in Fig. 

4a and b

.

Fig. 4

Stem-loop structure of some novel miRNAs genes. a miRNAs detected with high abundance (≥50 reads) in C. robusta small RNA-Seq data. b miRNAs observed with low abundance (≤10 reads) in small RNA-Seq data. Red upper-case letters indicate inferred mature miRNA sequences

Analysis of C. robusta small RNA-Seq data confirms de novo miRNA predictions

As the first test of our de novo predictions of Halocynthia miRNAs, we next made use of a previously generated set of C. robusta small RNA-Seq reads [32]. These reads, which are enriched in miRNAs, but could also include other classes of small non-coding RNAs [43], were mapped onto the H. roretzi reference genome using BLASTN (see Methods), and hits flanked by sequences whose minimum folding free energy (MFE) and stem-loop structure passed our filtration criteria were selected for further analysis (see Methods section for details). This identified 268 novel candidate miRNAs. Remarkably this set included 236 of the 268 candidate miRNA predicted by our de novo approach, of which 38 were also detected using miRBase data (Fig. 3; Additional file 1; ID numbers of the novel H. roretzi miRNAs that were detected in these ways are from 5000 to 5229). Figure 4a shows the stem-loop structures of a selection of these novel miRNA genes, with high small RNA-Seq reads (≥50 reads) support. Figure 4b showed stem-loop structures of miRNA genes with weak small RNA-Seq reads (≤10 reads) support, which represents low confidence predictions. The mature sequences of these H. roretzi miRNAs were inferred via the matched position of C. robusta small RNA-Seq reads (Fig. 4, red letters).

Interestingly mature miRNA sequences appear to have diverged significantly between Ciona and Halocynthia. Of the 230 novel predicted mature H. roretzi miRNAs (268 hits minus 38 identified in Fig. 3), Hrore-miR-5008 was the only one with less than two mismatches to the 458 C. robusta mature miRNAs predicted by Keshavan et al. [31]. When four mismatches were allowed, only five hits were detected, Harore-miR-5008 (Cirobu-miR-13a), Harore-miR-5046 (Cirobu-miR-200b), Harore-miR-5154_5p (Cirobu-miR-244), Harore-miR-5165 (Cirobu-miR-246), and Harore-miR-5214 (Cirobu-miR-246).

In addition to 458 C. robusta mature miRNAs, we reexamined whether the 230 novel Halocynthia miRNAs have homologues in the entire C. robusta genome (http://ghost.zool.kyoto-u.ac.jp/download_kh.html). When two mismatches were allowed in the mature sequences, 168 H. roretzi miRNAs were found to have a homolog in the C. robusta genome (BLASTN top 500 hits, word size of 7, and an alignment length of ≥20). When four mismatches were accepted, however, 225 out of 230 (97.8%) H. roretzi miRNAs were found to have a homolog in the C. robusta genome and 66 (29.3%) of these 225 precursors could form canonical stem-loops when the temperature parameters of RNAfold were adjusted to 18 °C [31] (Additional file 1, right most column). We have analyzed positions of the mismatches. 24.5% occurred in seed region (base 2–7) and 75.5% occurred outside of the seed region, suggesting that mismatches distribute evenly over entire miRNAs as base number of seed region is only six out of 20–24 nucleotide. These results suggest that, although the sequences of mature ascidian miRNAs may diverge rapidly, C. robusta and H. roretzi may share more homologous miRNAs than expected from the results of the previous section.

The union of the miRNAs predicted by all approaches consists of 319 candidate miRNAs in the Halocynthia genome: 61 well-conserved miRNAs, 226 de novo predicted miRNAs and 32 additionally predicted from Ciona small RNA-Seq data. These predictions largely overlap (Fig. 3).

Validation of potential miRNAs prediction

Expression of some of the predicted miRNAs was validated by RT-PCR using embryonic RNA. The primers used are listed in Additional file

4

. RNA from mixed stages of embryogenesis was purified. In total, 20 miRNA candidates were tested, and bands of adequate size were detected for 10 miRNAs (Fig. 

5

). Sequencing of these 10 PCR bands confirmed their identity. No bands were amplified in RT negative samples. In the 10 well-conserved miRNAs (Table 

1

; most of them are also shown in Fig. 

2

), miR-92a, miR-153, miR-96 and miR-367 showed clear bands of ~40 bp (Fig. 

5a

). We also examined 5 potential miRNAs that were predicted

de novo

and from

C. robusta

small RNA-Seq reads (Fig. 

5b

, some of them are shown in Fig. 

4a

). miR-5001, miR-5004 and miR-5008 showed a clear band at ~40 bp. In addition, we found that the faint bands of miR-5003 and 5014 in Fig. 

5b

also contained the expected sequences. Three miRNAs in miRBase that are conserved only between

H. roretzi

and

C. robusta

(miR-4034, miR-4029 and miR-4123) gave clear bands (Fig. 

5c

). These results indicate the embryonic expression of at least half of predicted miRNAs. The other predicted miRNAs may either correspond to artefactual prediction, or not have been amplified because the position of their predicted mature sequence may be erroneous resulting in inadequate primer choice, or because these miRNAs are not expressed during embryogenesis.

Fig. 5

RT-PCR validation of miRNAs. a RT-PCR results of some well-conserved miRNAs listed in Table 1. b RT-PCR validation of some potential miRNAs that were predicted de novo and by using C. robusta small RNA-Seq reads. c Ascidian or tunicate-specific miRNAs (miR-4034, 4029 and 4123). ‘ + ’ indicates PCR with reverse transcriptase; ‘-’ indicates PCR without reverse transcriptase. The expected size of the amplified fragment is ~40 bp (arrowheads) because the 19 bp universal primer is amplified together with ~20 bp miRNA-specific primers. Other bands would be non-specific bands

Potential target prediction of the miRNAs

To get insight into the functions of the miRNAs, 3′ UTR sequences were extracted from each gene model in Aniseed [19] (http://www.aniseed.cnrs.fr/). Then, targets were tentatively predicted. A total of 3451 possible target sites in putative 3′ UTR sequences, which correspond to 17% of coding genes (2734 genes out of approximately 16,000 gene models), were detected for 275 miRNAs. Among those, 285 target genes of 140 miRNAs have gene ontology (GO) terms associated with various development processes. Although the functional validation of these targets goes beyond the scope of this article and these targets have not been functionally validated, information on their identity, provided as a list in Additional file 5, may be useful for future studies of miRNA functions.

About

We Support OUR Contributors

Get Our Newsletter

 Receive podcast updates
Exclusive insights
Patient Engagement Tips from industry experts
We hate SPAM as much as you do and promise to keep your email address safe.
  • Subscribe to the Podcast