Supplementary Data for "Genome-wide profiling of mouse RNA secondary structures reveals key features of the mammalian transcriptome"
The understanding of RNA structure is a key feature toward the comprehension of RNA functions and mechanisms of action. In particular, non-coding RNAs are thought to exert their functions by specific secondary structures, but an efficient annotation on a large scale of these structures is still missing.
By using a novel high-throughput method, named chemical inference of RNA structures (CIRS-seq), that uses dimethyl sulfate (DMS), and N-cyclohexyl- N'-(2-morpholinoethyl)carbodiimide metho-p-toluenesulfonate (CMCT) to modify RNA residues in single-stranded conformation within native RNA secondary structures, we investigated the structural features of mouse embryonic stem cell (ESC) transcripts. Our analysis revealed an unexpected higher structuring of the 5' and 3' untranslated regions (UTRs) compared to the coding regions, a reduced structuring at the Kozak sequence and stop codon, and a three-nucleotide periodicity across the coding region of messenger RNAs. We also observed that ncRNAs exhibit a higher degree of structuring with respect to protein coding transcripts. Moreover, we found that the Lin28a binding protein binds selectively to RNA motifs with a strong preference toward a single stranded conformation.
This work defines for the first time the complete RNA structurome of mouse embryonic stem cells, revealing an extremely articulated RNA structural landscape. These results demonstrate that CIRS-seq constitutes an important tool for the identification of native RNA structures.
Supplementary analysis data/tools:
|CIRS Genes Annotation (Ensembl 65, mm9)||.bed||e2f1f6ad2cbd7a8a18e56825776d9142|
|CIRS Genes Annotation (Ensembl 65, mm9)||.gtf||760c40fed1a896094cbcaf003e458d4e|
|CIRS E14 Genes Sequences||.fa||34ba2fb2d3e4e9e6400a58730e29b014|
|CIRS E14 Bowtie v1 Index||.ebwt||c05d61cfaa85dcf344d540d18c77fdc7|
|CIRS E14 IGV Genome||.genome||b1e59833f33cd1de15cf653a1bbe4f1e|
|Raw/Processed data files are available through the Gene Expression Omnibus database (GSE54106)|