Chemical Inference of RNA Structures

Supplementary Data for "Genome-wide profiling of mouse RNA secondary structures reveals key features of the mammalian transcriptome"


The understanding of RNA structure is a key feature toward the comprehension of RNA functions and mechanisms of action. In particular, non-coding RNAs are thought to exert their functions by specific secondary structures, but an efficient annotation on a large scale of these structures is still missing.

By using a novel high-throughput method, named chemical inference of RNA structures (CIRS-seq), that uses dimethyl sulfate (DMS), and N-cyclohexyl- N'-(2-morpholinoethyl)carbodiimide metho-p-toluenesulfonate (CMCT) to modify RNA residues in single-stranded conformation within native RNA secondary structures, we investigated the structural features of mouse embryonic stem cell (ESC) transcripts. Our analysis revealed an unexpected higher structuring of the 5' and 3' untranslated regions (UTRs) compared to the coding regions, a reduced structuring at the Kozak sequence and stop codon, and a three-nucleotide periodicity across the coding region of messenger RNAs. We also observed that ncRNAs exhibit a higher degree of structuring with respect to protein coding transcripts. Moreover, we found that the Lin28a binding protein binds selectively to RNA motifs with a strong preference toward a single stranded conformation.

This work defines for the first time the complete RNA structurome of mouse embryonic stem cells, revealing an extremely articulated RNA structural landscape. These results demonstrate that CIRS-seq constitutes an important tool for the identification of native RNA structures.

Supplementary analysis data/tools:

File      Format   Download   Checksum (MD5)
CIRS Genes Annotation (Ensembl 65, mm9)   .bed     e2f1f6ad2cbd7a8a18e56825776d9142
CIRS Genes Annotation (Ensembl 65, mm9)   .gtf     760c40fed1a896094cbcaf003e458d4e
CIRS E14 Genes Sequences   .fa     34ba2fb2d3e4e9e6400a58730e29b014
CIRS E14 Bowtie v1 Index   .ebwt     c05d61cfaa85dcf344d540d18c77fdc7
CIRS E14 IGV Genome   .genome     b1e59833f33cd1de15cf653a1bbe4f1e
Analysis scripts   .zip     3d1f401aaa58cf5e0d9ab6202d10a357
Raw/Processed data files are available through the Gene Expression Omnibus database (GSE54106)