miRkwood small RNA-seq


This page is a user manual for the miRkwood small RNA-seq web application.

If this is your first time using miRkwood, we would suggest visiting our quick start guide before.



  1. Input form
    1. Upload your set of reads
    2. Select an assembly
    3. Parameters
      1. Parameters for the processing of the read data
      2. Parameters for the secondary structure of the hairpin precursor
    4. Submit the job
  2. Results page
    1. Overview
    2. Known miRNAs
    3. Novel miRNAs
  3. Export
    1. GFF format
    2. FASTA format
    3. Dot-bracket format
    4. Tabular format (CSV)
    5. Full report in ORG format
    6. Full report in PDF
    7. Reads cloud
  4. HTML report
    1. HTML report for known miRNAs
    2. HTML report for novel miRNAs

Input form

Upload your set of reads

The input is a set of reads produced by deep sequencing of small RNAs and then mapped to a reference genome. For that, you should organize your data in a BED file. See the detailed instructions on how to build this file, or grab a sample file below.

You will find more information on this sample file in our quick start guide.

Select an assembly

miRkwood currently proposes 7 assemblies, which are listed below.

Each assembly is supplemented by two GFF format files. The first one contains the genome coordinates of annotated CDS, tRNAs, rRNAs and snoRNAs, and is used to apply masking options described in Section 1.3 - Parameters. The source of this file is indicated above, case by case. The other GFF file compiles all miRNAs and precursors of miRNAs available in MiRBase 21 and is used to detect known miRNAs that are expressed in the sequencing data.

Parameters

miRkwood comes with a series of options, that allow to customize the search and enhance the results. These options can be divided in two main types: parameters concerning the processing of the sequence reads and parameters concerning the secondary structure of the miRNA precursor. Note that they only apply to novel miRNAs. For known miRNAs, the user can rely on the additional information delivered by miRBase and make up his/her own mind.

Parameters for the processing of the read data

The first step of miRkwood is to locate signals into the set of mapped reads. This is performed by scanning the set of reads and detecting statistically significant clusters of reads. To this end, it is advised to filter out the data beforehand with the following options.

Remove multiply mapped reads: All reads that are mapped to more than 5 loci on the reference sequence are discarded. This allows to avoid spurious predictions due to transposons. Default: checked.

Filter out tRNA/rRNA/snoRNA: When this option is checked, products from tRNA, rRNA and snoRNA degradation are filtered out from the input reads. This task is performed based on the existing annotation provided in the GFF annotation file. All reads that intersect a tRNA, rRNA, or snoRNA feature are removed. Default: checked.

Mask coding regions: This option allows selecting reads that are aligned to non-coding sequences. The selection is performed with the GFF annotation file. All reads that intersect a CDS feature are removed. Default: checked.

Parameters for the secondary structure of the hairpin precursor

After cluster detection, miRkwood aims at determining which sequences can fold into a stem-loop structure. This gives a set of candidate precursors of miRNAs. For each candidate, it is possible to calculate additional criteria that help to bring further evidence to the quality of the prediction and to distinguish accurate miRNA precursors from pseudo-hairpins.

Select only sequences with MFEI < -0.6: MFEI is the minimum folding free energy index, and expresses the thermodynamic stability of the precursor. It is calculated by the following equation:

MFEI = [MFE / sequence length x 100] / (G+C%)

where MFE (minimum free energy) denotes the negative folding free energies of a secondary structure, and is calculated using the Matthews-Turner nearest neighbor model implemented in RNAeval. When checked, this option removes all candidate pre-miRNAs with an MFEI greater than or equal to -0.6. Indeed, more than 96% of miRBase precursors have an MFEI smaller than -0.6, whereas pseudo-hairpins show significantly larger values of MFEI. Default: checked.

Compute thermodynamic stability: The significance of the stability of the sequence can also be measured by comparison with other equivalent sequences. Bonnet et al have established that the majority of the pre-miRNA sequences exhibit a MFE that is lower than that for shuffled sequences. We compute the probability that, for a given sequence, the MFE of the secondary structure is different from a distribution of MFE computed with 300 random sequences with the same length and the same dinucleotide frequency.

Flag conserved mature miRNAs: This option permits to check if the predicted miRNA belongs to some known miRNA family. For that, we compare the sequence of the precursor with the database of mature miRNAs of plant (Viridiplantae) deposited in miRBase (Release 20). We select alignments with at most three errors (mismatch, deletion or insertion) against the full-length mature miRNA and that occur in one of the two arms of the stem-loop. Moreover, this alignment allows to infer a putative location for the miRNA within the precursor. This location is then validated with miRdup, that assesses the stability of the miRNA:miRNA* duplex. Here, it was trained on miRbase Viridiplantae V20.

Submission

Each job is automatically assigned an ID.

Job title. It's possible to identify the tool result by giving it a name.

Email Address. You can enter your email address to be notified when the job is finished. The email contains a link to access the results for 2 weeks.

Results page

Results overview

This page has two main parts. The first one (Options summary) is simply a summary of your job parameters. The other one (Results summary) provides the detailed results.

Total number of reads (unique reads): This is the total number of reads in your initial file. The number of unique reads (obtained after merging identical reads) is indicated in parentheses.

CoDing Sequences: This is the number of reads that have been discarded by the option Mask coding regions. You can list them by clicking on the download link.

rRNA/tRNA/snoRNA: This is the number of reads that have been discarded by the option Filter out tRNA/rRNA/snoRNA. You can list them by clicking on the download link.

Multiply mapped reads: This is the number of reads that have been discarded by the option Remove multiply mapped reads. You can list them by clicking on the download link.

Orphan cluster of reads: A cluster of reads is a short region in the genome that has been enriched with aligned reads. Here we report the number of reads that are not classified as miRNA by miRkwood, but that nevertheless occur in a cluster. You can obtain the list of orphan clusters by clicking on the download link (BED file).

Unclassified reads: Unclassified reads are isolated reads, that do not belong to any cluster, or do not fall in any annotated region.

Known miRNAs: This is the number of loci annotated as microRNA precursors in miRBase that intersect with reads from the BED file. You can display detailed results by clicking on the link see results. See Section 2.2.

Novel miRNAs: This is the number of miRNAs found by miRkwood that have not been previously reported in miRbase. You can display detailed results by clicking on the link see results. See Section 2.2.

Known miRNAs

Known miRNAs are miRNAs that are already present in the miRBase database (version 21). We consider that a known microRNA is found in the data as soon as there is at least one read on the precursor sequence. The quality score helps to determine which are the best candidates.


results table

The list of all known miRNAs found is displayed in a two-way table. Each row corresponds to a pre-miRNA, and each column to a feature. By default, results are sorted by sequence and then by position. It is possible to have them sorted by quality (see definition below). You can view all information related to a given prediction by clicking on the row (see section HTML Report).

Chr: Number of the chromosome.

Position: Start and end positions of the miRNA precursor, as documented in miRBase.

+/- : Strand, forward (+) or reverse (-).

miRNA: Sequence of the miRNA.

Length: Length of the miRNA.

Reads: Number of reads included in the locus.

Quality: This score measures the consistency between the distribution of reads along the locus and the annotation provided in miRbase. It ranges between 0 and 2 stars, and is calculated as follows.

miRBAse name: miRBase identifier.

2D structure: You can drag the mouse over the zoom icon to visualize the stem-loop structure of the pre-miRNA. The image is generated with Varna.

Novel miRNAs

Novel miRNAs are miRNAS that are not reported in miRBase. The prediction is supported by the presence of a stem-loop secondary structure, a significant read coverage and read distribution.


results table

Each row corresponds to miRNA precursor, and each column to a feature. By default, results are sorted by sequence and then by position. It is possible to have them sorted by quality. The quality is the sum of three values: the existence of a miRNA sequence, the score of reads distribution, the value of the MFEI (<-0.8) (see definitions below). You can view all information related to a given prediction by clicking on the row (see section HTML Report).

Chr: Number of the chromosome.

Position: Start and end positions of the putative miRNA precursor in the original sequence in 1 based notation (consistently to the GFF format).

+/-: Strand, forward (+) or reverse (-).

miRNA: Sequence of the miRNA. It is the sequence of the most common read, with a frequency of at least 33%.

Length: Length of the miRNA.

Weight: It is the depth of the read labelled as the miRNA, divided by the number of places in the genome where this read is aligned.

Reads: The total number of reads included in the locus.

Reads distribution: This score, ranging from O to 3-stars, allows to qualify the pattern of reads mapping to a putative microRNA precursor. It aims at determining if this distribution of reads presents a typical 2-peaks profile, corresponding to the guide miRNA and the miRNA* respectively.

Each criterion contributes equally to the overall ranking, and adds one star.

MFEI: This is the minimum folding free energy index. This value expresses the thermodynamic stability of the precursor and is calculated by the following equation:

MFEI = [MFE / sequence length x 100] / (G+C%)

where MFE is the minimum free energy of the secondary structure (computed with RNAeval) When the MFEI is < -0.8, then the value is displayed in purple, indicating a significantly stable hairpin. This MFEI threshold covers 83% of miRBase miRNA precursors, whereas it is observed in less than 13% of pseudo hairpins.

Shuffles (option): proportion of shuffled sequences whose MFE is lower than the MFE of the candidate miRNA precursor (see Compute thermodynamic stability). This value ranges between 0 and 1. The smaller it is, the more significant is the MFE. We report pre-miRNA stem-loops for which the value is smaller than 0.01, which covers more than 89% of miRBase sequences. Otherwise, if the P-value is greater than 0.01, we say that it is non significant, and do not report any value.

Conserved miRNA (option): This cell is checked arobas when an alignment between the candidate sequence and miRBase is found (see Flag conserved mature miRNAs). It is doubled checked arobasarobas when the location of the candidate mature miRNA is validated by miRdup. The alignments are visible in the HTML report.

2D structure: You can drag the mouse over the zoom icon to visualize the stem-loop structure of the pre-miRNA. The image is generated with Varna.

Export

Results, or a selection of them, can be exported to a variety of formats, and saved to a local folder for further analyses.

GFF: General annotation format, that displays the list of positions of pre-miRNA found (see more explanation on Ensembl documentation)

FASTA: This is the compilation of all pre-miRNA sequences found

Dot-bracket notation: This is the compilation of all pre-miRNA sequences found, together with the predicted secondary structure. The secondary structure is given as a set of matching parentheses (see more explanation on Vienna website).

CSV (comma separated value): It contains the same information as the result table, plus the FASTA sequences and the dot-bracket secondary structures. This tabular format is supported by spreadsheets like Excel.

ORG: This is an equivalent of the HTML report, and contains the full report of the predictions.

PDF: This is an equivalent of the ORG report.

Reads cloud: This archive is a compilation of all reads clouds. Each reads cloud is a text file that summarizes all information available for a potential precursor: positions, sequence, secondary structure, existence of an alignment with miRbase, distribution of mapped reads. It can easily be parsed.

HTML report

The HTML report contains all information related to a given predicted pre-miRNA.

HTML report for known miRNAs

>  1:234009-234092,-, stem-loop structure
GAAAUGAUGCGCAAAUGCGGAUAUCAAUGUAAAUCAGGGAGAAGGCAUGAUAUACCUUUAUAUCCGCAUUUGCGCAUCAUCUCU
((.(((((((((((((((((((((.((.(((.((((.(.......).)))).))).)).))))))))))))))))))))).)).

Reads

Locus  : 1:234009-234092
Strand : -

GAAAUGAUGCGCAAAUGCGGAUAUCAAUGUAAAUCAGGGAGAAGGCAUGAUAUACCUUUAUAUCCGCAUUUGCGCAUCAUCUCU
((.(((((((((((((((((((((.((.(((.((((.(.......).)))).))).)).))))))))))))))))))))).)).
         <------miRBase------>                          <------miRBase------>
*********************............................................................... length=21 depth=5
.......*********************........................................................ length=21 depth=2
........................................................*********************....... length=21 depth=1
............................................................*********************... length=21 depth=16

HTML report for novel miRNAs

>  1:234009-234092,-, stem-loop structure
GAAAUGAUGCGCAAAUGCGGAUAUCAAUGUAAAUCAGGGAGAAGGCAUGAUAUACCUUUAUAUCCGCAUUUGCGCAUCAUCUCU
((.(((((((((((((((((((((.((.(((.((((.(.......).)))).))).)).))))))))))))))))))))).)).

The stem-loop structure of the miRNA precursor is also displayed with Varna.

Varna image

Reads

Locus  : 1:234009-234092
Strand : -

GAAAUGAUGCGCAAAUGCGGAUAUCAAUGUAAAUCAGGGAGAAGGCAUGAUAUACCUUUAUAUCCGCAUUUGCGCAUCAUCUCU
((.(((((((((((((((((((((.((.(((.((((.(.......).)))).))).)).))))))))))))))))))))).)).
         <------miRBase------>                          <------miRBase------>
*********************............................................................... length=21 depth=5
.......*********************........................................................ length=21 depth=2
........................................................*********************....... length=21 depth=1
............................................................*********************... length=21 depth=16

Thermodynamics stability

Conservation of the mature miRNA

All alignments with miRBase are reported and gathered according to their positions.

alignment

query is the user sequence, and miRBase designates the mature miRNA found in miRBase. It is possible to access the corresponding mirBase entry by clicking on the link under the alignment. The report also indicates whether the location is validated with miRdup. Finally, we provide an ASCII representation of the putative miRNA within the stem-loop precursor.

hairpin with mature