CG-seq: Comparative Genomics-seq

CG-seq is a software pipeline to identify noncoding RNAs in a genomic sequence by comparative analysis and multispecies comparison. It takes as input a genomic sequence (called the target sequence) and a set of other sequences coming from a variety of species to be compared against the target sequences.

The algorithm of CG-seq proceeds in four steps.

  1. Preprocessing. Sequences are preprocessed to mask CDSs, or to remove redundancy between strains coming from the same species (optional).
  2. Alignment. The target sequence is compared to all other sequences to detect similar sequences across species.
  3. Conserved regions. Pairwise alignments are combined into clusters of significantly conserved regions.
  4. RNA structure. Conserved regions are investigated by inspection of evolutionary patterns to select sequences exhibiting a conserved consensus secondary structures.

Downloading and installation

Linux, Mac OS X: CG-seq_linux.tar.gz. CG-seq is distributed under the GPL license.


CG-seq full documentation is available here.

  1. Introduction (this page)
  2. Getting started
    1. Command Line Interface
    2. Graphical User Interface
    3. Sample data
  3. Load data
    1. Load the target sequence
    2. Load other sequences
    3. Provide an output directory
    4. Sequence parameters
  4. Run analysis
    1. Preprocessing sequences
    2. Alignments
    3. Conserved regions
    4. RNA structures
  5. Viewing results
  6. References