mreps

What is mreps

mreps is a flexible and efficient software for identifying serial repeats (usually called tandem repeats) in DNA sequences. It was developed at LORIA in Adage group and is currently maintained at LIFL by Sequoia team.

See a mini-tutorial of mreps for more explanations on what mreps is looking for.

The following paper describes mreps 2.5 as well as some case examples of its application to genomic studies. Please cite this paper when referring to mreps.

[1] R. Kolpakov, G. Bana, and G. Kucherov, mreps: efficient and flexible detection of tandem repeats in DNA, Nucleic Acid Research, 31 (13), July 1 2003, pp 3672-3678.

Combinatorial algorithms implemented in mreps have been presented in the following publications.

[2] R. Kolpakov, G. Kucherov, Finding maximal repetitions in a word in linear time, 1999 Symposium on Foundations of Computer Science (FOCS), New-York (USA), pp. 596-604, IEEE Computer Society

[3] R. Kolpakov, G. Kucherov, Finding approximate repetitions under Hamming distance, Theoretical Computer Science, 2003, vol 303 (1), pp 135-156. An extended abstract appeared in the 9th European Symposium on Algorithms(ESA 2001), Aarhus, Denmark, 2001

Current version


Current version is mreps 2.5 (binaries available for linux, windows, mac os x).
An old distribution mreps 2.1 is still available (this version has the option of treating the general ascii alphabet, and therefore can still be useful).

Some features of mreps 2.5


Mixed combinatorial/heuristic approach
mreps 2.5 is based on a mixed combinatorial/heuristic paradigm. The core of mreps is constituted by exhaustive combinatorial algorithms (described in [2,3]) used to find all repeats verifying certain mathematical properties. This insures the exhaustivity of the approach. Those repeats are then submitted to an heuristic treatment in order to obtain more biologically relevant representation of the repeats. A description of mreps 2.5 can be found in [1].

Identifying "fuzzy" repeats
mreps 2.5 has a resolution parameter that allows to compute "fuzzy" repeats. In metaphoric terms, this parameter plays the role of "magnifying glass" allowing to "zoom out" the genomic sequence in order to compute more loose repeats.

Efficiency
mreps has no limitation whatsoever on the pattern size (size of the repeated unit) of computed repeats -- repeats of all possible pattern sizes can be computed within a single program run. Moreover, depending on the resolution parameter, this run is very fast: for low resolution values processing sequences of dozens of millions bases takes only several seconds on a regular PC.

Limitations
mreps algorithm does not deal with indels (insertions/deletions of nucleotides), but only with substitutions. As a result, indels are treated in an indirect way, and certain repeats containing indels may be missed.

Download and use mreps

Credits

The following people contributed to mreps: Ghizlane Bana, Mathieu Giraud, Liliana Ibanescu, Roman Kolpakov, Gregory Kucherov, Ralph Rabbat