-- HUGO --
| - Description - | What is it - | Synopsis - | Download - | Address - |
Descritiontop
HUGO stands for Hierarchical Union of Genes from Operons. HUGO is a computer program
able to detect conserved clusters of genes in Prokaryotic species. It is specialized
in detecting über-operons: a set of genes formed by the union of conserved and similar
operons.
The detected clusters form sets of genes sharing the same function.
The main characteristic of HUGO is that it is able to find gene clusters
common to a subset of the compared genomes. The clusters must not belong to
all the compared genomes. Another important characteristic is that HUGO
authorizes gene duplications.
As an example, HUGO is able to detect the following hierarchical cluster of
genes:
What is it ?top
Genes are organized in operon structures in Prokaryotic genomes but comparing operons from different species doesn't allow to find shared set of genes. The set of operons from one species is usually broken up in another one. Usual methods designed to discover such motifs are based on gene order. Accordingly, these methods don't allow to compare many genomes and even very few phylogenetically-distant ones. The main pitfalls are the strong stringency of their definition and of their genome representation. Each genome is organized in operons but we can't find common structures. Taking this paradox into account, we study another motif definition: über-operons, a higher level of operon conservation.
We propose, HUGO, a new formal definition and implementation of uber-operons. Binaries and sources can be downloaded. Soon available, a graphical user interface which will allow to project uber-operons on genomes.
In Prokaryotes, genes tend to be organized in clusters controlled from a single regulatory site. A cluster of genes, collectively known as an operon, usually produces products with related functions. It's known that operons aren't conserved between all Prokaryotic species. But ancestral traces of them remain in many genomes.
Two orthologous genes may be present in two operons that contain different genes. The main idea of uber-operons is that there is a set of operons that share the same set of genes. Genes from operons yield related functions. Rearrangements break existing operons but they also leads to new ones. And new ones yield related functions. Thus the union of all the operons sharing genes is a set of genes which yields related functions.
An über-operon is a set of genes formed by a maximal set of operons that share orthologous and/or paralogous genes. Let two operons identified by their set of genes in a species: o1,1={1,2,3} and o1,2={4,5,6}. Each set of orthologous genes is represented by an integer. Suppose we have these two others operons: o2,1={1,5,3} and o2,2={4,2,6} in another species. Thus no shared operon can be identified between these two species. But there exists a common motif: the set of operons. This set is an uber-operon whose genes are {1,2,3,4,5,6}.
Synopsistop
-------HUGO-HELP-----------
Detecting clusters program, version 0.1b
usage : ./HUGO {OPTION}
Where OPTION is one of :
-s genome_file [File containing the list of species to load]
-w cog_file [COG file]
-r reference_species [reference species for name of genes (like 'eco')]
-R directory [species directory]
-a edges [alpha parameter / cutting edges]
-d delta
-g gamma
-m [meta-clusters algorithm]
-h gene [hierarchical representation for gene 'gene']
-v [verbose]
-V [really verbose]
---------------------------
-R option :
allow us to specify the directory where are species files (as ordered list
of genes).
-r option :
specify the reference species. It is the species which the name of its genes
will be used to name genes in other species. By default the name is 'eco'
-w option :
The file contening COG (Clusters of Orthologous Genes). This file gives cross
annotation between many species.
-s option :
The file containing the list of species to be loaded.
-a option :
the value od edges not to be pruned (the alpha value).
-h option :
Hierarchical representation of uber-operons.
-m option :
switch from uber-operons to meta-clusters.
-d option :
(specific to meta-clusters) authorized number of genes between to
'adjacent' genes.
-g option :
(specific to meta-clusters) maximum length of pathways.
Downloadtop
| Linux i386 | download |
| Windows (DOS interface) | download |
| C code (tgz) | download |
| C code (zip) | download |
| species (tgz) | download |
| species (zip) | download |
Addresstop
-
LIFL- bâtiment M3
Cité Scientifique
Universite des Sciences et Technologies de Lille
59655 Villeneuve d'Ascq Cedex FRANCE
- Martin Figeac
martin (dot) figeac (at) lifl (dot) fr
http://www.lifl.fr/~figeac