About EGPRED Server
[HOME] [SUMBISSION FORM] [CONTACT] [TEAM] [UPDATES] [HELP] [RESULTS] [About EGPRED]
DNAHome Page

Combinatorial approaches that use both homology search and pattern recognition is a fruitful method for more accurate and encompassing gene prediction. This is based on the fact that different algorithms takes advantage of various measures to score partial features of genes. Since programs consider the gene features in different ways, predictions for same sequence by different programs are often not identical. Even a overall worst predictor may be the best predictor for certain situations or sequences. It has been suggested that accuracy of gene prediction can be improved if the outputs of several programs can be combined in a proper way (Murakami and Takagi, 1998).

Murakami and Takagi, 1998, proposed five different ways of combining different gene prediction programs. These are the

  1. AND- method
  2. OR- method
  3. HIGHEST- method
  4. RULE- method, and
  5. BOUNDARY- method

They used these methods to combine predictions from four different programs, GENSCAN, FEXH, GeneParser, and GRAIL. Using these combinatorial methods, Murakami and Takagi, (1998) have developed a server program (Shirokane system) and a client program (GeneScope). It is available at http://gf.genome.ad.jp/).

Recently, Rogic et al, (2002) have tried another approach in which they have integrated the 'AND-' and 'OR-' method of combination, using the exon scores given by the programs. Instead of combining many programs as Murakami and Takagi had done, they tried their combination only on the two most successful gene prediction programs as evaluated earlier (Rogic et al, 2001). Rogic et al., 2002 developed three methods for combining gene prediction from two ab initio programs. They developed a server GeneComber implementing 3 methods; EUI (Exon Union-Intersection) Method, EUI-frame (Exon Union-Intersection with Reading Frame Consistency) Method, and GI (Gene Intersection) Method. (rogic). All the three methods resulted in increase of Sensitivity and Specificity with regards to Nucleotide level accuracy of gene finding. However, significant improvement of Exon level accuracy was also noted through the implementation of these methods (Rogic et al, 2002).


Rogic et al, 2002 developed the following three methods for combining the results from two most successful gene prediciton programs, GENSCAN and HMMgene.

  1. EUI (Exon Union-Intersection method):

    This method uses both the AND- and OR- method described by Murakami and Takagi, 1998. All exons that have exon probability scores above a given threshold is considered using the OR- method. Next for all exons having score below thresold will be considered only if predicted by both programs. Consequently, a GENSCAN or HMMgene exon that does not overlap any exon predicted by the other program will be accepted if its exon probability is greater or equal to threshold score and refused otherwise. An additional criteria included into the method is the initial exon rule wherein if GENSCAN's internal exon has same right boundary as HMMgene's intial exon and both have score above threshold, choose HMMgene's prediction as EUI prediction.

  2. EUI-frame (Exon Union-Intersection with Reading Frame Consistency method):

    This method uses the above described EUI method while maintaining the 'Reading Frame Consistency'. Gene boundaries are determined for each program's prediction and a probability score is assigned as average of exon probability score of all exons in that gene. For each predicted exon the acceptor and donor site is determined. For any overlapping prediction choose one woth higher gene probability to impose reading frame. Now EUI method is applied to selected genes and accepting exons only if they are in chosen reading frame.

  3. GI (Gene Intersecton):

    This approach is for identification of genes in long genomic regions so as to consider only exons that are predicted by both programs. The first step is to select those regions predicted as genes by each program and next is to apply the EUI method to those exons that completely belong to GI genes.



Supplementary Information

Evaluation of combinatorial gene-finding methods was carried out using the HMR195 dataset. Details of the evaluation are reported elsewhere.