Internet Electronic Journal of Molecular Design - IEJMD, ISSN 1538-6414, CODEN IEJMAT
ABSTRACT - Internet Electron. J. Mol. Des. September 2005, Volume 4, Number 9, 613-624 |
Finding Protein Coding Genes in the Yeast Genome Based on the Characteristic Sequences
Ping-an He, Chun Li, and Jun Wang
Internet Electron. J. Mol. Des. 2005, 4, 613-624
|
Abstract:
Due to the rapid growth of DNA sequences data in various
databases, the development of accurate algorithms for gene
prediction is of great importance. The motivation of this paper is
to suggest a numerical characterization algorithm specific for
predicting protein-coding genes in the yeast genome. The
characteristic sequences of a DNA sequence are a group of (0,1)
sequences. Each of them is a reduced representation of the given
DNA sequence, and two of them can uniquely reconstruct the
sequence. Based on the numerical description of the
characteristic sequences, a protein coding gene finding algorithm
specific for the yeast genome was suggested. As a result, the
accuracy of the prediction is better than 95%. Based on this, it is
found that the total number of protein coding genes in the yeast
genome is 5897, coincident with 5800-6000, which is widely
accepted. The names of putative non-coding ORFs are listed here
in detail. The results presented in this paper show that this new
method is a useful gene prediction algorithm, and can be
extended to find genes with more complicated structures.
|