Internet Electronic Journal of Molecular Design - IEJMD, ISSN 1538-6414, CODEN IEJMAT
ABSTRACT - Internet Electron. J. Mol. Des. August 2003, Volume 2, Number 8, 527-538 |
Artificial Neural Network Method for Predicting Protein Coding
Genes in the Yeast Genome
Chun Li, Ping-an He, and Jun Wang
Internet Electron. J. Mol. Des. 2003, 2, 527-538
|
Abstract:
The rapid growth of DNA sequences data in various DNA
databanks has made analyzing these sequences, especially,
finding genes in them very important, and it is even a more
critical task at present to clarify the number of genes. The
motivation of this paper is to suggest an artificial neural network
method specific for predicting protein-coding genes in the yeast
genome. We first obtain a 12-dimensional vector from a DNA
primary sequence, and then construct a 12×21×1 three-layer
feedforward neural network. After being trained in a supervised
manner with the error back-propagation algorithm by sufficient
samples, the network is examined by the cross-validation test. As
a result, the average absolute error δ
and the average variance σ2
are 0.0084 and 0.0077, respectively, and the accuracy of the
prediction is better than 96%. Based on this, it was found that the
numbers of coding ORFs in the 2nd-6th classes are 393, 189,
803, 924 and 229, respectively. Thus, the total number of protein
coding genes in the yeast genome is equal to 5930 coincident
with a widely accepted range 5800-6000. The names of putative
non-coding ORFs are listed in detail. The results imply that the
current artificial neural network method is a useful computer
technique for predicting protein-coding genes, and can be
extended to find genes with more complicated structures.
|