Internet Electronic Journal of Molecular Design - IEJMD, ISSN 1538-6414, CODEN IEJMAT
ABSTRACT - Internet Electron. J. Mol. Des. June 2003, Volume 2, Number 6, 392-402 |
Support Vector Machines for Predicting Protein Homo-Oligomers
by Incorporating Pseudo-Amino Acid Composition
Shao-Wu Zhang, Quan Pan, Hong-Cai Zhang, Yong-Hong Wu, and Jian-Yu Shi
Internet Electron. J. Mol. Des. 2003, 2, 392-402
|
Abstract:
Following the success of human genome project, the gap
between sharply increasing the number of protein sequences
entering into data bank and slow accumulation of know
structure is becoming large. Developing a fast and accurate
method to predict the protein properties based on the primary
sequences becomes indispensable. In general, the performance
of the predictive system can be improved by selecting
appropriate algorithm and the fitting method of extracting
feature. Thus a new method of extracting feature (the
weighting pseudo-amino acid composition) from the sequences
has been introduced to predict the protein homo-oligomers,
which is a combination of a set of weighting discrete sequence
correlation factors computed with the amino acid index profile
and the 20 components of the conventional amino acid
composition. We extract four attribute parameter datasets
(COMP, PLIV, FAUJ and MAXF) from the primary sequences
as examples to investigate this problem. The COMP attribute
dataset is composed of amino acid composition, and the PLIV,
FAUJ and MAXF attribute datasets are composed of the amino
acid composition and a set of weighting discrete sequence
correlation factors of corresponding amino acid residue index.
The total accuracies of PLIV, FAUJ and MAXF using support
vector machines (SVM) algorithm are 80.36%, 79.34% and
79.02% respectively in 10 fold cross-validation (10CV) test,
which are 4.59%, 3.57% and 3.25% respectively higher than
that of COMP. Based on the same COMP and PLIV attribute
datasets, the total accuracies of SVM are 33.87% and 18.05%
respectively higher than that of covariant discriminant
algorithm in the jackknife test. These results show that the
method of extracting feature from the protein sequences is
effective and feasible for predicting homo-oligomers, and
implies that the primary sequences of homo-oligomeric
proteins contain quaternary structure information, and also
indicates that the performance of SVM is superior to the
covariant discriminant algorithm for classifying protein homo-oligomers.
|