Internet Electronic Journal of Molecular Design - IEJMD, ISSN 1538-6414, CODEN IEJMAT
ABSTRACT - Internet Electron. J. Mol. Des. December 2005, Volume 4, Number 12, 835-849 |
Support Vector Machines for Prediction of Mechanism of Toxic Action from
Multivariate Classification of Phenols Based on MEDV Descriptors
Zhong-Sheng Yi and Shu-Shen Liu
Internet Electron. J. Mol. Des. 2005, 4, 835-849
|
Abstract:
Phenols are widely used in agriculture as biocides and disinfectants
and in various industries. Most synthetic phenolic compounds are
toxic and are classified as hazardous pollutants. Their mechanism
of toxic action (MOA) classes are usually predicted by quantitative
structure-activity relationships (QSAR) models. In this study, we
report the support vector machine (SVM) model for identifying
four MOA of phenols. The structures of 221 phenols were
described by the molecular electronegativity distance vector
(MEDV). The SVM algorithm with one-against-one multi-class
classification method was used to construct the QSAR models for
four MOA classes (polar narcotics, weak acid respiratory
uncouplers, precursors to soft electrophiles, and soft electrophiles).
The predictive power of each model was estimated by leave-one-out
(LOO) cross validation method. In order to find MOA
classifiers with high predictive power, we have investigated 345
SVM models generated from two SVM methods and two kernels
including linear and radial basis function (RBF). The key factors
affecting the quality of SVM models are kernel type, its
corresponding parameters that control the kernel shape, and the
capacity parameter C. We used a RBF kernel with γ = 0.0004 and a
capacity parameter C = 128, which has the highest accuracy index
for leave-one-out cross-validation. The accuracy index for all 221
compounds (with 13 compounds misclassified) is 94.1%. To test
the stability of this SVM model, we have uniformly chosen 155
from all 221 compounds for training, and the remaining
compounds were included in the test set. The training set was used
to construct a new SVM model with the parameters of γ = 0.0004
and C = 128. It has been shown that 16 compounds (8 in the
training set and 8 in testing set) were misclassified, which gives an
accuracy index of 92.8%. These results show that the SVM model
has a high quality for predicting the aquatic toxicity mechanism for
new chemical compounds, when appropriate SVM parameters and
molecular descriptors are used. The SVM method based on MEDV
descriptors allows satisfactory classification of phenols with
respect to four MOA which are based on experimental toxicity to
the ciliate Tetrahymena pyriformis. This approach can be used to
predict the aquatic toxicity mechanism and to select the
appropriate QSAR model based on MEDV descriptors for new
phenolic compounds.
|