Internet Electronic Journal of Molecular Design - IEJMD, ISSN 1538-6414, CODEN IEJMAT
ABSTRACT - Internet Electron. J. Mol. Des. May 2002, Volume 1, Number 5, 269-284 |
Structure-Odor Relationships for Pyrazines with Support Vector Machines
Ovidiu Ivanciuc
Internet Electron. J. Mol. Des. 2002, 1, 269-284
|
Abstract:
The flavor class prediction of chemical compounds can be efficiently
performed with structure-odor relationships (SOR), leading to a better
understanding of the mechanism of odor perception. SOR models for
various odor classes were developed with a wide variety of structural
descriptors and statistical equations. We have investigated the
application of support vector machines (SVM) for the classification of
98 tetra-substituted pyrazines representing three odor classes, namely
32 green, 23 nutty, and 43 bell-pepper. The chemical structure of the
pyrazines was encoded by five theoretical descriptors, namely the sum
of electrotopological indices, the number of carbon atoms of the
substituent R2, the charge on the first atom of the substituent
R4 computed with an ab initio method (Hartee-Fock with a
3-21G basis set), and the molecular surface of the
substituents R1 and R3. Three
sets of SVM experiments were performed for the classification of
pyrazines, each one considering the classification of one class of
compounds against the compounds from the remaining two classes.
The SVM models were computed with the dot, polynomial, radial
basis function, neural, and anova kernels. The leave-10%-out
cross-validation results represent the main criterion for selecting the best
SVM model that has the highest prediction power. The results
obtained demonstrate that the SVM classification of pyrazines in
aroma classes depends strongly on the kernel type and various
parameters that control the kernel shape. In general, the neural kernel
gives the worst results. The best predictions were obtained with the
polynomial kernel of degree 2 for the green and bell-pepper classes,
and with the anova kernel (γ = 0.5 and d = 1) for the nutty pyrazines.
The classification of chemical compounds in odor classes with SOR
models can be efficiently made with support vector machines. The
solution of the SVM model is a unique hyperplane that guarantees a
maximum separation between two classes of chemical compounds.
This hyperplane can be computed very fast and represents the solution
of a quadratic programming problem, but the classification results
depend on the kernel type and structural descriptors. The identification
of the optimum predictive kernel and elimination of the overfitted
SVM models requires extensive cross-validation experiments.
|