| 
 Internet Electronic Journal of Molecular Design - IEJMD, ISSN 1538-6414, CODEN IEJMAT
  
| ABSTRACT - Internet Electron. J. Mol. Des. April 2002, Volume 1, Number 4, 203-218 |  
 Support Vector Machine Classification of the Carcinogenic Activity of Polycyclic
 Aromatic Hydrocarbons
 
Ovidiu Ivanciuc 
Internet Electron. J. Mol. Des. 2002, 1, 203-218 
  |   
Abstract: 
Structure-activity relationships (SAR) can be efficiently used to predict the carcinogenic
 hazard of new chemicals, before producing them on a large scale or even before
 synthesizing them. SAR models that detect potential carcinogens can also supplement
 short-term tests of genotoxicity, long-term tests of carcinogenicity in rodents, or
 epidemiological evidence in humans. Support vector machine (SVM) is an efficient
 classification algorithm that can provide highly predictive SAR models for the
 carcinogenic hazard. We have applied the SVM model to identify the carcinogenic
 activity of 46 methylated and 32 non-methylated polycyclic aromatic hydrocarbons
 (PAH). The PAH chemical structure was encoded by four theoretical descriptors
 computed with PM3, namely the energy of the highest occupied molecular orbital EHOMO,
 the energy of the lowest unoccupied molecular orbital ELUMO, the hardness HD, and the
 difference between EHOMO and EHOMO-1. A wide range of SVM experiments were
 performed using the dot, polynomial, radial basis function, neural, and anova kernels.
 The results obtained for the classification of PAH carcinogenicity demonstrate that the
 performances of SVM depend strongly on the kernel type and various parameters that
 control the kernel shape. The best prediction results were obtained with the radial basis
 function kernel with γ = 0.5, the anova kernel with γ = 0.5 and d = 1, and the anova
 kernel with γ = 0.5 and d = 2. In the first case, from 34 carcinogenic compounds, 28 were
 correctly classified, while from 44 non-carcinogenic compounds, 40 were correctly
 classified. SAR models for predicting the carcinogenic hazard can benefit from the use of
 support vector machines, which determine a maximum separating hyperplane between
 carcinogenic and non-carcinogenic compounds. The solution of the SVM model is a
 unique hyperplane which can be computed very fast, but the classification results heavily
 depend on the kernel type and structural descriptors. Extensive cross-validation tests
 should be made to find the kernel with the optimum predictive power.
 
  
  |