Bio Chem Press  Internet Electronic Journal of Molecular Design is a refereed journal for scientific papers regarding all applications of molecular design
Home | News | Current Issue | Journal Index | IECMD 2004 | Preprint Index | Instructions for Authors | Send the Manuscript | Special Issue
 BioChemPress.com  To bookmark this site press Ctrl D
 
   Home
   News & Announcements
  Journal Info
   Current Issue
   Journal Index
   Preprint Index
   Editor
   Advisory Board
  Conference Info
   IECMD 2004
   Day 1
   Day 2
   Day 3
   Day 4
   Day 5
   Day 6
   Day 7
   Day 8
   Day 9
   Day 10
   IECMD 2003
  BioChem Links
   CoEPrA
   Support Vector Machines
  Author Info
   Instructions for Authors
   Send the Manuscript
   Special Issue
  Contact
   Editorial Office
   Subscription
   Advertising
   Copyright
  User Info
   Terms of Use
   License

Internet Electronic Journal of Molecular Design - IEJMD, ISSN 1538-6414, CODEN IEJMAT
ABSTRACT - Internet Electron. J. Mol. Des. December 2005, Volume 4, Number 12, 882-910

Highly Correlating Distance-Connectivity Based Topological Indices 3: PCR and PC-ANN Based Prediction of the Octanol-Water Partition Coefficient of Diverse Organic Molecules
Mojtaba Shamsipur, Raoof Ghavami, Bahram Hemmateenejad, and Hashem Sharghi
Internet Electron. J. Mol. Des. 2005, 4, 882-910

Free: Download the paper in PDF format Return to Table of Contents Get Acrobat Reader to view and print the paper

Abstract:
Recently, we proposed some new topological indices (Shamsipur indices) based on the distance sum and connectivity of a molecular graph for use in QSAR/QSPR studies. The aim of this study is to examine the ability of the proposed Sh indices in QSPR study of the n-octanol/water partition coefficients (logP) of a diverse set of organic compounds by means of principal component regression (PCR) and principal component-artificial neural network (PC-ANN) modeling methods combining with two factor selection procedures named eigenvalue ranking (EV), and correlation ranking (CR). Experimental values for the partition coefficient ranging from -0.66 (methanol) to 8.16 (2,2',3,3',4,5,5',6,6'-PCB) have been collected from literature for 379 organic compounds with a wide variety of functional groups containing C, H, N, O, and all halogens. Ten different Sh indices (Sh1 through Sh10) were calculated for each molecule by different combination of the connectivity and distance sum vectors. The Sh topological descriptor data matrix was subjected to principal component analysis for the reduced the dimensionality of a data set and the most significant factors or principal components (PC) were extracted. Both the linear and nonlinear modeling methods were employed for predicting the logP of an extensive set of organic compounds including several structurally diverse groups of compounds (alkanes, alkenes, alkynes, cycloalkanes, cycloalkenes, aliphatic alcohols, ethers, esters, aldehydes, ketones, carboxylic acids, amines, aromatic hydrocarbons, halogenated hydrocarbons and some polychlorinated biphenyls (PCBs)). Principal component regression and PC-ANN were used as linear and nonlinear modeling methods, respectively. Principal component analysis of the Sh data matrix showed that the seven PCs could explain 99.97% of variances in the Sh data matrix. The extracted PCs were used as the predictor variables (input) for PCR and ANN (PN-ANN) models. The ANN model could explain 97.98% of variances in the logP data, while the value obtained from PCR procedures were 80.76%. Indeed, linear (MLR) and nonlinear (MLR-ANN) modelings by the use of original Sh indices were performed for comparison. The respective square of correlation coefficients of the prediction obtained by the MLR, PCR, MLR-ANN and PC-ANN are 0.7431, 0.7857, 0.9377 and 0.9626, and the respective standard errors are 0.783, 0.689, 0.361, and 0.281. Some newly proposed topological indices (Sh indices) has been applied to predict partition coefficient of a large set of organic compounds. The results of this project showed that factor selection by correlation ranking gives superior results relative to those obtained by eigenvalue ranking. PCR analysis of the data showed that proposed Sh indices could explain about 80% of variations in the logP data; while the variations explained by the ANN modeling were more than 96%. These results confirm the suitability of the indices in QSPR analysis of the lipophilicity data. The Sh indices were calculated in a simple and fast manner and, in comparison with some previously reported QSPR models, produced better results.

Free: Download the paper in PDF format Return to Table of Contents Get Acrobat Reader to view and print the paper

Home | News | Current Issue | Journal Index | IECMD 2004 | Preprint Index | Instructions for Authors | Send the Manuscript | Special Issue
Last changes: January 5, 2006 Webmaster
http://www.biochempress.com/
Copyright © 2001-2006 Ovidiu Ivanciuc