Internet Electronic Journal of Molecular Design - IEJMD, ISSN 1538-6414, CODEN IEJMAT
ABSTRACT - Internet Electron. J. Mol. Des. August 2002, Volume 1, Number 8, 374-387 |
Tailored Similarity Spaces for the Prediction of Physicochemical Properties
Brian D. Gute, Subhash C. Basak, Denise Mills, and Douglas M. Hawkins
Internet Electron. J. Mol. Des. 2002, 1, 374-387
|
Abstract:
In the past, molecular similarity spaces have been developed from
arbitrary sets of molecular properties or theoretical descriptors and
the results of property estimation based on these methods have
always been inferior to SAR and QSAR models. Tailored QMSA
methods attempt to create similarity spaces specific for a property
of interest, rather than being purely arbitrary spaces characterizing
the general aspects of all chemicals within the space or intuitively
selected structure spaces whose elements are chosen subjectively.
To this end, we have created three similarity spaces, two tailored
and one non-tailored, for a set of 166 chemicals for which we have
both log P and normal boiling point (BP) data. The tailored spaces
were each tailored to one of the properties, while the other
similarity space was developed using standard QMSA methods.
Ridge regression was used to determine which of the available
molecular descriptors were most useful in modeling each of the
available properties. Fifteen topological descriptors were selected
for use as dimensions within each the tailored similarity spaces. The
same number of principal components were developed using
principal component analysis for the arbitrary similarity space.
The log P tailored similarity space was superior to both the
arbitrary structure space and the BP tailored space for the
estimation of log P. Also, the BP tailored similarity space was
superior to the arbitrary structure space for the estimation of BP.
Interestingly, the space tailored to model log P performed as well at
modeling BP as did the BP tailored space. This unexpected result is
explained by the degree of overlap between the indices used in both
of the tailored spaces and in the presence of connectivity indices
related to BP in the log P model.
The tailored similarity method presents a promising approach to
creating property specific similarity spaces derived from structural
descriptors based on the results of this study and from a previous
study. Further work is necessary to determine to true utility of this
method with large, diverse data sets.
|