Internet Electronic Journal of Molecular Design - IEJMD, ISSN 1538-6414, CODEN IEJMAT
ABSTRACT - Internet Electron. J. Mol. Des. March 2004, Volume 3, Number 3, 143-149 |
New Diversity Criterion and Database Compression Method
Bing Liu, Aijun Lu, Lei Zhang, Haibo Liu, Zhenming Liu, and Jiaju Zhou
Internet Electron. J. Mol. Des. 2004, 3, 143-149
|
Abstract:
Based on the topological scaffold classification approach to cluster
a structural database, we propose a new criterion to evaluate the
diversity of a chemical structural database. This criterion is defined
as the ratio of scaffold number to total structure number in the
database. Six databases have been evaluated by this criterion. To
reduce the size of a database under the minimum losing structural
diversity, a novel effective database compression method has been
developed. The number of selected structures in each compounds
group with common scaffold is determined by empirical K-4-5
rules, and the selected structures are those with the higher
drug-like value (DLV). A validity test has been made by adding 200
new random nonoverlapping natural products into NCI3D, and the
losing percent of that test data set is about 10.5%. Results show
that NCI3D, MNPD (marine natural products database) and
TCMD (traditional Chinese medical database) have 68.7%, 60.3%,
54.4% size reduced respectively by this method.
|