Internet Electronic Journal of Molecular Design - IEJMD, ISSN 1538-6414, CODEN IEJMAT
ABSTRACT - Internet Electron. J. Mol. Des. January 2007, Volume 6, Number 1, 1-12 |
Similarity Analysis of DNA Sequences based on the LZ Complexity
Jia Wen and Chun Li
Internet Electron. J. Mol. Des. 2007, 6, 1-12
|
Abstract:
Almost all methods for similarity analysis and phylogenetic inference are usually
based on the multiple alignment of sequences or the invariants of sequences. But
the former is not useful to all types of data, e.g. the whole genome comparisons,
while the latter is accompanied by the complex calculation. The motivation of this
paper is to introduce a new approach for similarity analysis of DNA sequences.
We propose a relative distance measure of (0,1)-sequence based on the LZ
complexity to quantify the similarity degree of two different binary sequences. By
transforming a DNA sequence into three binary sequences in term of
classifications of nucleic acid bases, we can obtain the relative distance of
corresponding characteristic sequences of any two DNA sequences. The distance
matrices are thus obtained to reflect the similarities of DNA sequences. A
similarity comparison is made for the 24 complete coronavirus genomes to show
the utility of our method. As a result, we find that the 24 complete coronavirus
genomes can be classified into four groups on the whole. In particular, SARS-CoVs
are not closely related to any of the previously characterized coronaviruses
and form a distinct group within the genus coronaviruses. The result is consistent
with those of previous analyses. On the basis of the findings, we conclude that the
present method has apparently captured important features of DNA sequences
considered and is useful for similarity analysis of DNA sequences.
|