Generalization of entropy based divergence measures for symbolic sequence analysis
Date
2014
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Public Library of Science
Abstract
Entropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of
Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its
sharing properties with families of other divergence measures and its interpretability in different domains including
statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise
because of a number of attributes including generalization to any number of probability distributions and association of
weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical
frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations
and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this
generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD
generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S.
enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced
improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult
to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the
Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In
contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and
Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal
organisms.
Description
Keywords
Entropy, Symbolic Sequence Analysis
Citation
Plos one
Endorsement
Review
Supplemented By
Referenced By
Creative Commons license
Except where otherwised noted, this item's license is described as info:eu-repo/semantics/openAccess