Generalization of entropy based divergence measures for symbolic sequence analysis

dc.creatorRé, Miguel A.
dc.creatorAzad, Rajeev K.
dc.date.accessioned2025-10-08T18:17:50Z
dc.date.issued2014
dc.description.abstractEntropy based measures have been frequently used in symbolic sequence analysis. A symmetrized and smoothed form of Kullback-Leibler divergence or relative entropy, the Jensen-Shannon divergence (JSD), is of particular interest because of its sharing properties with families of other divergence measures and its interpretability in different domains including statistical physics, information theory and mathematical statistics. The uniqueness and versatility of this measure arise because of a number of attributes including generalization to any number of probability distributions and association of weights to the distributions. Furthermore, its entropic formulation allows its generalization in different statistical frameworks, such as, non-extensive Tsallis statistics and higher order Markovian statistics. We revisit these generalizations and propose a new generalization of JSD in the integrated Tsallis and Markovian statistical framework. We show that this generalization can be interpreted in terms of mutual information. We also investigate the performance of different JSD generalizations in deconstructing chimeric DNA sequences assembled from bacterial genomes including that of E. coli, S. enterica typhi, Y. pestis and H. influenzae. Our results show that the JSD generalizations bring in more pronounced improvements when the sequences being compared are from phylogenetically proximal organisms, which are often difficult to distinguish because of their compositional similarity. While small but noticeable improvements were observed with the Tsallis statistical JSD generalization, relatively large improvements were observed with the Markovian generalization. In contrast, the proposed Tsallis-Markovian generalization yielded more pronounced improvements relative to the Tsallis and Markovian generalizations, specifically when the sequences being compared arose from phylogenetically proximal organisms.
dc.description.affiliationFil: Ré, Miguel A. Universidad Tecnológica Nacional. Facultad Regional Córdoba. Departamento Ciencias Básicas. Centro de Investigación en Informática para la Ingeniería; Argentina.
dc.description.affiliationFil: Azad, Rajeev K. University of North Texas. Department of Biological Sciences. Department of Mathematics; Unites States of America.
dc.description.affiliationRé, Miguel A. Universidad Nacional de Córdoba. Facultad de Matematica, Astronomía y Física; Argentina.
dc.description.peerreviewedPeer Reviewed
dc.formatpdf
dc.identifier.citationPlos one
dc.identifier.doihttps://doi.org/10.1371/journal.pone.0093532
dc.identifier.urihttps://hdl.handle.net/20.500.12272/13924
dc.language.isoen
dc.publisherPublic Library of Science
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internationalen
dc.rights.holderRé, Miguel; Azad, Rajeev K.
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.rights.usehttps://creativecommons.org/licenses/by/4.0/
dc.sourcePlos one 9(4): 1-11 (2014).
dc.subjectEntropy
dc.subjectSymbolic Sequence Analysis
dc.titleGeneralization of entropy based divergence measures for symbolic sequence analysis
dc.typeinfo:eu-repo/semantics/article
dc.type.versionpublisherVersion

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
Generalization of Entropy Based Divergence Measures for Symbolic Sequence Analysis.pdf
Size:
837.23 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
3.63 KB
Format:
Item-specific license agreed upon to submission
Description: