Information Decomposition Diagrams Applied beyond Shannon Entropy: A Generalization of Hu's Theorem

Abstract

In information theory, one major goal is to find useful functions that summarize the amount of information contained in the interaction of several random variables. Specifically, one can ask how the classical Shannon entropy, mutual information, and higher interaction information functions relate to each other. This is formally answered by Hu’s theorem, which is widely known in the form of information diagrams: it relates disjoint unions of shapes in a Venn diagram to summation rules of information functions; this establishes a bridge from set theory to information theory. While a proof of this theorem is known, to date it was not analyzed in detail in what generality it could be established. In this work, we view random variables together with the joint operation as a monoid that acts by conditioning on information functions, and entropy as the unique function satisfying the chain rule of information. This allows us to abstract away from Shannon’s theory and to prove a generalization of Hu’s theorem, which applies to Shannon entropy of countably infinite discrete random variables, Kolmogorov complexity, Tsallis entropy, (Tsallis) Kullback-Leibler Divergence, cross-entropy, submodular information functions, and the generalization error in machine learning. Our result implies for Chaitin’s prefix-free Kolmogorov complexity that the higher-order interaction complexities of all degrees are in expectation close to Shannon interaction information. For well-behaved probability distributions on increasing sequence lengths, this shows that asymptotically, the per-bit expected interaction complexity and information coincide, thus showing a strong bridge between algorithmic and classical information theory.

Publication
arXiv e-prints