Locally maximizing the Rényi entropies

August 25, 2018*

*Last modified 22-Mar-19

Tags: entropies, visualization

As I was rewriting my website, I found some visualizations I had stored on my old website to show a collaborator, and I figured it was worth writing a little to have a more proper place to put them; hence this post 😊.

Probability distributions on three letters consist just of three non-negative numbers which add up to 1, which we can see as a vector in \(\mathbb{R}^3\). The set of all such distributions form a simplex, which looks like a 2D triangle laying in \(\mathbb{R}^3\):

The simplex of probability distributions on three letters from two perspectives.

We can parametrize such distributions just by their \(x\)- and \(y\)-coordinates, since their \(z\)-coordinate is given by \(z=1-x-y\). This allows us to plot functions \(f\) that vary over the set of probability distributions on three letters in \(\mathbb{R}^3\): for each valid choice of \((x,y)\) coordinates, we plot the number \(f(x,y,1-x-y)\) at the point \((x,y)\). One particular function of probability distributions that I’m interested in is the \(\alpha\)-Rényi entropy. When considering probability distributions on three letters, it’s given by \[ H_\alpha( \vec p) = \frac{1}{1-\alpha}\log(x^\alpha + y^\alpha + z^\alpha) \] where \(\vec p = (x,y,z)\), and \(\alpha \in (0,1)\cup(1,\infty)\) is a parameter. From this function, we can define another function of probability measures, \[ \Delta_\varepsilon(\vec p) = \max_{ \vec q \in B_\varepsilon( \vec p) } H_\alpha (\vec q) - H_\alpha(\vec p), \] where \(B_\varepsilon(\vec p)\) is called the \(\varepsilon\)-ball around \(\vec p\), and consists of all probability measures which are \(\varepsilon\)-close to \(\vec p\) in total variation distance. For example if \(\vec r = (0.21, 0.24, 0.55)\), then \(B_\varepsilon(\vec r)\) is given by the filled purple hexagon in Figure 2:

B_\varepsilon(\vec r) is the purple hexagon.

\(B_\varepsilon(\vec r)\) is the purple hexagon.

It turns out that this maximum is achieved at one unique pointSee arxiv/1706.02212

; for the case before, it’s shown here:

The maximizer of H_\alpha over the ball B_\varepsilon(\vec r) is the unlabelled black point at the bottom of the hexagon.

The maximizer of \(H_\alpha\) over the ball \(B_\varepsilon(\vec r)\) is the unlabelled black point at the bottom of the hexagon.

and we can write down a form for the maximizer. This allows us to plot the value of \(\Delta_\varepsilon\) as it varies over the set of probability distributions, for a given \(\varepsilon\) and \(\alpha\). The quantity \(\Delta_\varepsilon\) is useful for proving continuity boundsSee arxiv/1707.04249

. I’ve included some of these plots of it below.

\(\Delta_\varepsilon\) with \(\varepsilon = 0.1\), for the \(\alpha\)-Rényi entropy with \(\alpha = 0.5\).


\(\Delta_\varepsilon\) with \(\varepsilon = 0.1\), for the \(\alpha\)-Rényi entropy with \(\alpha = 1.5\).


\(\Delta_\varepsilon\) with \(\varepsilon = 0.1\), for the \(\alpha\)-Rényi entropy with \(\alpha = 2.0\).


\(\Delta_\varepsilon\) with \(\varepsilon = 0.1\), for the \(\alpha\)-Rényi entropy with \(\alpha = 3.0\).