Locally maximizing the Rényi entropies

August 25, 2018*

*Last modified 22-Mar-19

Tags: entropies, visualization

As I was rewriting my website, I found some visualizations I had stored on my old website to show a collaborator, and I figured it was worth writing a little to have a more proper place to put them; hence this post 😊.

Probability distributions on three letters consist just of three non-negative numbers which add up to 1, which we can see as a vector in R3\mathbb{R}^3. The set of all such distributions form a simplex, which looks like a 2D triangle laying in R3\mathbb{R}^3:

The simplex of probability distributions on three letters from two perspectives.

We can parametrize such distributions just by their xx- and yy-coordinates, since their zz-coordinate is given by z=1xyz=1-x-y. This allows us to plot functions ff that vary over the set of probability distributions on three letters in R3\mathbb{R}^3: for each valid choice of (x,y)(x,y) coordinates, we plot the number f(x,y,1xy)f(x,y,1-x-y) at the point (x,y)(x,y). One particular function of probability distributions that I’m interested in is the α\alpha-Rényi entropy. When considering probability distributions on three letters, it’s given by Hα(p)=11αlog(xα+yα+zα) H_\alpha( \vec p) = \frac{1}{1-\alpha}\log(x^\alpha + y^\alpha + z^\alpha) where p=(x,y,z)\vec p = (x,y,z), and α(0,1)(1,)\alpha \in (0,1)\cup(1,\infty) is a parameter. From this function, we can define another function of probability measures, Δε(p)=maxqBε(p)Hα(q)Hα(p), \Delta_\varepsilon(\vec p) = \max_{ \vec q \in B_\varepsilon( \vec p) } H_\alpha (\vec q) - H_\alpha(\vec p), where Bε(p)B_\varepsilon(\vec p) is called the ε\varepsilon-ball around p\vec p, and consists of all probability measures which are ε\varepsilon-close to p\vec p in total variation distance. For example if r=(0.21,0.24,0.55)\vec r = (0.21, 0.24, 0.55), then Bε(r)B_\varepsilon(\vec r) is given by the filled purple hexagon in Figure 2:

B_\varepsilon(\vec r) is the purple hexagon.

Bε(r)B_\varepsilon(\vec r) is the purple hexagon.

It turns out that this maximum is achieved at one unique pointSee arxiv/1706.02212

; for the case before, it’s shown here:

The maximizer of H_\alpha over the ball B_\varepsilon(\vec r) is the unlabelled black point at the bottom of the hexagon.

The maximizer of HαH_\alpha over the ball Bε(r)B_\varepsilon(\vec r) is the unlabelled black point at the bottom of the hexagon.

and we can write down a form for the maximizer. This allows us to plot the value of Δε\Delta_\varepsilon as it varies over the set of probability distributions, for a given ε\varepsilon and α\alpha. The quantity Δε\Delta_\varepsilon is useful for proving continuity boundsSee arxiv/1707.04249

. I’ve included some of these plots of it below.

Δε\Delta_\varepsilon with ε=0.1\varepsilon = 0.1, for the α\alpha-Rényi entropy with α=0.5\alpha = 0.5.


Δε\Delta_\varepsilon with ε=0.1\varepsilon = 0.1, for the α\alpha-Rényi entropy with α=1.5\alpha = 1.5.


Δε\Delta_\varepsilon with ε=0.1\varepsilon = 0.1, for the α\alpha-Rényi entropy with α=2.0\alpha = 2.0.


Δε\Delta_\varepsilon with ε=0.1\varepsilon = 0.1, for the α\alpha-Rényi entropy with α=3.0\alpha = 3.0.