Carathéodory's theorem and the Holevo capacity

December 18, 2018*

*Last modified 10-Nov-19

Tags: QIT, physics, convex, teaching

Let \(\Lambda: \mathcal{B}(\mathcal{H})\to \mathcal{B}(\mathcal{H})\) be a quantum channelcompletely positive and trace-preserving map

on a finite-dimensional Hilbert space \(\mathcal{H}\), and let \(d:= \dim \mathcal{H}\). The Holevo capacity of \(\Lambda\) is defined as \[ \chi(\Lambda) := \sup_{\{p_k, \rho_k\}} \left[ S\Big(\sum_k p_k \Lambda(\rho_k)\Big) - \sum_k p_k S( \Lambda(\rho_k)) \right] \] where supremum is over ensembles of quantum states \(\rho_k\) and probability distributions \(\{p_k\}\). The HSW theorem (Holevo 1998; Schumacher and Westmoreland 1997) states that the product state classical capacity of a quantum channel \(\Lambda\) is given by its Holevo capacity, thus providing the quantity with an operational meaning. It is well known that that the supremum may be restricted to ensembles of at most \(d^2\) pure states; in fact, this is exercise 12.11 in the classic textbook (Nielsen and Chuang 2009).

To me, it’s not obvious how to prove this fact. Before discussing further, let us write the expression in a slightly different way. We can split up the supremum over all ensembles by first optimizing over the “average state” \(\rho = \sum_k p_k \rho_k\), and then over the ensemble of distributions yielding that fixed average state, by writing \[ \chi(\Lambda) = \sup_{\rho \in \mathcal{D}(\mathcal{H})} \sup_{\{p_k, \rho_k\} \to \rho} \Big[\tilde S(\rho) - \sum_k p_k \tilde S (\rho_k)\Big] \] where \(\mathcal{D}(\mathcal{H})\) is the set of density matrices on \(\mathcal{H}\), the notation \(\{p_k, \rho_k\} \to \rho\) means \(\rho = \sum_k p_k \rho_k\), and we write \(\tilde S(\rho) := S(\Lambda(\rho))\) for convenience. Since the first term only depends on the average state, we have \[ \chi(\Lambda) = \sup_{\rho \in \mathcal{D}(\mathcal{H})} \left[ \tilde S(\rho) - \inf_{\{p_k, \rho_k\} \to \rho} \sum_k p_k \tilde S(\rho_k)\right]. \] Moreover, since \(\tilde S\) is concave, we can restrict the infimum to pure state ensembles. However, not all pure state ensembles are the same; \(\Lambda\) may act very differently on one pure state compared to on another. To this point, note that the eigendecomposition of the average state \(\rho\) gives a pure state ensemble of size \(d\), but in the end we will only find a restriction to pure state ensembles of size \(d^2\), not \(d\) (so it’s not because smaller ensembles don’t exist!). That the statement involves \(d^2\) states suggests a use of Carathéodory’s theorem; after all, we can see self-adjoint operators on \(\mathcal{H}\) as a \(d^2\)-dimensional real vector space, and density matrices as an affine hyperplane in the set of self-adjoint operators (with dimension \(d^2-1\) as an affine space). Carathéodory’s theorem then tells us that any density matrix can be written as a convex combination of \(d^2-1+1 = d^2\) pure states, since the density matrices are the convex hull of the pure states. But the task remains to show that such a decomposition achieves the infimum in \[ \inf_{\{p_k, \rho_k\} \to \rho} \sum_k p_k \tilde S(\rho_k). \] It turns out the relevant property of this quantity is that it is the convex roof of a continuous function (namely \(\tilde S\)). Let us sketch the proof of (Uhlmann 2010) (see also (Uhlmann 1997)) showing that indeed, the above quantity is maximized on some pure state ensemble of size at most \(d^2\).

Define \[ R(\rho) := \inf_{\{q_k, \psi_k\} \to \rho} \sum_k q_k \tilde S(\psi_k) \qquad(1)\] where the infimum is taken over pure state ensembles of any size. To show that the supremum in the Holevo capacity can be restricted to pure state ensembles of size at most \(d^2\), it suffices to show that this infimum can be achieved by a ensemble with only \(d^2\) terms, i.e. \[ R(\rho) = \min_{ \{q_k, \psi_k\}_{k=1}^{d^2} \to \rho} \sum_{k=1}^{d^2} q_k \tilde S(\psi_k). \qquad(2)\]

Recall the following facts and definitions about a compact convex set \(C\). Here, (Simon 2011, chap. 8)which can also be found here

is a useful reference.

  1. A face \(F\) of \(C\) is a closed subset of \(C\) such that if \(x,y\in C\) and \(t\in (0,1)\) are such that \(tx + (1-t)y\in F\), then \(x,y\in F\) too. A proper face is one which is not equal to \(C\) itself.
  2. Extreme points of \(C\) are faces that are singletons.
  3. The extreme points of a face \(F\) of \(C\) are simply the extreme points of \(C\) that also lie in \(F\).
  4. For any point \(x\in \partial C\), the relative boundary, \(x\in F\) for some proper face \(F\).
  5. An affine subspace of \(\mathbb{R}^n\) is a set of the form \(x_0 + V\) where \(V\) is a vector subspace of \(\mathbb{R}^n\) and \(x_0 \in \mathbb{R}^n\) is a vector. The dimension of an affine subspace is the dimension of its associated vector space. Every convex set \(C\) in \(\mathbb{R}^n\) has a unique affine subspace \(W\) of \(R^n\) so that as a subset of \(W\), \(C\) has nonempty interior. The dimension of a convex set is the dimension of its associated affine subspace.
  6. The dimension of a proper face \(F\) of \(C\) is strictly smaller than the dimension of \(C\).
  7. Minkowski-Carathéodory theorem: if \(C\) is compact and convex with dimension \(n\), then any point in \(C\) is a convex combination of at most \(n+1\) extreme points.

With these facts in hand, we proceed to sketch the proof of (eq. 2), following (Uhlmann 2010). In the following, \(\mathcal{B}_{\text{s.a.}}(\mathcal{H})\) is the set of self-adjoint operators on \(\mathcal{H}\), and \(\mathcal{D}_\text{pure}(\mathcal{H})\) the set of pure states on \(\mathcal{H}\).

Proof sketch:

  1. We recall \(\mathcal{D}(\mathcal{H})\) is a compact, convex subset of \(\mathcal{B}_{\text{s.a.}}(\mathcal{H})\), which is a real vector space of dimension \(d^2\). As a convex set, \(\mathcal{D}(\mathcal{H})\) is of dimension \(d^2-1\).
  2. We enlarge \(\mathcal{B}_{\text{s.a.}}(\mathcal{H})\) by one dimension, yielding \(\mathcal{B}_{\text{s.a.}}(\mathcal{H}) \oplus \mathbb{R}\), and define \[ E = \left\{ (\psi , \tilde S(\psi) ): \psi \in \mathcal{D}_\text{pure}\right\} \subset \mathcal{B}_{\text{s.a.}}(\mathcal{H}) \oplus \mathbb{R} \] \(E\) is compact, using that \(\mathcal{D}_\text{pure}(\mathcal{H})\) is compact, and that \(\tilde S\) is continuous on \(\mathcal{D}_\text{pure}(\mathcal{H})\).
  3. Then \[ \Xi := \operatorname{conv}(E) = \left\{(\sum_j p_j \psi_j, \sum_j p_j \tilde S(\psi_j) : \{ p_j,\psi_j \} \text{ is a pure state ensemble}\right\} \] is a compact convex set of dimension \(d^2\), whose extreme points are \(E\).
  4. By construction, for \(\omega\in \mathcal{D}(\mathcal{H})\), we have \[ R(\omega) = \inf \{ \lambda \in \mathbb{R}: (\omega, \lambda) \in \Xi\}. \]
  5. Thus \((\omega, R(\omega)) \in \partial \Xi\), the relative boundary of \(\Xi\).
  6. Since \(\Xi\) is a convex, a point on the boundary of \(\Xi\) lies on one of it’s (proper) faces, \(F\), which is a convex set of dimension at most \(d^2-1\).
  7. Since \((\omega, R(\omega)) \in F\), it can be written as a convex combination of \(d^2\) elements of \(E\), using the Minkowski-Carathéodory theorem (and that the extreme points of \(F\) are also extreme points of \(\Xi\)). That is, \[ (\omega, R(\omega)) = \sum_{j=1}^{d^2} p_j ( \psi_j, \tilde S (\psi_j)) = \left( \sum_{j=1}^{d^2} p_j\psi_j, \sum_{j=1}^{d^2} p_j \tilde S(\psi_j)\right) . \] Comparing each component, we see \[ \omega = \sum_{j=1}^{d^2} p_j\psi_j, \qquad R(\omega) = \sum_{j=1}^{d^2} p_j \tilde S(\psi_j), \] and thus we have identified a decomposition of \(\omega\) with \(d^2\) terms such that \(R(\omega) = \sum_{j=1}^{d^2} p_j R(\psi_j)\).

Note: in (1997), Uhlmann takes a slightly different approach; instead of explicitly adding a dimension, in that work he encodes the same information in the “tracial dimension” by fixing a reference state \(\tau\) and considering \[ \psi + \tilde S(\psi) \tau \in \mathcal{B}_\text{s.a.}(\mathcal{H}) \] instead of the pair \((\psi, \tilde S(\psi))\). I chose his later approach (that of (2010)), since it accomplishes the same task more explicitly.


Holevo, A. S. 1998. “The Capacity of the Quantum Channel with General Signal States.” IEEE Transactions on Information Theory 44 (1): 269–73.

Nielsen, Michael A., and Isaac L. Chuang. 2009. Quantum Computation and Quantum Information. Cambridge University Press.

Schumacher, Benjamin, and Michael D. Westmoreland. 1997. “Sending Classical Information via Noisy Quantum Channels.” Physical Review A 56 (1): 131–38.

Simon, Barry. 2011. Convexity: An Analytic Viewpoint. Cambridge Tracts in Mathematics 187. Cambridge, UK ; New York: Cambridge University Press.

Uhlmann, Armin. 1997. “Optimizing Entropy Relative to a Channel or a Subalgebra.” arXiv:Quant-Ph/9701014, January.

———. 2010. “Roofs and Convexity.” Entropy 12 (7): 1799–1832.