Carathéodory's theorem and the Holevo capacity

December 18, 2018*

*Last modified 10-Nov-19

Tags: QIT, physics, convex, teaching

Let Λ:B(H)B(H)\Lambda: \mathcal{B}(\mathcal{H})\to \mathcal{B}(\mathcal{H}) be a quantum channelcompletely positive and trace-preserving map

on a finite-dimensional Hilbert space H\mathcal{H}, and let d:=dimHd:= \dim \mathcal{H}. The Holevo capacity of Λ\Lambda is defined as χ(Λ):=sup{pk,ρk}[S(kpkΛ(ρk))kpkS(Λ(ρk))] \chi(\Lambda) := \sup_{\{p_k, \rho_k\}} \left[ S\Big(\sum_k p_k \Lambda(\rho_k)\Big) - \sum_k p_k S( \Lambda(\rho_k)) \right] where supremum is over ensembles of quantum states ρk\rho_k and probability distributions {pk}\{p_k\}. The HSW theorem (Holevo 1998; Schumacher and Westmoreland 1997) states that the product state classical capacity of a quantum channel Λ\Lambda is given by its Holevo capacity, thus providing the quantity with an operational meaning. It is well known that that the supremum may be restricted to ensembles of at most d2d^2 pure states; in fact, this is exercise 12.11 in the classic textbook (Nielsen and Chuang 2009).

To me, it’s not obvious how to prove this fact. Before discussing further, let us write the expression in a slightly different way. We can split up the supremum over all ensembles by first optimizing over the “average state” ρ=kpkρk\rho = \sum_k p_k \rho_k, and then over the ensemble of distributions yielding that fixed average state, by writing χ(Λ)=supρD(H)sup{pk,ρk}ρ[S~(ρ)kpkS~(ρk)] \chi(\Lambda) = \sup_{\rho \in \mathcal{D}(\mathcal{H})} \sup_{\{p_k, \rho_k\} \to \rho} \Big[\tilde S(\rho) - \sum_k p_k \tilde S (\rho_k)\Big] where D(H)\mathcal{D}(\mathcal{H}) is the set of density matrices on H\mathcal{H}, the notation {pk,ρk}ρ\{p_k, \rho_k\} \to \rho means ρ=kpkρk\rho = \sum_k p_k \rho_k, and we write S~(ρ):=S(Λ(ρ))\tilde S(\rho) := S(\Lambda(\rho)) for convenience. Since the first term only depends on the average state, we have χ(Λ)=supρD(H)[S~(ρ)inf{pk,ρk}ρkpkS~(ρk)]. \chi(\Lambda) = \sup_{\rho \in \mathcal{D}(\mathcal{H})} \left[ \tilde S(\rho) - \inf_{\{p_k, \rho_k\} \to \rho} \sum_k p_k \tilde S(\rho_k)\right]. Moreover, since S~\tilde S is concave, we can restrict the infimum to pure state ensembles. However, not all pure state ensembles are the same; Λ\Lambda may act very differently on one pure state compared to on another. To this point, note that the eigendecomposition of the average state ρ\rho gives a pure state ensemble of size dd, but in the end we will only find a restriction to pure state ensembles of size d2d^2, not dd (so it’s not because smaller ensembles don’t exist!). That the statement involves d2d^2 states suggests a use of Carathéodory’s theorem; after all, we can see self-adjoint operators on H\mathcal{H} as a d2d^2-dimensional real vector space, and density matrices as an affine hyperplane in the set of self-adjoint operators (with dimension d21d^2-1 as an affine space). Carathéodory’s theorem then tells us that any density matrix can be written as a convex combination of d21+1=d2d^2-1+1 = d^2 pure states, since the density matrices are the convex hull of the pure states. But the task remains to show that such a decomposition achieves the infimum in inf{pk,ρk}ρkpkS~(ρk). \inf_{\{p_k, \rho_k\} \to \rho} \sum_k p_k \tilde S(\rho_k). It turns out the relevant property of this quantity is that it is the convex roof of a continuous function (namely S~\tilde S). Let us sketch the proof of (Uhlmann 2010) (see also (Uhlmann 1997)) showing that indeed, the above quantity is maximized on some pure state ensemble of size at most d2d^2.

Define R(ρ):=inf{qk,ψk}ρkqkS~(ψk)(1) R(\rho) := \inf_{\{q_k, \psi_k\} \to \rho} \sum_k q_k \tilde S(\psi_k) \qquad(1) where the infimum is taken over pure state ensembles of any size. To show that the supremum in the Holevo capacity can be restricted to pure state ensembles of size at most d2d^2, it suffices to show that this infimum can be achieved by a ensemble with only d2d^2 terms, i.e. R(ρ)=min{qk,ψk}k=1d2ρk=1d2qkS~(ψk).(2) R(\rho) = \min_{ \{q_k, \psi_k\}_{k=1}^{d^2} \to \rho} \sum_{k=1}^{d^2} q_k \tilde S(\psi_k). \qquad(2)

Recall the following facts and definitions about a compact convex set CC. Here, (Simon 2011, chap. 8)which can also be found here

is a useful reference.

  1. A face FF of CC is a closed subset of CC such that if x,yCx,y\in C and t(0,1)t\in (0,1) are such that tx+(1t)yFtx + (1-t)y\in F, then x,yFx,y\in F too. A proper face is one which is not equal to CC itself.
  2. Extreme points of CC are faces that are singletons.
  3. The extreme points of a face FF of CC are simply the extreme points of CC that also lie in FF.
  4. For any point xCx\in \partial C, the relative boundary, xFx\in F for some proper face FF.
  5. An affine subspace of Rn\mathbb{R}^n is a set of the form x0+Vx_0 + V where VV is a vector subspace of Rn\mathbb{R}^n and x0Rnx_0 \in \mathbb{R}^n is a vector. The dimension of an affine subspace is the dimension of its associated vector space. Every convex set CC in Rn\mathbb{R}^n has a unique affine subspace WW of RnR^n so that as a subset of WW, CC has nonempty interior. The dimension of a convex set is the dimension of its associated affine subspace.
  6. The dimension of a proper face FF of CC is strictly smaller than the dimension of CC.
  7. Minkowski-Carathéodory theorem: if CC is compact and convex with dimension nn, then any point in CC is a convex combination of at most n+1n+1 extreme points.

With these facts in hand, we proceed to sketch the proof of (eq. 2), following (Uhlmann 2010). In the following, Bs.a.(H)\mathcal{B}_{\text{s.a.}}(\mathcal{H}) is the set of self-adjoint operators on H\mathcal{H}, and Dpure(H)\mathcal{D}_\text{pure}(\mathcal{H}) the set of pure states on H\mathcal{H}.

Proof sketch:

  1. We recall D(H)\mathcal{D}(\mathcal{H}) is a compact, convex subset of Bs.a.(H)\mathcal{B}_{\text{s.a.}}(\mathcal{H}), which is a real vector space of dimension d2d^2. As a convex set, D(H)\mathcal{D}(\mathcal{H}) is of dimension d21d^2-1.
  2. We enlarge Bs.a.(H)\mathcal{B}_{\text{s.a.}}(\mathcal{H}) by one dimension, yielding Bs.a.(H)R\mathcal{B}_{\text{s.a.}}(\mathcal{H}) \oplus \mathbb{R}, and define E={(ψ,S~(ψ)):ψDpure}Bs.a.(H)R E = \left\{ (\psi , \tilde S(\psi) ): \psi \in \mathcal{D}_\text{pure}\right\} \subset \mathcal{B}_{\text{s.a.}}(\mathcal{H}) \oplus \mathbb{R} EE is compact, using that Dpure(H)\mathcal{D}_\text{pure}(\mathcal{H}) is compact, and that S~\tilde S is continuous on Dpure(H)\mathcal{D}_\text{pure}(\mathcal{H}).
  3. Then Ξ:=conv(E)={(jpjψj,jpjS~(ψj):{pj,ψj} is a pure state ensemble} \Xi := \operatorname{conv}(E) = \left\{(\sum_j p_j \psi_j, \sum_j p_j \tilde S(\psi_j) : \{ p_j,\psi_j \} \text{ is a pure state ensemble}\right\} is a compact convex set of dimension d2d^2, whose extreme points are EE.
  4. By construction, for ωD(H)\omega\in \mathcal{D}(\mathcal{H}), we have R(ω)=inf{λR:(ω,λ)Ξ}. R(\omega) = \inf \{ \lambda \in \mathbb{R}: (\omega, \lambda) \in \Xi\}.
  5. Thus (ω,R(ω))Ξ(\omega, R(\omega)) \in \partial \Xi, the relative boundary of Ξ\Xi.
  6. Since Ξ\Xi is a convex, a point on the boundary of Ξ\Xi lies on one of it’s (proper) faces, FF, which is a convex set of dimension at most d21d^2-1.
  7. Since (ω,R(ω))F(\omega, R(\omega)) \in F, it can be written as a convex combination of d2d^2 elements of EE, using the Minkowski-Carathéodory theorem (and that the extreme points of FF are also extreme points of Ξ\Xi). That is, (ω,R(ω))=j=1d2pj(ψj,S~(ψj))=(j=1d2pjψj,j=1d2pjS~(ψj)). (\omega, R(\omega)) = \sum_{j=1}^{d^2} p_j ( \psi_j, \tilde S (\psi_j)) = \left( \sum_{j=1}^{d^2} p_j\psi_j, \sum_{j=1}^{d^2} p_j \tilde S(\psi_j)\right) . Comparing each component, we see ω=j=1d2pjψj,R(ω)=j=1d2pjS~(ψj), \omega = \sum_{j=1}^{d^2} p_j\psi_j, \qquad R(\omega) = \sum_{j=1}^{d^2} p_j \tilde S(\psi_j), and thus we have identified a decomposition of ω\omega with d2d^2 terms such that R(ω)=j=1d2pjR(ψj)R(\omega) = \sum_{j=1}^{d^2} p_j R(\psi_j).

Note: in (1997), Uhlmann takes a slightly different approach; instead of explicitly adding a dimension, in that work he encodes the same information in the “tracial dimension” by fixing a reference state τ\tau and considering ψ+S~(ψ)τBs.a.(H) \psi + \tilde S(\psi) \tau \in \mathcal{B}_\text{s.a.}(\mathcal{H}) instead of the pair (ψ,S~(ψ))(\psi, \tilde S(\psi)). I chose his later approach (that of (2010)), since it accomplishes the same task more explicitly.

References

Holevo, A. S. 1998. “The Capacity of the Quantum Channel with General Signal States.” IEEE Transactions on Information Theory 44 (1): 269–73. https://doi.org/10.1109/18.651037.

Nielsen, Michael A., and Isaac L. Chuang. 2009. Quantum Computation and Quantum Information. Cambridge University Press. https://doi.org/10.1017/cbo9780511976667.

Schumacher, Benjamin, and Michael D. Westmoreland. 1997. “Sending Classical Information via Noisy Quantum Channels.” Physical Review A 56 (1): 131–38. https://doi.org/10.1103/PhysRevA.56.131.

Simon, Barry. 2011. Convexity: An Analytic Viewpoint. Cambridge Tracts in Mathematics 187. Cambridge, UK ; New York: Cambridge University Press.

Uhlmann, Armin. 1997. “Optimizing Entropy Relative to a Channel or a Subalgebra.” arXiv:Quant-Ph/9701014, January. http://arxiv.org/abs/quant-ph/9701014.

———. 2010. “Roofs and Convexity.” Entropy 12 (7): 1799–1832. https://doi.org/10.3390/e12071799.