Week 6.2
This might be a recap: The goal now is to map from $$\rho_{XB}$$ to $$\rho_{ZB} \equiv \rho_{f(X)B}$$ where B is some independent subsystem, so as to extract randomness.
Explicitly, $$\rho_{XB} = \sum_x p_x |x\rangle{}\langle{}x|\otimes{} \rho_B^x$$
We have the initial guarantee that $$H_{\text{min}}(X|B) \ge{} \ldots{}$$
In order to make the process independent of $$\rho_{XB}$$, we use random functions. Given a seed $$S$$ with some pmf distribution $$p_S$$ (usually taken to be uniform), we apply the function $$f_S$$ to $$X$$,
The goal is to have $$\Delta{}(\rho_{ZBS}, \Pi_Z \otimes{}\rho_B \otimes{} \rho_S) = E_S [\Delta{}(\rho_{f_S(X)B}, \Pi_Z\otimes{} \rho_B)]\le{} \epsilon{}$$.
$$\rho_{XBS} = \rho_{XB} \otimes{} \sum_S p_S |s\rangle{}\langle{}s| \le{} \epsilon $$
$$\rho_{ZBS} = \rho_{f_S(S)BS} = \sum_{x,s} |f_S(x)\rangle\langle{}f_S(x)|_z \otimes{} \rho_B^x \otimes{} |s\rangle\langle{}s|$$
Directly outputting $$S$$ as the new random is kind of cheating, so we include it as part of the random function instead.
A definition of two-universal family of functions, given as the small collision probability $$Pr_S[f_S(x) = f_S(x')] \le{} d_Z^{-1} \quad{}\forall{} x \ne x'$$, given a family of random functions $$f_S$$ and $$p_S$$ such that the function maps from X to Z.
Using this, we can choose from a much smaller family of random functions (so that the constraint on the size of the seed can be relaxed a little).
Consider the finite field $$\mathbf{F}_{2^n}$$ and non-zero $$s$$ from the field. We can consider $$f_S: \{0,1\}^n \rightarrow \{0,1\}^l, \quad{} l\le{}n$$.
In other words, $$x \mapsto{} x\cdot{}s (\text{mod} 2^n)$$, so we have $$f_S(x) = f_S(x') \Rightarrow{} (x-x')s = n2^l$$ given natural number $$n$$
We can iterate through all possible values of $$s$$ and we can retrieve back the whole field (except zero), i.e. this condition is satisfied $$(2^{n-l} - 1)$$ times.
Therefore, $$Pr[f_S(x)=f_S(x')] = \frac{2^{n-l} - 1}{2^n - 1} \le{} 2^{-l}$$
The size of this family is roughly $$(2^l)^{2^n}$$ where $$S$$ is exponential in $$n$$, i.e. $$\text{log}|S| = n$$
Using the Holdr inequality, we have another inequality $$||ABC||_1 \le{} ||A||_4 ||B||_2 ||C||_4$$. This allows us to bound the trace distance:
\begin{align*}
2\Delta{}(\rho_{f_S(X)B}, \Pi_Z\otimes{} \rho_B)
&= || (\rho_B^{1/4}\rho_B^{-1/4}) (\rho_{f_S(X)B} - \Pi_Z \otimes{} \rho_B)
(\rho_B^{-1/4}\rho_B^{1/4})||_1 \\
&\le{} ||I_Z \otimes{} \rho_B^{1/4} ||_4 ||\rho_B^{-1/4} (\rho_{f_S(X)B} - \Pi_Z \otimes{} \rho_B) \rho_B^{-1/4} ||_2 \cdots{} \\
&= \sqrt{d_Z tr(\rho_B^{-1/2} (\rho_{f_S(X)B} - \Pi_Z \otimes{} \rho_B)^2 \rho_B^{-1/2} )} \\
&= \sqrt{d_Z tr(\rho_B^{-1/2} \rho_{f_S(X)B} \rho_B^{-1/2} \rho_{f_S(X)B} ) - 1}
\end{align*}
Some notes:
Something about the mixed states yielding -1 and something else gives +1, and the trace over $$(\Pi_B\otimes{}\rho_B)$$ gives $$-1/d_Z$$, which then finally pops up outside the trace.
Again, $$\rho_{f_S(X)B} = \sum_z \sum_x \mathbf{1}\{f_S(x)=z\}\cdots{}p_z |z\rangle\langle{}z| \otimes{} \rho_B^x$$
Applying trace will force all the z to be equal.
\begin{align*}
&E_S[tr(\rho_B^{-1/2} \rho_{f_S(X)B} \rho_B^{-1/2} \rho_{f_S(X)B} ) ] \\
&= E_S[\sum_z \sum_{x,x'} \rho_x\rho_{x'} \mathbf{1}\{ f_S(x) = z \}\mathbf{1}\{ f_S(x') = z \} tr(\rho_B^{-1/2}\rho_B^x \rho_B^{-1/2}\rho_B^{x'})] \\
&= E_S[\sum_{x,x'} \rho_x\rho_{x'} \mathbf{1}\{ f_S(x) = f_S(x') \}tr(\rho_B^{-1/2}\rho_B^x \rho_B^{-1/2}\rho_B^{x'})] \\
&= \sum_{x} p_x^2 tr(\rho_B^{-1/2}\rho_B^x \rho_B^{-1/2}\rho_B^{x'}) + \sum_{x\ne{}x'} p_x p_{x'} E_S[\mathbf{1}\{f_S(x) = f_S(x')\}] tr(\rho_B^{-1/2}\rho_B^x \rho_B^{-1/2}\rho_B^{x'}) \\
&\le{} \sum_{x} p_x^2 tr(\rho_B^{-1/2}\rho_B^x \rho_B^{-1/2}\rho_B^{x'}) + \frac{1}{d_Z} \sum_{x\ne{}x'} p_x p_{x'} tr(\rho_B^{-1/2}\rho_B^x \rho_B^{-1/2}\rho_B^{x'}) \\
&= tr(\rho_B^{-1/2}\rho_{XB}\rho_B^{-1/2}\rho_{XB}) + \frac{1}{d_Z} \\
&\le{} 2^{-H_\text{min}(XB)_\rho} + \frac{1}{d_Z}
\end{align*}
We can use the upper bound because every diagonal term is positive.
By definition of marginals $$\rho_B = \sum_x p_x \rho_B^x$$, we have $$\sum_{x,x'} p_x p_{x'} tr(\rho_B^{-1/2}\rho_B^x \rho_B^{-1/2}\rho_B^{x'}) = 1$$. We initially split into the diagonal and off-diagonal.
\begin{align*}
E_S[2\Delta{}(\rho_{f_S(X)B}, \Pi_Z\otimes{} \rho_B)]
&\le{} \sqrt{d_Z tr(\rho_B^{-1/2} \rho_{f_S(X)B} \rho_B^{-1/2} \rho_{f_S(X)B} ) - 1} \\
\le{} \sqrt{d_Z 2^{-H_\text{min}(XB)_\rho}}
\end{align*}
Interpretation: we want $$\frac{1}{2}\sqrt{d_Z 2^{-H_\text{min}(X|B)}} \le{} epsilon{}$$.
We use the smooth-min entropy $$H_\text{min}^\epsilon{}(A|B)_\rho = \max_{\tilde{\rho} \approx{}_\epsilon{} \rho} H_\text{min} (A|B)_{\tilde{\rho}}$$
We define $$\tilde{\rho} \approx{}_\epsilon{} \rho$$ means $$P(\tilde{\rho},\rho)\le{} \epsilon{}$$
In the context of QKD, the term on the right is usually not calculated exactly, because we need to find a bound without knowing what state Eve could have.
\begin{align*}
E_S[2\Delta{}(\rho_{f_S(X)B}, \Pi_Z\otimes{} \rho_B)]
&\le{} \sqrt{d_Z 2^{-H_\text{min}(X|B)_\rho}} \\
&= 2\epsilon{} + \sqrt{d_Z 2^{-H^\epsilon{}_\text{min}(X|B)_\rho}}
\end{align*}
Week 6.1
Week 5.2
Week 5.1
Surprisal
We define surprisal $$s(p_x)$$ with the following desirable properties, given a random variable $$X \in \mathcal{X}$$ and pmf $$p_x = P_X(x)$$.
$$s(1) = 0$$, monotonically decreasing with $$p_x$$ and additivity $$s(p_xp_y) = s(p_x) + s(p_y)$$
This naturally leads to a formulation of Shannon entropy as $$H(x) = E[S(X)] = \sum_x p_x \log \frac{1}{p_x}$$
Entropy
The quantum entropy generalizes Shannon entropy for classical states, and should hold unitary invariance $$H(\rho) = H(U\rho{}U^\dagger{})$$.
We can use an eigenstate (where the coefficients represent probabilities) $$\rho = \sum_i \lambda_i |i\rangle\langle{}i|$$ (and if not, rotate into them). It then follows $$H(\rho) = \sum_i \lambda_i \log \frac{1}{\lambda_i}$$ as well.
Monotone under (unital) noise channels $$\epsilon(I) = I$$, e.g. depolarizing channels, i.e. $$H(\rho) \le{} H(\epsilon(\rho))$$
If one adds noise to a state, the uncertainty about it increases.
Amplitude-damping channels do not fall under this, since it pushes the state to 0 instead of identity.
It is convenient to represent the entropy of the (sub-)system rather than specific states itself: $$H(A)_\rho \equiv H(\rho_A)$$
There are von Neumann measurements, i.e. (Rank-1) projective measurements:
Interesting point: in classical systems, the entropy increases with the number of systems, e.g. $$H(XY) \ge{} H(X)$$, whereas in the case of purification, the larger system (of a singular pure state) has an entropy of 0, even though the smaller system can have non-zero entropy.
Given pure state $$|\rho\rangle{}_{AB}$$ with $$\rho_A,\rho_B$$ marginals, we have $$H(A) = H(B) \equiv H(\rho_B)$$ and $$H(AB) = 0$$.
Conditional entropy
We define the conditional entropy $$H(A|B) := H(AB) - H(B)$$,
$$H(A|B)$$ can be negative! So the conditional entropy is better described as the amount of entropy to add to obtain the entropy of the joint system.
This can also be seen as a measure of uncertainty about $$A$$ from the perspective of an observer with access to $$B$$.
The data-processing inequality (DPI) is the statement: $$H(A'|B')_{\epsilon\otimes{}\mathcal{F}(\rho)} \ge{} H(A|B)$$, for a unital quantum channel $$\epsilon_{A\rightarrow{}A'}$$ and quantum channel $$\mathcal{F}_{B\rightarrow{}B'}$$.
DPI represents monotonicity, with the former taking a more operational perspective.
Applying a quantum channel may gradually make it more separable, so $$H(A|B)$$ becomes more non-negative.
We also have $$H(A|BC) \le{} H(A|B)$$, which is DPI on $$tr_C$$.
Another perspective: If we drop some side-information on $$A$$, our uncertainty on $$A$$ increases.
Can also be interpreted as strong sub-additivity (joint entropy is smaller than sum of its parts) and is equivalent (although this was rather hard to proof formally):
\begin{align*}
H(A|BC) &\le{} H(A|B) \\
H(ABC) - H(BC) &\le{} H(AB) - H(B) \\
H(AC|B) &\le{} H(A|B) + H(C|B)
\end{align*}
If $$\rho_{ABC}$$ is pure, we get the duality $$H(A|B) = -H(A|C)$$. The proof uses the idea that $$H(AB) = H(C)$$ since both parts complete the joint state:
$$H(A|B) = H(AB) - H(B) = H(C) - H(AC) = -H(A|C)$$.
We also define mutual information $$I(A:B) = H(A) - H(A|B)$$, which is a measure of correlation (applies to both classical and quantum).
Bound: $$0\le{} I(A:B) \le{} 2\log d$$
We also have another DPI, where correlations decrease after applying local operations $$\epsilon_{A\rightarrow{}A'}, \mathcal{F}_{B\rightarrow{}B'}$$: $$I(A':B')_{\epsilon\otimes{}\mathcal{F}(\rho)} \le{} I(A:B)_\rho$$
Relative entropy
For two states $$\rho$$ and $$\sigma$$, we define the relative entropy $$D(\rho||\sigma) = tr(\rho(\log \rho - \log \sigma))$$.
More generally, only $$\sigma \ge{}0$$ needs to hold.
This allows us to derive all the data-processing inequalities: $$D(\rho||\sigma) \ge{} D(\epsilon(\rho)||\epsilon(\sigma))$$ for any quantum channel $$\epsilon$$
Thoughts
Most of the difficulty grappling with these concepts are:
Week 4.2
CHSH game
CHSH quantum strategy: $$\rho_{A'B'} = |\Psi_0\rangle{}\langle{}\Psi_0|$$, $$M^0 = \{|0\rangle{}\langle{}0|, |1\rangle{}\langle{}1|\}, M^1 = \{|+\rangle{}\langle{}+|,|-\rangle{}\langle{}-|\}$$, $$N^0 = \{|\pi/8\rangle{}\langle{}\pi/8|, |5\pi/8\rangle{}\langle{}5\pi/8|\}, N^1 = \{|-\pi/8\rangle{}\langle{}-\pi/8|,|3\pi/8\rangle{}\langle{}3\pi/8|\}$$
Given Alice $$\alpha$$ and Bob $$\beta$$, $$tr(|\alpha\rangle\langle{}\alpha|\otimes|\beta\rangle\langle{}\beta| \cdot{} |\Psi_0\rangle\langle{}\Psi_0|) = \frac{1}{2}|\langle{}\alpha|\beta\rangle|^2 = \frac{1}{2}\cos^2(\alpha-\beta)$$
This can be solved simply by parameterizing all of the terms into the computational basis, or alternatively writing the Bell state in terms of the $$\{|\alpha,\alpha\rangle{},|\alpha+\pi/2\rangle,\alpha+\pi/2\rangle\}$$ basis.
Even more cool way of solving this, by rewriting $$|\beta\rangle{}$$ as its adjoint, then the trace becomes $$\frac{1}{2}tr(|\alpha\rangle{}\langle{}\alpha|\beta\rangle{}\langle{}\beta|)$$.
This makes intuitive sense: if both angles are the same, then the two are perfectly correlated (and perfectly anti-correlated with a $$\pi/2$$ difference).
$$p_{\text{win}}^q(\text{CHSH}) = \cos^2(\pi/8) = \frac{1}{2}+\frac{1}{2\sqrt{2}} \approx{} 0.85\ldots{}$$
We can quickly show this is the supremum using Naimark dilation: Given a POVM $$\{M_x\}_x$$ on A, there exists a projective POVM (PVM) $$\{P_x\}_x$$ and an isometry $$U:A\rightarrow{}AX$$ such that $$\langle{}M_x,A\rangle = \langle{}P_x,U\rho{}U^\dagger{}\rangle{}$$.
The isometry takes the state into a larger space.
One such construction: $$P_x = I_A\otimes{}|x\rangle{}\langle{}x|$$ and $$U= \sum_x \sqrt{M_x} \otimes{} |x\rangle{}$$.
We easily see $$P_x$$ is a projector. We also verify the isometry $$U^\dagger{}U = \sum_{x,x'} \sqrt{M_x}\sqrt{M_{x'}}\langle{}x'|x\rangle{} = \sum_x M_x = I$$.
Writing the inner product, $$\langle{}P_x,U\rho{}U^\dagger{}\rangle{} = \langle{}U^\dagger{}P_xU,\rho\rangle{} = \langle{}M_x,\rho\rangle{}$$.
This implies that the CHSH game has an additional structure: $$p_\text{win}^q$$ can be achieved only with pure states and projective measurements, so the winning probability calculated is already optimal (without having processed all possible strategies).
The upper bound can be derived and is given by Tsirelson's bound, using $$P(a,b|x,y) = \langle{}\phi|P_{A'}^x\otimes{}Q_{B'}^x|\phi\rangle{}$$:
In the case of $$x=y=0$$, the winning probability is $$\sum_{a,b}\mathbf{1}\{a=b\}P(a,b|0,0) = \frac{1}{2}+\frac{1}{2}\sum_{a,b} (-1)^{a+b} P(a,b|0,0) \equiv \frac{1}{2} + \frac{1}{2}\langle{}\phi|C^x \otimes{}D^y|\phi\rangle{}$$
The winning probability then becomes:
\begin{align*}
p_\text{win}^q
&= \frac{1}{2} + \frac{1}{8}\langle{}\psi|C^0\otimes{}D^0 + C^0\otimes{}D^1 + C^1 \otimes{}D^0 - C^1 \otimes{}D^1|\psi\rangle \\
&\equiv{} \frac{1}{2} + \frac{1}{8}\langle{}\psi|B|\psi\rangle \\
&\le{} \frac{1}{2} + \frac{1}{8}||B||_\infty{} \\
&= \frac{1}{2} + \frac{1}{8}\sqrt{||B||^2_\infty{}}
\end{align*}
B is called the Bell observable.
$$B^2 = 4I + [C^0,C^1]\otimes{}[D^0,D^1]$$
Using the triangle inequality ($$||AB||_\infty{} \le{} ||A||_\infty{}||B||_\infty{}$$) gives us $$||[C^0,C^1]||_\infty{} \le{} 2||C^0||_\infty{}||C^1||_\infty{} = 2$$
Using the fact $$||A\otimes{}B||_\infty{} = ||A||_\infty{} ||B||_\infty$$, we finally get $$||B^2||\le{} 4 + 4 = 8$$
This also gives the probability $$p_\text{win}^q \le{} \frac{1}{2} + \frac{1}{8}\sqrt{||B^2||_\infty} \le{} \frac{1}{2} + \frac{1}{2\sqrt{2}}$$.
Mermin magic square game
The game follows $$X=Y=\{1,2,3\}, A=B = \{-1,1\}^{\otimes{}3}$$ and $$V(a,b,x,y) = \mathbf{1}\{a_1a_2a_3=1, b_1b_2b_3=-1, a_y = b_x\}$$. This is better visualized as a 3x3 square, where the Alice prepares a row and Bob prepares a column such that their intersection is the same.
The optimal classical strategy is to fill the square:
x \ y | 1 | 2 | 3 |
1 | 1 | 1 | 1 |
2 | -1 | 1 | -1 |
3 | 1 | -1 | ? |
where the last ? cannot be agreed on between Alice and Bob. This thus gives $$p_\text{win}^{\text{cl.}} = \frac{8}{9}$$.
For the quantum game, we rely on:
Commuting (Hermitian) observables $$H,K$$ such that $$[H,K]=0$$, there exists an ONB $$\{|\psi_i\rangle{}\}_i$$ with $$H|\psi_i\rangle = \lambda_i|\psi_i\rangle{}$$ and $$K|\psi_i\rangle = \mu_i|\psi_i\rangle{}$$.
The idea is for each row entry should commute with each other. The set of $$I\otimes{}Z$$, $$Z\otimes{}I$$ and $$Z\otimes{}Z$$ all commute with each other to give $$I$$.
This means whichever state we measure, the product of the observables along the row will always give $$I$$ (the parity is always 1).
Filling the rest of the square gives the following, where the parity of each row is $$I$$ and column is $$-I$$:
x \ y | 1 | 2 | 3 |
1 | $$I\otimes{}Z$$ | $$Z\otimes{}I$$ | $$Z\otimes{}Z$$ |
2 | $$X\otimes{}I$$ | $$I\otimes{}X$$ | $$X\otimes{}X$$ |
3 | $$-X\otimes{}Z$$ | $$-Z\otimes{}X$$ | $$Y\otimes{}Y$$ |
The joint eigenbasis gives the eigenvectors in the computational basis, i.e. $$|\theta_0\rangle{} = |00\rangle{}$$, etc. To win the game, Alice and Bob need to share two joint maximally-entangled states. Let $$O_{xy}$$ represent the observable in cell (x,y), and the state (on either Alice or Bob) be $$|\psi\rangle{} = |\Psi_0\rangle{}\otimes{}|\Psi_0\rangle{}$$, the winning probability is thus,
\begin{align*}
p_\text{win}^q
&= \frac{1}{2}(1 + \langle{}O_{xy}\otimes{}O_{xy},|\psi\rangle{}\langle{}\psi|\rangle{}) \\
&= \frac{1}{2}(1 + \langle{}O_{xy}O_{xy}^T\otimes{}I,|\psi\rangle{}\langle{}\psi|\rangle{}) \\
&= \frac{1}{2}(1 + \frac{1}{4}tr(O_{xy}O_{xy}^T)) \\
&= 1
\end{align*}
Experimental realization of this has not been performed.
Week 4.1
POVM: $$\{H_x\}_x$$
Instrument: $$\mathcal{M}_{A\rightarrow{}AX}(\cdot{}) = \sum_x |x\rangle{}\langle{}x| \otimes \sqrt{M_x}(\cdot{})\sqrt{M_x} $$
Measurement: $$tr_A \cdot{} M_{A\rightarrow{}AX} \equiv M_{A\rightarrow{}X}$$
cq-state and post-measurement state
Quantum correlations and games
A two-party game $$G=(X,Y,A,B,P_{XY},V)$$, where $$X,Y,A,B$$ are discrete and finite alphabets (input/output), $$P_{XY}$$ is the pmf and $$V:A\times{}B\times{}X\times{}Y \rightarrow \{0,1\}$$.
Alice gets input $$x\in{}X$$ and outputs $$a\in{}A$$, and correspondingly for Bob. They win if $$V(a,b,x,y) = 1$$.
Example: CHSH game, where $$X=Y=A=B=\mathbf{F}_2$$ and $$P_{XY}(x,y)=\frac{1}{4} \forall{} x,y$$, and $$V(a,b,x,y) = 1 \,\text{if}\,a+b=xy \,\text{else}\,0$$.
A classical strategy $$S$$ for Alice and Bob is given by $$Q_{X'Y'}$$ and $$f_A:X\times{}X' \rightarrow A$$ and $$f_B:Y\times{}Y'\rightarrow{} B$$.
\begin{align*}
P_{\text{win}}^{\text{cl.}}(G,S)
&= Pr[V(f_A, f_B, X, Y) = 1] \\
&= \sum_{x,x',y,y'} P_{XY}(x,y)Q_{X'Y'}(x',y') \mathbf{1}\{V(\cdots{})=1\} \\
&= \sum_{x,y,a,b} P_{XY}(x,y)S(a,b|x,y) \mathbf{1}\{V(a,b,x,y)=1\}
\end{align*}
$$\mathbf{1}$$ is the indicator function, which is 1 only if the condition specified is satisfied. Similar to the delta function.
$$S(a,b|x,y) = \sum_{x',y'} Q_{X'Y'}(x',y') \cdot{} \mathbf{1}\{f_A(x,x')=a\} \cdot{} \mathbf{1}\{f_B(y,y')=b\}$$
A strategy is deterministic if $$f_A(x,x') = f_A(x) \forall{} x,x'$$.
The classical winning probability of a game is $$p_\text{win}^\text{cl.} (G) = \sup_S p_\text{win}^\text{cl.} (G,S)$$
Lemma: The maximum is achieved by a deterministic strategy. We rely on the fact that the expectation is always upper-bounded by the maximum.
Proof: $$p_\text{win} \le{} \max_{x',y'} \sum_{x,y} P_{XY}(x,y) \mathbf{1}\{V(f_A(x,x^*), f_B(y,y^*),x,y)=1\}$$, where $$(x^*,y^*) = \text{argmax} \sum_{x,y} P_{XY}(x,y) \mathbf{1}\{V(f_A(x,x^*), f_B(y,y^*),x,y)=1\}$$
A quantum strategy $$S_q$$ is given by a state $$\rho_{A',B'}$$ and a collection of POVMs $$\{M^x_a\}_a$$ on A' and $$\{N_b^y\}_b$$ on B' for all x,y.
Recall: $$\sum_{a\in{}A} M_a^x = I_A\;\forall{}x$$.
The strategy is now given as $$S_q(a,b|x,y) = tr((M_a^x \otimes{} N_b^y) \rho_{A'B'})$$, with the winning probability $$p_\text{win}^q(G) = \sup_{S_q} p_\text{win}(G,S_q)$$.
This is the most general case.
Even if we locally apply channels on the state, we see that $$(M_a^x \otimes{} N_b^y)(\epsilon_{A'\rightarrow{}A'}^x\otimes{} I)\rho_{A'B'} = (((\epsilon_{A'\rightarrow{}A'}^x)^\dagger{}M_a^x) \otimes{} N_b^y) \rho_{A'B'}$$.
We can show that $$\{(\epsilon_{A'\rightarrow{}A'}^x)^\dagger{}M_a^x\}$$ is also a POVM, by noting that (although cptp maps are not necessarily unital) the adjoint of cptp maps are unital (i.e. preserves the identity $$F: F(I_A) = I_A$$), so we see that $$\sum_a (\epsilon_{A'\rightarrow{}A'}^x)^\dagger{} M_a^x = (\epsilon_{A'\rightarrow{}A'}^x)^\dagger{} (I_A) = I_A$$.
Proof for unital adjoint: $$\langle{}\epsilon^\dagger{}I, \rho{}\rangle{} = \langle{}I,\epsilon(\rho)\rangle{} = tr(\epsilon(\rho)) = tr(\rho) = \langle{}I,\rho{}\rangle{}$$
We note that $$p_\text{win}^q(G) \ge{} p_\text{win}^{\text{cl.}}(G)$$.
Current thoughts
Intuitively things seem to make sense, but the theoretical treatment requires considering many different small lemmas that actually need proving...
Week 3.1
Composite systems
A state $$\rho_{AB}$$ is separable if it can be written as $$\rho_{AB} = \sum_i p_i \rho_A^i \otimes{} \rho_B^i$$, otherwise we say $$\rho_{AB}$$ is entangled.
Note that $$p_i$$ needs to be positive.
Examples of entangled states include Bell states, e.g. $$|\psi_i\rangle{} = (P_i \otimes{} \mathbf{1})(\frac{1}{\sqrt{2}} (|0\rangle{} \otimes{}|0\rangle{} + |1\rangle{}\otimes{}|1\rangle{}))$$
Classical-quantum (cq) states are of the form $$\rho_{XB} = \sum_x p_x | x\rangle{}\langle{}x|_x \otimes{} \rho_B^x$$.
Taking the partial trace over B gives us the classical state.
Convention to use X, Y, Z... for classical states, and these are called registers.
Channels
We look at isomorphism from $$L(V,W)$$ to $$V\otimes{}W$$, or in more explicit terms, $$L(L(V),L(W))$$ to $$L(V)\otimes{}L(W) \sim L(V\otimes{}W)$$. This is the Choi-Jamiotkowski isomorphism.
We can vectorize channels: $$\text{vec}(\epsilon_{V\rightarrow{}W}) = (\epsilon_{V\rightarrow{}W} \otimes{} \mathcal{I}) (|\Omega{}\rangle{}\langle{}\Omega{}|) = \sum_{i,j} \epsilon(|i\rangle{}\langle{}j|) \otimes{} |i\rangle{}\langle{}j|$$. This is called a Choi state.
Reminder that $$|i\rangle{}\langle{}j|$$ can be seen as a matrix with the (i,j)-entry as 1, and the rest 0.
Idea is to transform a channel into a linear operator that can be properly characterized.
$$|\Omega{}\rangle{}$$ is similar to the Bell states, with $$|\Omega{}\rangle{} = \sum_i^{d_v} |i\rangle{}_v\otimes{}|i\rangle{}_{v'}$$
Theorems:
Examples of other channels:
Norms and metrics on states
Canonical norm: $$||X|| = \sqrt{\langle{}X,X\rangle{}} = \sqrt{\sum_j \langle{}Xv_j,Xv_j\rangle{}} = \sqrt{\sum_j s_j^2} \equiv ||X||_2$$. This is called the 2-norm, or Frobenius norm, or Schatten-2 norm.
We write $$X = \sum_i s_i |w_i\rangle{}\langle{}v_i|$$.
The 2-norm is unitary invariant (i.e. $$||X||_2 = ||UXV^\dagger{}||_2$$ for any isometries $$U,V$$.
The p-norm has the same type of invariance properties and is defined $$||X||_p = (\sum_i s_i^p)^(1/p)$$.
Recall that a metric has the properties $$\Delta{}(\rho,\sigma) \ge{} 0$$ with equality iff $$\rho=\sigma$$, satisfies Cauchy-Schwarz inequality, and is symmetric. Some metrics that we can define:
Trace distance $$\Delta{}(\rho,\sigma{}) \equiv{} \frac{1}{2}||\rho-\sigma||_1$$
Fidelity $$F(\rho,\sigma) = ||\sqrt{\rho}\sqrt{\sigma}||^2_1 = (tr(\sqrt{\rho{}}\sqrt{\sigma{}}))^2 = (tr \sqrt{\sqrt{\rho}\sigma\sqrt{\rho}})^2$$.
With a pure state, we see simply that $$F(\rho,|\psi\rangle{}\langle{}\psi|) = \langle{}\psi|\rho|\psi\rangle{}$$
$$F(\rho,\sigma)\in [0,1]$$
Can be seen as a trace distance between purifications of the states.
Purified distance (sometimes known as infidelity): $$P(\rho,\sigma) = \sqrt{1 - F(\rho,\sigma)}$$.
These are related by the Fuchs-van de Groof inequality $$1 - \sqrt{F} \le{} \Delta \le \sqrt{1-F}$$, with $$\Delta = \sqrt{1-F}$$ if $$\rho,\sigma$$ are pure.
Some relations:
Given ctptp $$\epsilon$$, applying them on both states can make them closer (e.g. adding noise makes them more difficult to distinguish). This is embodied in these data-processing inequalities (DP1):
$$\Delta(\epsilon(\rho),\epsilon(\sigma)) \le{} \Delta(\rho,\sigma)$$
$$F(\epsilon(\rho),\epsilon(\sigma)) \ge{} F(\rho,\sigma)$$
Uhlman's theorem: for purifications $$|\rho\rangle, |\sigma\rangle$$, we have $$F(\rho,\sigma) = \max_{|\rho\rangle,|\sigma\rangle} F(|rho\rangle\langle\rho|,|\sigma\rangle\langle\sigma|)$$.
Proof sketch:
Using DP1 for partial trace, we have $$F(\rho,\sigma) \ge{} F(|rho\rangle\langle\rho|,|\sigma\rangle\langle\sigma|)$$.
For the other direction, we use the purification $$|\rho\rangle = \text{vec}(\sqrt{\rho}) = \sum_i \sqrt{\rho}|i\rangle\otimes{}|i\rangle{}$$ and $$|\sigma{}_u\rangle{} = (I\otimes{}U)\text{vec}(\sqrt{\sigma}) = \text{vec}(\sqrt{\sigma}U^T)$$. This gives the inner product after the isometry, expanded below.
This relies on $$|\Omega{}\rangle{} = \sum_i |ii\rangle{}$$ and $$X\otimes{}I|\Omega{}\rangle = I\otimes{}X^T|\Omega{}\rangle$$
\begin{align*}
\max_u |\langle{}\rho|\sigma\rangle{}|
&= \max_u |\langle{}\sqrt{\rho},\sqrt{\sigma}U^\dagger\rangle{}| \\
&= \max_u | tr\sqrt{\rho}\sqrt{\sigma} U^T |\\
&= \max_u |tr(|\sqrt{\rho}\sqrt{\sigma}| U^T V)| \\
&\ge{} tr|\sqrt{\rho}\sqrt{\sigma}| \\
&= \sqrt{F(\rho,\sigma)}
\end{align*}
Week 2.2
Tensor spaces
Given V, W Hilbert spaces on F with inner products and ONBs, we can define a tensor space $$V\otimes{} W = \text{span}\{v_i\otimes{} w_j\}_{i,j}$$.
Their corresponding inner product between each basis vector is as expected: $$\langle{}v_i\otimes{} w_j , v_{i'} \otimes{} w_{j'} \rangle{} = \langle{} v_i,v_{i'}\rangle{}_v \langle{} w_j,w_{j'}\rangle{}_w = \delta_{ii'} \delta_{jj'} $$.
We can also define general vectors in this new basis: $$u = \sum_{i,j} \alpha_{ij} v_i \otimes{} w_j $$, as well as the inner product $$\langle{}u,u'\rangle{} = \sum_{i,j} \overline{\alpha_{ij}} \alpha_{ij}' $$
Tensor spaces retain additive distributivity and multiplicative associativity.
Tensors of linear operators: $$ (L\otimes{} K) (v\otimes{} w) = (Lv)\otimes{} (Kw) $$, so we can define $$ L(V) \otimes{} L(W) \sim{} L(V\otimes{} W)$$
Composite systems
Notation:
We denote $$A\otimes{}B\otimes{}C$$ as $$ABC$$.
$$L(AB)$$ are linear operators on $$AB$$, and $$\text{Herm}(AB)$$ are self-adjoint elements of $$L(AB)$$.
$$\rho_{AB}$$ is a state in $$\text{Herm}(AB)$$ (as usual, $$\rho_{AB}\ge{}0,\,tr[\rho_{AB}] = 1 $$)
$$A'\sim{} A$$ implies isomorphism, i.e. they share the same dimensions, and their basis vectors can be mapped between each other.
Marginal state is given by $$\rho_A = \sum_i (I_A \otimes{} \langle{}i|_B) \rho_{AB} (I_A \otimes{} |i\rangle{}_B) \equiv{} (I_A\otimes{}\text{tr}_B)(\rho_AB) \equiv{} \text{tr}_B(\rho_AB)$$, given any ONB. $$\text{tr}_B$$ is called the partial trace, and can be written as $$tr_B(X) = \sum_i \langle{}i|X|i\rangle{}$$.
Given a state $$\rho_A = \sum_i \lambda{}_i |\psi_i\rangle{}\langle{}\psi_i| $$.
We define $$|\rho\rangle{}_{AA'} = \sum_i \sqrt{\lambda_i} |\psi_i\rangle{}_A \otimes{} |i\rangle{}_{A'}$$. Note that we can use any arbitrary ONB (which has to span $$A'$$ that is isomorphic to $$A$$) on the right side - we use the computational basis here for simplicity.
We compute the partial trace of the state over $$A'$$:
\begin{align*}
tr_{A'} (\rho_{AA'}) &\equiv tr_{A'}(|\rho\rangle{}\langle{}\rho|_{AA'}) \\
&= tr_{A'} (\sum_{i,j} \sqrt{\lambda{}_i\lambda_j} |\psi_i\rangle{}\langle{}\psi_j| \otimes | i\rangle{}\langle{}i|) \\
&= \sum_{i,j} \sqrt{\lambda_i\lambda_j} \sum_k |\psi_i\rangle{}\langle{}\psi_j| \cdot{} \langle{}k|i\rangle{}\langle{}j|k\rangle{} \\
&= \sum_k \lambda_k |\psi_k\rangle{}\langle{}\psi_k| \\
&= \rho_A
\end{align*}
Channels
Channels are linear maps that map between states: $$\epsilon_{A\rightarrow{}B} \in L(\text{Herm}(A),\text{Herm}(B))$$.
Some properties we would like it to have:
$$\epsilon_{A\rightarrow{}B}$$ must preserve Hermiticity and trace, i.e. $$tr(\epsilon_{A\rightarrow{}B}(H_A)) = tr(H_A)$$ for all $$H_A \in \text{Herm}(A)$$.
$$\epsilon_{A\rightarrow{}B}$$ must preserve positivity, i.e. $$H_A\ge{}0 \Rightarrow{} \epsilon_{A\rightarrow{}B} (H_A) \ge{} 0 $$
$$\epsilon_{A\rightarrow{}B}$$ must be completely positive, i.e. $$\epsilon_{A\rightarrow{}B}\otimes{} I_C$$ is positive for all $$C$$ (bolded part is the important one).
Channels are therefore called cptp maps (completely positive trace-preserving).
Examples:
Unitaries are actually unitary channels (without noise), e.g. $$\mathcal{U}_{A\rightarrow{}A}(H_A) = U_A H_A U_A^\dagger{} $$.
Pauli channels: $$ \mathcal{P}_{p_i} (H) = \sum_{i=0}^3 p_i P_i HP_i$$ where $$\sum_i p_i = 1$$
Trace (and partial trace) is also a channel: $$tr_A: H (\in \text{Herm}(A))\mapsto{} tr(H) (\in \mathbb{R}) $$
Channels have adjoints, e.g. the adjoint of trace is $$(tr_B)^\dagger{}: H_A \mapsto{} H_A\otimes{} I_B$$. This adjoint is actually pretty conceptually important.
We can derive the adjoint trace: $$\langle{}H_A, (I_A\otimes{}tr_B)(M_{AB})\rangle = tr(H_A \cdot{} tr_B(M_{AB})) = tr(H_A \otimes{} I_B \cdot{} M_{AB}) = \langle{} H_A \otimes{} I_B, M_{AB}\rangle{}$$
$$Pr[X=i] = \langle{}M^i_A\otimes{}I_B,\rho_{AB}\rangle{} = \langle{}M^i_A,tr_B(\rho_AB)\rangle{} = \langle{}M^i_A,\rho_A\rangle{} $$
i.e. it is sufficient to look at the marginals since we only look at the outcome of Alice's measurement on state (Bob cannot affect Alice's outcome).
Usually we want to look at the correlations between Alice and Bob's outcomes, so the probability of Alice's outcome conditioned on Bob's outcome will instead be different (and entanglement can strengthen these correlations).
Week 2.1
Every $$ X\in{} L(V,W)$$ can be written in the form $$ X\mapsto{} \sum_{i=0}^{r-1} s_i |w_i\rangle{}\langle{}v_i| $$, where $$ \{|v_i\rangle{}\}_i$$ is orthonormal in V. $$s_i>0$$ are called singular values.
Does not need to span the entire space (i.e. form a basis)
Terminology:
Rank $$r < \text{min}\{d_{V},d_{W}\}$$
$$\text{span}\{v_i\}_i$$ is called support (whose orthogonal complement is called kernel), while that for $$w$$ is the image.
We can order the singular values, e.g. $$s_0 \ge{} s_1 \ge{} \cdots{} \ge{} s_{r-1} > 0$$
This matrix picture might make more sense: $$X = U\Sigma{}V^\dagger{}$$
For Hermitian operators, we have the spectral decomposition (eigenvalue decomposition), which boils down to the even more familiar $$H=\sum_i \lambda_i |v_i\rangle{}\langle{}v_i|$$ for $$H\in{} L(V)$$.
As usual, $$\lambda_i \in R$$
The arises from SVD by assigning $$s_i = |\lambda{}_i|$$ for $$i \le{} r$$ (eigenvalues may be non-zero, so the number of non-zero singular values will be less than the rank).
We can write in the SVD form by moving any negative signs from the eigenvalue to the corresponding image vector, e.g. $$|v_i\rangle{} \rightarrow{} |-v_i\rangle{} \equiv{} |w_i\rangle{}$$, to satisfy the positive singular value condition.
A Hermitian $$H\in{}L(V)$$ is positive semidefinite (psd) matrix if $$\langle{}v,Hv\rangle{}\ge{} 0 \forall{} v\in{}V$$.
We denote (a) $$H\ge{} 0$$ if H is psd (but does not mean the entries of H are necessarily non-negative!)
Lemma: $$M\ge{}0$$ implies $$X^\dagger{}MX\ge{}0$$ (use the same proof mechanism as above).
Projectors are psd operators $$P$$ with $$P^2 = P$$.
For a function $$f:(0,\infty{}) \rightarrow{} R$$, we define $$f(H)$$ as $$\sum_{i=0,\lambda_i>0}^{r-1} f(\lambda-i)|v_i\rangle{}\langle{}v_i|$$ for any psd $$H$$.
Note the non-zero eigenvalues is a convenience thing.
Example using $$f:t\mapsto{}1/t$$ gives the Penrose generalized inverse (the same one used in computing a best fit), which then gives us $$H^{-1}H = \sum_{i=0,\lambda_i>0}^{r-1} |v_i\rangle{}\langle{}v_i|$$ as the projector onto the support of H.
We can define a partial order on Hermitian operators $$ A\ge{}B$$ iff $$A-B\ge{}0$$, e.g. $$1\ge{} M \ge{} 0$$ if eigenvalues of M are [0,1].
Quantum states (or density operators) are psd operators $$\rho,\sigma,\omega,\pi$$ with unit trace.
These can be seen as a generalization of probability mass functions (since $$\sum_i \lambda_i = 1$$).
Pure states are states with rank 1, i.e. $$\Psi = |\psi\rangle{}\langle{}\psi{}|$$ with $$\lambda\in{}\{0,1\}$$, so these are projectors as well.
The Hadamard $$H$$ and Pauli operators $$\sigma_i$$ / $$P_i$$ are also psd operators!
A positive operator-valued measure (POVM) is a set $$\{M_x\}_{x\in{}X}$$ such that $$M_x\ge{}0$$ and $$\sum_x M_x = \bf{1}$$.
Terminology:
We say it is projective if the effects are also projective, i.e. $$M_x = M_x^2$$.
We use Born's rule to associate probabilities to states/effects:
Using the formalism (this involves multiple perspectives!),
\begin{align*}
p_x &= \langle{} \Psi{}, M_x \rangle{} \\
&= \text{tr} (\Psi{}M_x) = \text{tr} (|\psi{}\rangle{}\langle{}\psi| M_x)\\
&= \langle{}\psi|M_x|\psi{}\rangle{} \\
&= \langle{} \psi{}, M_x\psi{} \rangle{}
\end{align*}
Week 1.2 Linear algebra recap
Generally a recap of linear algebra basics.
Vector spaces
We denote a field $$F$$ (either the real field $$R$$ or complex field $$C$$). A vector space on F is a set V that is closed under addition $$V\times V \rightarrow V$$ and scalar multiplication $$F \times V \rightarrow V$$. This has the zero element 0.
A Hilbert space is a (finite dimensional) vector space with inner product $$\langle{}\cdot{},\cdot{}\rangle{}: V \times F \rightarrow F$$.
This inner product is linear in the second argument (Physics convention), i.e. $$\langle{}w,\alpha{}v\rangle{} = \alpha{}\langle{}w,v\rangle{} $$
Has conjugate $$ \langle{}w,v\rangle{} = \overline{\langle{}v,w\rangle{}} $$
Positive definite: $$\langle{}v,v\rangle{} \ge{} 0$$ with equality iff v=0
Canonical norm: $$||v|| = \sqrt{\langle{}v,v\rangle{}} $$
Cauchy-Schwarz inequality: $$|\langle{}v,w\rangle{}| \le ||v||\cdot{}||w||$$
Triangle inequality: $$||v+w|| \le ||v|| + ||w||$$
$$v$$ and $$w$$ are orthogonal if $$\langle{}v,w\rangle{} = 0$$
$$\{v_i\}_i$$ is orthonormal if $$\langle{}v_i,v_j\rangle{} = \delta{}_{ij}$$
Orthonormal basis (ONB) of V if $$V = \text{span}\{v_i\} = \{v: v=\sum_i a_i v_i, a_i \in F\}$$
Dimension of V is number of elements in ONB.
One example are pure states $$|\psi{}\rangle{} \in H$$, with the computational basis defined as $$\{|i\rangle{}\}_{i=0}^{d-1}$$. The inner product is defined $$\langle{}\theta{}|\phi{}\rangle{} \equiv{} \langle{}|\theta{}\rangle{},|\phi{}\rangle{}\rangle{} = \sum_i \overline{\theta}_i \cdot{} \phi_i $$. This is analogous to the matrix representation of dot product between column and row vectors.
Linear operators
Linear operators $$L(V,W)$$ contains all linear maps from V to W.
$$L(v) := L(V,V)$$
$$L(V,W)$$ is a vector space over same field (for convenience) as V and W, e.g. $$v\in V: (a_1X_1 + a_2X_2)v = a_1X_1v + a_2X_2v \in W$$.
Adjoint of $$ X\in L(V,W)$$ is $$ X^\dagger{} \in L(W,V) $$ satisfying $$\langle{}w,Xv\rangle{} = \langle{}X^\dagger{}w, v\rangle{} $$, where $$ (X^\dagger{})_{ij} = \overline{X_{ji}}$$.
Identity operator on V is $$1_V \in L(V)$$, where $$1_V: v\mapsto v$$, or for any general ONB $$1_V: v \mapsto \sum_{i=0}^{d-1} v_i \langle{} v_i,v\rangle{}$$.
An isometry $$U\in L(V,W) $$ satisfies $$U^\dagger{}U = 1_V$$.
A unitary $$U\in L(V)$$ is a stronger isometry where the adjoint is its inverse, i.e. $$U^\dagger{}U = UU^\dagger{}$$.
Isometries preserve inner product: $$\langle{}Uv,Uw\rangle = \langle{} v,U^\dagger{}Uw\rangle{} = \langle{} v,w\rangle{} $$.
Unitaries are maps that map from one ONB to another, i.e. $$ U: v\mapsto{} \sum_i w_i \langle{} v_i,v\rangle{} $$, for a map from ONB $$\{v_i\}_i$$ to ONB $$\{w_i\}_i$$, or in other words $$Uv_i = w_i \forall{} j$$
An operator H is self-adjoint or Hermitian if $$H^\dagger{} = H$$, e.g. density operators, states, effects
We can also define an inner product on $$L(V,W)$$, where $$\langle{} X,Y\rangle{} = \sum_i \langle{} Xv_i, Yv_i\rangle{}$$ taking any ONB of V.
This is also defined as the Hilbert-Schmidt inner product $$\langle{}X,Y\rangle{} \sim \sum_i \langle{} X|i\rangle{} ,Y|i\rangle{} \rangle{} = \sum_i \langle{} i|X^\dagger{}Y|i\rangle{} = \text{tr}(X^\dagger{}Y)$$, in the matrix picture.
Doesn't have to be the diagonal in the computation basis per se, can be any arbitrary inner product, which we can see in the lemma $$\langle{} X,Y\rangle{} = \langle{}Y^\dagger{},X^\dagger{}\rangle{}$$ (proof in handwritten lecture notes).
Can be shown to be positive definite by considering $$\langle{}X,X\rangle{} \ge{} 0$$ where equality iff $$X=0$$.
We can use the unitary to define further:
\begin{align*}
\langle{}X,Y\rangle{} &= \sum_i \langle{}Xw_i,Yw_i\rangle{} \\
&= \sum_i \langle{}XUw_i,YUw_i\rangle{} \\
&= \langle{}XU,YU\rangle{} \\
&= \langle{}U^\dagger{}Y^\dagger{},U^\dagger{}X^\dagger{}\rangle{} \\
&= \langle{}Y^\dagger{},UU^\dagger{}X^\dagger{}\rangle{} \\
&= \langle{}Y^\dagger{},X^\dagger{}\rangle{}
\end{align*}
A small note: vectors can be seen also as a map from the field to the vector space, e.g. $$ V\sim{} L(C,V)$$. We can generalize even further into channels, which map from some Hilbert space / set of linear operators to itself or another, e.g. $$ \langle{}\epsilon^\dagger{}(X),Y\rangle = \langle{} X,\epsilon{}(Y)\rangle{}$$ which can be seen as analogous to the Heisenberg vs Schrondinger picture.
Additional notes: to be pedantic, $$|\psi{}\rangle{}\alpha = \alpha{}|\psi\rangle{}$$, the left $$|\psi\rangle{} \in L(\bf{C},V)$$ while the right $$\alpha|\psi{}\rangle{} \in L(V)$$.
Week 1.1 Administrative
A list of names of students in the class. Scribing for lecture notes and a final exam.