Quantum Machine Learning Tutorial

A Hands-on Tutorial for Machine Learning Practitioners and Researchers

Chapter 2.1 From Classical Bits to Quantum Bits

In this section, we define quantum bits (qubits) and present the mathematical tools used to describe quantum states. We begin by discussing classical bits and then transition to their quantum counterparts.

Classical bits

In classical computing, a bit is the basic unit of information, which can exist in one of two distinct states: $0$ or $1$. Each bit holds a definite value at any given time. When multiple classical bits are used together, they can represent more complex information. For instance, a set of three bits can represent $2^3=8$ distinct states, ranging from $000$ to $111$.

Quantum bits (Qubits)

Analogous to the role of bit in classical computation, the basic element in quantum computation is the quantum bit (qubit). We start by introducing the representation of single-qubit states and then extend this to two-qubit and multi-qubit states.

Single-qubit state. A single-qubit state can be represented by a two-dimensional vector with unit length. Mathematically, a qubit state can be written as $$\bm{a} = \begin{bmatrix} \bm{a}_1 \\ \bm{a}_2 \end{bmatrix}\in \mathbb{C}^2~,$$ where $|\bm{a}_1|^2+|\bm{a}_2|^2=1$ satisfies the normalization constraint. Following conventions in quantum theory, we use Dirac notation to represent vectors, i.e., $\bm{a}$ is denoted by $\ket{\bm{a}}$ (named ket) with $$\ket{\bm{a}} =\bm{a}_1\ket{0}+\bm{a}_2\ket{1}~,$$

where $\ket{0}\equiv \bm{e}_0\equiv \begin{bmatrix} 1 \ 0 \end{bmatrix}$ and $\ket{1}\equiv \bm{e}_1\equiv \begin{bmatrix} 0 \ 1 \end{bmatrix}$ are two computational (unit) basis states. In this representation, the coefficients $\bm{a}_1$ and $\bm{a}_2$ are referred to as amplitudes. The probabilities of obtaining the outcomes $0$ or $1$ upon measurement of the qubit are given by $|\bm{a}_1|^2$ and $|\bm{a}_2|^2$, respectively. The normalization constraint ensures that these probabilities always sum to one, as required by the probabilistic nature of quantum mechanics. In addition, the conjugated transpose of $\bm{a}$, i.e. $\bm{a}^{\dagger}$, is denoted by $\bra{\bm{a}}$ (named bra) with

$$\bra{\bm{a}} = \bm{a}_1^*\bra{0}+\bm{a}_2^*\bra{1} \in \mathbb{C}^2~,$$

where $\bra{0}\equiv \bm{e}_0^\top \equiv [1, 0]$ and $\bra{1}\equiv \bm{e}_1^\top\equiv [0, 1]$.

The physical interpretation of coefficients ${\bm{a}_i}$ is probability amplitudes. Namely, when we intend to extract information from the qubit state $\ket{\bm{a}}$ into the classical form, quantum measurements are applied to this state, where the probability of sampling the basis $\ket{0}$ ($\ket{1}$) is $|\bm{a}_1|^2$ $(|\bm{a}_2|^2)$. Recall that the classical bit only permits the deterministic status with $0$ or $1$, while the qubit state is the superposition of the two status $\ket{0}$ and $\ket{1}$.

::: tcolorbox The quantum superposition leads to a distinct power between quantum and classical computation, where the former can accomplish certain tasks with provable advantages. :::

Two-qubit state. The two qubits obey the tensor product rule, i.e.,

$$\left[\begin{matrix} \bm{x}_1 \\ \bm{x}_2 \end{matrix} \right] \otimes \left[\begin{matrix} \bm{y}_1 \\ \bm{y}_2 \end{matrix} \right] = \left[\begin{matrix} \bm{x}_1 \left[\begin{matrix} \bm{y}_1 \\ \bm{y}_2 \end{matrix} \right] \\ \bm{x}_2 \left[\begin{matrix} \bm{y}_1 \\ \bm{y}_2 \end{matrix} \right] \end{matrix} \right] = \left[\begin{matrix} \bm{x}_1 \bm{y}_1 \\ \bm{x}_1 \bm{y}_2 \\ \bm{x}_2 \bm{y}_1 \\ \bm{y}_2 \bm{y}_2 \end{matrix} \right],$$

which differs from the classical bits yielding the Cartesian product rule.

For instance, let the first qubit is $\ket{\bm{a}}$ and the second qubit state be $\ket{\bm{b}}=\bm{b}_1\ket{0}+\bm{b}_2\ket{1}$ with $|\bm{b}_1|^2+|\bm{b}_2|^2=1$. The two-qubit state formed by $\ket{\bm{a}}$ and $\ket{\bm{b}}$ is defined as $$\ket{\bm{a}}\otimes \ket{\bm{b}} = \bm{a}_1\bm{b}_1 \ket{0}\otimes\ket{0} + \bm{a}_1\bm{b}_2 \ket{0}\otimes\ket{1} + \bm{a}_2\bm{b}_1 \ket{1}\otimes\ket{0} + \bm{a}_2\bm{b}_2 \ket{1}\otimes\ket{1} \in \mathbb{C}^4~,$$

where the computational basis follows $\ket{0}\otimes\ket{0}\equiv \left[\begin{smallmatrix} 1 \ 0 \ 0 \ 0 \end{smallmatrix} \right]$, $\ket{0}\otimes\ket{1}\equiv \left[\begin{smallmatrix} 0 \ 1 \ 0 \ 0 \end{smallmatrix} \right]$ , $\ket{1}\otimes\ket{0}\equiv \left[\begin{smallmatrix} 0 \ 0 \ 1 \ 0 \end{smallmatrix} \right]$ , $\ket{1}\otimes\ket{1}\equiv \left[\begin{smallmatrix} 0 \ 0 \ 0 \ 1 \end{smallmatrix} \right]$, and the coefficients satisfy $\sum_{i=1}^2\sum_{j=1}^2 |\bm{a}_i\bm{b}_j|^2 = 1$.

For ease of notations, the state $\ket{\bm{a}}\otimes \ket{\bm{b}}$ can be simplified as $\ket{\bm{a}\bm{b}}$, $\ket{\bm{a},\bm{b}}$, or $\ket{\bm{a}}\ket{\bm{b}}$. We will interchangeably use these notations throughout the tutorial.

A typical example of a two-qubit state is the Bell state, which represents a maximally entangled quantum state of two qubits. There are four types of Bell states, expressed as: $$\begin{aligned} \ket{\phi^+} &= \frac{1}{\sqrt{2}} \left( \ket{00} + \ket{11} \right), \nonumber \ \ket{\phi^-} &= \frac{1}{\sqrt{2}} \left( \ket{00} - \ket{11} \right), \nonumber \ \ket{\psi^+} &= \frac{1}{\sqrt{2}} \left( \ket{01} + \ket{10} \right), \nonumber \ \ket{\psi^-} &= \frac{1}{\sqrt{2}} \left( \ket{01} - \ket{10} \right). \end{aligned}$$ Each Bell state is a superposition of two computational basis states in the four-dimensional Hilbert space.

Multi-qubit state. We now generalize the above two-qubit case to the $N$-qubit case with $N>2$. In particular, an $N$-qubit state $\ket{\psi}$ is a $2^N$-dimensional vector with $$\ket{\psi} = \sum_{i=1}^{2^N} \bm{c}_i \ket{i}\in \mathbb{C}^{2^N} ~,$$

where the coefficients satisfy the normalization constraint $\sum_{i=1}^{2^N} |\bm{c}_i|^2 = 1$ and the symbol ‘$i$’ of the computational basis $\ket{i}$ refers to a bit-string with $i\in {0, 1}^{N}$. As with the single-qubit case, the physical interpretation of coefficients ${\bm{c}_i}$ is probability amplitudes, where the probability to sample the bit-string ‘$i$’ is $|\bm{c}_i|^2$. When the number of nonzero entries in $\bm{c}$ is larger than one, which implies that different bit-strings are coexisting coherently, the state $\ket{\psi}$ is called in superposition.

In quantum computing, a basis state $\ket{i}$ refers to a computational basis state in the Hilbert space of a quantum system. For an $N$-qubit system, the computational basis states are represented as $\ket{i}\in {\ket{0\cdots 0},\ket{0\cdots 1},…,\ket{1\cdots 1}}$, where $i$ is the binary representation of the state index. These states form an orthonormal basis of the $2^N$-dimensional Hilbert space, satisfying $$\braket{i|j}=\delta_{ij}, \forall i,j\in [2^N].$$ These basis states are fundamental for representing and analyzing quantum states, as any arbitrary quantum state can be expressed as a linear combination of these basis states.

Moreover, the size of $\bm{c}$ exponentially scales with the number of qubits $N$, attributed to the tensor product rule. This exponential dependence is an indispensable factor to achieve quantum supremacy, since it is extremely expensive and even intractable to record all information of $\bm{c}$ by classical devices for the modest number of qubits, e.g., $N>100$.

Entangled multi-qubit state. A fundamental phenomenon in multi-qubit quantum systems is entanglement, which represents a non-classical correlation between quantum systems that cannot be explained by classical physics. As proved in Ref. [@jozsa2003role], quantum entanglement is an indispensable component to offer an exponential speed-up over classical computation. A representative example is Shor’s algorithm, which utilizes entanglement to attain an exponential speed-up over any classical factoring algorithm. In an entangled quantum state, the state of one qubit cannot be fully described independently of the other qubits, even if they are spatially separated. The formal definition of entanglement for states in Dirac notation is as follows:

Definition of Entanglement for States in Dirac Notation: A quantum state $\ket{\psi}\in \mathbb{C}^{2^N}$ is entangled if it cannot be expressed as the tensor product of states of its subsystems $A$ and $B$: $$\ket{\psi} \neq \ket{\psi_a} \otimes \ket{\psi_b}, \quad \forall \ket{\psi_a} \in \mathbb{C}^{2^{N_A}}, \ket{\psi_b} \in \mathbb{C}^{2^{N_B}}, N_A+N_B=N.$$ If the state can be expressed in this form, it is referred to as seperable.

A typical example of an entangled $N$-qubit state is the Greenberger-Horne-Zeilinger (GHZ) state, which is a generalization of the two-qubit Bell state to a maximally entangled $N$-qubit state. The general form of an $N$-qubit GHZ state is:

$$\ket{\text{GHZ}_N} = \frac{1}{\sqrt{2}} \left( \ket{0}^{\otimes N} + \ket{1}^{\otimes N} \right).$$

For $N = 3$, the GHZ state is:

$$\ket{\text{GHZ}_3} = \frac{1}{\sqrt{2}} \left( \ket{000} + \ket{111} \right).$$

A key property of the entangled states (e.g., Bell states and GHZ states) is that measuring one qubit determines the outcome of measuring the other qubit, reflecting their strong quantum correlation.

Density matrix

Another description of quantum states is through density matrix or density operators. The reason for establishing density operators instead of Dirac notations arises from the imperfection of physical systems. Specifically, Dirac notations are used to describe ‘ideal’ quantum states (i.e., pure states), where the operated qubits are isolated from the environment. Alternatively, when the operated qubits interact with the environment unavoidably, the density operators are employed to describe the behavior of quantum states living in this open system. As such, density operators describe more general quantum states.

Mathematically, an $N$-qubit density operator, denoted by $\rho\in\mathbb{C}^{2^N\times 2^N}$, presents a mixture of $m$ quantum pure states $\ket{\psi_i}\in\mathbb{C}^{2^N}$ with probability $p_i\in [0,1]$ and $\sum_{i=1}^m p_i =1$, i.e., $$\rho = \sum_{i=1}^m p_i\rho_i~,$$ where $\rho_i =\ket{\psi_i}\bra{\psi_i}\in\mathbb{C}^{2^N\times 2^N}$ is the outer product of the pure state $\ket{\psi_i}$. The outer product of two vectors $\ket{u},\ket{v}\in\mathbb{C}^n$ is expressed as $$\ket{u}\bra{v} = \begin{bmatrix} u_1 \ u_2 \ \vdots \ u_n \end{bmatrix} \begin{bmatrix} v_1^* & v_2^* & \cdots & v_n^* \end{bmatrix} = \begin{bmatrix} u_1 v_1^* & u_1 v_2^* & \cdots & u_1 v_n^* \ u_2 v_1^* & u_2 v_2^* & \cdots & u_2 v_n^* \ \vdots & \vdots & \ddots & \vdots \ u_n v_1^* & u_n v_2^* & \cdots & u_n v_n^* \end{bmatrix},$$ where $u_i$ and $v_i^*$ are the element of $\ket{u}$ and the conjugate transpose $\bra{v}$, respectively.

From the perspective of computer science, the density operator $\rho$ is just a positive semi-definite matrix with trace-preserving, i.e., $\bm{0}\preceq \rho$ and $\text{Tr}(\rho)=1$.

Definition of Positive semi-definite matrix: A matrix $A \in \mathbb{C}^{n \times n}$ is positive semi-definite (PSD) if it satisfies the following conditions:

  1. $A$ is Hermitian: $A = A^\dagger$

  2. For any nonzero vector $\ket{v} \in \mathbb{C}^n$, $\bra{v}A\ket{v} \geq 0$, where $\bra{v}A\ket{v}$ represents the quadratic form of $A$ with respect to $\ket{v}$.

When $m=1$, the density operator $\rho$ amounts to a pure state with $\rho = \ket{\psi_1}\bra{\psi_1}$. When $m>1$, the density operator $\rho$ describes a ‘mixed’ quantum state, where the rank of $\rho$ is larger than $1$. A simple criterion to discriminate the pure states with the mixed states is as follows: the pure state with $m=1$ yields $\text{Tr}(\rho^n)=\text{Tr}(\rho)=1$ for any $n>0$; the mixed state with $m>1$ satisfies $\text{Tr}(\rho^n)<\text{Tr}(\rho)=1$ for any $n\in \mathbb{N}_+\setminus {1}$. Similar to the Definition for entanglement of pure states, we can define the entanglement of mixed states.

Definition of Entanglement for Mixed States: Let $\rho$ be a density operator acting on a composite Hilbert space $\mathcal{H}_A \otimes \mathcal{H}_B$. The state $\rho$ is said to be entangled if it cannot be expressed as:

$$\rho = \sum_{i} p_i , \rho_A^{(i)} \otimes \rho_B^{(i)},$$

where $p_i \geq 0$, $\sum_{i} p_i = 1$, and $\rho_A^{(i)}$ and $\rho_B^{(i)}$ are density operators on $\mathcal{H}_A$ and $\mathcal{H}_B$, respectively. If $\rho$ can be written in this form, it is called separable.

Example of Density matrix representations.
(i). Consider the single-qubit pure state $\ket{\psi} = \frac{1}{\sqrt{2}}(\ket{0} + \ket{1})$. The corresponding density operator is: $$\rho = \ket{\psi}\bra{\psi} = \frac{1}{2} \begin{bmatrix} 1 & 1 \ 1 & 1 \end{bmatrix}.$$ Here, $\text{Tr}(\rho^2) = \text{Tr}(\rho) = 1$, confirming that it is a pure state.
(ii). Consider the classical probabilistic mixture of $\ket{0}$ and $\ket{1}$, each with equal probability $p = 0.5$. The density operator is: $$\rho = 0.5 \ket{0}\bra{0} + 0.5 \ket{1}\bra{1} = \frac{1}{2} \begin{bmatrix} 1 & 0 \ 0 & 1 \end{bmatrix}.$$ In this case, $\text{Tr}(\rho^2) = 0.5 < \text{Tr}(\rho) = 1$, indicating it is a mixed state.