# Using Coalton to Implement a Quantum Compiler

## Introduction: Coalton and the quilc compiler

Quilc is a state-of-the-art optimizing compiler for quantum computers written in Common Lisp. It is capable of taking arbitrary quantum programs written in Quil, and compiling and optimizing them into code that conforms to the majority of quantum computing architectures that exist today.

Quilc and its related tooling are around 50,000 lines of code, and though it has good test coverage, it falls trap to problems that frequently show up when using dynamically typed programming languagues, two of which are

1. type errors showing up at runtime, usually a result from the code following a less probable control path, and
2. not having certain useful abstractions that can only be enabled by having a static type system (e.g., certain kinds of polymorphism).

Coalton, being a strictly typed language within Common Lisp, addresses these two problems in principle. Since it’s not practical to rewrite an entire compiler, we opted to implement a significant new feature of quilc in Coalton called discrete compilation. In this post, we’ll walk through what discrete compilation is, how Coalton made it simpler to implement (compared to Common Lisp), and how we tested that such a complicated feature actually works.

## Towards a discrete set of operations for quantum computation

A typical quantum program is comprised of a sequence of operations (sometimes called instructions, operators, gates, or matrices) that can be expressed mathematically as square matrices of complex numbers. The matrices are unitary—which means the matrices can never stretch or shrink a vector they multiply onto—and they have to have a power-of-two size. Just like classical computers, quantum computers have a set of operations they can natively perform. Quantum computers typically have only a small handful. For example, the set of native operations of an IBM quantum computer is:

\begin{aligned} \mathrm{RX}_{\pi/2} &:= \begin{pmatrix} \frac{1}{\sqrt{2}} & i\frac{1}{\sqrt{2}} \\ i\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{pmatrix}\\ \mathrm{RX}_{-\pi/2} &:= \begin{pmatrix} \frac{1}{\sqrt{2}} & -i\frac{1}{\sqrt{2}} \\ -i\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{pmatrix}\\ \mathrm{RZ}_{\theta} &:= \begin{pmatrix} \cos\frac{\theta}{2}-i\sin\frac{\theta}{2} & 0 \\ 0 & \cos\frac{\theta}{2}+i\sin\frac{\theta}{2} \end{pmatrix}\\ \mathrm{CNOT} &:= \begin{pmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 1\\ 0 & 0 & 1 & 0 \end{pmatrix} \end{aligned}

In this case, $\theta$ is actually parametric and can be any value between $0$ and $2\pi$.

It is a surprising fact that you can build any $2^n\times 2^n$-size complex unitary matrix out of the above matrices by way of matrix multiplications and Kronecker products. This means the operation set is computationally universal, not unlike the concept of universality of Boolean logic in ordinary computing. For example, consider the following unitary matrix:

$$M := \begin{pmatrix} 1 & 0 & 0 & 0\\ 0 & \frac{\sqrt{3}}{2} & \frac{i}{2} & 0\\ 0 & \frac{i}{2} & \frac{\sqrt{3}}{2} & 0\\ 0 & 0 & 0 & 1 \end{pmatrix}.$$

This matrix $M$ can be written1 as the following product:

\begin{aligned} m^{\prime} &:= (\mathrm{RZ}_{\pi/2}\cdot\mathrm{RX}_{\pi/2}\cdot\mathrm{RZ}_{-5\pi/4})\otimes \mathrm{RZ}_{-\pi/4} \\ m^{\prime\prime} &:= (\mathrm{RX}_{-\pi/2}\cdot\mathrm{RZ}_{\pi/6}\cdot\mathrm{RX}_{\pi/2})\otimes(\mathrm{RX}_{-\pi/2}\cdot\mathrm{RZ}_{\pi/6}\cdot\mathrm{RX}_{\pi/2})\\ m^{\prime\prime\prime} &:= (\mathrm{RZ}_{\pi/4}\cdot\mathrm{RX}_{\pi/2}\cdot\mathrm{RZ}_{\pi/2})\otimes\mathrm{RZ}_{\pi/4}\\ M &\hphantom{:}= m^{\prime\prime\prime}\cdot \mathrm{CNOT} \cdot m^{\prime\prime} \cdot \mathrm{CNOT} \cdot m^{\prime} \end{aligned}

The intermediate $m$’s are just there for abbreviation. Notice how only the aforementioned list of native operations are used, and how they’re combined using multiplication (syntactically: $A\cdot B$) and Kronecker products (syntactically: $A\otimes B$). While it would be appealing to describe Kronecker products in more detail in this blog post, for the sake of brevity, we’ll just consider them a separate kind of matrix multiplication.

One of the primary tasks of a quantum compiler is to break down arbitrary unitary matrices, usually given by the quantum programmer, into a sequence of native ones. Using conventions from the quantum computing field, the above matrix is usually specified as $\mathrm{XY}(\pi/3)$. Quilc can recover the decomposition of this matrix like so:

$echo 'XY(pi/3) 1 0' | ./quilc --isa ibmqx5 RZ(-5*pi/4) 1 RX(pi/2) 1 RZ(pi/2) 1 RZ(-pi/4) 0 CNOT 1 0 RX(pi/2) 1 RZ(pi/6) 1 RX(-pi/2) 1 RX(pi/2) 0 RZ(pi/6) 0 RX(-pi/2) 0 CNOT 1 0 RZ(pi/4) 0 RZ(pi/2) 1 RX(pi/2) 1 RZ(pi/4) 1  Using software engineering terminology, we say we’ve compiled the XY(pi/3) matrix into the native operations of an IBM quantum computer. In quilc’s output, the numbers 0 and 1 denote how to do the Kronecker products. For example, the syntax A 1 B 0  represents the Kronecker product$A\otimes B$. Different quantum computers each have a different set of native operations, so quilc must be a retargetable compiler. This mathematical puzzle is interesting and already quite difficult, but lurking is also an engineering problem of great concern. Almost every quantum computer in use today has some sort of continuous operation, possibly many, like the$\mathrm{RZ}_\theta$above. These continuous operations represent the analog nature of these quantum computers. Analog devices have their merits, but one thing analog hardware usually isn’t good at is extreme precision. While I might request the quantum computer perform an$\mathrm{RZ}_{0.12345}$, due to the computer’s physical nature, it might only accomplish something between an$\mathrm{RZ}_{0.11}$and an$\mathrm{RZ}_{0.13}$. Quantum hardware engineers around the world, every day, are putting effort into improving the precision of the available native operations, but it’ll never be to feasible have infinite precision, simply due to physical limitations. In practice, we will always have some amount of noise. Can there instead be some set of discrete native operations while still being able perform any quantum computation we’d like? And if we have such a set, will it be easy and efficient to compile a given matrix? These two questions represent the problem of discrete compilation of quantum programs. Fortunately, the answer to both questions is a resounding yes, with a small but reasonable caveat: It is not possible to find an exact sequence of native operations to reconstruct a given matrix. Instead, we can only get arbitrarily close, at the expense of running more native operations2. There is a 20+-year history of many different algorithms for doing discrete compilation with their own advantages and disadvantages3. We will focus on one by Neil Ross and Peter Selinger. Ross and Selinger proved that if we use a specific native operation set, we can accomplish nearly optimal-length decompositions4 of one-qubit matrices. From there, we can use other results to decompose matrices of any number of qubits. In the next section, we give an overview of the Ross-Selinger algorithm. It is somewhat mathematically intensive, so for readers uninterested in those details, we recommend skipping. (We duly recap the main takeaways when we return to discussing Coalton.) ### An approach to discrete compilation by Ross and Selinger In order to have a set of discrete operations, we must be able to discretize the parametric operation$\mathrm{RZ}_\theta$, which is a$2\times2$matrix with entries depending on$\theta. Ross and Selinger consider the following native operations: \begin{aligned} \mathrm{H} &:= \begin{pmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \end{pmatrix}\\ \mathrm{S} &:= \begin{pmatrix} 1 & 0 \\ 0 & i \end{pmatrix}\\ \mathrm{T} &:= \begin{pmatrix} 1 & 0 \\ 0 & \frac{1}{\sqrt{2}}+i\frac{1}{\sqrt{2}} \end{pmatrix} \end{aligned} This is called the Clifford+T set. These operations have mathematical significance because 1. arbitrary combinations of\mathrm{H}$and$\mathrm{S}$form a special algebraic space called the one-qubit Clifford group, 2.$\mathrm{S}$equals$\mathrm{T}^2$, and 3. arbitrary products of these operations form a dense set of the unitary matrices. The third point means to say is that this set of operations could be used to approximate any$2\times 2$unitary matrix to an arbitrary precision, though Ross and Selinger will need to devise an algorithm to do it. Next, Ross and Selinger turn to a result by Kliuchnikov, Maslov, and Mosca which says a given$2\times2$matrix can be written precisely as a product of Clifford+T elements if and only if the matrix elements are all members of the number ring$R := \mathbb{Z}[\frac{1}{\sqrt 2}, i]$. So Ross and Selinger set up the following goal: Try to write the problematic parametric operation$\mathrm{RZ}_\theta$as a matrix $$\begin{pmatrix} a & -b^* \\ b & a^* \end{pmatrix}$$ with user-selectable precision, where$a$and$b$are elements of$R$, and$z^*$represents the complex conjugate of$z$. If we can write this matrix, then we can use Kliuchnikov-Maslov-Mosca to write it in terms of Clifford+T. And if we can do that, then we can write any program with parametric$\mathrm{RZ}_\theta$operations into an equivalent one (up to user-specified precision, at least) using only discrete operations. Ross and Selinger succeed at solving this problem, by turning that matrix problem into a Diophantine equation that has to be solved over a specific number ring, and coming up with an algorithm to solve that. Since the two-qubit operations of usual interest are already discrete (like CNOT, CZ, etc.), this would make a fully discretized native operation set, so long as quantum computer implementers could supply the Clifford+T set as native operations. Already, even without the gory details of how to find the approximating matrix, we see we’re in for quite a ride. There’s a lot of machinery we’ll need, but one piece that sticks out is the need to do arithmetic over the ring$\mathbb{Z}[\frac{1}{\sqrt{2}},i]$. If we let$\omega:=(1+i)/\sqrt{2}$, then with a bit of exercise, we can see that elements of$\mathbb{Z}[\frac{1}{\sqrt{2}},i]$all take the following canonical form: $$\frac{a_0}{2^{n_0}}+\frac{a_1}{2^{n_1}}\omega+\frac{a_2}{2^{n_2}}\omega^2+\frac{a_3}{2^{n_3}}\omega^3,$$ where$a_\bullet$are integers and$n_\bullet$are non-negative integers. If we take two elements of this form and add or multiply them, we’ll always end back up in the same ring. For the Ross-Selinger algorithm, it turns out we also need to work in other rings, like the cyclotomic integers of degree 8, the quadratic integers of$\sqrt{2}$, and about a half-dozen others. ## Coalton’s strength in implementing math One of the implementation difficulties of the Ross-Selinger algorithm is being able to work with a bunch of different-but-interoperable5 number types. How do we implement these mathematical objects in a program? At least in principle, Common Lisp would have no trouble representing elements of any of these rings; just define some new classes, perhaps some new generic functions like ring+ and ring*, and you’re off to the races. The trouble is that it’s cumbersome. In Lisp, first, there’s no way to integrate with the existing numerical operators; there is no way to “overload” the standard operator cl:+ to work with different rings. Second, as explained in a previous blog post, there’s no way to uniformly treat additive and multiplicative identity in a convenient fashion. Third, it gets very messy, with lots of casts, upconversions, downconversions, etc. It’s very difficult to build a new numerical tower atop of or aside Common Lisp’s existing one, though Common Lisp’s multiple-dispatch mechanism at least eases the pain a bit. These difficulties presented a second opportunity for Coalton. Coalton’s builds its fundamental abstractions from a different starting point, making this kind of mathematics easier and safer to express. As such, we used this as a testing ground to implement new functionality of quilc in Coalton. Coalton has a system for ad hoc polymorphism called type classes. Type classes allow one to extend existing functionality of a program, like that of the + or * operators, in a composable and statically typed manner. For example, this ring of algebraic numbers $$\mathbb{Z}[\sqrt{2}] = \left\{a_1 + a_2\sqrt{2} : a_1,a_2\in \mathbb{Z}\right\}$$ can be modeled as pairs of integers$[a_1; a_2]that obey certain laws. We can derive one such law, a multiplication law, with a little bit of algebra: \begin{aligned} [a_1; a_2]\cdot [b_1; b_2] &= (a_1 + a_2\sqrt{2})(b_1 + b_2\sqrt{2})\\ &= a_1b_1 + a_1(b_2\sqrt{2})+ (a_2\sqrt{2})b_1+(a_2\sqrt{2})(b_2\sqrt{2}) \\ &= (a_1b_1 + 2a_2b_2) + (a_1b_2+a_2b_1)\sqrt{2}\\ &= [a_1b_1+2a_2b_2; a_1b_2+a_2b_1]. \end{aligned} These algebraic numbers can be implemented as a new type in Coalton, which we’ll call Alg.  (define-type Alg "Represents the algebraic number a1 + a2 * sqrt(2)." (Alg Integer Integer))  The Alg type can implement the Eq type class (which demands we implement equality) and the Num type class (which demands we implement addition, subtraction, multiplication, and some way to convert an integer into our new type).  (define-instance (Eq Alg) (define (== a b) (match (Tuple a b) ((Tuple (Alg a1 a2) (Alg b1 b2)) (and (== a1 b1) (== a2 b2)))))) (define-instance (Num Alg) (define (+ a b) (match (Tuple a b) ((Tuple (Alg a1 a2) (Alg b1 b2)) (Alg (+ a1 b1) (+ a2 b2))))) (define (- a b) (match (Tuple a b) ((Tuple (Alg a1 a2) (Alg b1 b2)) (Alg (- a1 b1) (- a2 b2))))) (define (* a b) (match (Tuple a b) ((Tuple (Alg a1 a2) (Alg b1 b2)) (Alg (+ (* a1 b1) (* 2 (* a2 b2))) (+ (* a1 b2) (* a2 b1)))))) (define (fromInt n) (Alg n 0)))  We can verify it works by seeing that(1-2\sqrt{2})^2 = 9-4\sqrt{2}$: > (coalton (* (Alg 1 -2) (Alg 1 -2))) #.(ALG 9 -4)  How would we compute$2(-\sqrt{2})$? One way is to write$2$explicitly as$2+0\sqrt{2}$, and$-\sqrt{2}$explicitly as$0-1\sqrt{2}$, like so: > (coalton (* (Alg 2 0) (Alg 0 -1))) #.(ALG 0 -2)  Coalton however provides us a shorthand for Num-typed values. When Coalton sees a plain integer like 2 in the source syntax, it will actually interpret it as (fromInt 2). One of two things are possible: 1. From type inference, Coalton concludes that (fromInt 2) should result in a value of a specific type, such as Alg. In this case, it will use (Num Alg)’s fromInt method. 2. Otherwise, Coalton concludes that the type of (fromInt 2) is ambiguous—we just know it must be a Num type and nothing else—in which case it assumes it is an Integer. Since Alg implements fromInt, the integer syntax$n$in a place where Alg is expected will automatically be interpreted as$n+0\sqrt{2}$. So we can rewrite the previous prompt more simply as: > (coalton (* 2 (Alg 0 -1))) #.(ALG 0 -2)  As is to be expected at this point, functions on Alg are inferred appropriately. For example, consider this function which conjugates a number, i.e., flips the sign of the$\sqrt{2}$part: (define (algebraic-conjugate x) (match x ((Alg a b) (Alg a (negate b)))))  We can see that Coalton has properly inferred the type: > (type-of 'algebraic-conjugate) (ALG → ALG)  Modeling these algebras works out quite well, especially when we have more of them, matrices of them, vectors of them, polynomials of them, and so on. They compose well. We can even make our own Ring class so that we can write code that is generic on any ring. (define-class (Ring :t) (add-id (:t)) ; additive identity (mul-id (:t)) ; multiplicative identity (ring-+ (:t -> :t -> :t)) (ring-* (:t -> :t -> :t)))  One can imagine how we could do the same for vector spaces, inner product spaces, and so on. These concepts aren’t just theoretical benefits, they’re actually routinely used in practice. We’ve implemented one number system, but we can endow Coalton further with an understanding of how to convert out of this representation. As a simple example, we can instruct Coalton on how to convert, when possible, an Alg into a Double-Float: (define-instance (TryInto Alg Double-Float) (define (tryInto x) (match x ((Alg a b) ;; An Integer may fail to convert to a Double-Float (match (Tuple (tryInto a) (tryInto b)) ((Tuple (Ok a) (Ok b)) (Ok (+ a (* b (math:sqrt 2.0d0))))) (_ (Err "Can't represent an Alg as a Double-Float")))))))  Whenever we need a double-float, we can safely get one by calling tryInto on our Alg object, and matching the Ok case if it succeeded, or the Err case if it failed. In this first example, we successfully convert$1+2\sqrt{2}$into a float$3.828\ldots$without otherwise being explicit about the conversion. (It figured it out since the other branch returns a Double-Float-related value, and both branches must have types that unify.) COALTON-USER> (coalton (match (tryInto (Alg 1 2)) ((Ok f) f) ((Err _) (the Double-Float math:nan)))) 3.8284271247461903d0  Contrast with this example, where$1+10^{1000}\sqrt{2}$has no cogent representation as a floating-point number, so the code instead returns a “not a number” value. COALTON-USER> (coalton (match (tryInto (Alg 1 (^ 10 1000))) ((Ok f) f) ((Err _) (the Double-Float math:nan)))) #<DOUBLE-FLOAT quiet NaN>  As we can see, Coalton makes working within and between arithmetic systems convenient, explicit, and safe. ## Discrete compilation in quilc Though perhaps not as effective as mathematics itself, Coalton’s facilities handle the Ross-Selinger Clifford+T decomposition algorithm handsomely. This lead to the introduction of a first-of-its-kind feature to a general-purpose quantum compiler: the ability to do discrete compilation of arbitrary quantum programs. We are happy to introduce full-featured discrete compilation into quilc. With quilc, it is now possible to target backends that support the Clifford+T set. There is nothing special the programmer has to do to enable this feature; just build a quilc ISA that only has Clifford+T, and it’ll just work. Discrete compilation is implemented as a part of the cl-quil/discrete contrib module to quilc. If you’re a Lisp developer, all you need to do is load the cl-quil/discrete system, and it will Just Work. We can repeat the compilation of the$M$operation from before instead for a made-up chip of two qubits that only supports the Clifford+T set. This chip is called 4Q-cliffordt and is in essence a virtual “test chip”. (The output below is reformatted into columns to fit the page.) $ echo 'PRAGMA TOLERANCE "0.001"; XY(pi/3) 1 0' | ./quilc --isa 4Q-cliffordt
T 0       H 0       T 0       H 0       H 1       H 1       H 1       S 1
S 0       T 0       S 0       T 0       T 1       T 1       T 1       H 1
S 0       H 0       H 0       H 0       H 1       H 1       H 1       T 1
S 0       T 0       T 0       T 0       T 1       T 1       T 1       H 1
H 0       S 0       S 0       S 0       S 1       S 1       S 1       T 1
T 1       H 0       H 0       H 0       H 1       H 1       H 1       H 1
CNOT 0 1  T 0       T 0       T 0       T 1       T 1       T 1       T 1
S 0       H 0       S 0       S 0       S 1       S 1       H 1       S 1
H 0       T 0       H 0       H 0       H 1       H 1       T 1       H 1
S 0       S 0       T 0       T 0       T 1       T 1       H 1       T 1
H 0       H 0       H 0       S 0       S 1       S 1       T 1       S 1
T 0       T 0       T 0       H 0       H 1       H 1       S 1       H 1
H 0       H 0       H 0       T 0       T 1       T 1       H 1       T 1
T 0       T 0       T 0       H 0       S 1       S 1       T 1       S 1
S 0       S 0       S 0       T 0       H 1       H 1       S 1       H 1
H 0       H 0       H 0       H 0       T 1       T 1       H 1       T 1
T 0       T 0       T 0       T 0       S 1       S 1       T 1       H 1
S 0       S 0       H 0       S 0       H 1       H 1       H 1       S 1
H 0       H 0       T 0       H 0       T 1       T 1       T 1       S 1
T 0       T 0       H 0       T 0       H 1       S 1       S 1       S 1
S 0       S 0       T 0       S 0       T 1       H 1       H 1       H 1
H 0       H 0       S 0       H 0       S 1       T 1       T 1       S 1
T 0       T 0       H 0       T 0       H 1       S 1       S 1       S 1
S 0       S 0       T 0       S 0       T 1       H 1       H 1       S 1
H 0       H 0       S 0       H 0       S 1       T 1       T 1       CNOT 0 1
T 0       T 0       H 0       T 0       H 1       S 1       H 1       H 0
S 0       S 0       T 0       H 0       T 1       H 1       T 1       T 0
H 0       H 0       H 0       S 0       H 1       T 1       H 1       T 1
T 0       T 0       T 0       S 0       T 1       S 1       T 1       S 1
H 0       S 0       S 0       S 0       H 1       H 1       H 1       S 1
T 0       H 0       H 0       H 0       T 1       T 1       T 1       S 1
S 0       T 0       T 0       S 0       S 1       S 1       S 1
H 0       S 0       S 0       S 0       H 1       H 1       H 1
T 0       H 0       H 0       S 0       T 1       T 1       T 1
S 0       T 0       T 0       S 1       H 1       S 1       S 1
H 0       S 0       H 0       H 1       T 1       H 1       H 1
T 0       H 0       T 0       S 1       S 1       T 1       T 1


Notice how every operation in the output is either H, S, T, or CNOT. But also notice how there are a lot of operations—290 of them to be exact. (To recount, there were 16 operations for the compilation before, but again, that had to use continuous operations.) There are only two differences in how we run the compiler between this and the previous IBM-chip compilation:

1. We had to specify the chip 4Q-cliffordt. (Quilc will be happy to construct a chip of any requested shape, size, and native operations; 4Q-cliffordt is just a convenient built-in for testing the discrete compilation machinery.)
2. We had to specify a tolerance, here a discrete compilation accuracy within 0.1% for each approximation made.

If we request more precision, we notice an increase in operation count.