An Image Classification use-case by Quantum Transfer Learning
Andi Sama — CIO, Sinergi Wahana Gemilang and Cahyati S. Sangaji with Agung Trisetyarso, PhD
*** Supporting files (Python Notebook, images) for this article are available in github.
*** Articles on General Introduction to Quantum Computing are available in "Meneropong Masa Depan: Quantum Computing" (Indonesian), "The Race in Achieving Quantum Supremacy & Quantum Advantage" (English), "Hello Many Worlds in Quantum Computer" (English), "Quantum Teleportation" (English), and "Quantum Random Number Generator"(English).
#IBM #IBMCloud #QuantumExperience #Qiskit #QuantumSimulator #HybridMachineLearning #QML #QuantumMachineLearning #Google #Cirq #Xanadu #Pennylane #IBMPOWER #IBMPOWER-AC922
Quantum Machine Learning has the big potential to solve computationally intensive problems in Machine Learning that are not practical to do with Classical Computer. One approach is to combine Quantum Computing with Machine Learning, creating a Hybrid Classical-Quantum Machine Learning model. Image Classification using Machine Learning has been around for years; however, the idea of combining the process of modeling by introducing Quantum Computer to the process is still relatively new.
The Hybrid approach is applied to an Image Classification task with pre-trained ResNet-18 neural network architecture for classifying either face with a mask or face with no mask. The last layer of ResNet-18 is replaced with a quantum layer through transfer learning, which is then retrained on local quantum simulators. The dataset has 410 images (randomly chosen from Google Image Search) in the 50%:50% proportion between faces with masks and faces with no masks, with 240:50:120 images distribution for training:validation:test.
We demonstrate the transfer learning approach based on Pennylane’s Quantum Machine Learning framework, a work in progress regarding this article's publication. The framework acts as an integrator to various backend quantum simulators and real quantum computers through their respective application programming interfaces with achieved training accuracies on Quantum Simulators (at 30th epochs): 97.0833% (Qiskit, IBM), 97.5000% (Cirq, Google), and 97.5000% (Pennylane’s own interface) among others. Excluding Q# from Microsoft that is said to be supported but could not be tested. Results are meant to show that Hybrid Classical-Quantum Machine Learning is possible through transfer learning, not trying to achieve a State of The Art (SOTA) by pushing the algorithms to achieve the best training accuracy.
At this time, it is seen as impractical to train on a cloud-based real quantum computer or even a cloud-based quantum simulator. The hundreds of calls per iteration (each with 1024 shots, for example) to cloud-based backend quantum computers (or simulators) can easily span to weeks of training for multiple epochs (considering delays in the queue for each call).
“When we talk about quantum computers, we usually mean fault-tolerant devices.” (James Wootton, 2018). James continues, “We’ll know that devices can do things that classical computers can’t, but they won’t be big enough to provide fault-tolerant implementations of the algorithms we know about. John Preskill coined the term NISQ to describe this era. Noisy because we don’t have enough qubits to spare for error correction, and so we’ll need to directly use the imperfect qubits at the physical layer. And ‘Intermediate-Scale’ because of their small (but not too small) qubit number.”
Quantum Machine Learning (QML) may still be a topic for early discussion among data scientists, as the development of scalable Quantum Computer (QC) itself is still progressing.
In this article, we discuss Hybrid Classical-Quantum Machine Learning through Transfer Learning. This is achieved by modifying the hyperparameters on the last layer, “quantum layer,” of a quantum computer's pre-trained neural network. The pre-trained neural network itself has been previously trained by a Classical Computer (CC).
For the real quantum computer, we use NISQ Quantum Computer on IBM Quantum Computing Experience on the cloud: IBM Quantum Computer. NISQ is the term that refers to Noisy Intermediate-Scale Quantum (John Preskill, 2018). NISQ is seen as the transition towards the future Fault-Tolerant Quantum Computer when better hardware is available.
1. The Experiments — Hybrid Classical-Quantum Machine Learning
The illustration above shows briefly how we do Hybrid Classical-Quantum Machine Learning on an Image Classification task. The following 2 tables and 4 illustrations (executed in July 2020) show the experiments using Classical Computer combined with Quantum Computer Simulators (on the local classical computer).
While there has been an attempt to experiment using a real quantum computer on the cloud, it is not practical to complete the experiment with the IBM Quantum Computing Experience's current access profile. Based on initial experiments with the quantum simulator on the cloud: IBM 32-qubits Quantum Simulator (with significantly less delay in the queue), the estimated time for each epoch for experimenting with a real quantum computer on the cloud (with much more delays in the queue for each job execution) is estimated to be up to a week — practically may reach a few weeks for just doing the first three epochs.
Experiments are done using an x86 laptop, representing a classical computer. The configuration includes x86 laptop + quantum simulators on a local classical computer(through Google Cirq, IBM qiskit, and Xanadu Pennylane quantum simulators). x86 laptop is a Lenovo T480-series, Intel-based i7 CPU, 16GB RAM (without GPU), OS: Windows version 10 with Anaconda.
A. Dataset
We choose to do Image Classification by doing transfer learning based on a pre-trained ResNet-18 model (trained on ImageNet dataset, classical computer). Transfer learning is done by using all the pre-trained hyper-parameters and only retrain the last layer of ResNet-18 (fully connected layer) that has been replaced by a quantum layer.
The new dataset consists of two classes: a person with a mask and a person with no mask. A total of 410 images (all classes) are acquired through a random Google image search. Downloaded images are split into three subsets for training, validation, and test, in the proportion of 240:50:120 images (roughly at 58%:12%:30%).
While 240 images for training do not seem enough, we see this is still acceptable as we apply transfer learning rather than training the model from scratch.
B. The Experiment — (x86 laptop + Local Quantum Simulators)
In this experiment, we run 30 epochs each for different backend local quantum simulators. Local quantum simulators (Cirq by Google, qiskit by IBM, and Pennylane) run on a local classical computer, accessed through its respective APIs within the Pennylane quantum machine learning framework.
Results of training accuracies and training losses for the first experiment are summarized in the following table.
As this is a transfer learning (based on a pre-trained model), we observe that we achieve quite a high accuracy in just a few epochs. We only retrain the last layer with a new additional dataset on Image.
Classification task shows how classical-quantum machine learning through transfer learning is possible without achieving the highest accuracy possible.
While the training time varies for all three backend quantum simulators, we observe that all backends achieve 91.2% of training accuracy in just 3 epochs. At the end of epoch 30, training accuracies are 97.5%, 97.1%, and 97.5%, respectively, for Google, IBM, and Pennylane backend quantum simulators.
The illustration below shows the training accuracy plots of all the three backends from epoch 1 to epoch 30.
Then, the training loss plots are shown below.
The following table summarizes one of the first experiment results (at the last epoch, 30), the one with the IBM local quantum simulator.
Then, the following Illustrations show further details on training accuracies and validation accuracies.
C. The Quantum Layer
The 4-qubits quantum circuit executed at each job execution on an IBM Quantum Simulator on the cloud is shown in the below illustrations. The histogram result (for the job id: 5f07fae2bd5ed5001b79ff54) shows the distribution of quantum measurements across all 4-qubits for this part of the experiment (1024-shots).
Note that there are hundreds of job executions within 1 iteration during the quantum layer's training process. It means that there are thousands of job executions (each with 1024 shots) to complete 1 epoch — as in this experiment, we have 31 iterations within 1 epoch.
2. Building a Hybrid Classical-Quantum Machine Learning Model
How do we build a Hybrid Classical-Quantum Machine Learning neural network model?. In general, we get a pre-trained deep neural network model (an Image Classification model, pre-trained previously by a classical computer). Then, we make changes only to the last layer by retraining the last layer's hyper-parameters using a quantum computer (note that a quantum layer has replaced the last layer before retraining).
a. Prepare - Do a Transfer Learning from a Pre-Trained neural network model
- Get a pre-trained deep neural network model that performs a generic task (previously trained with ImageNet dataset.
- Do transfer learning by fixing (making no changes) to all hyper-parameters, except hyper-parameters on the last layer.
- Prepare a new dataset (faces with masks, faces with no masks).
b. Retrain - Update/retrain the last layer with the new dataset to perform a new specific task.
- Retrain the hyper-parameters of the last layer using either a quantum simulator or a quantum computer.
- Generate a modified model.
c. Predict - Predict with the newly generated model.
The following table illustrates the possible combinations between Classical Computer and Quantum Computer in processing datasets and algorithms. CC is a common combination today in which both datasets and algorithms are processed in a classical computer. QQ combination is a long way to go in which both datasets and algorithms are quantum, like processing data in the form of the photon by a quantum computer.
The combination that we discuss here is CQ. A classical computer processes data, then part of the algorithm is processed by a quantum computer. We suggest the readers refer to the previous article (Andi Sama, 2020c).
A. Prepare - Dataset & Pre-Trained Model
We choose to do Image Classification on a new dataset using a pre-trained deep neural network model (Mₚ) based on ResNet-18 neural network architecture (Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, 2015). The model itself was previously trained with the ImageNet dataset (Dₐ) to do Image Classification on generic tasks (Tₐ).
By having this pre-trained model, we do Transfer Learning by making no changes to all hyperparameters (A₁) but the last layer (A₂). The FC-layer (fully connected layer) as the last layer will be retrained using a quantum computer. The resulting model (Mᵣₜ), trained with the new dataset (Dᵦ), will be able to perform a specific task of Image Classification (Tᵦ).
The new dataset to perform this specific task is for doing Image Classification in two classes: a person wearing a mask and a person who is not wearing a mask. A total of 410 images (all two classes) is acquired through a random google image search. Downloaded images are split into three subsets for training, validation, and test, in the proportion of 240:50:120 images (roughly at 58%:12%:30%).
While 240 images for training do not seem enough, this is still acceptable as we apply transfer learning rather than training the model from scratch.
B. Retrain – Hyperparameters of Quantum Layer (Last Layer)
The hyperparameters of the last layer A₂ (FC-layer) will be trained using a quantum computer with a new dataset (Dᵦ). This is done by leaving all hyperparameters of the previous layers (A₁) unchanged (A₁ →A₁) and preparing the FC-layer (trainable layer) to be replaced with the quantum layer and updated (A₂ → A₂’).
See the following for preparing the quantum gates, the quantum layer, and retraining the quantum layer.
The first part of the code shows how to prepare the base quantum gates (Hadamard, Rᵧ) then construct the entangled layer as part of the quantum layer as a replacement of the last layer in ResNet-18 pre-trained neural network architecture.
# 1st - Prepare the Quantum Gates
def H_layer(nqubits):
"""Layer of single-qubit Hadamard gates.
"""
for idx in range(nqubits):
qml.Hadamard(wires=idx)
def RY_layer(w):
"""Layer of parametrized qubit rotations around the y axis.
"""
for idx, element in enumerate(w):
qml.RY(element, wires=idx)
def entangling_layer(nqubits):
"""Layer of CNOTs followed by another shifted layer of CNOT.
"""
# In other words it should apply something like :
# CNOT CNOT CNOT CNOT... CNOT
# CNOT CNOT CNOT... CNOT
for i in range(0, nqubits - 1, 2): # Loop over even indices: i=0,2,...N-2
qml.CNOT(wires=[i, i + 1])
for i in range(1, nqubits - 1, 2): # Loop over odd indices: i=1,3,...N-3
qml.CNOT(wires=[i, i + 1])@qml.qnode(dev, interface="torch")
def quantum_net(q_input_features, q_weights_flat):
"""
The variational quantum circuit.
"""
# Reshape weights
q_weights = q_weights_flat.reshape(q_depth, n_qubits)# Start from state |+> , unbiased w.r.t. |0> and |1>
H_layer(n_qubits)
# Embed features in the quantum node
RY_layer(q_input_features)
# Sequence of trainable variational layers
for k in range(q_depth):
entangling_layer(n_qubits)
RY_layer(q_weights[k])
# Expectation values in the Z basis
exp_vals = [qml.expval(qml.PauliZ(position)) for position in range(n_qubits)]
return tuple(exp_vals)
Expectation values are returned as a result of submitting a job to a backend quantum computer. Next, we prepare a replacement quantum layer (as illustrated below), replace the last layer (fully connected layer) of the pre-trained ResNet-18 neural network architecture with this quantum layer. Finally, we retrain the quantum layer with the new dataset for 30 epochs (using the train & validation dataset).
# 2nd - Prepare the Replacement Quantum Layerclass DressedQuantumNet(nn.Module):def __init__(self):
super().__init__()
self.pre_net = nn.Linear(512, n_qubits)
self.q_params = nn.Parameter(q_delta * torch.randn(q_depth * n_qubits))
self.post_net = nn.Linear(n_qubits, 2)def forward(self, input_features):
# obtain the input features for the quantum circuit
# by reducing the feature dimension from 512 to 4
pre_out = self.pre_net(input_features)
q_in = torch.tanh(pre_out) * np.pi / 2.0# Apply the quantum circuit to each element of the batch and append to q_out
q_out = torch.Tensor(0, n_qubits)
q_out = q_out.to(device)
for elem in q_in:
q_out_elem = quantum_net(elem, self.q_params).float().unsqueeze(0)
q_out = torch.cat((q_out, q_out_elem))# return the two-dimensional prediction from the postprocessing layer
return self.post_net(q_out)# 3rd - Replace last layer of ResNet-18 with defined quantum layermessage = "* Loading pre-trained ResNet-18..."
myprint(log_file_name, "append", message, screen=True)
model_hybrid = torchvision.models.ResNet18(pretrained=True)for param in model_hybrid.parameters():
param.requires_grad = False# Notice that model_hybrid.fc is the last layer of ResNet18
message = " - Replacing last layer (fc-layer) with Quantum Layer..."
myprint(log_file_name, "append", message, screen=True)model_hybrid.fc = DressedQuantumNet()# Use CUDA or CPU according to the "device" object.
model_hybrid = model_hybrid.to(device)
An updated model with retrained last layer (training accuracy = 97.083333%) is saved to a file ‘swgCQ_simIBMQLocal(30)-08072020085403.pth’ (as shown below).
* (START re-training the Quantum Layer)
=> Training Started.
> Phase: train Epoch: 1/30 Loss: 0.6145 Acc: 0.6875
> Phase: validation Epoch: 1/30 Loss: 0.4807 Acc: 0.8400
* Saving interim model while training: swgCQ_simIBMQLocal-at-epoch-1(30)-14072020172453.pth
[[1. 0.6875 0.6144937 0.84 0.48068776]]
> Phase: train Epoch: 2/30 Loss: 0.4623 Acc: 0.8125
> Phase: validation Epoch: 2/30 Loss: 0.3722 Acc: 0.9200
* Saving interim model while training: swgCQ_simIBMQLocal-at-epoch-2(30)-14072020172453.pth
[[2. 0.8125 0.4623062 0.92 0.37219827]]
> Phase: train Epoch: 3/30 Loss: 0.3748 Acc: 0.9125
> Phase: validation Epoch: 3/30 Loss: 0.2874 Acc: 0.9800
* Saving interim model while training: swgCQ_simIBMQLocal-at-epoch-3(30)-14072020172453.pth
[[3. 0.9125 0.37478789 0.98 0.28741294]]
> Phase: train Epoch: 4/30 Loss: 0.3427 Acc: 0.9042
> Phase: validation Epoch: 4/30 Loss: 0.2684 Acc: 0.9800
* Saving interim model while training: swgCQ_simIBMQLocal-at-epoch-4(30)-14072020172453.pth
...
...
...
> Phase: train Epoch: 29/30 Loss: 0.2568 Acc: 0.9292
> Phase: validation Epoch: 29/30 Loss: 0.1558 Acc: 1.0000
* Saving interim model while training: swgCQ_simIBMQLocal-at-epoch-29(30)-14072020172453.pth
[[29. 0.97083333 0.18603694 1. 0.14358226]]
> Phase: train Epoch: 30/30 Loss: 0.1930 Acc: 0.9708
> Phase: validation Epoch: 30/30 Loss: 0.1495 Acc: 1.0000
* Saving interim model while training: swgCQ_simIBMQLocal-at-epoch-30(30)-14072020172453.pth
[[30. 0.97083333 0.18603694 1. 0.14358226]]
* Saving train, val results (all epochs): 'epoch, best_acc_train, best_loss_train, best_acc, best_loss'
=> Training completed in 219m 52s
=> Best test loss: 0.1436 | Best test accuracy: 1.0000
* (FINISH re-training the Quantum Layer).
Once the training is done, a new model is produced (Mᵣₜ) to perform a specific Image Classification task (Tᵦ).
C. Predict
We can test the generated Hybrid Classical-Quantum Machine Learning model (‘swgCQ_simIBMQLocal(30)-08072020085403.pth’) by doing Image Classification with a test dataset that is not part of the training & validation dataset.
First, we load a trained hybrid classical-quantum machine learning model ‘swgCQ_simIBMQLocal(30)-08072020085403.pth’ (as shown in the following code),
hybrid_model_name = "swgCQ_simIBMQLocal(30)-08072020085403.pth"
my_hybrid_model_name = hybrid_model_name
my_model_hybrid = torchvision.models.ResNet18(pretrained=True)for param in my_model_hybrid.parameters():
param.requires_grad = Falsemy_model_hybrid.fc = DressedQuantumNet()
my_model_hybrid = my_model_hybrid.to(device)
my_model = my_model_hybrid
my_model.load_state_dict(torch.load(my_hybrid_model_name))# Predict a single image
from PIL import Image
data_transforms = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])def predict_single_image(model, image):
PIL_img = Image.open(image)
img = data_transforms(PIL_img)
# add 1 dimension at position 0
img_input = img.unsqueeze(0)
print("img shape", img_input.shape)
print(type(img_input))
model.eval()
with torch.no_grad(): # inferencing
outputs = model(img_input)
print("output from model:", outputs)
base_labels = (("mask", outputs[0,0]), ("no_mask", outputs[0,1]))
expvals, preds = torch.max(outputs, 1)
expvals_min, preds_min = torch.min(outputs, 1)
print("base_labels (model output): ", base_labels)
print("predicted as: ", expvals)
if expvals == base_labels[0][1]:
labels = base_labels[0][0]
else:
labels = base_labels[1][0]
ax = plt.subplot()
ax.axis("off")
title = "Detected as <" + labels + ">, Expectation Value: " + str(expvals) + " (" + str(expvals_min) + ")"
ax.set_title("[{}]".format(title))
imshow(img)
then run inference on the test dataset (Illustrated below). Note that predictions are not 100% accurate (shown one image is predicted as a person with a mask, while it is a person without a mask).
As we are using a relatively small set of datasets, this is something that may occur. We may be able to enrich the dataset by doing image augmentation, for example (the same set of images can be enriched by flipping horizontally or vertically, blurred, sharpened, enlarged, shrunken, color variations like turning into black & white or hue adjustments, and rotated among others).
3. What’s Next?
Quantum Computer requires a Paradigm Shift. It will eventually come to be implemented in practical applications, solving some of the world’s hard-problems that are tough to solve/unsolvable with current Classical Computers, even the most sophisticated Classical Supercomputer.
As any other scientists in other disciplines often dedicate their lives to pursuing things in their research area, focus on keeping updated and striving to be better than state-of-the-arts. It may seem almost impossible to achieve the defined goals at the beginning of the journey - however, with strong persistence and many patients, at the end of the road, although not always true, all of the efforts that we have been making will be worth it.
Becoming handy and having updated knowledge and experience with practical available Technologies or Open-Source tools would be invaluable for getting relevant in the future. Available tools for Data Science & Machine Learning include Python programming language and Pytorch Deep Learning framework. Recent tools for Quantum Computing are also available, including Pennylane.ai for quantum machine learning framework & Qiskit to provide convenient access to work with either IBM Quantum Simulator or real IBM Quantum Computer on the cloud.
In previous articles, we have demonstrated a random number generator (Andi Sama, 2020a), quantum information teleportation (Andi Sama, 2020b), and a simple quantum “Hello Many Worlds” application (Andi Sama, 2020c). All were developed and presented using Qiskit to create the simplest program using either a Quantum Computer Simulator or a real Quantum Computer on IBM Quantum Experience in IBM Cloud.
This article demonstrates one of many ways to do machine learning with a simulated quantum computer. A simple use-case for Image Classification has been shown. A transfer learning approach derives the model from a pre-trained neural network model (pre-trained by a classical computer) and modifying its hyper-parameters of the last layer with a quantum computer. To some extent, we are doing a hybrid quantum machine learning.
Later in 2021, as a follow on to this article (in addition to the x86-based system), we plan to do further experiments on an IBM POWER AC922 as the Classical Computer part. AC922 (Accelerated Computing on IBM POWER9) is an IBM POWER AI-Optimized server with 2x NVidia V100 16/32GB GPUs, starting from 128GB RAM, OS: Ubuntu with IBM Watson Machine Learning Community Edition (WML-CE).
Well, many more applications are possible with hybrid quantum machine learning. Sometime later, we may come to a point in building a machine learning model from scratch without transfer learning.
Going up to a higher level on real-world applications, some applications will benefit from the potential power of Quantum Computing. Examples include financial simulation, traffic optimization, weather forecasting & climate change simulation, drug discovery, development of new battery, all the way to Artificial Intelligence, and capturing CO₂ emission, to name just a few.
Well, let’s get started by doing something. And the right time is Now!.
References
- Andi Sama, 2020a, “Quantum Random Number Generator (QRNG).”
- Andi Sama, 2020b, “Quantum Teleportation: Demonstrate Quantum Information Teleportation with Qiskit on IBM Q.”
- Andi Sama, 2020c, “Hello Many Worlds in Quantum Computer — Demonstrate 2-Qubits Entanglement with Qiskit on IBM Q”.
- Andi Sama, 2020d, “The Race in Achieving Quantum Supremacy & Quantum Advantage.”
- Andi Sama, 2019, “Meneropong Masa Depan: Quantum Computing.”
- IBM, 2020b, “IBM Quantum Experience.”
- IBM, 2020c, “Qiskit, An Open-Source Quantum Computing Software Development Framework.”
- IBM, 2020d, “Qiskit Global Summer School,” July 20–31 2020.
- IBM Research, 2020, “Illustration of Quantum Information Processing.”
- James Wootton, 2018, “What is meant by ‘Noisy Intermediate-Scale Quantum’ (NISQ) technology?”, IBM Research.
- John Preskill, 2018, “Quantum Computing in the NISQ era and beyond,” Cornell University.
- PennyLane, 2020, “Quantum Transfer Learning,” Xanadu.
- Qiskit Community Team, 2020a, “Learn Quantum Computation using Qiskit,” Qiskit.org.
- Qiskit Community Team, 2020b, “Qiskit Ignis: Understanding and mitigating noise in quantum systems,” Qiskit.org.
- Qiskit YouTube Channel, 2020, “Quantum Information Science Kit,” Qiskit.org.
- Quantum Analyst, 2020, “IBM now has 18 Quantum Computers on the IBM Cloud”, May 2020.