Qarnot Technical Team
Engineers
HPC platform
Launch compute tasks in a few lines of code or a few clicks on Tasq, our HPC platform.

Auto-sklearn on Qarnot Cloud - documentation

October 19, 2021 - HPC discovery, Documentation, Machine Learning / AI

Introduction

AutoML is a technology that automates the most time consuming tasks of a Machine Learning project so that data scientists can spend more time on business problems on practical scenarios. It also allows everyone, instead of a small group of people, to use machine learning technology.

Here is a quick step by step walkthrough to guide you through the different steps of how to train Auto-sklearn, an AutoML framework, on Qarnot so follow along!

Version

Release yearVersion
2021v0.12.5

If you are interested in another version, please send us an email at qlab@qarnot.com.

Prerequisites

Before starting a calculation with the Python SDK, a few steps are required:

  • Retrieve the authentication token (here)
  • Install Qarnot’s Python SDK (here)

Note: in addition to the Python SDK, Qarnot provides C# and Node.js SDKs and a Command Line.

Test Case

The data showcased in this tutorial is the electricity data set. This data was collected from the Australian New South Wales Electricity Market where electricity prices are set every five minutes based on supply and demand. Given this historical data, we have to predict whether the electricity prices will go up or down. This is called a binary classification problem and our class labels are UP and DOWN.

We want to build the best possible model using Auto-sklearn in a given time frame. The best way to do so is to train multiple models in parallel to increase the chances of building strong models. This is well suited for Qarnot’s HPC service which we will use for parallelizing Auto-sklearn’s computation across multiple nodes in a cluster. This test case will showcase how to train Auto-sklearn for 15 minutes on a 4 nodes Qarnot cluster. The cluster’s workflow can be summarized with the following diagram.

The necessary input data needed for this tutorial can be downloaded here.

Before moving forward, you should setup your working environment to contain the following files:

  • input
    • electricity-normalized.csv: training data
  • auto-sklearn.py: script to start the task (found bellow)

Launching the test case

Once your working environment is set up correctly you are almost ready to start. Be sure to copy your authentication token in the script (instead of <<<MY_SECRET_TOKEN>>>) to be able to launch the task on Qarnot.

   

To launch this script, simply copy the code above in a Python script and execute the following command in your terminal: python3 auto-sklearn.py &.

Results

At any given time, you can monitor the status of your task on Tasq.

Once the training is done, the task state will pass to green. You can then check out the task’s output bucket auto-sklearn-out. There you will find different files like a training log, the saved model, the prediction confusion matrix and an accuracy over time graph.

Confusion matrix

Accuracy over time graph

Wrapping up

That’s it! If you have any questions, please contact qlab@qarnot.com and we will help you with pleasure!

If you are curious and would like to learn more about this particular use case and others, you can check out our blog article which goes into much more detail.

 

Share on networks