AutoML is a technology that automates the most time consuming tasks of a Machine Learning project so that data scientists can spend more time on business problems on practical scenarios. It also allows everyone, instead of a small group of people, to use machine learning technology.
Here is a quick step by step walkthrough to guide you through the different steps of how to train Auto-sklearn, an AutoML framework, on Qarnot so follow along!
If you are interested in another version, please send us an email at email@example.com.
Before starting a calculation with the Python SDK, a few steps are required:
The data showcased in this tutorial is the
electricity data set. This data was collected from the Australian New South Wales Electricity Market where electricity prices are set every five minutes based on supply and demand. Given this historical data, we have to predict whether the electricity prices will go up or down. This is called a binary classification problem and our class labels are UP and DOWN.
We want to build the best possible model using Auto-sklearn in a given time frame. The best way to do so is to train multiple models in parallel to increase the chances of building strong models. This is well suited for Qarnot’s HPC service which we will use for parallelizing Auto-sklearn’s computation across multiple nodes in a cluster. This test case will showcase how to train Auto-sklearn for 15 minutes on a 4 nodes Qarnot cluster. The cluster’s workflow can be summarized with the following diagram.
The necessary input data needed for this tutorial can be downloaded here.
Before moving forward, you should setup your working environment to contain the following files:
electricity-normalized.csv: training data
auto-sklearn.py: script to start the task (found bellow)
Launching the test case
Once your working environment is set up correctly you are almost ready to start. Be sure to copy your authentication token in the script (instead of
<<<MY_SECRET_TOKEN>>>) to be able to launch the task on Qarnot.
To launch this script, simply copy the code above in a Python script and execute the following command in your terminal:
python3 auto-sklearn.py &.
At any given time, you can monitor the status of your task on Tasq.
Once the training is done, the task state will pass to green. You can then check out the task’s output bucket
auto-sklearn-out. There you will find different files like a training log, the saved model, the prediction confusion matrix and an accuracy over time graph.
Accuracy over time graph
That’s it! If you have any questions, please contact firstname.lastname@example.org and we will help you with pleasure!
If you are curious and would like to learn more about this particular use case and others, you can check out our blog article which goes into much more detail.