Qarnot Technical Team
Engineers

Train your own neural network

January 14, 2021 - Machine Learning / AI

With the development of digital computers in the mid-fifties came the idea that it might be possible for a computer to develop a type of intelligence and achieve some human-like tasks. To do so, it would have to learn by itself, to constantly improve its reasoning, as we do all our life. From this point of view and with the help of statistics was created Artificial Intelligence (AI), a domain which comprises machine learning and deep learning models as well.

From the simple perceptron to complex neural networks

In deep learning, brain derived algorithms called neural networks are trained to realize certain tasks through a prediction process. More recently, deep learning has had a huge impact on many fields, and novel model architectures steadily achieve better results, at the cost of always using more computing power - costly GPUs nowadays. Let’s see how the first neural architecture, the perceptron, and the following neural networks work.

Operation of a simple perceptron

Invented in the 50s, the perceptron algorithm is a binary linear classifier first intended for image recognition through supervised learning (it is a form of training, orienting the model knowing the outcome wanted by using labeled data). It consists of multiple items: n input values $x_i$, weights $w_i$ and bias $b$, a sum and an activation function $f$ that maps the input between the required values (often in [0,1] for probabilities or [-1,1]). The following equations resumes an iteration's operations :

$ forall i in [1,n],$$y_{i} = w_{i}x_{i} $
 $begin{equation}
y = cfrac{
displaystylesum_{i=1}^{n} y_{i}
}{
displaystylesum_{i=1}^{n} w_{i}
} + b
end{equation}$
 $ y_{p}=f(y) $

Those equations are executed a certain number of times, called epochs. In each epoch, the parameters of the model, i.e. the weights and bias, are updated through a gradient optimization process that minimizes a cost function between the output of the model (predicted label yp) and the expected one (true label yt).

This algorithm being linear, it was only able to predict a few classes with a limited accuracy. A few decades later, it was shown that combining multiple perceptrons to form a multi-layered perceptron was able to bypass this limit.

Evolution of the multi-layered perceptron

The models that were developed following this discovery are able to model non-linear phenomena, which allows them to learn more complex tasks. The multi-layered perceptron that emerged in the eighties used backpropagation to compute the gradients in the optimization stage: starting from the output layer, it traces back to the input layer, calculating the importance of each perceptron on the previous output, and changing the importance of their weights accordingly. The backpropagation algorithm is nowadays often done automatically by an optimizer such as the Adam optimizer to simplify the training of neural networks.

In the 21st century, deep neural networks started to become really popular and new architectures kept being invented every few years. Among them are the well-known Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN).
CNNs use convolution operations to connect neurons with a small subset of inputs or previous neurons located in a receptive field, instead of being fully connected to all items in the previous layer. It reduces greatly the number of parameters to train and can be applied to all grid-like topological structures, such as images.
RNNs were created to adapt feed-forward neural networks to sequence data with unfixed lengths, such as word sentences, by processing words one after the other with shared weights and thus get context information.

Many libraries can be used to implement such models on Qarnot, such as Tensorflow, Sickit-Learn, Pytorch, etc. In this use case, we focus on the Pytorch library, an open source machine learning framework popular in the operational field for its ease of deployment in production. We will now use it to train a deep learning model on the Qarnot Platform.

How to run a Pytorch simulation on Qarnot

Use case

The case presented here uses the weather dataset extracted from Rain in Australia, and can be downloaded here. It contains 142193 samples of daily weather data from several regions of Australia. 24 features are used to describe this data, including a feature indicating whether it rained the day after the data was collected. It can be used to create a neural network capable of predicting the probability of rainfall on the day after a specific day from a set of spatial and atmospheric data for that day. This problem is treated as a case of binary classification, by transforming the feature that describes whether or not it rains the next day into a binary index, 0 indicating that it does not rain and 1 indicating that it rains.

The chosen data set required a short preprocessing before its use, as there are missing data and categorical variables that need to be converted into numerical data.
For pedagogical reasons, a small and simple multi-layered perceptron is built, consisting of three linear layers, but a more complex one could also be created. It is then trained with the classic adam optimizer, quickly mentioned above, and a cross entropy loss - well-suited to a classification case - on 75% of the dataset, the remaining 25% being left for testing.

Training a neural network on Qarnot

The first step is to create a Qarnot account. We offer 15€ worth of computation on your subscription, which amply covers the running cost of this example.

Next, create a Pytorch folder, and inside it an input folder. In the input folder, you will include the dataset downloaded here and the following python script that pre-processes the data, creates the model architecture and trains it.

Now that you have all the necessary elements for the simulation, let’s use the Qarnot Python SDK to launch the calculation. You can save the following python script under the name pytorch.py in the Pytorch folder. You only need to include your Qarnot Token from your newly created account (you will find it here) in the corresponding slot. This script executes the training on a CPU, but it can be done on one or more GPUs as well by modifying the profile indicated when creating the task.

QARNOT SCRIPT

The only thing left to do is to follow these steps to set up a Python virtual environment in the Pytorch folder, and then run the Python script by typing " python pytorch.py " in a terminal.

You can then view the tasks details on your own console or on Qarnot's console: Tasq by clicking on your task. Once the task is completed, the results (i.e. the trained model, predictions, and a log) will be downloaded to your computer.

Results

After 1000 training epochs, the trained model and predictions are retrieved and several metrics are analyzed. A final accuracy of 86% and an F1-Score of 78% are obtained. These rather satisfactory results, given the simplicity of the model and the parameters used, should however be nuanced: the initial dataset is very unbalanced, with nearly 78% of the samples belonging to class 0 (not raining the day after), and only 22% to the opposing class (raining the day after). This disparity is simply explained by the fact that this dataset contains weather observations from Australia where raining days are less frequent.
This results in a large number of false negatives, with the following confusion matrix showing the percentages of samples predicted in each class knowing their true label. The neural network prefers indeed to focus on the correct prediction of class 0 samples, since they are in the majority, to increase its accuracy rate.

To counterbalance this and see if better results could be obtained, several classical techniques could be tested: under-sampling, over-sampling or the inclusion of the classes weights during training. The neural network architecture can also be further improved to better adapt it to this use case.

Going further?

Other methods could have been used to make this meteorological prediction. Staying in the AI domain, other machine learning algorithms such as SVM could have given similar results, although it would have taken longer to train it.
As far as meteorological forecasts go, some progress was made in the last few years and some deep learning models, like google MetNet+, are now able to predict the weather reliably up to 12 hours in the future. For longer forecast time, computer-hungry physics-based dynamical models are necessary, as discussed in this other Qarnot article.

To go further on machine learning you also can read one of our other articles, using a sickit-learn example to learn handwritten digits.

We hope you enjoyed this tutorial! Should you have any question(s) or if you wish to use our platform for heavier computations (we can provide top of the art resources on demand), don’t hesitate to contact us.

 

written by Zoé Berenger and Nébié Guillaume Lalé

 

 

 

 

Share on networks