In order to start this tutorial, make sure you have MATLAB 2017a or MATLAB 2016b downloaded and installed. Click here to download MATLAB if you do not have it installed.
In finance, MonteCarlo methods are ideal for pricing options with complicated features. Based on repeated computation and random sampling, the convergence rate of MonteCarlo method is rather slow and accurate results soon require large sample size. In this article, we’ll describe how to reduce MonteCarlo computational time using MATLAB^{®} Parallel Computing Toolbox.
Problem description
This article focuses on the parallelization of the MonteCarlo algorithm described in the article Stock options pricing using Python: an introduction.
As a reminder, a call option is a financial contract between two parties giving the buyer of the call the right, but not the obligation, to buy a particular financial instrument (Underlying asset $S$) by a certain date (Maturity time $T$) for a certain price (Strike price $K$). If we assume the underlying asset follows a geometric Brownian motion and has a lognormal distribution, basic financial theory gives the price for the underlying at the maturity time:
$$S_T = S_0 exp \left(\left(\mu – \dfrac{\sigma^2}{2}\right) T + \sigma \sqrt{T} \mathcal{N}(0,1) \right)$$
where $\mathcal{N}(0,1)$ is the standard Gaussian distribution (with zero mean and unit variance).
Since the price of the option depends directly on the price of the underlying at the date T, it is possible to estimate the value of the option. MonteCarlo simulation is a widely used and quite simple technique for solving this problem. The price of an option is calculated using MonteCarlo simulation by performing the following four steps:
 Generating several thousand random price paths for the underlying.
 Calculating the payoff for each of the potential underlying price paths.
 Averaging these payoffs…
 …and discounting the price back to today.
The following MATLAB code implements the Monte Carlo method.
Since the accuracy of the result depends on the number of trials $N$, accurate results require considerable amount of time. The slow convergence rate of MonteCarlo method ($o\left(\dfrac{1}{ \sqrt{N}}\right)$) makes actually such approach difficult to use for large numbers within a reasonable amount of time. Using parallel processing can improve drastically computational time and becomes necessary for large numbers of trials.
Improve performance with Parallel processing
Traditionally, Matlab programs have been written for serial computation which means that instructions are executed sequentially one after another on a single processor. Parallel Computing Toolbox™ lets you solve computationally problems using multicore processors and computer clusters to boost your performance. Parallel computing is a type of computation in which many calculations are carried out simultaneously. Large problems are divided into smaller ones, which are executed concomitantly on different processors reducing then computational time.
Before performing computations in parallel, it is necessary to define a cluster and a parallel pool. A cluster is a set of cores or connected computers that work together which can be viewed as a single logical unit. The cluster controls the job queue, schedules and distributes tasks to workers for computation. A parallel pool is a set of MATLAB^{®} workers on a compute cluster. Note that your parallel computation will be only performed on your parallel pool workers and not on all your cluster workers.
The Qarnot cluster is a set of several Central Processing Units (CPU) each composed of 4 physical cores. Regardless of the number of workers requested, the cluster uses 2 physical cores to schedule and distribute tasks to the workers.
If you have a multicore processor, you should be able to carry out calculations in parallel and might see speedup for large numbers using the Parallel Computing Toolbox™. To use parallel computation on your device, execute the following commands:
 Open Matlab.
 Ensure that the default cluster is the local cluster in Parallel > Manage Cluster Profiles.
 Start a pool of workers using the local profile with the following command :
The parpool function starts a parallel pool, creates a special job on a pool of workers and connects the MATLAB client to the parallel pool.
 Replace then the statement
by
to split the execution of forloop iterations over the workers in your parallel pool.
You are now able to run parallel simulations on your computer and might observe that parallel computing is faster than sequential computing for large numbers of trials !
Boost your performance with the Qarnot Cluster service
Now, let’s use Qarnot cluster to launch the distributed MonteCarlo simulation.

First, you’ll need to download the Qarnot Matlab SDK and store it in your MATLAB^{®} current folder. Then, download latest OpenVPN version.

To get access to the Qarnot cluster, you’ll need to have a Qarnot account. If you don’t already have an account, click here to register. Execute the following sets of commands on your MATLAB^{®} command window to run a new cluster after replacing with the token you obtained by registering at Qarnot computing.
After a few minutes, the command will return your OpenVPN Static key.

On a Linux system, open a new terminal window and execute the following command.
If you use a Windows machine, execute the following command in an admin shell.
You can also right click on the file untitled “matlabmdcsmycluster.jobs.qarnot.com.cmd” and choose “start as administrator”.

Your device may be already connected to the Qarnot Cluster. If you want to be sure that your MATLAB client session can access the cluster, open the Cluster Profile Manager by selecting Parallel > Manage Cluster Profiles and select validate.
You are now ready to perform computation on the Qarnot Cluster !
Optimizing parallel pool algorithms
Choice of the number of workers
The parforloop takes some time to start and make the code available to the workers. The more workers there are, the more time it takes to the workers to be ready to compute. That’s why, for small numbers of trials, using few workers is faster than using several workers. Thus, the optimal number of workers we should use depends on the number of trials we want to compute.
For each number of workers $w$, the computational time cost of $N$ trials can be approximated by
$$T_w(N) = a_w * N+T0_w$$
where $a_w$ and $T0_w$ depend only on the number of workers $w$. We can compute $a_w$ and $T0_w$ for each number of workers thanks to a linear regression. Minimizing the computational time will give us the optimal number of workers we should use for each number of paths.
Blocking
If we want to get a better estimation of the option value, we will have to repeat random sampling. The error decreases with the square root of the number of trials meaning that if we want to reduce the error by a factor 10, we need 100 times more points for the average. Thus it’s necessary to work on large numbers of trials which raises several issues.
 If you don’t split instructions into several blocks, the size of matrices will hit the computer memory limitations.
 If you split instructions into small blocks, it increases the time required for data transfer and reduces the effectiveness of the matrix calculation.
That’s why there is an optimal combination (Number of blocks, Number of paths per block) which minimizes the execution time.
Plotting the execution time for large numbers of paths as a function of the number of iterations per block gives us a minimum for 8000 iterations per block. Using 8 workers instead of one and a number of iterations per block of 8000 can reduce the computational time by more than 5 times.
Some Code
The optimal value of the number of iterations per block allows us to get the lowest computational time for each number of trials.
Next Steps
The previous article explains how to perform parallel computation to price a simple call option.
It’s possible to increase the MonteCarlo rate of convergence with lowdiscrepancy sequences ie quasirandom sequences such as the Halton or Sobol sequences. Nevertheless, the advantages of this method can breakdown for large dimension. That’s why, the parallelization of the MonteCarlo Method should be more useful when performed with highdimensional pathdependent options such as American or Barrier options.
Moreover, the optimization carried out previously has been done empirically. We didn’t explain how to theoretically determine the optimal number of workers and number of iterations.