Rémi Bouzel
Scientific Director
HPC platform
Launch compute tasks in a few lines of code or a few clicks on Tasq, our HPC platform.

Molecular docking and cloud computing

December 3, 2019 - Biotech

 

In this article, we will go through a general description of molecular docking with a few examples, and see how we can run such algorithms on Qarnot platform.

What is molecular docking?

Molecular docking is a method to predict the binding of two molecules:

  • Will they bind? For this to happen, the complex formed by the two molecules must have an energy lower than the energy of both molecules separately. A docking software will look for energy minimums of binding complexes.
  • What shape will have the molecular complex once they are bound?

Why is it important?

It is essential in drug design: a lab will analyze the binding of thousands of molecules (called ligands, usually small molecules) with another molecule (called the receptor, usually a protein, with thousands of atoms) and check the best matching molecule, and its configuration. Let’s take a practical example with enzymes: enzymes accelerate chemical reaction when they bind to a specific substrate. To prevent this to happen, a drug can be designed to modify the shape of the enzyme, and therefore prevent the binding of the substrate.

A famous example is the penicillin (used in antibiotics).  A bacteria is surrounded by a cell wall. The penicillin binds to the bacteria molecule in charge of maintaining this wall (called DD-transpeptidase), modifies it, and therefore prevents it from building the wall. After some time, the bacteria finally bursts and dies.

Why do we need High Performance Computing to solve this kind of problems?

  • A rough computation answers this question. Chemists usually have a good idea of the active zone to check on the protein, i.e. in our penicillin example, where our drug should locate to inhibit the DD-transpeptidase. A general estimation of this zone would be a cube with a 50A° side (1A° = 0,1nm). If we consider a granularity of 1A° (i.e. we check the molecule configuration for every Angstrom), we have to study $50^3 approx 10^5$ positions. The identification of this pocket is critical in the pipeline as it can drastically reduce the number of configurations to check.
  • Each of these positions can rotate in 3 dimensions. If we consider 100 ranges for each rotation (each range representing a 3,6° rotation), we now have much more options: $10^5 * 100^3 = 10^{11}$ configurations (position + orientation)
  • A molecule can also have local rotations called torsions (a piece of the molecule can rotate compared to another). If we consider we have 10 of them, and 20 ranges for each torsion (each range representing a 18° rotation), we multiply again the number of possibilities: $10^{11} * 20^{10} approx 10^{24}$ (position + orientation + torsion)
  • Finally, the team working on this project has 1000 ligands they’d like to test, so we now reach $10^{27}$ possibilities. Going through each of them is extremely long, it won’t run on a normal computer. To solve this issue, two solutions need to be used simultaneously:
    • Optimization: we won’t go through each of these $10^{27}$ solutions one by one, instead we use specific algorithms to decrease the number of solutions to study
    • HPC: Most of these computations are still too heavy to run on regular computers, the team working on drug design will use servers with more capacity to go faster and parallelize the computation

Optimization

  • If every configuration takes 1us to run, it will take $10^{21}$ seconds to go through all of them sequentially, which represents billions of years. Thanks to smart optimization, this time can be reduced down to a few hours (around 3 minutes per ligand)
  • Different softwares use different algorithms. For our example, we will use Autodock Vina as it is one of the most robust open-source docking softwares
  • 2 different algorithms are involved to find the best docking position:
    • In order to evaluate the fitness of a potential solution, we need a scoring algorithm that will associate a solution (the complex ligand-protein with a specific position, orientation and conformation of the ligand) with its energy.
    • The search algorithm reduces drastically the number of configurations to explore and still has a very high probability of finding the best configuration. There are various ways to do so, but let’s go through the one used by Autodock Vina: the Iterate local Search global optimizer. Autodock vina performs several tasks following the same steps:
      • Step 1: generation of a random starting point for the ligand: orientation (3 dimensions), rotation (3 dimensions) and torsion (if the  ligand has 10 rotatable bonds, 10 dimensions)
      • Step 2:
        • Mutation: one of the argument’s value (position, orientation, torsion) is randomly changed
        • Local Optimization: we check the energy of this “mutated” conformation, and do a local optimization: small modification of the arguments (position, orientation, torsion) in order to reach a local energy minimum.
        • After a mutation and a local optimization, we reach a local minimum of energy, but as we look for the global minimum, we go through the same step several times
      • Step 3:
        • Mutation
        • Local Optimization
      • Step ...
      • Step n: we have our best guess for the global minimum, for this task

 

Two comments on this algorithm:

  • With Autodock Vina, the number of steps is automatically defined by the software, depending on the complexity of the problem
  • Because step 1 is a random choice, tasks can be parallelized. Let’s see how to run a simple example with Qarnot platform

How to run this computation with Qarnot?

The following Vina configuration file can be used to set this computation:

 

Here we specify the receptor, the ligand, the center and size of the grid, and the number of cores  on which the computation will run. We see below the protein and ligand we chose for our example. We also displayed the Gaussian surface of the protein to get a better idea.

These visualizations and the ones below were obtained thanks to SAMSON visualization tool. You can then run the following python script on Qarnot’s sustainable HPC platform: it will create buckets to gather the inputs (configuration files, and the two molecules), and will launch the task to one of our servers. Once the task is done, it will put its result in the output bucket.

You can see in the following picture the complex assembled: the protein (Gaussian surface) and the ligand (stick and ball display). We see here that the energy minimum configuration for the ligand is almost nested in the protein.

If you want to run this example by yourself:

  • Create a Qarnot Account.
  • Copy your client token (find it here) and paste it in the run.py script.
  • On your computer, in your working directory, follow these steps to set up a Python virtual environment
  • Copy the configuration file (config.txt) and the following molecules files (*.pdbqt) in a directory called resources_vina_qarnot, located at the same path as your python script
  • Execute the run.py (you might need to allow run.py to execute by typing "chmod +x run.py")
  • A task will appear on Tasq, when it finishes, download your results from the output bucket

Molecule 1:

Molecule 2:

 

These molecules come from the Autodock Vina tutorial   We hope you enjoyed this tutorial! Should you have any question(s) or if you wish to use our platform for heavier computations (we can provide top of the art resources on demand), don’t hesitate to contact us.

Share on networks