Tiphaine Bonniot
HPC Innovation Engineer
Engineer at Qarnot, specialized in HPC, environmental footprint and operations research
HPC platform
Launch compute tasks in a few lines of code or a few clicks on Tasq, our HPC platform.

BlastN on Qarnot Cloud - documentation

October 20, 2021 - Biotech, Documentation

Introduction

Basic Local Alignment Search Tool (BLAST) is initially an online web-based tool allowing to find regions of similarity between biological sequences. The program compares nucleotide sequences to sequence databases and computes statistical significance. Depending on the sequencing data type, there are different specific tools. In this article, we focus on the alignment of nucleotide sequences and thus, on the usage of BlastN.

Versions

The test case uses BLAST 2.10.1.

Release year    Version
20202.10.1

If you are interested in another version, please send us an email at qlab@qarnot.com.

Prerequisites

Before launching a computation task, please ensure that you already fulfill those requirements:

Test case

This test case is a simple example of BLAST use, and more particularly the tool BlastN, on Qarnot Cloud, using the Python SDK. We will align a list of query DNA sequences against another list of reference DNA sequence. Please find a dataset containing two local sequences. Unzip them and place them both in a folder named dataset-blastn. (Find below the headers of the two local sequences files:)


NC_000006.12 Homo sapiens chromosome 6, GRCh38.p13 Primary Assembly
NM_005514.8 Homo sapiens major histocompatibility complex, class I, B (HLA-B), mRNA

In that same dataset-blastn folder, create a run_blastn.sh file and copy the following code in it. 

run_blastn.sh externalCode[https://raw.githubusercontent.com/qarnot/blog-samples/main/blastn/run_blastn.sh]

This code contains instructions to:

  • Build a database based on a reference sequence (for more information, see BLAST documentation),
  • Align a sequence (query) from a FASTA file to the previously built database.

Launching the case

Copy the following code in a Python script and save it in the same path as the dataset-blastn folder under the name blastn.py

blastn.py externalCode[https://raw.githubusercontent.com/qarnot/blog-samples/main/blastn/blastnOnTasq.py] Be sure you have copied your authentication token in the script (instead of <<<MY_SECRET_TOKEN>>>) to be able to launch the task on Qarnot. Make sure that all input files mentioned above (1 fna file, 1 fsa file, 1 sh file) are in the same folder named dataset-blastn. Your working directory should look like this :

  • dataset-blastn/
    • chr6.fna : Homo sapiens chromosome 6
    • hla-b.fsa : Homo sapiens major histocompatibility complex
    • run_blastn.sh : script to run the alignment using BlastN
  • blastn.py : Python script to run the computation on Qarnot

To launch this script, open a terminal in your working directory and execute python3 blastn.py.

Results

At any given time, you can monitor the status of your task on Tasq.

 

Successful BlastN execution on Tasq

You should now have an output folder in your working directory on your computer and a blastn-demo-output bucket on Tasq containing all output files (built database and scores for sequences alignment in the results.out file, see bellow).

results.out


BLASTN 2.10.1+

Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.



Database: chr6.fna
           1 sequences; 170,805,979 total letters



Query= NM_005514.8 Homo sapiens major histocompatibility complex, class I,
B (HLA-B), mRNA

Length=1536
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

NC_000006.12 Homo sapiens chromosome 6, GRCh38.p13 Primary Assembly   784     0.0  


>NC_000006.12 Homo sapiens chromosome 6, GRCh38.p13 Primary Assembly
Length=170805979

 Score = 784 bits (424),  Expect = 0.0
 Identities = 424/424 (100%), Gaps = 0/424 (0%)
 Strand=Plus/Minus

Query  1113      AGCCTGAGACAGCTGTCTTGTGAGGGACTGAGATGCAGGATTTCTTCACGCCTCCCCTTT  1172
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31354298  AGCCTGAGACAGCTGTCTTGTGAGGGACTGAGATGCAGGATTTCTTCACGCCTCCCCTTT  31354239

Query  1173      GTGACTTCAAGAGCCTCTGGCATCTCTTTCTGCAAAGGCACCTGAATGTGTCTGCGTCCC  1232
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31354238  GTGACTTCAAGAGCCTCTGGCATCTCTTTCTGCAAAGGCACCTGAATGTGTCTGCGTCCC  31354179

Query  1233      TGTTAGCATAATGTGAGGAGGTGGAGAGACAGCCCACCCTTGTGTCCACTGTGACCCCTG  1292
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31354178  TGTTAGCATAATGTGAGGAGGTGGAGAGACAGCCCACCCTTGTGTCCACTGTGACCCCTG  31354119

Query  1293      TTCCCATGCTGACCTGTGTTTCCTCCCCAGTCATCTTTCTTGTTCCAGAGAGGTGGGGCT  1352
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31354118  TTCCCATGCTGACCTGTGTTTCCTCCCCAGTCATCTTTCTTGTTCCAGAGAGGTGGGGCT  31354059

Query  1353      GGATGTCTCCATCTCTGTCTCAACTTTACGTGCACTGAGCTGCAACTTCTTACTTCCCTA  1412
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31354058  GGATGTCTCCATCTCTGTCTCAACTTTACGTGCACTGAGCTGCAACTTCTTACTTCCCTA  31353999

Query  1413      CTGAAAATAAGAATCTGAATATAAATTTGTTTTCTCAAATATTTGCTATGAGAGGTTGAT  1472
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31353998  CTGAAAATAAGAATCTGAATATAAATTTGTTTTCTCAAATATTTGCTATGAGAGGTTGAT  31353939

Query  1473      GGATTAATTAAATAAGTCAATTCCTGGAATTTGAGAGAGCAAATAAAGACCTGAGAACCT  1532
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31353938  GGATTAATTAAATAAGTCAATTCCTGGAATTTGAGAGAGCAAATAAAGACCTGAGAACCT  31353879

Query  1533      TCCA  1536
                 ||||
Sbjct  31353878  TCCA  31353875

...

Wrapping up

That’s it! If you have any questions, please contact qlab@qarnot.com and it will be our pleasure to help you! 
You can read a more detailed article on this use case : Nucleotide sequence alignment with BlastN – Qarnot Blog or discover an other bio-technology use case on Qarnot : Molecular docking and cloud computing – Qarnot Blog.

Share on networks