Computing Power to the People

The Official Qarnot Blog

< Back

BlastN on Qarnot – documentation


by Tiphaine Bonniot - October 20, 2021 - Biotech

Introduction

Basic Local Alignment Search Tool (BLAST) is initially an online web-based tool allowing to find regions of similarity between biological sequences. The program compares nucleotide sequences to sequence databases and computes statistical significance. Depending on the sequencing data type, there are different specific tools. In this article, we focus on the alignment of nucleotide sequences and thus, on the usage of BlastN. The test case uses BLAST 2.10.1.

Versions

Release year Version
2020 2.10.1

If you are interested in another version, please send us an email at qlab@qarnot.com.

Prerequisites

Before launching the case, please ensure that the following prerequisites have been met.

Test case

This test case is a simple example of using BLAST, and more particularly the tool BlastN, on Qarnot, using the Python SDK. We will align a list of query DNA sequences against another list of reference DNA sequences. You can download the data containing two local sequences here. You need to unzip it to be able to launch the computation on Qarnot. Find bellow the headers of the two local sequences files:

 

NC_000006.12 Homo sapiens chromosome 6, GRCh38.p13 Primary Assembly
NM_005514.8 Homo sapiens major histocompatibility complex, class I, B (HLA-B), mRNA

Copy the following code in a sh script and place it in the dataset-blastn folder you unzipped previously. It contains instructions to:

  • Build a database based on a reference sequence (for more information, see BLAST documentation),
  • Align a sequence (query) from a FASTA file to the previously built database.

run_blastn.sh

Launching the case

Copy the following code in a Python script and save it in the same path as the dataset-blastn folder.

blastn.py

Be sure you have copied your authentication token in the script (instead of <<<MY_SECRET_TOKEN>>>) to be able to launch the task on Qarnot.

Make sure that all input files mentioned above (1 fna file, 1 fsa file, 1 sh file) are in the same folder named dataset-blastn. Your working directory should look like this :

  • dataset-blastn/
    • chr6.fna : Homo sapiens chromosome 6
    • hla-b.fsa : Homo sapiens major histocompatibility complex
    • run_blastn.sh : script to run the alignment using BlastN
  • blastn.py : Python script to run the computation on Qarnot

To launch this script, open a terminal in your working directory and execute python3 blastn.py &.

Results

At any given time, you can monitor the status of your task on the Console.

Successful BlastN execution on Qarnot’s console

You should now have an output folder in your working directory on your computer and a blastn-out bucket on Qarnot’s Console containing all output files (built database and scores for sequences alignment in the results.out file, see bellow).

results.out
 

BLASTN 2.10.1+

Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.



Database: chr6.fna
           1 sequences; 170,805,979 total letters



Query= NM_005514.8 Homo sapiens major histocompatibility complex, class I,
B (HLA-B), mRNA

Length=1536
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

NC_000006.12 Homo sapiens chromosome 6, GRCh38.p13 Primary Assembly   784     0.0  


>NC_000006.12 Homo sapiens chromosome 6, GRCh38.p13 Primary Assembly
Length=170805979

 Score = 784 bits (424),  Expect = 0.0
 Identities = 424/424 (100%), Gaps = 0/424 (0%)
 Strand=Plus/Minus

Query  1113      AGCCTGAGACAGCTGTCTTGTGAGGGACTGAGATGCAGGATTTCTTCACGCCTCCCCTTT  1172
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31354298  AGCCTGAGACAGCTGTCTTGTGAGGGACTGAGATGCAGGATTTCTTCACGCCTCCCCTTT  31354239

Query  1173      GTGACTTCAAGAGCCTCTGGCATCTCTTTCTGCAAAGGCACCTGAATGTGTCTGCGTCCC  1232
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31354238  GTGACTTCAAGAGCCTCTGGCATCTCTTTCTGCAAAGGCACCTGAATGTGTCTGCGTCCC  31354179

Query  1233      TGTTAGCATAATGTGAGGAGGTGGAGAGACAGCCCACCCTTGTGTCCACTGTGACCCCTG  1292
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31354178  TGTTAGCATAATGTGAGGAGGTGGAGAGACAGCCCACCCTTGTGTCCACTGTGACCCCTG  31354119

Query  1293      TTCCCATGCTGACCTGTGTTTCCTCCCCAGTCATCTTTCTTGTTCCAGAGAGGTGGGGCT  1352
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31354118  TTCCCATGCTGACCTGTGTTTCCTCCCCAGTCATCTTTCTTGTTCCAGAGAGGTGGGGCT  31354059

Query  1353      GGATGTCTCCATCTCTGTCTCAACTTTACGTGCACTGAGCTGCAACTTCTTACTTCCCTA  1412
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31354058  GGATGTCTCCATCTCTGTCTCAACTTTACGTGCACTGAGCTGCAACTTCTTACTTCCCTA  31353999

Query  1413      CTGAAAATAAGAATCTGAATATAAATTTGTTTTCTCAAATATTTGCTATGAGAGGTTGAT  1472
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31353998  CTGAAAATAAGAATCTGAATATAAATTTGTTTTCTCAAATATTTGCTATGAGAGGTTGAT  31353939

Query  1473      GGATTAATTAAATAAGTCAATTCCTGGAATTTGAGAGAGCAAATAAAGACCTGAGAACCT  1532
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  31353938  GGATTAATTAAATAAGTCAATTCCTGGAATTTGAGAGAGCAAATAAAGACCTGAGAACCT  31353879

Query  1533      TCCA  1536
                 ||||
Sbjct  31353878  TCCA  31353875

...

Wrapping up

That’s it! If you have any questions, please contact qlab@qarnot.com and we will help you with pleasure! You can read a more detailed article on this use case here : Nucleotide sequence alignment with BlastN – Qarnot Blog or discover an other bio-technology use case on Qarnot here : Molecular docking and cloud computing – Qarnot Blog.

comments

Leave a Reply

Your email address will not be published. Required fields are marked *