Introduction
Basic Local Alignment Search Tool (BLAST) is initially an online web-based tool allowing to find regions of similarity between biological sequences. The program compares nucleotide sequences to sequence databases and computes statistical significance. Depending on the sequencing data type, there are different specific tools. In this article, we focus on the alignment of nucleotide sequences and thus, on the usage of BlastN.
Versions
The test case uses BLAST 2.10.1.
Release year | Version |
---|---|
2020 | 2.10.1 |
If you are interested in another version, please send us an email at qlab@qarnot.com.
Prerequisites
Before launching the case, please ensure that the following prerequisites have been met.
- Create an account (here)
- Retrieve your authentication token (here)
- Install one of Qarnot’s SDK (Python) (Node.js) (C#) (Commande Line)
Test case
This test case is a simple example of using BLAST, and more particularly the tool BlastN, on Qarnot Cloud, using the Python SDK. We will align a list of query DNA sequences against another list of reference DNA sequences. You can download the data containing two local sequences here. You need to unzip it to be able to launch the computation on Qarnot. Find bellow the headers of the two local sequences files:
NC_000006.12 Homo sapiens chromosome 6, GRCh38.p13 Primary Assembly
NM_005514.8 Homo sapiens major histocompatibility complex, class I, B (HLA-B), mRNA
Copy the following code in a sh script and place it in the dataset-blastn
folder you unzipped previously. It contains instructions to:
- Build a database based on a reference sequence (for more information, see BLAST documentation),
- Align a sequence (query) from a FASTA file to the previously built database.
run_blastn.sh
Launching the case
Copy the following code in a Python script and save it in the same path as the dataset-blastn
folder. blastn.py Be sure you have copied your authentication token in the script (instead of <<<MY_SECRET_TOKEN>>>
) to be able to launch the task on Qarnot. Make sure that all input files mentioned above (1 fna file, 1 fsa file, 1 sh file) are in the same folder named dataset-blastn
. Your working directory should look like this :
dataset-blastn/
chr6.fna
: Homo sapiens chromosome 6hla-b.fsa
: Homo sapiens major histocompatibility complexrun_blastn.sh
: script to run the alignment using BlastN
blastn.py
: Python script to run the computation on Qarnot
To launch this script, open a terminal in your working directory and execute python3 blastn.py &
.
Results
At any given time, you can monitor the status of your task on Tasq.
Successful BlastN execution on Tasq
You should now have an output
folder in your working directory on your computer and a blastn-out
bucket on Tasq containing all output files (built database and scores for sequences alignment in the results.out
file, see bellow).results.out
BLASTN 2.10.1+
Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.
Database: chr6.fna
1 sequences; 170,805,979 total letters
Query= NM_005514.8 Homo sapiens major histocompatibility complex, class I,
B (HLA-B), mRNA
Length=1536
Score E
Sequences producing significant alignments: (Bits) Value
NC_000006.12 Homo sapiens chromosome 6, GRCh38.p13 Primary Assembly 784 0.0
>NC_000006.12 Homo sapiens chromosome 6, GRCh38.p13 Primary Assembly
Length=170805979
Score = 784 bits (424), Expect = 0.0
Identities = 424/424 (100%), Gaps = 0/424 (0%)
Strand=Plus/Minus
Query 1113 AGCCTGAGACAGCTGTCTTGTGAGGGACTGAGATGCAGGATTTCTTCACGCCTCCCCTTT 1172
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 31354298 AGCCTGAGACAGCTGTCTTGTGAGGGACTGAGATGCAGGATTTCTTCACGCCTCCCCTTT 31354239
Query 1173 GTGACTTCAAGAGCCTCTGGCATCTCTTTCTGCAAAGGCACCTGAATGTGTCTGCGTCCC 1232
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 31354238 GTGACTTCAAGAGCCTCTGGCATCTCTTTCTGCAAAGGCACCTGAATGTGTCTGCGTCCC 31354179
Query 1233 TGTTAGCATAATGTGAGGAGGTGGAGAGACAGCCCACCCTTGTGTCCACTGTGACCCCTG 1292
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 31354178 TGTTAGCATAATGTGAGGAGGTGGAGAGACAGCCCACCCTTGTGTCCACTGTGACCCCTG 31354119
Query 1293 TTCCCATGCTGACCTGTGTTTCCTCCCCAGTCATCTTTCTTGTTCCAGAGAGGTGGGGCT 1352
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 31354118 TTCCCATGCTGACCTGTGTTTCCTCCCCAGTCATCTTTCTTGTTCCAGAGAGGTGGGGCT 31354059
Query 1353 GGATGTCTCCATCTCTGTCTCAACTTTACGTGCACTGAGCTGCAACTTCTTACTTCCCTA 1412
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 31354058 GGATGTCTCCATCTCTGTCTCAACTTTACGTGCACTGAGCTGCAACTTCTTACTTCCCTA 31353999
Query 1413 CTGAAAATAAGAATCTGAATATAAATTTGTTTTCTCAAATATTTGCTATGAGAGGTTGAT 1472
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 31353998 CTGAAAATAAGAATCTGAATATAAATTTGTTTTCTCAAATATTTGCTATGAGAGGTTGAT 31353939
Query 1473 GGATTAATTAAATAAGTCAATTCCTGGAATTTGAGAGAGCAAATAAAGACCTGAGAACCT 1532
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 31353938 GGATTAATTAAATAAGTCAATTCCTGGAATTTGAGAGAGCAAATAAAGACCTGAGAACCT 31353879
Query 1533 TCCA 1536
||||
Sbjct 31353878 TCCA 31353875
...
Wrapping up
That’s it! If you have any questions, please contact qlab@qarnot.com and we will help you with pleasure! You can read a more detailed article on this use case here : Nucleotide sequence alignment with BlastN – Qarnot Blog or discover an other bio-technology use case on Qarnot here : Molecular docking and cloud computing – Qarnot Blog.