Bioinformatics Questions and Answers Part-20

1. The BLAST program was developed in _______
a) 1992
b) 1995
c) 1990
d) 1991

Answer: c
Explanation: The BLAST program was developed by Stephen Altschul of NCBI in 1990 and hassince become one of the most popular programs for sequence analysis. BLAST uses heuristics to align a query sequence with all sequences in a database.

2. In sequence alignment by BLAST, the second step is to search a sequence database for the occurrence of these words.
a) true
b) false

Answer: a
Explanation: This step is to identify database sequences containing the matching words. The matching of the words is scored by a given substitution matrix. A word is considered a match if it is above a threshold.

3. In sequence alignment by BLAST, each word from query sequence is typically _______ residues for protein sequences and _______ residues for DNA sequences.
a) ten, eleven
b) three, three
c) three, eleven
d) three, ten

Answer: c
Explanation: The first step is to create a list of words from the query sequence. Each word is typically three residues for protein sequences and eleven residues for DNA sequences. The list includes every possible word extracted from the query sequence. This step is also called seeding

4. The final step involves pairwise alignment by extending from the words in both directions while counting the alignment score using the same substitution matrix.
a) true
b) false

Answer: a
Explanation: The extension continues until the score of the alignment drops below a threshold due to mismatches (the drop threshold is twenty-two for proteins and twenty for DNA). The resulting contiguous aligned segment pair without gaps is called high-scoring segment pair. In the original version of BLAST, the highest scored HSPs are presented as the final report. They are also called maximum scoring pairs.

5. A recent improvement in the implementation of BLAST is the ability to provide gapped alignment
a) true
b) false

Answer: a
Explanation: In gapped BLAST, the highest scored segment is chosen to be extended in both directions using dynamic programming where gaps may be introduced. The extension continues if the alignment score is above a certain threshold; otherwise it is terminated. However, the overall score is allowed to drop below the threshold only if it is temporary and rises again to attain above threshold values. Final trimming of terminal regions is needed before producing a report of the final alignment.

6. BLASTX uses protein sequences as queries to search against a protein sequence database.
a) true
b) false

Answer: b
Explanation: BLASTP, and not BLASTX, uses protein sequences as queries to search against a protein sequence database. BLASTX uses nucleotide sequences as queries and translates them in all six reading frames to produce translated protein sequences, which are used to query a protein sequence database.

7. Which of the following is not a variant of BLAST?.
a) BLASTN
b) BLASTP
c) BLASTX
d) TBLASTNX

Answer: d
Explanation: BLAST is a family of programs that includes BLASTN, BLASTP, BLASTX TBLASTN, and TBLASTX. BLASTN queries nucleotide sequences with a nucleotide sequence database. The alignment scoring is based on the BLOSUM62 matrix.

8. TBLASTX queries protein sequences to anucleotide sequence database with the sequences translated in all six reading frames.
a) true
b) false

Answer: b
Explanation: TBLASTN queries protein sequences to anucleotide sequence database with the sequences translated in all six reading frames. TBLASTX uses nucleotide sequences, which are translated in all six frames, to search against a nucleotide sequence database that has all the sequences translated in six frames. In addition, there is also a bl2seq program that performs local alignment of two user-provided input sequences. The graphical output includes horizontal bars and a diagonal in a two-dimensional diagram showing the overall extent of matching between the two sequences.

9. If one is looking for protein homologs encoded in newly sequenced genomes, one may use TBLASTN, which translates nucleotide database sequences in all six open reading frames.
a) true
b) false

Answer: a
Explanation: This may help to identify protein coding genes that have not yet been annotated. If a DNA sequence is to be used as the query, a protein-level comparison can be done with TBLASTX. However, both programs are very computationally intensive and the search process can be very slow.

10. Which of the following is not correct about BLAST?
a) The BLAST web server has been designed in such away as to simplify the task of program selection
b) The programs are organized based on the type of query sequences
c) The programs are organized based on the type of nucleotide sequences, or nucleotide sequence to be translated
d) BLAST is not based on heuristic searching methods

Answer: d
Explanation: BLAST and FASTA are based on heuristic searching methods. In addition, programs for special purposes are grouped separately; for example, bl2seq, immunoglobulin BLAST, and VecScreen, a program for removing contaminating vector sequences.