The Bioperl-run package (a separate installation available from the main bioperl site) provides a number of wrappers around many of the more useful bioinformatics software tools, both available from the local system, and remotely using internet based webservices. We will look at two of the tools made available from this package.
Blast
There are two ways to run the blast suite of sequence analysis programs.
Bio::Tools::Run::RemoteBlast
You can run blast using the blast webservice available over the internet from the NCBI website.
This tool makes use of the NCBI URL based blast API.
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/remote_blastdblist.html
This script takes a fasta file, blast program, database, and e value as arguments, and runs the NCBI blast system.
use Bio::Tools::Run::RemoteBlast;
use strict;
my $usage = 'run_remote_blast.pl fasta_file blast_program blast_db e_value'
."\n\n e_value can be of form 1 or 1-eX where X is 10,100,1000\n";
my $fasta_file = shift or die $usage; #nc.fa
my $prog = shift or die $usage; #'blastn';
my $db = shift or die $usage; #'nr';
my $e_val= shift or die $usage; #1 or '1e-10';
my @params = ( '-prog' => $prog,
'-data' => $db,
'-expect' => $e_val,
'-readmethod' => 'xml' );
my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
### THIS IS IMPORTANT
### SearchIO currently cannot parse the normal blast text output, because NCBI changes it too frequently
### Their XML output is more stable, and should be used instead of the default
### This next line changes the retrieval type to XML
$Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{'FORMAT_TYPE'} = 'XML';
### uncomment this to see a variety of warnings generated during the entire process from within the
### Bioperl module
#$factory->verbose(1);
my $r = $factory->submit_blast($fasta_file);
while ( my @rids = $factory->each_rid ) {
foreach my $rid ( @rids ) {
my $rc = $factory->retrieve_blast($rid);
if( !ref($rc) ) {
if( $rc < 0 ) {
$factory->remove_rid($rid);
}
sleep 5;
} else {
my $result = $rc->next_result();
$factory->remove_rid($rid);
print "\nQuery Name: ", $result->query_name(), "\n";
while ( my $hit = $result->next_hit ) {
next unless ( $v > 0);
print "\thit name is ", $hit->name, "\n";
while( my $hsp = $hit->next_hsp ) {
print "\t\tscore is ", $hsp->score, "\n";
}
}
}
}
}
Bio::Tools::Run::StandAloneBlast
Some users choose to install blast locally, along with some or all of the genbank files necessary for it to work. This module allows you to run the stand alone blast programs from within a perl program.
Parsing Blast Reports
Both of these tools make use of the same basic blast parser provided by the bioperl-core package, SearchIO, which you can also use to parse blast reports that you have generated manually using the blast executable or NCBI website. But the SearchIO is much more than a blast parser. Having learned about BioPerl's naming convention, the IO in the name should alert you to its design. It is a factory object designed to take arguments about which of many popular analysis programs output it is attempting to read in or write out, and provide access to the information from any of these formats in a uniform system of objects.
SearchIO HowTo.
Basically, the SearchIO system takes a report generated by one of its supported programs, and parses it into zero or more Bio::Search::Result::ResultI implementing objects which can be accessed using the next_report iterator method. Bio::Search::Result::ResultI objects then provide access to zero or more Bio::Search::Hit::HitI implementing objects using the next_hit iterator method. Bio::Search::Hit::HitI objects then provide access to zero or more Bio::Search::HSP::HSPI implementing objects, for high-scoring-pairs, using the next_hsp iterator method. In addition to the iterators, each of these objects makes a variety of attributes, and other methods available to get access to the information provided in the report. For example, the Bio::Search::HSP::HSPI implementing objects all provide access to a Bio::SimpleAlign object using the get_aln method. This object can then be used in a variety of ways, such as writing out to a well-understood alignment report format supported by the Bio::AlignIO system.
Comments (0)
You don't have permission to comment on this page.