perlsOfLondon

 

from_cutandpaste_to_packages

Page history last edited by Darin 1 yr ago

 Using Packages to Share Access to Important Subroutines Among Scripts

 

We have seen how subroutines allow you to more easily reuse important code in multiple parts of the same script.  But this doesnt make it any easier to reuse code from one script in a totally different script.  But, now that we have gone through the trouble of putting this code into a subroutine, it is easy to move it into a package, and allow other scripts to use it by importing it from the package.

 

First, we have to review a bit about Perl Packages.

 

Packages

Perl allows you to create packages with names.  These can be defined in any code, but it is usually best to place package code into its own source file, named for the package.  All packages have the same structure.

 

1. Package  Definition

The code defining the package must use the 'package' function that perl makes available to define packages.

 

package myPackage;

 

# code goes here

 

 

1;

 

Also, perl packages must always end with a line which returns true when the file is processed by the perl interpreter, which is accomplished with the 1; line above.  This code should be placed in a file called myPackage.pm (it must end with the .pm extension).

 

Also, you can, and probably should, define your packages within a more limited Namespace, which makes it possible to use your package with another package with the same name, so long as they are in different namespaces.  To do this, you use the namespace in the 'package' definition, and place the .pm file in a subdirectory structure which matches the namespace structure.

 

MyNamespace/MyPackage.pm

 

package MyNamespace::MyPackage;

 

MySuper/MySub/MyDomain/MyPackage.pm

 

package MySuper::MySub::MyDomain::MyPackage;

 

Also, to use your package in any code, the directory with the .pm file, or, in the case of a package with NameSpaces, the directory containing the ultimate parent directory of the namespace, must be in your @INC array, or in the default system perl package directories.  This is most easily accomplished by adding each super directory to your PERL5LIB environment variable.

 

If MyPackage.pm is in /home/user/lib

then

export PERL5LIB="/home/user/lib"

or, if PERL5LIB is already defined

export PERL5LIB="${PERL5LIB}:/home/user/lib"

 

To provide access to MySuper::MySub::MyDomain::MyPackage, which is in

/home/user/lib/MySuper/MySub/MyDomain/MyPackage.pm

 

export PERL5LIB="/home/user/lib"

 

 

2. Package Variables

Variables that are declared inside packages behave just like variables declared everywhere else in scripts.  In fact, you should just get used to the idea that the code that runs inside a typical perl script is actually given its own package, named main.  You can declare any type of variable in a package that you can declare in main, e.g. scalars, arrays, hashes, and references.  You can, and should, declare your variables with scope declarations.  Remember, 'my' means private, and 'our' means 'global'.  If you declare the variables without a scope decleration then are made available with a package scope, which, for most purposes, is like global 'our'.  However, if you use the strict pragma in the package, or if the calling script, or any other package used by the calling script, uses the strict pragma, your package will cause that script to die if it has declared variables without an explicity scope declaration.  It is always best to use the following standard when declaring variables in your packages.

 

a. When in doubt, make variables private using 'my'.  This maximizes the security of the data in this variable, but preventing any other package (including main) from accessing or changing its contents.  You will be happier using my.

b.  If you absolutely need for calling scripts or other packages to be able to access or change a variable, then, and only then, use 'our', and document its intended use in the package documentation.

c.  Never declare a variable in a package without an explicit scope declaration.

 

3.  Using Packages in your code

In order to use a package in your code, whether it is the main script, or another package, you have to either 'require', or 'use' it.  99% of the time you will 'use' a package.  Once you use a package, you can access any subroutine, and any global or package variable within it, by referring to it in the context of the package name, including its namespace.  For instance, if you use MySuper::MyPackage, which has a global $test scalar variable, and a print_greeting subroutine, you could do the following:

 

use MySuper::MyPackage;

 

$MySuper::MyPackage::test = 'hello test';

print $MySuper::MyPackage::test, "\n";

 

MySuper::MyPackage::print_greeting();

 

4. Exporter

Most people who create packages do not want to require the calling code to use the package name when invoking the shared resources they provide.  They simply want to 'export' certain resources to the calling code so that it is available to that code as if it was defined locally to the code.  This is accomplished using the Exporter perl package, and making your package an extension of Exporter using the following code occuring in the package:

 

package MyPackage;

use vars qw/@EXPORT @ISA/; # this makes use strict happy when you use @EXPORT and @ISA without scope declarations

use Exporter; # use the Exporter Package

@ISA = qw/Exporter/; # this makes your package an extension of Exporter

 

You then need to specify which resources you want to export to the calling code.  You do this using one of two arrays that Exporter itself exports into your package:

 

* @EXPORT:  Anything placed into this array is automatically exported to the calling script.

* @EXPORT_OK: Anything placed into this array is only exported to the calling script if explicitly asked for after the 'use' statement, e.g.

 

use MyPackage qw/something/;

 

Will only import something into the calling script, and nothing else.  This conserves memory, but adds a little complexity to the interface to the package.

 

In addition, in order to export variables to the calling script, you must make them global using 'our'.  Private variables cannot be accessed or changed by calling code, even if they are exported in @EXPORT or @EXPORT_OK.

The following package exports a single subroutine and a single scalar variable, which uses a private internal subroutine, and the exported scalar variable.

 

package MyPackage;

use vars qw/@EXPORT @ISA/;

 

use Exporter;

@ISA = qw/Exporter/;

@EXPORT = qw/print_greeting $greeting/;

 

our $greeting = 'Hello %s';

 

sub print_greeting {

  my $input = shift;

  &_print_it($input);

}

 

sub _print_it {

  my $it = shift;

  printf $greeting, $it;

}

1;

 

It can be used as follows:

 

use MyPackage;

 

&print_greeting("Dave");

$greeting = "Goodbye %s";

&print_greeting("Dave");

exit;

 

 

Example: IGSP::FastaSeq

Moving our next_seq FASTA fetching subroutine into a package is extremely easy, due to the time we spent developing it in a black-box style.  We just have to use Expoter and export the next_seq method to the calling code.  Then we can use that code anywhere where we want to process a fasta file record by record.  We will give your package a namespace, called IGSP, and a name FastaSeq.  This will require that we create a directory called IGSP somewhere, and place our FastaSeq.pm file in it with the code it requires from the original script (including the $last_id_line).

 

package IGSP::FastaSeq;

use vars qw/@EXPORT @ISA/;

 

use Exporter;

@EXPORT = qw/next_seq/;

@ISA = qw/Exporter/;

 

my $last_id_line;

 

sub next_seq {

    my $fh = shift;

    my $current_seq;  # we want this to be undef, so it can be passed back undef at EOF                                        

 

    # this will only pull in the next line on the first call to the subroutine                                                 

    # and the call after the last sequence in the file                                                                         

    unless ($last_id_line) {

        $last_id_line = <$fh>;

        if ($last_id_line && !($last_id_line =~ m/^>/)) {

            die "Non Fasta File Error\n";

        }

 

        chomp $last_id_line if ($last_id_line);

    }

    return unless $last_id_line; # this returns at EOF                                                                         

 

    my ($id, $acc, $description) = split /\|/, $last_id_line;

    $current_seq = {

        'id' => $id,

        'accession' => $acc,

        'description' => $description,

    };

    undef $last_id_line; # this sets up the ability to return undef on EOF                                                     

 

    SEQLINE: while (my $line = <$fh>) {

        chomp $line;

        if ($line =~ m/^>/) {

            $last_id_line = $line;

            last SEQLINE;

        }

        else {

            $current_seq->{'sequence'} .= $line;

        }

    }

 

    if ($current_seq->{'id'} && !$current_seq->{'sequence'}) {

    die "Missing Sequence Error\n";

    }

 

    return $current_seq;

}

1; # this is critical

 

Then we can use this code in any script like the following:

 

use IGSP::FastaSeq;

 

my $file = shift or die "fasta_processor_sub.pl <path_to_fasta_file>\n";

open (my $fh, '<'.$file) or die "Couldnt open $file $!\n";

 

my @seqs;

while (my $current_seq = &next_seq($fh)) {

    push @seqs, $current_seq;

}

 

foreach my $seq (@seqs) {

  printf "ID: %s\nACC: %s\nDescription: %s\nSeq:\n%s\n",

         $seq->{'id'},

         $seq->{'accession'},

         $seq->{'description'},

         $seq->{'sequence'};

}

exit;

Note, it is important that the IGSP directory either be a subdirectory of the current working directory when you run the script, or, more powerfully, be a subdirectory of some directory found in the perl @INC.  I usally create a 'lib' directory in my IGSP home directory, and add this to my PERL5LIB by editing my .bashrc file with the following line:

export PERL5LIB='/home/londo003/lib'

 

of course substitute your userid with mine.  This makes it available everytime I log into any server that uses my home directory and bash.  I then make all of my packages available within lib.

 

 

Resource Clashes

Sometimes it is inevitable that a subroutine exported by one package will have the same name as the subroutine exported by either the calling code, or another package used by the calling code.  For example, you might have a subroutine called 'copy' within your script.  If you use the File::Copy package to copy files from one place to another, it will clash with your own copy subroutine, which may have a complete different function.  This problem is alleviated by calling File::Copy::copy with its namespace explicitly, and by placing the call to that subroutine into a block with a different, local, throwaway package name, like so:

 

my $file = 'blah.txt';

 

{

  package Fooby;  # this is a throw away package name

  File::Copy::copy($file, '/somewhere/else/');

}

 

sub copy {

  my $a = shift;

  my $b = $a;

  return $b;

}

 

You may not ever have to worry about this, but it is possible.

 

 

 

Comments (0)

You don't have permission to comment on this page.