perlsOfLondon

 

References

Page history last edited by dmlond 2 yrs ago

  • What are references: references are scalars which, instead of holding the data within a particular part of memory, hold a pointer to another, un-named, part of memory which contains either the data, or another pointer (ad infinitem). This is a very memory efficient way of holding, and passing around, data, as you can have as many different scalars, array elements, or hash elements as you need (in many segments of the code) refering to the data, but only have the actual data in one place within the memory on the computer. Also, references form the foundation for the ability to create multi-dimensional hashes/arrays, arrays of hashes, hashes of arrays, etc, and are essential for passing multiple lists as arguments to a subroutine. With these benefits come some costs. You can not access the data in the scalar the same way as if it were not a reference. You must dereference it first (see below). Also, you must remember that the data is really only in one place, and that every reference to the data can, potentially, change that data. Thus, if you have references to the data in two different places (say within multiple objects), and one operation on one of these places changes the data, the data is changed everywhere the reference is used. Another challenge is that the data referred to by references stay around as long as some portion of the code containing a reference to that data is in scope. This sets up the potential for memory leaks in the form of circular references. Circular references really only become a concern when you are working with nested objects, multi-demensional hashes/arrays. Also, as we will see below, this is actually a feature, absolutely essential in the creation of closures.
  • Creating References in Perl
    • scalar references: scalar references are created by using the slash \ operator on an existing scalar.

      my $orig = "HELLO";
      my $ref = \$orig;

    • array references: There are multiple ways of creating array references. You can use the slash operator to create a reference to an existing array:

      my $ref = \@existing;

      Or you can create an empty array reference, using [] instead of (), and assign things to the referenced list as needed by dereferencing in an lvalue (note that the assignment uses only one of a couple of ways of dereferencing things that perl makes available, as seen next):

      my $arrayref = [];
      $$arrayref[0] = "first element";

      my $aref2 = [ "A","B","C","D"];

    • hash references: Hashes can also be referenced using the slash operator. And, like arrays, they can be created as references, using {} instead of (), and assigned to it using a dereference in an lvalue:

      my $href = {};
      $$href{"key"} = "value;

      my $href2 = { "key" => "value", "key2" => "value2" }

  • Dereferencing: Perl offers multiple ways of dereferencing references (TMTOWTDI). You can just use the double symbol way, with the intentional symbol coming before the reference symbol:

    print $$scalar_ref;
    grep { $_ eq "A" } @$array_ref;
    print $$array_ref[0];
    # an array_ref dereferenced and used as a slice
    grep { $_ eq "A" } @$array_ref[0,3,5];
    foreach my $el (@$array_ref) { ... }

    A slightly better, more readable way to do this, is to put the intentional symbol before a block containing the reference:

    foreach my $el (@{$array_ref} { ... }

    Finally, with arrays and hashes, perl gives you the pointer operator '->' to dereference individual elements:

    $aref->[0] = "first element";
    $hashref->{"key"} = \@array;

    This is the most readable way to play with array/hash reference elements.
  • multidimensional hashes/arrays: Perl does not really provide real multi-demensional arrays, hashes, arrays of hashes, or hashes of arrays, the way other programming languages do. However, using references, you can create very efficient multi-dimensional structures, making it so that you will almost not even notice that perl lacks this critical functionality :).

    my @md_array1 = ();
    $md_array1[0]->[0] = "first array, first element";
    $md_array[0]->[1] = "first array, second element";
    $md_array[1]->[1] = "second array, first element";
    $md_array[1]->[1] = "second array, second element";

    Of course, it may be more efficient to just use references the entire way through (it just takes an extra pointer operator). This is especially useful for structures that you want to pass around to methods and subroutines:

    my $md_array1 = [];
    $md_array1->[0]->[0] = "first array, first element";
    $md_array->[0]->[1] = "first array, second element";
    $md_array->[1]->[1] = "second array, first element";
    $md_array->[1]->[1] = "second array, second element";

    Also, you may just want to create these data structures in your code before runtime:

    my $href_of_arrays = {
      "array1" => ["a","b","c"],
      "array2" => ["d","e","f"]
    };
    print join(" ", @{$href_of_arrays->{"array1"}}."\n";
    print $href_of_arrays->{"array2"}->[2];

    fun with multi-demensional data structures (these really start to look alot like objects, hint, hint ;)):

    if ($seq->{"seqlength"} > 100 ) {
      foreach my $bp (@{$seq->{"base_pairs"}} { print "$_\n"; }
    }

  • coderefs: Perl also allows you to create references to subroutines, both named and (as is most often the case) anonymous. This is an extremely useful aspect of the more powerful programming languages such as perl. Named subroutines can be referenced to a scalar using the slash operator. Anonymous subroutines just need to be written with the sub parameter before the un-named code block during the assignment. Subrefs are dereferenced using the pointer '->' operator followed by the perentheses wrapped list of arguments (which may be empty):

    sub named { print "GOODBYE\n"; }

    my $sc_named_coderef = \&named; $sc_named_coderef->();
    my $sc_anon_coderef = sub { print "HELLO\n" }; $sc_anon_coderef->();

    One thing people who have programmed in other languages realized was missing from perl was the old switch statement. This statement allowed programmers to code a compact block to change the flow of the program based on the value of a particular variable, without using complex if, elsif, elsif, elsif, else code. Perl does not provide this handy flow control parameter. However, it does provide the user with the ability to create a hash of coderefs which accomplishes the same thing.


    my %switch = (
      'STRT' => sub { print "NEW SEQUENCE\n"; },
      'ACC:' => sub { my $line = shift; print "HERE IS THE ACCESSION $line\n"; },
      'LEN:' => sub { my $line = shift; print "HERE IS THE LENGTH $line\n"; },
      'TYP:' => sub { my $line = shift; print "HERE IS THE TYPE $line\n"; },
      'SEQ:' => sub { my $line = shift; print "HERE IS THE SEQ $line\n"; }
    );

    open (IN, "<seq.test");
    while (<IN>) {
      my $start = substr($_, 0, 4);
      my $rest = substr($_, 5);
      $switch{$start}->($rest);
    }

    close IN;
    exit;

    Sometimes, you want to iterate over a piece of data and do multiple processes on it. For a small number of processes, individual named subroutines can be enough, but if you want a flexible structure which can be extended easily later without having to change things in a lot of other places in the code, try using an array of coderefs.

    perl -le 'my @codeFactory = ( sub { my $line = shift; print "IT IS ".length($line)." bytes long\n"; }, sub { my $line = shift; my $first_five = substr($line, 0, 5); print "THE FIRST FIVE CHARS ARE $first_five\n"; } ); open (IN, "<seq.test"); while (<IN>) { foreach $widget (@codeFactory) { $widget->($_); } } close IN;'

  • closures: The 'problem' that makes circular references possible is really a feature. It allows for the programming of one of the cooler structures in the programming world, closures. Closures are special subroutines. They return references to an anonymous subroutine (e.g. coderefs). Most importantly, these inner subroutines have references to local variables, and filehandles created in the outer subroutine. Thus, the data in those local instances does not go away as long as the coderef is in scope, but the data can only be accessed through calls to the coderef. The easiest way to get your head around this is to see an example (from The Perl Cookbook). Here is a useful closure that can be used to measure how much time it takes to do something in a program:

    sub timer { my $otime = time; return sub { return time - $otime; }}

    #create a timer for the entire program
    my $etime = &timer();
    #do somethings
    #create another timer for a subprocess within the entire program
    my $stimer = &timer();
    #do the sub-process things

    print "SUB PROCESS TOOK ".$stimer->()." seconds\n";
    print $etime->()." seconds have passed since the program began\n";
    #continue processing the entire program

    print "ENTIRE PROGRAM TOOK ".$etime->()." seconds\n";

    Remember, the inner subroutines can take arguments, and do all sorts of complex processes, using any, all variables, locally created filehandles, etc. that were created in the scope of the outer subroutine, and new instances of the same closure can be created for multiple tasks. They also provide object oriented programmers with the only way to create REALLY private variables (other objects variables can, in theory, be accessed from other parts of the program, using the symbol table), though this is usually overkill.
  • filehandle globs: perl does provide some other ways of storing pointers from one piece of memory to another piece of memory, using typeglobs. Most typeglob uses are fairly advanced, and will not be covered here (see the Camel Book for more details). One of the more useful uses of typeglobs is to create references to handles (file handles, network handles, etc). This allows you to store handles into scalars, arrays, or (most useful) hashes to be accessed programatically (although, remember that all systems impose limits on the number of filehandles that can be simultaneously opened by any one program, but this limitation can be overcome with special functions or modules).

    open (FH, "<file_to_be_read"); my $fh = *FH; #creates a variable which can be used everywhere FH is used
    while (<$fh>) { print "$_\n"; }
    close $fh;

    perl -le 'open (FHA, ">a_files"); open (FB, ">b_files"); my $fh1 = *FHA; my $fh2 = *FB; my $writers = { "A" => $fh1, "B" => $fh2}; my @things = qw(abra bollo ante brea); foreach my $thing (@things) { print { $writers->{uc(substr($thing, 0, 1))} } $thing; } close FHA; close FB;'

Comments (0)

You don't have permission to comment on this page.