Home Perl LaTeX Downloads Publications

Perl Pitfalls and Precautions

Abstract: Perl is a powerful, terse, versatile, and multi-faceted programming language. It can be used for tasks from as trivial as changing file extensions, to as complex as managing secure databases and servers. This article is directed at novice-to-intermediate Perl programmers, working alone and without the proximal support of other Perl programmers, who might encounter the same errors that I have had to grapple with, in trying to use Perl to automate certain non-trivial text processing and software installation tasks on a desktop PC running GNU/Linux.

Some pitfalls to avoid and precautions to follow are outlined in this article. The first recommendation is the use of pragmata to assist the solitary programmer in becoming productive. The fundamental differences between scalar and list context are then examined through several detailed example scripts. The use of references, both as input arguments to subroutines and as return values from them is considered next. The interaction of Perl with the shell is often treated cursorily, if at all, in tutorial material. Here, it is reviewed briefly with emphasis on the backtick operator and the system command. A brief look at Perl one-liners concludes this survey.

Delightful though it is, Perl can and does trap the unwary. It is hoped that this article will assist others, especially those without much experience and working alone, in trying to use Perl for tasks of low-to-medium complexity.

1  Introduction

Perl is a formal programming language that is both delightful and frustrating at the same time. It is predicated on the three virtues of a programmer being laziness, impatience, and hubris [1]. Perl is very expressive because “There’s More Than One Way To Do It”, abbreviated as TMTOWTDI, and usually pronounced as tim-toady [1]. This variety of choice lends great charm and elegant variation to Perl programs. It is perhaps the only programming language that has sites devoted to poetry [2] as well as to obfuscated code [3]. In this article, I have documented some pitfalls to avoid as well as precautions and recommended practices, while writing Perl programs. I hope that it helps those working alone, and without much experience with Perl, attempting tasks of low-to-medium difficulty.

2  First things first

Perl programs can be run by typing them in directly on the command line on a terminal, or from script files. The perl interpreter parses the program at compile time and executes it at run time. This two-stage process is transparent to the user unless the compiler chokes on errors of syntax and precludes execution.

2.1  Use pragmata

Judicious use of compiler directives, or pragmata, (singular pragma), can help catch most syntactical errors. I start all my Perl scripts like so:

#!/usr/bin/perl
use warnings;
use diagnostics;
use strict;

The first line is magical on most systems and starts with a shebang symbol #! composed of a sharp for the # sign and a bang for the ! exclamation mark. The rest of the line instructs the command interpreter or shell to invoke the perl interpreter.

The next three lines, each starting with use, are pragmata. The first produces warnings, the second, verbose warning diagnostics, and the third imposes a strictness that restricts unsafe constructs. This pragmatic triad allows the novice programmer to self-correct many common errors of syntax, and become productive, even if working alone.

2.2  Syntax highlighting while editing

I also recommend that you use a text editor capable of syntax highlighting for Perl. In such an editor, comments, keywords, arguments, non-interpolated strings, interpolated strings, regular expressions, matching braces, parentheses, etc., each have their own colour. Unmatched, incomplete, or forgotten symbols like ", or ;, or } will immediately change text colours on your editing console. The resulting visual feedback, as you edit your file, can help trap many typographical errors before they have had a chance to burgeon into syntax errors.

2.3  Syntax checking

To check the syntax of a program, you should type perl -wc [filename] on the command line in a terminal and examine the output. The -c switch causes the program to be compiled but not executed, and the -w switch generates warnings in case of problems. The above is an example of invoking perl on the command line.We will look at some of these later in this article.

2.4  Getting help

To get help on Perl, type perldoc perldoc at the command line on a terminal and follow through from there. HTML documentation is also available in most standard perl installations. The standard trilogy of books for the Perl beginner are Learning Perl (also known as the Llama book) [4], Programming Perl [1] (also known as the Camel book) [1], and The Perl Cookbook [5].

There is a wealth of assistance on the Web at perldoc.perl.org [6], The Perl Directory [7], Perl Mongers [8], use Perl [9], Perl.com [10], PerlMonks [11], Planet Perl [12], and perlmeme.org [13]. There is also a friendly newsgroup for perl beginners [14] where experienced users generously help newbies on Perl problems. There is even a book, available both in printed form and online as a free download, called Learning Perl the Hard Way [15] written for those who have experience with other programming languages, but not Perl.

3  The importance of context

Perl has three built-in data types, scalars, arrays, and hashes, denoted by their respective sigils:

Data typeSigilExample
scalar$$foo
array@@foo
hash%%foo

Perl has two major contexts: scalar and list, corresponding respectively to the singular and plural in everyday parlance. Whenever an assignment is made or a statement is executed, the involved variables are assigned or evaluated either in scalar context or list context. Not distinguishing between the two, and using them incorrectly, can cause much perplexity and anguish, not to mention poor productivity.

3.1  Arrays and Scalars

The first example script, context.pl, illustrates the behaviour of arrays and scalars in scalar and list context. Download and execute it so that you can better follow the discussion below.

@fruits = ("oranges", "apples", "peaches", "bananas", "pineapples");
print @fruits, "\n"; # @fruits is not interpolated
print "@fruits\n"; # @fruits is interpolated within double quotes

The first line above assigns the comma-separated list of quoted words to the array @fruits. If the array is passed as an argument to a print statement, without being double-quoted, there are no spaces between the elements of the array: the individual fruit names are concatenated into one long string and printed out. If instead, the array name is interpolated within double quotes, a space is inserted between the individual fruit names when printed out. This behaviour is very useful when the contents of an array need to be passed to a function as a list of space-separated values. The special variable $" confers this behaviour implicitly and is assigned space as its default value.

$fruits = @fruits; # assigned in scalar context
print $fruits, "\n";

In the two lines above, the array @fruits is assigned to the scalar $fruits in scalar context. In all such cases, the value being assigned is the number of elements in the array, which in this case happens to be the number 5. It is no surprise that this is precisely the value of $fruits when it is printed out.

($fruits) = @fruits; # assigned in list context
print $fruits, "\n";

Here, the array @fruits is assigned to the scalar $fruits in list context when the latter is enclosed in parentheses on the left hand side as ($fruits). In this case, the output is dramatically different. We get the name of a fruit after all, but only the first. The elements in @fruits are pushed out, in order from the left, to the variables within the parentheses on the left hand side. Since there is only one variable in the list, namely $fruits, it gets the first fruit while all the others go to never-never land, or undef.

# $fruits = @fruits[3]; # uncomment this line to see a warning
$fruits = $fruits[3];
print $fruits, "\n";

The assignment $fruits = @fruits[3]; generates a warning and is commented out here. Uncomment and execute it to see why it should not be used. Note that Perl arrays begin their indexing at 0 just as in C. Also note the use of brackets to denote individual array elements. The next statement, $fruits = $fruits[3]; assigns to $fruits the fourth element of the array @fruits. There are two noteworthy points here: (a) we use a $ sigil rather than an @ sigil to denote a single scalar element of the array @fruits using an index; and (b) the use of the $ sigil on the right hand side does not in any way affect the meaning of the $ sigil on the left hand side. There is no namespace clash or confusion about what is being referred to on either side of the = sign. They are two unrelated entities that happen to share the expression $fruits in their names.

$fruits = ("oranges", "apples", "peaches", "bananas", "pineapples")[3];
print $fruits, "\n";

Here, we have something that is subtly different from what has gone before. Instead of assigning the fourth element of the array @fruits to $fruits, we assign the fourth element of the list of comma-separated literals (which was originally assigned to the array @fruits) to the scalar $fruits. The result is identical, as we would hope, and all is well.

$fruits = shift @fruits;
print $fruits, "\n";
$fruits = shift @fruits;
print $fruits, "\n";
print "@fruits\n";

We could use the shift operator to assign values from the array @fruits to the scalar $fruits. It is also useful for reading in command line arguments when a script is invoked with them. Notice that the shift operation alters the underlying array by removing from it the array element that was shifted out. This property is useful in some contexts. If a command line is being read wherein one does not know beforehand the number of arguments, or if a line with an unknown number of words is being read in, one can repeatedly shift words until the result is undefined, when the line is exhausted.

($f1, $f2) = @fruits;
print $f1, "\n";
print $f2, "\n";

As a variation of the assignment ($fruits) = @fruits;, if @fruits is assigned to the two scalars $f1 and $f2 in list context, these two scalars assume the values of the two leftmost array elements, the last array element being unassigned in this case.

@new_fruits = @fruits;
print "@new_fruits\n";
print "@fruits\n";

If the array @fruits is assigned to another array, new_fruits each element in the old array is assigned to a new element in the new array, in list context and the two arrays have identical contents, as would be expected.

The block of code below has been commented out because it generates a warning when used with the use warnings; and use diagnostics; pragmata.

# $fruits = ("oranges", "apples", "peaches", "bananas", "pineapples");
# print $fruits, "\n";

If you uncomment the block and execute it, you will find that the above assignment ends up with $fruits having the rightmost array value pineapples. Contrast this behaviour with $fruits having the leftmost array value oranges when @fruits was assigned to ($fruits) in list context. This is an instance where assigning a list to a scalar, and assigning an array to a scalar, either in scalar or list context, give different results, in each of the three cases. The use warnings; pragma helps avoid this syntax and any attendant confusion. From the foregoing, can you guess what would result from:

($fruits) = ("oranges", "apples", "peaches", "bananas", "pineapples");

Uncomment it and execute it to check your guess.

$fruits = ["oranges", "apples", "peaches", "bananas", "pineapples"];
print $fruits, "\n";
print "@$fruits\n";

The above assignment is tricky because an anonymous array, associated with the comma-separated list of literals within brackets, is assigned to the scalar $fruits. The latter does not hold a number or string, but rather has become a reference to an array. The contents of $fruits on my machine after one particular run was ARRAY(0x81b7d9c). To display the contents of the anonymous array, we need to dereference $fruits by inserting an @ sigil in front of the scalar to give @$fruits. This last code snippet concludes the first example script and leads rather neatly to our next topic: arrays, hashes, references, and subroutines.

4  Arrays, hashes, references, and subroutines

Elements within a Perl array are accessed by the index, which is the number denoting the offset of an element from the first element within the array. Hashes, on the other hand are selected using strings called keys associated with values. The elements of an array are stored in the order in which they are assigned. The elements of a hash are stored as key-value pairs, but not necessarily in the order in which they were assigned.

4.1  Arrays and hashes as subroutine arguments

You should download the second program, at_errors.pl, and execute it to follow the discussion below.

@greek = qw/alpha beta gamma/; # note no commas
@numbers = (0, 1, "pi", "i", "e"); # list assigned to array
%polygons = (
            triangle => 3, # keys are on the left; values on the right
            rectangle => 4,
            pentagon => 5,
            hexagon => 6,
            heptagon => 7, # note the terminal comma
            );

There are many ways to initialize arrays and hashes. The first line uses the qw or quote words operator within delimiters, chosen here to be a pair of forward slashes. Take care not to insert commas between the words within the delimiters. The second method is to assign a list to an array, as we have seen before. Note that this particular list is a mixture of numbers and quoted literals. A hash can likewise be assigned in different ways; we have here used the => arrow-like symbol to relate the keys on the left to the values on the right. The keys need not be quoted (unless they could clash with reserved words) but the values need to be quoted if they are literal strings. Two other points are noteworthy: (a) the key-value pairs are separated by commas, unlike in the qw case; and (b) the last hash assignment can be terminated by a comma so that errors do not arise due to omission of a comma when a new key-value pair is appended later.

print '@greek is: ', "@greek\n";
print '@numbers is:  ', "@numbers\n";
print '%polygons is stored in this order: ', "\n";
while (($key, $value) = each %polygons)
    {
    print $key, ' => ', $value, "\n";
    }

Printing the contents of arrays is undemanding, but printing hashes as key-value pairs requires some effort, as shown above. Examine the screen output to convince yourself that array elements are stored in the order in which they were assigned because the index carries positional significance. A hash, however, does not need to satisfy this requirement, and its key-value pairs are generally not stored in the assignment order. The integrity of the key-value relationship is preserved for hash elements, though.

sub at_errors
    
    my $arg1 = @_;          # case 1
    my ($arg2, $arg3) = @_; # case 2
    my @arg1 = @_;          # case 3
    my (@arg2, @arg3) = @_; # case 4
    ....
    }

Arguments are extracted from within subroutines in Perl via the array @_. The subroutine at_errors is designed to illustrate inadvertent errors that can arise when sufficient care is not taken while assigning the subroutine argument array @_. It is important to realize that even with a single scalar argument, @_ is still an array and behaves as one, and not as a scalar. Note that for the subroutine as written, there is nothing to restrict the number or type of arguments with which at_errors is called. The subroutine at_errors is designed to be invoked with three different arguments so:

at_errors(@greek); # (a) @_ : one array of 3 elements
at_errors(@numbers, @greek); # (b) @_ : 2 arrays with 8 elements in all
at_errors(%polygons); # (c) @_ : hash with 5 key-value pairs or 10 elements

The output for each of these cases should be clear and self-explanatory on program execution. We discuss this output below, sorted by the way in which @_ was assigned.

  1. $arg1 = @_;. This is a scalar context assignment. In all three cases, therefore, $arg1 is set to the number of elements in @_.
    1. @_ = @greek;. The latter has 3 elements. So $arg1 = 3.
    2. @_ = (@numbers, @greek);. The number of elements in @_ equals the sum of the number of elements in @numbers, which is 5, and the number of elements in @greek, which is 3, giving a total of 8. So, $arg1 = 8.
    3. @_ = %polygons;. The latter has 5 key-value pairs. Since each key-value pair is composed of two elements, there are 10 elements in all. The number of elements in @_ is therefore 10 and this is the value assigned to $arg1.
    Most of the time, the assignment $arg1 = @_; is made in error when what was meant was ($arg1) = @_;. Even when @_ holds only a single scalar, it has to be assigned to $arg1 in list context using parentheses so: ($arg1). This is because @_ is an array—even if it holds a single scalar element—and not a scalar.
  2. ($arg2, $arg3) = @_;. This is a list context assignment. The leftmost scalar elements in the flattened input argument list @_ are assigned individually to the scalar elements on the left hand side, starting with the leftmost.
    1. @_ = @greek;. The first element of @greek is alpha and this is assigned to $arg2. The second element of @greek is beta and it is assigned to $arg3.
    2. @_ = (@numbers, @greek);. The input argument array @_ is a flattened list consisting of the elements of @numbers followed by the elements of @greek. The two leftmost elements of @_ are the first two elements of @numbers, which are 0 and 1 respectively. So, $arg2 = 0 and $arg3 = 1.
    3. @_ = %polygons;. The flattened list @_ contains the key-value pairs in %polygons in whatever order they are stored when the program is executed. So $arg2 is assigned to the first key and $arg3 is assigned to the first value in %polygons. The order in which the hash elements are stored is shown at the head of the program output.
  3. @arg1 = @_;. The elements in @_ are assigned in list context to the elements in the array @arg1. The latter consumes all the elements in @_.
    1. @_ = @greek;. This case is tantamount to setting @arg1 = @greek; so that @arg1 is a copy of @greek.
    2. @_ = (@numbers, @greek);. Here that flattened list @_ may be thought of as a concatenation of the array @numbers followed by the array @greek. @arg1 contains all the elements of @numbers followed by all the elements of @greek. Note that it is not possible to recover either @numbers or @greek from @arg1 without independently knowing the elements of either array beforehand. The two input arrays lose their individual integrities when they commingle into the single flattened list @_.
    3. @_ = %polygons;. This case is equivalent to the assignment @arg1 = %polygons and it illustrates what happens when a hash is assigned to an array. As noted before, a hash of size 5 becomes a flattened list of 5 key-value pairs or a total of 10 elements. The array @arg1 contains the consecutive key-value pairs of the hash in the order in which they were stored, and as printed out at the head of the output.
  4. (@arg2, @arg3) = @_;. Here the input argument list is assigned to two arrays in list context. We have already seen the ability of a single assigned array to gobble up the entire argument list. So, one could hazard a guess that the leftmost array @arg2 will consume the whole of whatever is in @_ and that the array @arg3 will be empty. This is indeed what happens in all three cases.
    1. @_ = @greek;. In this case, @arg2 equals @greek and @arg3 is empty.
    2. @_ = (@numbers, @greek);. Here @arg2 contains all the elements of @numbers followed by all the elements of @greek, leaving @arg3 empty.
    3. @_ = %polygons;. Here again, @arg2 contains all the key-values pairs in %polygons and @arg3 is empty.

The conclusion to be drawn from this exercise is that the input argument list of a subroutine is flattened into a single list in @_. If the input arguments are a list of scalars, it is possible to extract and identify each of them uniquely. If the input argument is a single array or a single hash, it is possible to extract and identify it uniquely from @_. But any combination of scalars and/or arrays and/or hashes, cannot be extracted and uniquely identified from within the subroutine. Because the inputs to and outputs from a subroutine are flattened lists, it is not possible to pass more than one array or one hash as an argument to a subroutine or as the return value from it. The ideal solution to this is to somehow map each array or hash to a unique scalar and pass that scalar in the argument list to a subroutine; this is considered in the next subsection.

4.2  References as subroutine arguments and return values

References are scalars holding the addresses, to other Perl entities like hashes, arrays, scalars, subroutines, filehandles, and other references. We are most concerned here with passing array and hash references as scalar arguments to subroutines.

4.3  References of different types

The next code example, references.pl, shows how an indefinite number of input arguments, consisting of references, may be passed to a subroutine and identified and printed out from within the subroutine by shifting them out of the @_ array. Download and execute the program and examine its output carefully. Pay careful attention to the definition of the subroutine print_all, which should be self-explanatory.

use strict; # disallow symbolic references but allow hard references
.....
$answer = 42; # courtesy of Douglas Adams
@numbers = (0, 1, "pi", "i", "e"); # mixture of numbers and quoted literals
# Ensure that there is an even number of elements in the hash
# especially when written like this
%capitals = qw(UK London France Paris Australia Canberra Iceland Reykjavik);

$answer_sref = \$answer; # reference to a scalar
$answer_sref_ref = \$answer_sref; # reference to another reference
$numbers_aref = \@numbers; # reference to an array
$capitals_href = \%capitals; # reference to a hash
$print_all_cref = \&print_all; # reference to a subroutine/code
$handle_gref = \*STDOUT; # reference to a filehandle

# Invoke subroutine with different types of references
# including numbers and quoted literals
print_all($capitals_href, $numbers_aref, $answer_sref,
$answer_sref_ref, $print_all_cref, $handle_gref, 1.414, "home alone");
.....

The subroutine print_all can be invoked with an indefinite of number of arguments, each of which can be identified and printed out. The invocation above includes all standard reference types, as well as numbers and quoted literals. The output demonstrates that references to arrays and hashes can be passed as subroutine arguments, and the underlying arrays and hashes can be retrieved from within the subroutine without loss of identity and integrity.

4.4  Sorting an array and inverting a hash

A subroutine implicitly returns the last evaluated expression. It can also explicitly return a scalar or list. For the same reasons as avoiding arrays and hashes as subroutine input arguments, namely that they lose their identities and are flattened into a single list, we also avoid them in return values. If more than one value is to be returned, a list of scalars, which includes references, is used. The next example, sort_invert.pl, deals with sorting an array and inverting a hash.

sub sort_array
    {
    my ($aref) = @_; # called with an array reference as the argument
    my @array = sort(@$aref);
    return (\@array); # return a new array reference
    }

The above subroutine takes in an array reference, sorts it, assigns the result to a new array, and passes a reference to the new array in its return value. The first pitfall to avoid here is not to assign @_ in scalar mode, but rather in list mode. The second point to note is that the sorted array is returned in list context. A local array, @array is assigned to this sorted array. Normally, @array will disappear once we have exited the scope of the subroutine. However, because the subroutine returns a reference to this array, it will not disappear once the subroutine block is exited. The upshot is that there will exist in memory, two arrays: the original unsorted one, and the new sorted one. One question that arises is “Can this sorting be done in place, and the results returned in the original array itself?”. The answer is that it can. The next code segment shows how.

sub sort_array_in_place
    {
    my ($aref) = @_; # called with an array reference as the argument
    @$aref = sort(@$aref); # sort original array and return it into itself
    return ($aref); # return original input array reference
    }

The terseness with which this is accomplished is testimony to the fulfilment of the design specification that “Perl is designed to make easy jobs easy, without making the hard jobs impossible.” [1]. A similar observation applies to the two hash inversion subroutines invert_hash and invert_hash_in_place, the latter being shown below:

sub invert_hash_in_place
    {
    my ($href) = @_; # called with a hash reference as the argument
    %$href = reverse %$href; # reverse original hash; assign it into itself
    return ($href); # return original input hash reference
    }

For hash inversions, the reverse function accomplishes key-value exchange most efficiently, although there are other ways to do it. The output from this example also prints out the values of the references to highlight the distinction between the in-place and non-in-place versions.

5  Interaction with the system

The standard shell in many GNU/Linux installations is the bash shell. The shell itself contains many builtin commands. The usage of any specific command is displayed by typing help [command name] on the command line. There are other system commands that are not shell builtins, but whose usage is revealed with the traditional man [command name].

It is often necessary to interact with the system from within a Perl script. Examples include a script for automatically installing a custom software package, or a script to create a time-stamped archive of a directory, etc. It could be argued that such tasks could be executed as shell scripts. But if one is accomplishing a long tasklist in which text filtering, in-place text substitution, etc., are also involved, it would be ideal if tasks specific to the shell script could also be accomplished from within Perl.

There are several commands in Perl to enable this close interaction. We focus here on two of them: the `` or qx\\ or backticks command and the system command. We look at each in turn.

5.1  The backticks command

The backticks command allows the output from a shell builtin or system command to be assigned to a variable within a Perl script. That output could subsequently be processed further to extract useful information for later use. Let us assume that we want to create a hyphen-separated directory name consisting of the username, current month abbreviated with three letters, and the current year in four digits. Typically, such a directory name would be user-mon-nnnn. The task is then to get the three fields of information from system commands, string them together in a literal, and create a directory with that name. The steps to do this are in the next script, backticks.pl.

# On my system, the locale defines the default format for date
# and typical output is 
# Wed Jan 16 16:07:52 WST 2008
$date = `date`;
$date =~ m/\w{3}\s(\w{3}).*(\d{4})$/; # do not forget the ~
$month = $1;
$year = $2;
print "$month-$year\n";

The first precaution is to use the same punctuation symbol for the backticks. It is all too easy to use ` for the opening symbol and ' for the closing symbol. If you cannot locate the correct ` symbol, perhaps you should use the qx\\ version instead.

The second point to note is that the output is a single space-separated string ending with a newline, and not a series of lines, each ending with a newline. It is proper here to capture the output in a string. This is doubly appropriate because the string matching that follows can only be done on a string. The two captured substring matches are then combined to generate the required month-year string as evidenced by the printed output.

# Alternatively, one could define the format to suit
$month_year = qx\date +%b-%Y\;
chomp $month_year; # chomp because this string is newline terminated
print "$month_year\n";

Because locales and their default settings differ, as an alternative method, you might want to customize the format of the output from the date command and capture it directly in a string as is done with $month_year above. Because the string $month_year is the only output from the above command, it is terminated by a newline. To use this string to create a directory, we should strip the terminal newline, or input record separator, with the chomp function. Failure to chomp output strings captured from other commands can lead to cryptic warnings and frustrating errors. It pays to be attentive to this, and perhaps to chomp such strings as a precaution, even if they did not have terminal newlines.

# @export = `export -p`; # does not work for a shell builtin
@export = `sh -c export -p`;
($user) = grep /USER/, @export; # use list mode for assignment
$user =~ m/.*\W+(\w+)\W+$/;
$user = $1;
print "$user\n";

The next segment of code relies on the shell builtin, export, and cannot be executed as a system command alone. Its syntax is displayed by typing help export on the command line in a terminal.

The first line in this code segment is commented out. Uncomment it from your downloaded script to see what sort of warning you get. You might not be able to guess from that warning that export is a shell builtin and cannot be executed as a system command in non-interactive mode unless it is prefixed by sh -c as shown above. The second pitfall to avoid is to capture the output in a scalar variable. This builtin gives multiple lines of newline-terminated output and must therefore be captured in list context, as by an array here. The result of the grep command should also be captured in list mode. The username is the last word within single or double quotes on the grepped line and is captured in the $user variable and extracted from it.

# create desired directory within home directory 
# without error checking of any kind
# uncomment to execute if desired
# chdir; # go to home directory
# mkdir $dir; # create directory, assuming it does not exist

The final block of code is commented out in the script to avoid interfering with your home directory. You could uncomment and execute it to confirm that the desired directory has indeed been created. The created directory can easily be removed by typing rmdir followed by the directory name on the command line in the user’s home directory.

5.2  The system command

The system command allows a Perl script to execute system commands mid-way and to resume execution of the script on completion of those commands. The system command can be invoked with a list argument or a scalar argument and the behaviour is most clearly explained by the following quote from the documentation obtained by typing perldoc -f system on the command line on a terminal:

Note that argument processing varies depending on the number of arguments.

If there is more than one argument in LIST, or if LIST is an array with more than one value, [system] starts the program given by the first element of the list with arguments given by the rest of the list.

If there is only one scalar argument, the argument is checked for shell metacharacters, and if there are any, the entire argument is passed to the system’s command shell for parsing (this is /bin/sh -c on Unix platforms, but varies on other platforms). If there are no shell metacharacters in the argument, it is split into words and passed directly to execvp, which is more efficient.

A contextual example will drive home this difference in behaviour. Suppose we are installing apache2 and php5 on a machine and would like to create a PHP test page in a file called /var/www/test.php that is writable only by the superuser. Also, let the content of the test.php file simply be the two lines shown below:

# test.php
<?php phpinfo();?>

Our quest is to create this file non-interactively, and it poses a challenge because characters like { } < > ( ) ; * ? are shell metacharacters and are not treated as literals by the shell. The # symbol is treated as a comment and ? is also a special shell variable in bash. The fact that our test file text is peppered with some of these special symbols means that if the arguments to system were given as a single double quoted list, they would be passed to the shell for parsing. They would then be treated as metacharacters and not literals and our intent would be thwarted.

It is instructive to look at how to set up the test.php file using a bash script.

#!/bin/sh
# Set up /var/www/test.php for Apache 2 server
sudo sh -c "echo '# test.php' > /var/www/test.php";
sudo sh -c "echo '<?php phpinfo();?>' >> /var/www/test.php";

Because the target file resides in a system directory to which only the superuser has write permission, the command is invoked with the system sudo command. Double quotes are used to pass the first line, which includes the echo command and the shell redirection operator >. Note, however, that # test.php is within single quotes to allow it to be parsed literally and not be treated as a comment. The first line is echoed and written into the file. In the second line, the single quotes are again used to literalize the troublesome characters and this time the shell append operator >> appends this line to the file. The use of > and >> within the shell simplifies creating and appending to files. There is no need to open filehandles, and close them, or to write to temporary files. If we can accomplish these same actions, but from within a Perl script, that would be very concise and elegant. The next example does just this.

#!/usr/bin/perl
use strict;
use diagnostics;
use warnings;
# Create /var/www/test.php for Apache 2
system "sudo", "sh", "-c", "echo '# test.php' > /var/www/test.php";
system "sudo", "sh", "-c", "echo '<?php phpinfo();?>' >> /var/www/test.php";

Note that we are here using a list with more than one argument and therefore, the sudo command is directly executed, bypassing the shell. However, it is followed by precisely those commands that invoke the shell via arguments, "sh", "-c". The purpose in splitting the input arguments to sudo is to avoid long double-quoted strings in which either Perl or the shell could encounter metacharacters, and to pass those metacharacters within single quotes so that both Perl and the shell do not treat them as literals. The reason for invoking the shell is to use its ability to write and append to files through the redirection operators > and >>. The double-quoted string beginning with echo should embody the redirection operators within itself, just as the shell script does. And the text to be treated literally within this string is single-quoted, again as in the shell script.

It bears noting that when the Perl system command completes successfully, it returns a zero rather than a non-zero result, unlike most Perl commands. Finally, the reason for doing this in a Perl script rather than a bash script is that this file creation could be integrated with other more recondite tasks involved with setting up the same software that are better accomplished from within perl.

6  Perl one-liners

Because Perl was inspired by such programs as grep [16], sed [17], and awk [18], there is a tradition of replacing these traditional UNIX/Linux utilities with Perl one-liners. These are terse and efficient ways to perform a multiplicity of tasks on a succession of files using Perl. Perl one-liners are profoundly useful to know. One well recommended monograph that gives a gentle but thorough introduction to Perl one-liners is Minimal Perl [19]. It provides side-by-side comparisons with such UNIX/Linux mainstays as grep, sed, and awk. Most of all, it explains many of Perl’s command line switches and the magic they embody: something that is tucked away out of sight, until you explore it by hitting perldoc perlrun on the terminal. Three online articles treating of Perl one-liners are Perl One Liners [20], Cultured Perl: One-liners 101 [21], and Perl One-liners [22].

6.1  The -n, -p, -l, -e, and -i switches

When perl is invoked with the -n or -p switch, it performs an implicit while(<>) loop operation, iterating over filename arguments, as explained in Programming Perl. One little publicized fact is that the label for the implicit loop is LINE. The difference between the -p and -n switches is that the former prints all lines by default, whereas the latter does not. The -l switch automatically chomps the line terminator when used in conjunction with -n or -p. It also allows the line terminator character to be specified by yet another switch.

When perl is invoked with the -e switch and single quotes, all text within the single quotes including interpolated double quoted strings are processed as if they were within a script. When the -e switch is combined with either the -n or -p switch, one could perform many useful functions that rely on textual matching, substitution, filtering etc., on entire files, directly from the command line.

The -i switch allows for in-place editing of files with the option of backing up the original file. Together, the above switches allow for in-place search and replacements from the command line using Perl one-liners.

An example will illustrate what is possible. Suppose we want to change all occurrences of uppercase HTML tags to lowercase ones in an HTML file. A Perl script to do that is shown and explained below:

#!/usr/bin/perl -i.bak -wpl
# Lowercase HTML tags in-place

# assemble regex for substitution
# note that delimiters are not /// but :::
s:
</?\w+| # look for < and </ followed by one or more word characters; or
\b\w+=| # look for any word terminating at an = sign; or
\b\w+\s\w+= # look for any word pair terminating at an = sign
:\L$&:gx; # lowercase all matched strings globally and allow comments
# The corresponding Perl one-liner is
# 
# perl -i.bak -wpl -e 's:</?\w+|\b\w+=|\b\w+\s\w+=:\L$&:gx;'
#
# to be invoked with [htmlfilename.htm[l]] at the end

If this script is stored in the file lchtml_script.pl, it is invoked as

lchtml_script.pl [htmlfilename.htm[l]]

Note that with the options in the perl command line, we do not explicitly open a file but rather enter directly and implicitly into the while (<>) loop that is executed after the file is opened, and the filehandle assigned. The regular expression being matched for substitution is explained in the code above. The -i.bak combination allows in-place editing with the original file being backed up with the suffix .bak. The -w switch enables warnings; the -p and -l switches function as explained before. The g modifier processes the file globally for all occurrences and the x modifier allows expanded mode for commenting the code as we have done. The above is still a script, though. If one were to type it as a one-liner at the command line in a terminal, one would uncomment and type the appropriate line in the code segment above, and end it with the filename. Neither the script nor the one-liner checks for correct usage or gives usage messages, though.

7  Conclusions

Perl is powerful, versatile, multi-faceted, and syntactically so rich that it is never boring to use as a programming language. The fact that it does not straitjacket programming approaches or styles can also be traps for the unwary. In this article, I have surveyed some potential pitfalls that can ensnare the novice-to-intermediate Perl programmer and also suggested precautions to avoid them. In the spirit of Perl, I have tried to summarize this article in the form of my first Perl poem, poem.pl. It should be compiled, without pragmata for once, as perl -c poem.pl and it should execute without any errors (or output for that matter)!

pragmata;
scalar (@ARGV) or list; 
chomp;
sub ref{}
backtick;
system and shell;
s/perl -pi -e one-liners are as easy as pie?/y : n/;

References

[1]
L. Wall, T. Christiansen, and J. Orwant, Programming Perl, 3rd ed. Sebastopol, CA, USA: O’Reilly Media, Inc., Jul. 2000, (also known as the Camel book).
[2]
“Perl Poetry.” [Online]. Available: http://www.perlmonks.org/?node=Perl\%20Poetry
[3]
“Obfuscated Code.” [Online]. Available: http://www.perlmonks.org/?node=Obfuscated\%20Code
[4]
R. L. Schwartz, T. Phoenix, and brian d foy, Learning Perl, 4th ed. Sebastopol, CA, USA: O’Reilly Media, Inc., Jul. 2005, (also known as the Llama book).
[5]
T. Christiansen and N. Torkington, Perl Cookbook, 2nd ed. Sebastopol, CA, USA: O’Reilly Media, Inc., Aug. 2003.
[6]
“Tutorials - perldoc.perl.org.” [Online]. Available: http://perldoc.perl.org/index-tutorials.html
[7]
“The Perl Directory - perl.org.” [Online]. Available: http://www.perl.org/
[8]
“Perl Mongers.” [Online]. Available: http://www.pm.org/
[9]
“use Perl: All the Perl that’s Practical to Extract and Report.” [Online]. Available: http://use.perl.org/
[10]
“Perl.com: The Source for Perl – perl development, conferences.” [Online]. Available: http://www.perl.com/
[11]
“PerlMonks - The Monastery Gates.” [Online]. Available: http://www.perlmonks.org/
[12]
“Planet Perl - an aggregation of Perl blogs.” [Online]. Available: http://planet.perl.org/
[13]
“Welcome to perlmeme.org.” [Online]. Available: http://www.perlmeme.org/
[14]
“perl.beginners archive - nntp.perl.org.” [Online]. Available: http://www.nntp.perl.org/group/perl.beginners/
[15]
A. B. Downey, “Learning Perl the Hard Way.” [Online]. Available: http://www.greenteapress.com/perl/perl.pdf
[16]
“grep - Wikipedia, the free encyclopedia.” [Online]. Available: http://en.wikipedia.org/wiki/Grep
[17]
“sed - Wikipedia, the free encyclopedia.” [Online]. Available: http://en.wikipedia.org/wiki/Sed
[18]
“AWK (programming language) - Wikipedia, the free encyclopedia.” [Online]. Available: http://en.wikipedia.org/wiki/Awk
[19]
T. Maher, Minimal Perl: For UNIX and Linux People. Greenwich, CT, USA: Manning Publications, Oct. 2006.
[20]
J. Mates, “Perl One Liners.” [Online]. Available: http://sial.org/howto/perl/one-liner/
[21]
T. Zlatanov, “Cultured Perl: One-liners 101.” [Online]. Available: http://www.ibm.com/developerworks/linux/library/l-p101/
[22]
J. Bay, “Perl One-liners,” The Perl Review, vol. 0, no. 1, pp. 1–8, Mar. 2002. [Online]. Available: http://www.theperlreview.com/Issues/The\_Perl\_Review\_0\_1.pdf


Please email me your comments and corrections.

© R (Chandra) Chandrasekhar, January 2008


This document was translated from LATEX by HEVEA.

Last generated on Sat Sep 20 18:54:24 WST 2008

Site Design by Nandakumar Chandrasekhar.

Valid CSS Valid HTML 4.01 Transitional