Perl Pitfalls and Precautions |
Abstract: Perl is a powerful, terse, versatile, and multi-faceted programming language. It can be used for tasks from as trivial as changing file extensions, to as complex as managing secure databases and servers. This article is directed at novice-to-intermediate Perl programmers, working alone and without the proximal support of other Perl programmers, who might encounter the same errors that I have had to grapple with, in trying to use Perl to automate certain non-trivial text processing and software installation tasks on a desktop PC running GNU/Linux.Some pitfalls to avoid and precautions to follow are outlined in this article. The first recommendation is the use of pragmata to assist the solitary programmer in becoming productive. The fundamental differences between scalar and list context are then examined through several detailed example scripts. The use of references, both as input arguments to subroutines and as return values from them is considered next. The interaction of Perl with the shell is often treated cursorily, if at all, in tutorial material. Here, it is reviewed briefly with emphasis on the backtick operator and the
systemcommand. A brief look at Perl one-liners concludes this survey.Delightful though it is, Perl can and does trap the unwary. It is hoped that this article will assist others, especially those without much experience and working alone, in trying to use Perl for tasks of low-to-medium complexity.
1 Introduction
Perl is a formal programming language that is both delightful and frustrating at the same time. It is predicated on the three virtues of a programmer being laziness, impatience, and hubris [1]. Perl is very expressive because “There’s More Than One Way To Do It”, abbreviated as TMTOWTDI, and usually pronounced as tim-toady [1]. This variety of choice lends great charm and elegant variation to Perl programs. It is perhaps the only programming language that has sites devoted to poetry [2] as well as to obfuscated code [3]. In this article, I have documented some pitfalls to avoid as well as precautions and recommended practices, while writing Perl programs. I hope that it helps those working alone, and without much experience with Perl, attempting tasks of low-to-medium difficulty.
2 First things first
Perl programs can be run by typing them in directly on the command line on a
terminal, or from script files. The perl interpreter parses the program
at compile time and executes it at run time. This two-stage process is
transparent to the user unless the compiler chokes on errors of syntax and
precludes execution.
2.1 Use pragmata
Judicious use of compiler directives, or pragmata, (singular pragma), can help catch most syntactical errors. I start all my Perl scripts like so:
#!/usr/bin/perl use warnings; use diagnostics; use strict;
The first line is magical on most systems and starts with a shebang
symbol #! composed of a sharp for the # sign and a
bang for the ! exclamation mark. The rest of the line instructs
the command interpreter or shell to invoke the perl interpreter.
The next three lines, each starting with use, are pragmata. The
first produces warnings, the second, verbose warning diagnostics, and the
third imposes a strictness that restricts unsafe constructs. This pragmatic
triad allows the novice programmer to self-correct many common
errors of syntax, and become productive, even if working alone.
2.2 Syntax highlighting while editing
I also recommend that you use a text editor capable of syntax
highlighting for Perl. In such an editor, comments, keywords, arguments,
non-interpolated strings, interpolated strings, regular expressions, matching
braces, parentheses, etc., each have their own colour. Unmatched, incomplete, or
forgotten symbols like ", or ;, or } will immediately
change text colours on your editing console. The resulting visual feedback,
as you edit your file, can help trap many typographical errors before
they have had a chance to burgeon into syntax errors.
2.3 Syntax checking
To check the syntax of a program, you should type perl -wc [filename] on
the command line in a terminal and examine the output. The -c switch
causes the program to be compiled but not executed, and the -w switch
generates warnings in case of problems. The above is an example of invoking
perl on the command line.We will look at some of these
later in this
article.
2.4 Getting help
To get help on Perl, type perldoc perldoc at the command line on a
terminal and follow through from there. HTML documentation is also available in
most standard perl installations. The standard trilogy of books for the
Perl beginner are
Learning
Perl (also known as the Llama book) [4],
Programming
Perl [1] (also known as the Camel
book) [1], and
The Perl
Cookbook [5].
There is a wealth of assistance on the Web at perldoc.perl.org [6], The Perl Directory [7], Perl Mongers [8], use Perl [9], Perl.com [10], PerlMonks [11], Planet Perl [12], and perlmeme.org [13]. There is also a friendly newsgroup for perl beginners [14] where experienced users generously help newbies on Perl problems. There is even a book, available both in printed form and online as a free download, called Learning Perl the Hard Way [15] written for those who have experience with other programming languages, but not Perl.
3 The importance of context
Perl has three built-in data types, scalars, arrays, and hashes, denoted by their respective sigils:
| Data type | Sigil | Example |
| scalar | $ | $foo |
| array | @ | @foo |
| hash | % | %foo |
Perl has two major contexts: scalar and list, corresponding respectively to the singular and plural in everyday parlance. Whenever an assignment is made or a statement is executed, the involved variables are assigned or evaluated either in scalar context or list context. Not distinguishing between the two, and using them incorrectly, can cause much perplexity and anguish, not to mention poor productivity.
3.1 Arrays and Scalars
The first example script, context.pl, illustrates the behaviour of arrays and scalars in scalar and list context. Download and execute it so that you can better follow the discussion below.
@fruits = ("oranges", "apples", "peaches", "bananas", "pineapples");
print @fruits, "\n"; # @fruits is not interpolated
print "@fruits\n"; # @fruits is interpolated within double quotes
The first line above assigns the comma-separated list of quoted words to the
array @fruits. If the array is passed as an argument to a print
statement, without being double-quoted, there are no spaces between the elements
of the array: the individual fruit names are concatenated into one long string
and printed out. If instead, the array name is interpolated within
double quotes, a space is inserted between the individual fruit names when
printed out. This behaviour is very useful when the contents of an array need to
be passed to a function as a list of space-separated values. The special
variable $" confers this behaviour implicitly and is assigned space as
its default value.
$fruits = @fruits; # assigned in scalar context print $fruits, "\n";
In the two lines above, the array @fruits is assigned to the scalar
$fruits in scalar context. In all such cases, the value being assigned
is the number of elements in the array, which in this case happens to
be the number 5. It is no surprise that this is precisely the value of
$fruits when it is printed out.
($fruits) = @fruits; # assigned in list context print $fruits, "\n";
Here, the array @fruits is assigned to the scalar $fruits in
list context when the latter is enclosed in parentheses on the left
hand side as ($fruits). In this case, the output is dramatically
different. We get the name of a fruit after all, but only the first. The
elements in @fruits are pushed out, in order from the left, to the
variables within the parentheses on the left hand side. Since there is only one
variable in the list, namely $fruits, it gets the first fruit while all
the others go to never-never land, or undef.
# $fruits = @fruits[3]; # uncomment this line to see a warning $fruits = $fruits[3]; print $fruits, "\n";
The assignment $fruits = @fruits[3]; generates a warning and is commented
out here. Uncomment and execute it to see why it should not be used. Note that
Perl arrays begin their indexing at 0 just as in C. Also note the use of
brackets to denote individual array elements. The next statement, $fruits = $fruits[3]; assigns to $fruits the fourth element of the array @fruits. There
are two noteworthy points here: (a) we use a $ sigil rather than an
@ sigil to denote a single scalar element of the array @fruits
using an index; and (b) the use of the $ sigil on the right hand side
does not in any way affect the meaning of the $ sigil on the left hand
side. There is no namespace clash or confusion about what is being referred to
on either side of the = sign. They are two unrelated entities that happen
to share the expression $fruits in their names.
$fruits = ("oranges", "apples", "peaches", "bananas", "pineapples")[3];
print $fruits, "\n";
Here, we have something that is subtly different from what has gone before.
Instead of assigning the fourth element of the array @fruits to
$fruits, we assign the fourth element of the list of
comma-separated literals (which was originally assigned to the array
@fruits) to the scalar $fruits. The result is identical, as we
would hope, and all is well.
$fruits = shift @fruits; print $fruits, "\n"; $fruits = shift @fruits; print $fruits, "\n"; print "@fruits\n";
We could use the shift operator to assign values from the array
@fruits to the scalar $fruits. It is also useful for
reading in command line arguments when a script is invoked with them. Notice
that the shift operation alters the underlying array by removing
from it the array element that was shifted out. This property is useful in some
contexts. If a command line is being read wherein one does not know beforehand
the number of arguments, or if a line with an unknown number of words is being
read in, one can repeatedly shift words until the result is undefined, when the
line is exhausted.
($f1, $f2) = @fruits; print $f1, "\n"; print $f2, "\n";
As a variation of the assignment ($fruits) = @fruits;, if @fruits
is assigned to the two scalars $f1 and $f2 in list context, these
two scalars assume the values of the two leftmost array elements, the
last array element being unassigned in this case.
@new_fruits = @fruits; print "@new_fruits\n"; print "@fruits\n";
If the array @fruits is assigned to another array, new_fruits
each element in the old array is assigned to a new element in the new array, in
list context and the two arrays have identical contents, as would be expected.
The block of code below has been commented out because it generates a warning
when used with the use warnings; and use diagnostics; pragmata.
# $fruits = ("oranges", "apples", "peaches", "bananas", "pineapples");
# print $fruits, "\n";
If you uncomment the block and execute it, you will find that the above
assignment ends up with $fruits having the rightmost array value
pineapples. Contrast this behaviour with $fruits having the
leftmost array value oranges when @fruits was assigned to
($fruits) in list context. This is an instance where assigning a list to
a scalar, and assigning an array to a scalar, either in scalar or list context,
give different results, in each of the three cases. The use warnings;
pragma helps avoid this syntax and any attendant confusion. From the
foregoing, can you guess what would result from:
($fruits) = ("oranges", "apples", "peaches", "bananas", "pineapples");
Uncomment it and execute it to check your guess.
$fruits = ["oranges", "apples", "peaches", "bananas", "pineapples"]; print $fruits, "\n"; print "@$fruits\n";
The above assignment is tricky because an anonymous array, associated
with the comma-separated list of literals within brackets, is assigned to the
scalar $fruits. The latter does not hold a number or string, but rather
has become a reference to an array. The contents of $fruits on
my machine after one particular run was ARRAY(0x81b7d9c). To display the
contents of the anonymous array, we need to dereference $fruits by
inserting an @ sigil in front of the scalar to give @$fruits. This
last code snippet concludes the first example script and leads rather neatly to
our next topic: arrays, hashes, references, and subroutines.
4 Arrays, hashes, references, and subroutines
Elements within a Perl array are accessed by the index, which is the number denoting the offset of an element from the first element within the array. Hashes, on the other hand are selected using strings called keys associated with values. The elements of an array are stored in the order in which they are assigned. The elements of a hash are stored as key-value pairs, but not necessarily in the order in which they were assigned.
4.1 Arrays and hashes as subroutine arguments
You should download the second program, at_errors.pl, and execute it to follow the discussion below.
@greek = qw/alpha beta gamma/; # note no commas
@numbers = (0, 1, "pi", "i", "e"); # list assigned to array
%polygons = (
triangle => 3, # keys are on the left; values on the right
rectangle => 4,
pentagon => 5,
hexagon => 6,
heptagon => 7, # note the terminal comma
);
There are many ways to initialize arrays and hashes. The first line uses the
qw or quote words operator within delimiters, chosen here to be a pair of
forward slashes. Take care not to insert commas between the words within the
delimiters. The second method is to assign a list to an array, as we have seen
before. Note that this particular list is a mixture of numbers and quoted
literals. A hash can likewise be assigned in different ways; we have here used
the => arrow-like symbol to relate the keys on the left to the values on
the right. The keys need not be quoted (unless they could clash with reserved
words) but the values need to be quoted if they are literal strings. Two other
points are noteworthy: (a) the key-value pairs are separated by commas, unlike
in the qw case; and (b) the last hash assignment can be terminated by a
comma so that errors do not arise due to omission of a comma when a new
key-value pair is appended later.
print '@greek is: ', "@greek\n";
print '@numbers is: ', "@numbers\n";
print '%polygons is stored in this order: ', "\n";
while (($key, $value) = each %polygons)
{
print $key, ' => ', $value, "\n";
}
Printing the contents of arrays is undemanding, but printing hashes as key-value pairs requires some effort, as shown above. Examine the screen output to convince yourself that array elements are stored in the order in which they were assigned because the index carries positional significance. A hash, however, does not need to satisfy this requirement, and its key-value pairs are generally not stored in the assignment order. The integrity of the key-value relationship is preserved for hash elements, though.
sub at_errors
my $arg1 = @_; # case 1
my ($arg2, $arg3) = @_; # case 2
my @arg1 = @_; # case 3
my (@arg2, @arg3) = @_; # case 4
....
}
Arguments are extracted from within subroutines in Perl via the array @_.
The subroutine at_errors is designed to illustrate inadvertent errors
that can arise when sufficient care is not taken while assigning the subroutine
argument array @_. It is important to realize that even with a single
scalar argument, @_ is still an array and behaves as one, and not as a
scalar. Note that for the subroutine as written, there is nothing to restrict
the number or type of arguments with which at_errors is called. The
subroutine at_errors is designed to be invoked with three different
arguments so:
at_errors(@greek); # (a) @_ : one array of 3 elements at_errors(@numbers, @greek); # (b) @_ : 2 arrays with 8 elements in all at_errors(%polygons); # (c) @_ : hash with 5 key-value pairs or 10 elements
The output for each of these cases should be clear and self-explanatory on
program execution. We discuss this output below, sorted by the way in which
@_ was assigned.
-
$arg1 = @_;. This is a scalar context assignment. In all three cases, therefore,$arg1is set to the number of elements in@_.-
@_ = @greek;. The latter has3elements. So$arg1 = 3. @_ = (@numbers, @greek);. The number of elements in@_equals the sum of the number of elements in@numbers, which is5, and the number of elements in@greek, which is3, giving a total of8. So,$arg1 = 8.@_ = %polygons;. The latter has5key-value pairs. Since each key-value pair is composed of two elements, there are10elements in all. The number of elements in@_is therefore10and this is the value assigned to$arg1.
$arg1 = @_;is made in error when what was meant was($arg1) = @_;. Even when@_holds only a single scalar, it has to be assigned to$arg1in list context using parentheses so:($arg1). This is because@_is an array—even if it holds a single scalar element—and not a scalar. -
($arg2, $arg3) = @_;. This is a list context assignment. The leftmost scalar elements in the flattened input argument list@_are assigned individually to the scalar elements on the left hand side, starting with the leftmost.-
@_ = @greek;. The first element of@greekisalphaand this is assigned to$arg2. The second element of@greekisbetaand it is assigned to$arg3. @_ = (@numbers, @greek);. The input argument array@_is a flattened list consisting of the elements of@numbersfollowed by the elements of@greek. The two leftmost elements of@_are the first two elements of@numbers, which are0and1respectively. So,$arg2 = 0and$arg3 = 1.@_ = %polygons;. The flattened list@_contains the key-value pairs in%polygonsin whatever order they are stored when the program is executed. So$arg2is assigned to the first key and$arg3is assigned to the first value in%polygons. The order in which the hash elements are stored is shown at the head of the program output.
-
@arg1 = @_;. The elements in@_are assigned in list context to the elements in the array@arg1. The latter consumes all the elements in@_.-
@_ = @greek;. This case is tantamount to setting@arg1 = @greek;so that@arg1is a copy of@greek. @_ = (@numbers, @greek);. Here that flattened list@_may be thought of as a concatenation of the array@numbersfollowed by the array@greek.@arg1contains all the elements of@numbersfollowed by all the elements of@greek. Note that it is not possible to recover either@numbersor@greekfrom@arg1without independently knowing the elements of either array beforehand. The two input arrays lose their individual integrities when they commingle into the single flattened list@_.@_ = %polygons;. This case is equivalent to the assignment@arg1 = %polygonsand it illustrates what happens when a hash is assigned to an array. As noted before, a hash of size5becomes a flattened list of5key-value pairs or a total of10elements. The array@arg1contains the consecutive key-value pairs of the hash in the order in which they were stored, and as printed out at the head of the output.
-
(@arg2, @arg3) = @_;. Here the input argument list is assigned to two arrays in list context. We have already seen the ability of a single assigned array to gobble up the entire argument list. So, one could hazard a guess that the leftmost array@arg2will consume the whole of whatever is in@_and that the array@arg3will be empty. This is indeed what happens in all three cases.-
@_ = @greek;. In this case,@arg2equals@greekand@arg3is empty. @_ = (@numbers, @greek);. Here@arg2contains all the elements of@numbersfollowed by all the elements of@greek, leaving@arg3empty.@_ = %polygons;. Here again,@arg2contains all the key-values pairs in%polygonsand@arg3is empty.
-
The conclusion to be drawn from this exercise is that the input argument list of
a subroutine is flattened into a single list in @_. If the input
arguments are a list of scalars, it is possible to extract and identify each of
them uniquely. If the input argument is a single array or a single hash, it is
possible to extract and identify it uniquely from @_. But any combination
of scalars and/or arrays and/or hashes, cannot be extracted and uniquely
identified from within the subroutine. Because the inputs to and outputs
from a subroutine are flattened lists, it is not possible to pass more than one
array or one hash as an argument to a subroutine or as the return value from
it. The ideal solution to this is to somehow map each array or hash to
a unique scalar and pass that scalar in the argument list to a subroutine; this
is considered in the next subsection.
4.2 References as subroutine arguments and return values
References are scalars holding the addresses, to other Perl entities like hashes, arrays, scalars, subroutines, filehandles, and other references. We are most concerned here with passing array and hash references as scalar arguments to subroutines.
4.3 References of different types
The next code example, references.pl, shows
how an indefinite number of input arguments, consisting of references, may be
passed to a subroutine and identified and printed out from within the subroutine
by shifting them out of the @_ array. Download and execute the program
and examine its output carefully. Pay careful attention to the definition of the
subroutine print_all, which should be self-explanatory.
use strict; # disallow symbolic references but allow hard references ..... $answer = 42; # courtesy of Douglas Adams @numbers = (0, 1, "pi", "i", "e"); # mixture of numbers and quoted literals # Ensure that there is an even number of elements in the hash # especially when written like this %capitals = qw(UK London France Paris Australia Canberra Iceland Reykjavik); $answer_sref = \$answer; # reference to a scalar $answer_sref_ref = \$answer_sref; # reference to another reference $numbers_aref = \@numbers; # reference to an array $capitals_href = \%capitals; # reference to a hash $print_all_cref = \&print_all; # reference to a subroutine/code $handle_gref = \*STDOUT; # reference to a filehandle # Invoke subroutine with different types of references # including numbers and quoted literals print_all($capitals_href, $numbers_aref, $answer_sref, $answer_sref_ref, $print_all_cref, $handle_gref, 1.414, "home alone"); .....
The subroutine print_all can be invoked with an indefinite of number of
arguments, each of which can be identified and printed out. The invocation above
includes all standard reference types, as well as numbers and quoted literals.
The output demonstrates that references to arrays and hashes can be passed as
subroutine arguments, and the underlying arrays and hashes can be retrieved from
within the subroutine without loss of identity and integrity.
4.4 Sorting an array and inverting a hash
A subroutine implicitly returns the last evaluated expression. It can also explicitly return a scalar or list. For the same reasons as avoiding arrays and hashes as subroutine input arguments, namely that they lose their identities and are flattened into a single list, we also avoid them in return values. If more than one value is to be returned, a list of scalars, which includes references, is used. The next example, sort_invert.pl, deals with sorting an array and inverting a hash.
sub sort_array
{
my ($aref) = @_; # called with an array reference as the argument
my @array = sort(@$aref);
return (\@array); # return a new array reference
}
The above subroutine takes in an array reference, sorts it, assigns the result
to a new array, and passes a reference to the new array in its return value. The
first pitfall to avoid here is not to assign @_ in scalar mode, but
rather in list mode. The second point to note is that the sorted array is
returned in list context. A local array, @array is assigned to this
sorted array. Normally, @array will disappear once we have exited the
scope of the subroutine. However, because the subroutine returns a reference to
this array, it will not disappear once the subroutine block is exited. The
upshot is that there will exist in memory, two arrays: the original unsorted
one, and the new sorted one. One question that arises is “Can this sorting be
done in place, and the results returned in the original array itself?”. The
answer is that it can. The next code segment shows how.
sub sort_array_in_place
{
my ($aref) = @_; # called with an array reference as the argument
@$aref = sort(@$aref); # sort original array and return it into itself
return ($aref); # return original input array reference
}
The terseness with which this is accomplished is testimony to the fulfilment of
the design specification that “Perl is designed to make easy jobs easy, without
making the hard jobs impossible.” [1]. A similar
observation applies to the two hash inversion subroutines invert_hash and
invert_hash_in_place, the latter being shown below:
sub invert_hash_in_place
{
my ($href) = @_; # called with a hash reference as the argument
%$href = reverse %$href; # reverse original hash; assign it into itself
return ($href); # return original input hash reference
}
For hash inversions, the reverse function accomplishes key-value exchange
most efficiently, although there are other ways to do it. The output from this
example also prints out the values of the references to highlight the
distinction between the in-place and non-in-place versions.
5 Interaction with the system
The standard shell in many GNU/Linux installations is the
bash shell. The shell itself
contains many builtin commands. The usage of any specific command is displayed
by typing help [command name] on the command line. There are other system
commands that are not shell builtins, but whose usage is revealed with the
traditional man [command name].
It is often necessary to interact with the system from within a Perl script. Examples include a script for automatically installing a custom software package, or a script to create a time-stamped archive of a directory, etc. It could be argued that such tasks could be executed as shell scripts. But if one is accomplishing a long tasklist in which text filtering, in-place text substitution, etc., are also involved, it would be ideal if tasks specific to the shell script could also be accomplished from within Perl.
There are several commands in Perl to enable this close interaction. We focus
here on two of them: the `` or qx\\ or backticks command and the
system command. We look at each in turn.
5.1 The backticks command
The backticks command allows the output from a shell builtin or system command
to be assigned to a variable within a Perl script. That output could
subsequently be processed further to extract useful information for later use.
Let us assume that we want to create a hyphen-separated directory name
consisting of the username, current month abbreviated with three letters, and
the current year in four digits. Typically, such a directory name would be
user-mon-nnnn. The task is then to get the three fields of information
from system commands, string them together in a literal, and create a
directory with that name. The steps to do this are in the next script,
backticks.pl.
# On my system, the locale defines the default format for date
# and typical output is
# Wed Jan 16 16:07:52 WST 2008
$date = `date`;
$date =~ m/\w{3}\s(\w{3}).*(\d{4})$/; # do not forget the ~
$month = $1;
$year = $2;
print "$month-$year\n";
The first precaution is to use the same punctuation symbol for the
backticks. It is all too easy to use ` for the opening symbol and
' for the closing symbol. If you cannot locate the correct `
symbol, perhaps you should use the qx\\ version instead.
The second point to note is that the output is a single space-separated string ending with a newline, and not a series of lines, each ending with a newline. It is proper here to capture the output in a string. This is doubly appropriate because the string matching that follows can only be done on a string. The two captured substring matches are then combined to generate the required month-year string as evidenced by the printed output.
# Alternatively, one could define the format to suit $month_year = qx\date +%b-%Y\; chomp $month_year; # chomp because this string is newline terminated print "$month_year\n";
Because locales and their default settings differ, as an alternative method, you
might want to customize the format of the output from the date command
and capture it directly in a string as is done with $month_year above.
Because the string $month_year is the only output from the above command,
it is terminated by a newline. To use this string to create a directory, we
should strip the terminal newline, or input record separator, with the
chomp function. Failure to chomp output strings captured from
other commands can lead to cryptic warnings and frustrating errors. It pays to
be attentive to this, and perhaps to chomp such strings as a precaution, even if
they did not have terminal newlines.
# @export = `export -p`; # does not work for a shell builtin @export = `sh -c export -p`; ($user) = grep /USER/, @export; # use list mode for assignment $user =~ m/.*\W+(\w+)\W+$/; $user = $1; print "$user\n";
The next segment of code relies on the shell builtin, export, and cannot
be executed as a system command alone. Its syntax is displayed by typing help export on the command line in a terminal.
The first line in this code segment is commented out. Uncomment it from your
downloaded script to see what sort of warning you get. You might not be able to
guess from that warning that export is a shell builtin and cannot be
executed as a system command in non-interactive mode unless it is prefixed by
sh -c as shown above. The second pitfall to avoid is to capture the
output in a scalar variable. This builtin gives multiple lines of
newline-terminated output and must therefore be captured in list context, as by
an array here. The result of the grep command should also be captured in
list mode. The username is the last word within single or double quotes on the
grepped line and is captured in the $user variable and extracted from it.
# create desired directory within home directory # without error checking of any kind # uncomment to execute if desired # chdir; # go to home directory # mkdir $dir; # create directory, assuming it does not exist
The final block of code is commented out in the script to avoid interfering with
your home directory. You could uncomment and execute it to confirm that the
desired directory has indeed been created. The created directory can easily be
removed by typing rmdir followed by the directory name on the command
line in the user’s home directory.
5.2 The system command
The system command allows a Perl script to execute system commands
mid-way and to resume execution of the script on completion of those
commands. The system command can be invoked with a list argument or a
scalar argument and the behaviour is most clearly explained by the
following quote from the documentation obtained by typing perldoc -f system on the command line on a terminal:
Note that argument processing varies depending on the number of arguments.If there is more than one argument in LIST, or if LIST is an array with more than one value,
[system]starts the program given by the first element of the list with arguments given by the rest of the list.If there is only one scalar argument, the argument is checked for shell metacharacters, and if there are any, the entire argument is passed to the system’s command shell for parsing (this is
/bin/sh -con Unix platforms, but varies on other platforms). If there are no shell metacharacters in the argument, it is split into words and passed directly toexecvp, which is more efficient.
A contextual example will drive home this difference in behaviour. Suppose we
are installing apache2 and php5 on a machine and would like to
create a PHP test page in a file called /var/www/test.php that is
writable only by the superuser. Also, let the content of the test.php
file simply be the two lines shown below:
# test.php <?php phpinfo();?>
Our quest is to create this file non-interactively, and it poses a challenge
because characters like { } < > ( ) ; * ? are shell
metacharacters and are not treated as literals by the shell. The #
symbol is treated as a comment and ? is also a special shell variable in
bash. The fact that our test file text is peppered with some of these
special symbols means that if the arguments to system were given as a single
double quoted list, they would be passed to the shell for parsing. They would
then be treated as metacharacters and not literals and our intent would be
thwarted.
It is instructive to look at how to set up the test.php file using a
bash script.
#!/bin/sh # Set up /var/www/test.php for Apache 2 server sudo sh -c "echo '# test.php' > /var/www/test.php"; sudo sh -c "echo '<?php phpinfo();?>' >> /var/www/test.php";
Because the target file resides in a system directory to which only the
superuser has write permission, the command is invoked with the system
sudo command. Double quotes are used to pass the first line, which
includes the echo command and the shell redirection operator >.
Note, however, that # test.php is within single quotes to allow it to be
parsed literally and not be treated as a comment. The first line is echoed
and written into the file. In the second line, the single quotes are again used
to literalize the troublesome characters and this time the shell append operator
>> appends this line to the file. The use of > and >>
within the shell simplifies creating and appending to files. There is no need to
open filehandles, and close them, or to write to temporary files. If we can
accomplish these same actions, but from within a Perl script, that would be very
concise and elegant. The next example does just this.
#!/usr/bin/perl use strict; use diagnostics; use warnings; # Create /var/www/test.php for Apache 2 system "sudo", "sh", "-c", "echo '# test.php' > /var/www/test.php"; system "sudo", "sh", "-c", "echo '<?php phpinfo();?>' >> /var/www/test.php";
Note that we are here using a list with more than one argument and therefore,
the sudo command is directly executed, bypassing the shell.
However, it is followed by precisely those commands that invoke the shell via
arguments, "sh", "-c". The purpose in splitting the input arguments to
sudo is to avoid long double-quoted strings in which either Perl or the
shell could encounter metacharacters, and to pass those metacharacters within
single quotes so that both Perl and the shell do not treat them as literals. The
reason for invoking the shell is to use its ability to write and append to files
through the redirection operators > and >>. The double-quoted
string beginning with echo should embody the redirection operators within
itself, just as the shell script does. And the text to be treated literally
within this string is single-quoted, again as in the shell script.
It bears noting that when the Perl system command completes successfully,
it returns a zero rather than a non-zero result, unlike most Perl commands.
Finally, the reason for doing this in a Perl script rather than a bash
script is that this file creation could be integrated with other more recondite
tasks involved with setting up the same software that are better accomplished
from within perl.
6 Perl one-liners
Because Perl was inspired by such programs as
grep [16],
sed [17], and
awk [18], there is a
tradition of replacing these traditional UNIX/Linux utilities with Perl
one-liners. These are terse and efficient ways to perform a multiplicity of
tasks on a succession of files using Perl. Perl one-liners are profoundly useful
to know. One well recommended monograph that gives a gentle but thorough
introduction to Perl one-liners is Minimal
Perl [19]. It provides side-by-side comparisons with such
UNIX/Linux mainstays as grep, sed, and awk. Most of
all, it explains many of Perl’s command line switches and the magic they embody:
something that is tucked away out of sight, until you explore it by hitting
perldoc perlrun on the terminal. Three online articles treating of Perl
one-liners are Perl One
Liners [20],
Cultured Perl:
One-liners 101 [21], and
Perl
One-liners [22].
6.1 The -n, -p, -l, -e, and -i switches
When perl is invoked with the -n or -p switch, it performs
an implicit while(<>) loop operation, iterating over filename arguments,
as explained in Programming Perl. One little
publicized fact is that the label for the implicit loop is LINE. The
difference between the -p and -n switches is that the former
prints all lines by default, whereas the latter does not. The -l switch
automatically chomps the line terminator when used in conjunction with -n
or -p. It also allows the line terminator character to be specified by
yet another switch.
When perl is invoked with the -e switch and single quotes, all
text within the single quotes including interpolated double quoted strings are
processed as if they were within a script. When the -e switch is combined
with either the -n or -p switch, one could perform many useful
functions that rely on textual matching, substitution, filtering etc., on entire
files, directly from the command line.
The -i switch allows for in-place editing of files with the option of
backing up the original file. Together, the above switches allow for in-place
search and replacements from the command line using Perl one-liners.
An example will illustrate what is possible. Suppose we want to change all occurrences of uppercase HTML tags to lowercase ones in an HTML file. A Perl script to do that is shown and explained below:
#!/usr/bin/perl -i.bak -wpl # Lowercase HTML tags in-place # assemble regex for substitution # note that delimiters are not /// but ::: s: </?\w+| # look for < and </ followed by one or more word characters; or \b\w+=| # look for any word terminating at an = sign; or \b\w+\s\w+= # look for any word pair terminating at an = sign :\L$&:gx; # lowercase all matched strings globally and allow comments # The corresponding Perl one-liner is # # perl -i.bak -wpl -e 's:</?\w+|\b\w+=|\b\w+\s\w+=:\L$&:gx;' # # to be invoked with [htmlfilename.htm[l]] at the end
If this script is stored in the file lchtml_script.pl, it is invoked as
lchtml_script.pl [htmlfilename.htm[l]]
Note that with the options in the perl command line, we do not explicitly
open a file but rather enter directly and implicitly into the while (<>)
loop that is executed after the file is opened, and the filehandle assigned. The
regular expression being matched for substitution is explained in the code
above. The -i.bak combination allows in-place editing with the original
file being backed up with the suffix .bak. The -w switch enables
warnings; the -p and -l switches function as explained before. The
g modifier processes the file globally for all occurrences and the
x modifier allows expanded mode for commenting the code as we have done.
The above is still a script, though. If one were to type it as a one-liner at
the command line in a terminal, one would uncomment and type the appropriate
line in the code segment above, and end it with the filename. Neither the script
nor the one-liner checks for correct usage or gives usage messages, though.
7 Conclusions
Perl is powerful, versatile, multi-faceted, and syntactically so rich that it is
never boring to use as a programming language. The fact that it does not
straitjacket programming approaches or styles can also be traps for the unwary.
In this article, I have surveyed some potential pitfalls that can ensnare the
novice-to-intermediate Perl programmer and also suggested precautions to avoid
them. In the spirit of Perl, I have tried to summarize this article in the form
of my first Perl poem, poem.pl. It should be compiled, without
pragmata for once, as perl -c poem.pl and it should execute without any
errors (or output for that matter)!
pragmata;
scalar (@ARGV) or list;
chomp;
sub ref{}
backtick;
system and shell;
s/perl -pi -e one-liners are as easy as pie?/y : n/;
References
- [1]
- L. Wall, T. Christiansen, and J. Orwant, Programming Perl, 3rd ed. Sebastopol, CA, USA: O’Reilly Media, Inc., Jul. 2000, (also known as the Camel book).
- [2]
- “Perl Poetry.” [Online]. Available: http://www.perlmonks.org/?node=Perl\%20Poetry
- [3]
- “Obfuscated Code.” [Online]. Available: http://www.perlmonks.org/?node=Obfuscated\%20Code
- [4]
- R. L. Schwartz, T. Phoenix, and brian d foy, Learning Perl, 4th ed. Sebastopol, CA, USA: O’Reilly Media, Inc., Jul. 2005, (also known as the Llama book).
- [5]
- T. Christiansen and N. Torkington, Perl Cookbook, 2nd ed. Sebastopol, CA, USA: O’Reilly Media, Inc., Aug. 2003.
- [6]
- “Tutorials - perldoc.perl.org.” [Online]. Available: http://perldoc.perl.org/index-tutorials.html
- [7]
- “The Perl Directory - perl.org.” [Online]. Available: http://www.perl.org/
- [8]
- “Perl Mongers.” [Online]. Available: http://www.pm.org/
- [9]
- “use Perl: All the Perl that’s Practical to Extract and Report.” [Online]. Available: http://use.perl.org/
- [10]
- “Perl.com: The Source for Perl – perl development, conferences.” [Online]. Available: http://www.perl.com/
- [11]
- “PerlMonks - The Monastery Gates.” [Online]. Available: http://www.perlmonks.org/
- [12]
- “Planet Perl - an aggregation of Perl blogs.” [Online]. Available: http://planet.perl.org/
- [13]
- “Welcome to perlmeme.org.” [Online]. Available: http://www.perlmeme.org/
- [14]
- “perl.beginners archive - nntp.perl.org.” [Online]. Available: http://www.nntp.perl.org/group/perl.beginners/
- [15]
- A. B. Downey, “Learning Perl the Hard Way.” [Online]. Available: http://www.greenteapress.com/perl/perl.pdf
- [16]
- “grep - Wikipedia, the free encyclopedia.” [Online]. Available: http://en.wikipedia.org/wiki/Grep
- [17]
- “sed - Wikipedia, the free encyclopedia.” [Online]. Available: http://en.wikipedia.org/wiki/Sed
- [18]
- “AWK (programming language) - Wikipedia, the free encyclopedia.” [Online]. Available: http://en.wikipedia.org/wiki/Awk
- [19]
- T. Maher, Minimal Perl: For UNIX and Linux People. Greenwich, CT, USA: Manning Publications, Oct. 2006.
- [20]
- J. Mates, “Perl One Liners.” [Online]. Available: http://sial.org/howto/perl/one-liner/
- [21]
- T. Zlatanov, “Cultured Perl: One-liners 101.” [Online]. Available: http://www.ibm.com/developerworks/linux/library/l-p101/
- [22]
- J. Bay, “Perl One-liners,” The Perl Review, vol. 0, no. 1, pp. 1–8, Mar. 2002. [Online]. Available: http://www.theperlreview.com/Issues/The\_Perl\_Review\_0\_1.pdf
Please email me your comments and corrections.
© R (Chandra) Chandrasekhar, January 2008
This document was translated from LATEX by HEVEA.
Last generated on Sat Sep 20 18:54:24 WST 2008
Site Design by Nandakumar Chandrasekhar.