Add Me!Close Menu Navigation

My technical corner about Linux, Perl, programming, computer networks and network security

Add Me!Open Categories Menu

Efficiency in Perl – regular expressions

The regular expressions in perl are awesome. If you write in perl, you know this. But most programers don’t think about correct use of their power.

 

 

1) Don’t use a regular expression for very simple string comparisions

Using

if ( $somestring eq "ok" )

is much better than

if ( $somestring =~ m/ok/ )

Moreover don’t use regular expressions to solve native problems for grep or map functions.

 

2) Efficiency is suffering when you are using the matching

Don’t use the matching if you don’t need and performance is your priority.

if ( $somestring =~ m/abc(?:def)/ )

instead of

if ( $somestring =~ m/abc(def)/ )

 

 

3) Less code means worse efficiency usually

If efficiency is your dream, you must know this rule.

if ( $somestring =~ m/[0-9][0-9][0-9][0-9][0-9]/ )

is faster than

if ( $somestring =~ m/[0-9]{5}/ )

 

Another extremely example (From this brilliant book)

print "1\n";
print "2\n";
print "3\n";
print "4\n";
print "5\n";
print "6\n";
[...]
print "100\n";

is completely rubbish but of course it’s faster than

for ( $i=1; $i<=100; $i++ )
{
 print "$i\n";
}

Why, you ask? Well, it’s simple – the most simplest explanation is you do some work than usuall your alghorithms do. So, if you do an ordinary print number from 1 to 100, it will be faster, because this “program” doesn’t use additional functions/loops/alghorithms.

4) Don’t use .* (if you can – ever)

Take a look at the following code (comparision):

my $file = "./test-speed.html";
my $file_data = q{};
my $i;

tie my @array, 'Tie::File', $file or die "Cannot open: $file";

foreach my $line ( @array )
{
 $file_data .= $line;
}

### $file_data contains a lot of html code
untie @array;

sub first
{
 $i = 0;
 if ( $file_data =~ m/\]*\>(.*)\<\/span\>/g )
 {
  $i++ if $1;
 }
}

sub second
{
 $i = 0;
 if ( $file_data =~ m/\]*\>([^\>|\<]*)\<\/span\>/g )
 {
  $i++ if $1;
 }
}

And results.

            Rate  first second
first     2208/s     --  -100%
second 1092267/s 49366%     --

In conclusion: a smart programmer doesn’t use .* in his regular expressions.

Leave a Reply

You must be logged in to post a comment.