Home » Blog » How I Use Regular Expressions as a Web Developer

How I Use Regular Expressions as a Web Developer

When I first started as a developer dabbling at this Utah web development company, I didn’t have any idea regular expressions existed or a need to use them. As my development skills grew and I started using PHP, editing .htaccess files frequently, and exploring more advanced Editors/IDEs, I got curious and read up on some regular expression basics and here is a quick rundown of how I use regular expressions in my workflow.

Disclaimer: I am by no means a regular expression expert and there are probably more efficient ways to write some of the examples. There are plenty of regular expression tutorials out there so I won’t be going over any commands in-depth. This article assumes you have basic knowledge of the regular expression syntax. Also note that the regular expression syntax can vary in different languages. For example Python uses 1, 2, etc. for pattern matches and PHP (typically) uses $1, $2, etc.

.htaccess and 301 redirects

When launching a website, there are often 301 redirects that need to be setup when the old url structure varies from the new url structure. Regular expressions can help you consolidate multiple redirect lines into one pattern to rule them all.

Let’s start with a basic example. Make sure you have RewriteEngine set to On. When the site was originally launched somebody typo’d and there are urls out there like this:

[code language="php"]
http://domain.com/locatoins/somecity
[/code]

We want to change them to:

[code language="php"]
http://domain.com/locations/somecity
[/code]

You could redirect all of the old URLs one at a time. What if there are a hundred? A thousand? A hundred thousand? It sure would be nice to write just one rule.

[code language="php"]
RewriteRule ^locatoins/(.*)$ http://domain.com/locations/$1 [R=301,L]
[/code]

“^locatoins/(.*)$” is the first part of our redirect, the source URL. The caret (^) means the start of the line (in the case of .htaccess the start of the URI). “locatoins/” is literal. The parenthesis capture the contents of their match for use later. The period matches any character and the asterisk extends it to match any number of any character. The dollar sign means the end of the line. Altogether this part is saying match any domain.com/locatoins/ANYTHING. The “$1” in the destination url is the result of the match, in this case everything after domain.com/locatoins/, which will be the rest of the URI.

For this example let’s say that the old url structure for our example site was something like this:

[code language="php"]
http://domain.com/books.php?isbn=1234567890
[/code]

And the new structure is like this:

[code language="php"]
http://domain.com/books/1234567890
[/code]

The RewriteCond and RewriteRule:

[code language="php"]
    RewriteEngine On
    RewriteCond %{QUERY_STRING} ^isbn=([0-9]+)$
    RewriteRule ^books.php$ http://domain.com/books/%1? [R=301,L]
[/code]

For this you need to have mod_alias and mod_reqrite enabled. The first line turns RewriteEngine on. If you already have it declared you can omit this. The second line sets a query string condition. The pattern is “^isbn=([0-9]+)$”. The caret (^) means the start of the string. “isbn=” is literal. The parenthesis capture the contents of their match for use later. “[0-9]+” matches one or more numbers. The dollar sign means the end of the string. Altogether this condition is saying match a query of isbn containing numbers. The “%1″ in the destination url is our match from RewriteCond (note that %1 is not the same as used for RewriteRule matches), the isbn number in this case. The question mark at the end will remove query strings when redirecting.

Find/Replace with regular expressions in your favorite editor/IDE

Ever written a function, called it a bunch of times with some variables, only to find out a variable you chose is already in use somewhere else in the project? I have.

[code language="php"]
<?php
    FUNCTION myExampleFunction($name,$phone,$existingClient=false){
        /*some code here*/
    } /* end myExampleFunction */
    IF( $i > 0 ){
        myExampleFunction($name,$phone,true);
    } ELSE {
        myExampleFunction($name,$phone);
    }
    ?>
[/code]

We need to replace “$name” with “$clientName”. We don’t want to replace the variable in all cases, only where is it being used in our function, since it is in use elsewhere in the project. The function is used many times and sometimes has optional variables set and sometimes doesn’t so the string is not always the same. We can’t use a standard find/replace. We can write a regular expression pattern to tackle this though.

Find:

[code language="php"]
myExampleFunction($name,$phone(.*));
[/code]

Replace:

[code language="php"]
myExampleFunction($clientName,$phone1);
[/code]

Let’s break down the find. “myExampleFunction” is literal. The opening parenthesis and the dollar sign are escaped with a backslash, making them literal. “name,” is literal. The second dollar sign is escaped with a backslash, making it literal. “(.*)”, which you might remember from earlier is a pattern that matches anything (or nothing). The period matches any character. The asterisk extends the previous character to match zero or any number of the previous character, which is a wildcard in this case. This is important because it will match the third, optional, variable if it exists but will still make a match if it does not exist. The closing parenthesis is escaped with a backslash, making it literal. The semicolon is literal as well. I included the semicolon so that we only match cases of the function being called and not it’s declaration, which follows the closing parenthesis with an opening curly bracket.

The replace is not a regex pattern so we don’t need to escape regex symbols. $name is changed to $clientName. The “1″ is the result of the first match (My IDE of choice is Komodo and is Python based. Your syntax for a match may differ) which will be the third, optional, variable if it exists.

Regular Expressions in PHP

PHP’s preg_match and preg_replace functions work very similiar to find/replace in an editor.

You could, for example, use preg_match to pull an email address out of a variable, use preg_replace in a function to dynamically search page content for links to an image and add rel=”prettyPhoto” for a lightbox, and so much more. This use is on a case by case basis but you can see the potential. Don’t be afraid of regular expressions – once you get over the learning curve they will save you a lot of time.

Update: PHP example

I have been working with the SalesForce API lately and it only accepts phone numbers in the format of (123) 456-7890. I have a form to capture data to send to SalesForce with a phone number field. I added some JavaScript validation to the form for the phone number but wanted to so a step further. Here is a great example of regular expression use in PHP.

First up is preg_replace(). It works very similiar to a str_replace() except it uses a regular expression instead of a string for the term to find. The pattern “[^0-9]” matches everything except numbers. We then replace them with nothing and they are gone.

[code language="php"]
<?php
$phoneNumber = preg_replace("/[^0-9]/","",$phoneNumber);
?>
[/code]

Now we have $phoneNumber set to a string of numbers regardless of the formatting the user inputed their phone number in. So “(123) 456-7890” and “123-456-7890” will both become “1234567890”.

We then want to do a preg_match() to split the number into 3 pieces and store those pieces in the $phoneMatches variable. The pattern is relatively simple – “^(d{3})(d{3})(d{4})$”. The first match is “d{3}”, which will be 3 digits. The next match is the same. The last match is “d{4}” which will match 4 digits. We then set $phoneNumber to be these 3 matches in the format we would like them in and we are done.

[code language="php"]
<?php
if(  preg_match( '/^(d{3})(d{3})(d{4})$/', $phoneNumber,  $phoneMatches ) )
    $phoneNumber = '(' . $phoneMatches[1] . ') ' .$phoneMatches[2] . '-' . $phoneMatches[3];
?>
[/code]