Sage Against The Machine

.htaccess Regular Expressions Rewriting Glossary

I am teaching a class tomorrow on SEO Tech Tips. Part of the class is .htaccess rewriting. I think one of the hardest parts of this is knowing what all the different codes stand for in what is called Regular Expressions. I won’t list them all. But I do want to put the most common ones used in .htaccess rewriting. I’m going to put them here for easy access:

  • ^ The caret signifies the start of an URL, under the current directory. This directory is whatever directory the .htaccess file is in. You’ll start almost all matches with a caret.
  • $ The dollar sign, $, signifies the end of the string to be matched. You should add this in to stop your rules matching the first part of longer URLs.
  • . The period or dot matches any one character, except a newline.
  • \. This backslash is an escape telling the Apache server to treat the . as a normal character.
  • [] The parts in square brackets are called ranges.
  • () We have encased the regular expression part of the URL in parentheses, because we want to store whatever value was found here for later use. In this case we’re sending this value to a PHP page as an argument. Once we have a value in parentheses we can use it through what’s called a back-reference. Each of the parts you’ve placed in parentheses are given an index, starting with one. So, the first back-reference is $1, the third is $3 etc.
  • + The plus modifier changes whatever comes directly before it, by saying ‘one or more of the preceding character or range.’
  • * The asterisk means ‘zero or more of the preceding character or range’,
  • ? The question mark means ‘zero or only one of the preceding character or range.’
  • [R] R forces a redirect of the url
  • [R,L] flag. R forces a redirect of the url, while L says this is the last rule, don’t bother checking any more.
  • [R=301] makes the redirect a 301 permanent redirect.
  • [NC] Since Unix is case sensitive, the [NC] flag makes the statement case insensitive.