title | updated | permalink | redirect_from | |||
---|---|---|---|---|---|---|
Regex - Regular Expressions in PHP |
February 26, 2017 |
/articles/regular-expressions-in-php/ |
|
Regular expressions (abbreviated regex) are sequences of characters that form search patterns. They are mainly used in pattern matching with strings.
- It started in 1940s-60s with lots of smart people talking about regular expressions
- 1970s g/re/p
- 1980 Perl and Henry Spencer
- 1997 PCRE (Perl Compatible Regular Expressions) That's where it really took off and when we talk about regex today that's what we're talking about. PCRE has libraries for almost every language, it looks the same everywhere and it is very useful.
PHP has three main regular expression PCRE functions - preg_match
,
preg_match_all
and preg_replace
.
This returns 1
if match is found, 0
if not and false
if error occurs:
int preg_match (
string $pattern,
string $subject [,
array &$matches [,
int $flags = 0 [,
int $offset = 0
]]])
This returns number of matches found:
int preg_match_all (
string $pattern,
string $subject [,
array &$matches [,
int $flags = PREG_PATTERN_ORDER [,
int $offset = 0
]]])
This returns the replaced string or array (based on the $subject):
mixed preg_replace (
mixed $pattern,
mixed $replacement,
mixed $subject [,
int $limit = -1 [,
int $count
]])
For comparison, regular expressions in JavaScript look pretty much the same as in PHP.
Returns an array of matches or null if no matches were found:
string.match(RegExp);
Returns the string with the replacements performed:
string.replace(RegExp, replacement);
- No "single-line" or DOTALL mode. (The dot never matches new line.)
- No lookbehind support
- Same methods for regex and non-regex matching and replacing
Let's take a look at example to find email addresses in codebase.
Our goal: /[\w.+-]+@[a-z0-9-]+(\.[a-z0-9-]+)*/i
Regular expressions are built from two type of characters:
- special characters:
.\[]?*+{}()^$/
- literals
Imagine your input strings as bolts and your pattern as a set of sockets (in order).
Let's take a look at what special characters do:
-
Backslash character
\\
can escape other special character in regular expression: -
The Dot and the
\w
-.
Matches everything but new lines. If you want to match a dot and only a dot escape it like
\
,\w
matches letters, numbers, and the underscore -
Square brackets
[]
Matches characters inside the brackets. Supports ranges. Some examples:
[abc]
- matches anya
,b
orc
.[a-z]
Lowercase letters[0-9]
Any single digit[a-zA-Z]
- matches any lower or uppercase alphabetic character
-
Optional
?
The
?
matches 0 or 1 -
The star
*
The star matches 0 or more
-
The Plus
+
Matches 1 or more
-
Curly brackets
{}
Min and Max ranges. Some examples:
{1,}
at least 1{1,3}
1 through 3{1,64}
1 through 64
Let's put all this together to get regex for email addresses:
/[\w.+-]+@[a-z0-9-]+(\.[a-z0-9-]+)*/i
How this looks in PHP:
preg_match_all(
"/[\w.+-]+@[a-z0-9-]+(\.[a-z0-9-]+)*/i",
$input_lines,
$output_array
);
Problem: make sure input is what we expect
Goal 1: /[^\[\]\w$.]/
Goal 2: /^[0-9]{1,2}[dwmy]$/
Regex is great at finding things but you need to know what you're looking for. When you validate you get to determine exactly what you want.
Many cases are better handled with PHP's filter_var
function. For example
validating emails should be done with PHP built-in filters:
filter_var(
'[email protected]',
FILTER_VALIDATE_EMAIL
)
For starting and ending regex you use anchors:
^
- the hat that indicates start of the string$
- the dollar sign that indicates end of string
if (!preg_match("%^[0-9]{1,2}[dwmy]$%", $_POST["subscription_frequency"])) {
$isError = true;
}
Negated character classes
[^abc]
- anything except a,b, or c, including new lines.
Example that ensures input only contains alphanumeric, dash, dot, underscore
if (preg_match("/[^0-9a-z-_.]/i", $productCode)) {
$isError = true;
}
Problem: Link @mentions
and #tags
Goal: /\B@([\w]{2,})/i
- PHP.net resources:
- Regex online tools:
- Debuggex - Online regex visualization tool.
- PHP Live Regex - Live regular expression tester for PHP.
- regexper - Regular expression visualizer using railroad diagrams.
- RegExr - Learn, build and test Regular Expressions.
- Regex101 - Create, debug, test and have your expressions explained for PHP, PCRE, JavaScript and Python. The website also features a community where you can share useful expressions.
- Tutorials:
- awesome-regex - A curated collection of awesome regex libraries, tools, frameworks and software.
- RegexOne - Learn Regular Expressions with simple, interactive exercises.