-
Notifications
You must be signed in to change notification settings - Fork 1
Elements
Elements are the building blocks that make up a regex. Building a regex with RegexBuilder
broadly consists of the following:
Pattern regex = new RegexBuilder()
// add elements here
.buildRegex();
All the element methods return a reference to the RegexBuilder
object, so they can be called in a fluent chained style.
With the exception of the anchor methods, all the methods below take an optional RegexQuantifier
parameter which is used to define how many instances of the element should be matched. Without a quantifier parameter, each method matches the element exactly once. Read more about quantifiers in Quantifiers.
All elements may be added to a group: see Groups for more details on those.
Method | Matches | Raw regex equivalent |
---|---|---|
letter() |
Any uppercase or lowercase Unicode letter | \p{L} |
lowercaseLetter() |
Any lowercase Unicode letter | \p{Ll} |
uppercaseLetter() |
Any uppercase Unicode letter | \p{Lu} |
nonLetter() |
Any character that is not a Unicode letter (including white space and control characters) | \P{L} |
digit() |
Any decimal digit (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) | [0-9] |
nonDigit() |
Any character that is not a decimal digit (including white space and control characters) | [^0-9] |
letterOrDigit() |
Any Unicode letter (uppercase or lowercase) or digit | [\p{L}0-9] |
nonLetterOrDigit() |
Any character that is not a Unicode letter or digit (including white space and control characters) | [^\p{L}0-9] |
hexDigit() |
Any hexadecimal digit (uppercase or lowercase letters) | [a-fA-F0-9] |
lowercaseHexDigit() |
Any hexadecimal digit (lowercase letters only) | [a-f0-9] |
uppercaseHexDigit() |
Any hexadecimal digit (uppercase letters only) | [A-F0-9] |
nonHexDigit() |
Any character that is not a hexadecimal digit | [^a-fA-F0-9] |
anyCharacter() |
Any character at all, including white space and control characters | . |
whitespace() |
Any white space character (space, tab, newline or carriage return) | \s |
nonWhitespace() |
Any non-white space character (including control characters) | \S |
space() |
A space character | |
tab() |
A tab character | \t |
lineFeed() |
A line feed character | \n |
carriageReturn() |
A carriage return character | \r |
wordCharacter() |
Any Unicode letter, decimal digit or underscore | [\p{L}0-9_] |
nonWordCharacter() |
Any character that is not a Unicode letter, decimal digit or underscore (including white space and control characters) | [^\p{L}0-9_] |
Method | Matches |
---|---|
text(String text) |
Any arbitrary text. If the string passed in contains reserved regex characters they will be escaped to avoid the regex doing unexpected things. For example, if you pass the string ":)" , it will be escaped to ":\)" . |
regexText(String text) |
Raw regex text. Reserved regex characters are not escaped, so this is only for tinkerers who know what they're doing. |
anyCharacterFrom(String characters) |
Any of the characters in the supplied string. For example, anyCharacterFrom("abc") will match "a" , "b" or "c" . |
anyCharacterExcept(String characters) |
Any characters not in the supplied string (including white space and control characters). For example, anyCharacterExcept("abc") will match "1" , "d" or "&" but not "a" . |
anyOf(String... strings) |
Any of the strings supplied, in their entirety. For example, anyOf("Mr", "Mrs", "Ms") will match "Mr" , "Mrs" or "Ms" but not "M" . |
Anchors (known in a regex world as "zero-width assertions") match a point in a string that isn't represented by a character (hence "zero-width"). They're useful for crafting regexes that match text occurring at a particular position within a string, rather than just anywhere.
Method | Matches | Raw regex equivalent |
---|---|---|
startOfString() |
The start of the string. | ^ |
endOfString() |
The end of the string. | $ |
wordBoundary() |
The boundary between a word character (letter, digit or underscore) and a non-word character. | \b |
RegexToolbox: Now you can be a hero without knowing regular expressions.