A regular expression (RegEx) consists of a pattern and optional flags. It's used to search through text.
In JavaScript there are 2 ways to use RegEx:
1. Literal Syntax
let regexp= /pattern/flags;
Literal syntax is shorter, but it doesn't allow for expressions to be inserted (like string template literals).
So only usable when you know the RegEx pattern beforehand (which is most of the time the case).
2. RegExp object
let regexp= new RegExp("pattern", "flags");
RegEx via the RegExp object is used when you need to create a RegEx “on the fly” from a dynamically generated string
Flags
Flags are modifiers that affect the search.
i- case-insensitive search
g- global search (without it – only the first match is returned)
m- multiline mode
s- “dotall” mode (allows dot to match newline character \n)
u- full unicode support.
y- “sticky” mode (searching at the exact position in the text)
Matching
Syntax: str.match(regexp)
Finds all matches of regexp in the string str.
let str = "We will, we will rock you";
str.match(/we/gi) => [We,we]
str.match(/we/i) => [We]
str.match(/xo/g) => null
Testing
Syntax: regexp.test(str)
Returns a boolean. True if there's at least one match, otherwise false.
let str = "We will, we will rock you";
let regexp = /Rock/i;
regexp.test(str) => true
Replacing
Syntax: str.replace(regexp, replacement)
Replaces matches found using regexp in string str with replacement.
let str = "We will, we will rock you";
str.replace(/we/i, "I") => I will, we will rock you
str.replace(/we/ig, "I") => I will, I will rock you
You can use special character combinations in the replacement string to insert fragments of the match
$& - inserts the whole match
$` - inserts a part of the string before the match
$' - inserts a part of the string after the match
$n - if n is a 1-2 digit number, then it inserts the contents of n-th parentheses,
$<name> - inserts the contents of the parentheses with the given name
$$ - inserts character $
let str= "I love HTML";
str.replace(/HTML/, "$& and JavaScript") => I love HTML and JavaScript
Character classes
A character class is a special notation that matches any symbol from a certain set.
let str = "+7(903)-123-45-67";
let regexp = /\d/
str.match(regexp) => 7
Without the flag g, the regular expression only looks for the first match, that is the first digit \d.
Let’s add the g flag to find all digits:
let str = "+7(903)-123-45-67";
let regexp = /\d/g;
str.match(regexp) => [7,9,0,3,1,2,3,4,5,6,7]
str.match(regexp).join('') => 79031234567
Most used character classes:
\d - digit: a character from 0 to 9
\s - spaces (includes spaces, tabs \t, newlines \n and \v, \f, \r
\w - letters of Latin alphabet, digit or an underscore _
\d\s\w means a “digit” followed by a “space character” followed by a “word”
A regexp may contain both regular symbols and character classes.
let str = "Is there CSS4?";
let regexp = /CSS\d/
str.match(regexp) => CSS4
Also we can use many character classes:
let str= "I love HTML5!";
str.match(/\s\w\w\w\w\d/) => HTML5
Inverse classes
For every character class there exists an “inverse class”, denoted with the same letter, but uppercased.
The “inverse” means that it matches all other characters, for instance:
\D - Non-digit: any character except \d, for instance a letter
\S - Non-space: any character except \s, for instance a letter
\W - Non-wordly character: anything but \w, e.g a non-latin letter or a space.
let str = "+7(903)-123-45-67";
str.replace(/\D/g, "") => 79031234567
Dot
A dot can be used as a placeholder for “any character”
It's a special character class that matches “any character except a newline”.
let str= "Z";
str.match(/./) => Z
let regexp = /CS.4/;
"CSS4".match(regexp) => CSS4
"CS4".match(/CS.4/) => null
No match because there's no character between S and 4
Dot with “s” flag
By default, a dot doesn’t match the newline character \n.
For instance, the regexp A.B matches A, and then B with any character between them, except a newline \n:
"A\nB".match(/A.B/) => null
"A\nB".match(/A.B/s) => A\nB
Matching any character
"A\nB".match(/A[\s\S]B/) => A\nB
The pattern [\s\S] means “a space character OR not a space character”.
In other words: anything. You can use any pair of complementary classes, such as [\d\D]
You can also use [^]. It means match any character except nothing.
We can also use this trick if we want both kind of “dots” in the same pattern;
the actual dot . behaving the regular way (“not including a newline”)
and also a way to match “any character” with [\s\S] or alike.
Spaces matter!
"1 - 5".match(/\d-\d/) => null
"1 - 5".match(/\d - \d/) => 1 - 5
"1 - 5".match(/\d\s-\s\d/) => 1 - 5
Anchors
Anchors are tests. They do not match a character, but rather force the regexp engine to check the condition.
^ matches at the beginning of the text
$ matches at the end of the text
let str= "Mary had a little lamb";
/^Mary/.test(str) => true
str.startswith("Mary") => true
The pattern ^Mary means: “string start and then Mary”.
let str= "it's fleece was white as snow";
/snow$/.test(str) => true
str.endswith("snow") => true
The pattern snow$ means: “snow, then string end”.
Testing for a full match
Both anchors together ^...$ are often used to test whether or not a string fully matches the pattern.
let goodInput = "12:34";
let badInput = "12:345";
let regexp = /^\d\d:\d\d$/;
regexp.test(goodInput) => true
regexp.test(badInput) => false
If there’s any deviation or extra characters between the anchors, the result is false.
An empty string is the only match for /^$/ (It starts and immediately finishes).
Multiline
The multiline mode is enabled by the flag m.
It only affects the behavior of ^ and $.
“Start of a line”
- text start/beginning
- text immediately after a newline \n
let str = `1st place: Winnie
2nd place: Piglet
3rd place: Eeyore`;
str.match(/^\d/g) => 1
str.match(/^\d/gm) => [1, 2, 3]
“End of a line”
- text end
- text immediately followed by a newline \n
let str = `Winnie: 1
Piglet: 2
Eeyore: 3`;
str.match(/\d$/g) => 3
str.match(/\d$/gm) => [1,2,3]
str.match(/\d\n/gm) => [1\n,2\n]
Searching for \n
To find a newline you can also use the newline character \n.
There are 2 matches instead of 3, because there’s no newline after 3.
Every match includes a newline character \n.
\n in the pattern is used when you need newline characters in the result,
while anchors are used to find something at the beginning/end of a line.
Word boundary \b
A word boundary \b is a test, just like ^ and $.
There are three different positions that qualify as word boundaries:
- At string start, if the first string character is \w
- Between 2 characters, where one is \w and the other is not.
- At string end, if the last string character is \w.
The desired string should be surrounded by characters different from \w,
such as spaces or punctuation (or text start/end).
"Hello, Java!".match(/\bJava\b/) => Java
"Hello, JavaScript!".match(/\bJava\b/) => null
let str= "Hello, Java!";
str.match(/\bHello\b/) => Hello
str.match(/\bJava\b/) => Java
str.match(/\bHell\b/) => null
str.match(/\bJava!\b/) => null
Note: For convenience see string/text start and end as non-worldy characters.
By that definition a boundary will only work when on one side of \b there's
a worldy character and on the other side of \b there's a non-worldy character.
\bJava!\b doesn't match because ! is non-worldy and string end is also non-worldly
"1 23 456 78".match(/\b\d\d\b/g) => [23,78]
"12,34,56".match(/\b\d\d\b/g) => [12,34,56]
Note:
Word boundary \b doesn’t work for non-latin alphabet (\w only).
Special characters
Special characters are characters that have a special meaning in a RegEx.
These are the special characters:
[ \ ^ $ . | ? * + ( )
Escaping
Special characters need to be escaped when included in a pattern.
Escaping is done by converting the character into a literal character, by prepending it with a backslash \
"Chapter 5.1".match(/\d\.\d/) => 5.1
"Chapter 511".match(/\d\.\d/) => null
511 is not a match because the dot in the patern is a literal dot
.\ means "." and not "any character"
"function g()".match(/g\(\)/) => g()
Forward slash
"/".match(/\//) => /
Forward slash needs to be escaped when it's used in literal RegEx notation
"/".match(new RegExp("/")) => /
Forward slash doesn't need to be escaped when used in RegExp object
Backslash
let regexp = new RegExp("\d\.\d");
"Chapter 5.1".match(regexp) => null
Backslashes are “consumed” by string quotes, so they need to be escaped when used in RegExp object
"\d\.\d" is perceived as d.d
"\\d\\.\\d" is perceived as \d\.\d
let regStr= "\\d\\.\\d";
let regexp = new RegExp(regStr);
"Chapter 5.1".match(regexp) => 5.1
When using the literal RegEx notation, backslashes also need to be escaped(in the test string)
because they are consumed by string quotes
"1\\2".match(/\\/) => \
Sets
Sets are group of characters inside [square brackets]
They can be used in RegEx along with regular characters.
let str= "Mop top";
str.match(/[tm]op/gi) => [Mop,top]
A set simply means to "search for any (1) character in the brackets".
let str= "Voila";
str.match(/V[oi]la/) => null
Voila is not a match because in the pattern, there is place for only one character (either o or i) between V and l
Ranges
Square brackets may also contain character ranges.
[a-z] is any letter from a to z
[0-5] is any digit from 0 to 5.
Multiple character ranges can be used inside one pair of square brackets.
For example: [0-9A-Fa-f]
let str= "Exception 0xAF";
str.match(/x[0-9A-F][0-9A-F]/g) => xAF
[0-9A-F] has two ranges meaning find
a digit from 0-9 OR a letter from A-F.
Character classes can also be used in [square brackets].
E.g. [\s\d] means “a space character or a digit”.
Character classes are shorthands for certain character sets
\d – is the same as [0-9]
\w – is the same as [a-zA-Z0-9_]
\s – is the same as [\t\n\v\f\r ]
Excluding ranges
Besides normal ranges, there are “excluding” ranges.
They are denoted by a ^ at the start and match any character except the given ones.
[^aeyo] – any character except 'a', 'e', 'y' or 'o'
[^0-9] – any character except a digit (same as \D)
[^\s] – any non-space character (same as \S)
let str= "alice15@gmail.com";
str.match(/[^\d\sA-Z]/gi) => [@,.]
Escaping in […]
] is ALWAYS escaped
- is ONLY escaped when it's between other characters
^ is ONLY escaped when it's in the start
. + ( ) are NEVER escaped
let regexp = /[-().^+]/g;
let str= "1 + 2 - 3";
str.match(regexp) => [+, -]
Characters that don't need to be escaped can always be escaped
when you're not entirely sure. Escaping them won't have any effect.
let regexp = /[\-\(\)\.\^\+]/g;
let str= "1 + 2 - 3";
str.match(regexp) => [+, -]
let regexp= /Java[^script]/;
let str1= "Java";
let str2= "JavaScript";
str1.match(regexp) => null
str2.match(regexp) => JavaS
Looking for "Java" followed by 1 character that's not s,c,r,i,p OR t
In str1 there’s a string end after Java, so no match (5th spot is empty)
In str2 there’s a (capital) S after Java, so there's a match.
let regexp = /\d\d[-:]\d\d/g;
let str= "Breakfast at 09:00. Dinner at 21-30";
str.match(regexp) => [09:00,21-30]
Quantifiers
Quantifiers are symbols that are inserted after (a group of) characters.
They specify how many of that character to look for (the quantity).
+, *, ? and {n}
Quantity {n}
The simplest quantifier is a number in curly braces: {n}.
Exact quantity: {n}
Looks for n characters.
let str= "I'm 12345678 years old";
str.match(/\d{5}/) => [12345]
str.match(/\b\d{5}\b/) => null
Boundaries \b are added to exclude results from longer numbers
Quantity range: {n,m}
Looks for minimum n and maximum m characters.
let str= "I'm 12345678 years old";
str.match(/\d{3,5}/) => [12345]
Minimum quantity: {n,}
Looks for n characters or more.
let str= "I'm not 12, but 345678 years old";
str.match(/\d{3,}/) => [345678]
let str = "+7(903)-123-45-67";
str.match(/\d{1,}/g) => [7,903,123,45,67]
Shorthands
There are shorthands for most used quantifiers:
? means “zero or one”, the same as {0,1}
* means “zero or more”, the same as {0,}
+ means “one or more”, the same as {1,}
let str = "Should I write color or colour?";
str.match(/colou?r/g) => [color, colour]
? makes the character before it optional.
let str = "+7(903)-123-45-67";
str.match(/\d+/g) => [7,903,123,45,67]
let str= "100 10 1";
str.match(/\d0*/g) => [100, 10, 1]
str.match(/\d0+/g) => [100, 10]
1 not matched because 0+ requires at least one zero
let str= "0 1 12.345 7890";
str.match(/\d+\.\d+/g) => 12.345
let str= "<body> ... </body>"
str.match(/<[a-z]+>/gi) => <body>
The regex looks for'<' followed by one or more Latin letters and ending with '>'.
let regexp= /<[a-z][a-z0-9]*>/gi;
let str= "<h1>Hi!</h1>";
str.match(regexp) => <h1>
let regexp= /<\/?[a-z][a-z0-9]*>/gi;
let str= "<h1>Hi!</h1>";
str.match(regexp) => [<h1>,</h1>]
let regexp = /\.{3,}/g;
let str= "Hello!... How goes?.....";
str.match(regexp) => [...,.....]
let regexp = /#[a-f0-9]{6}/gi;
let str1= "color:#121212; background-color:#AA00ef bad-colors:f#fddee #fd2 #12345678";
let str2= "#12345678";
str2.match(regexp) => #123456
str1.match(regexp) => [#121212,#AA00ef,#123456]
It finds matches in longer strings. To fix that, add \b to the end:
let regexp= /#[a-f0-9]{6}\b/gi;
let str1= "#123456";
let str2= "#12345678"
str2.match(regexp) => null
str1.match(regexp) => #123456