• Pattern of regex is applied on String from left to right and source char in a match can’t be reused.
  • For example, regex “121” will match “31212142121” only twice as “121___121”.
System.out.println("Using String matches method: " + str.matches(".bb"));
System.out.println("Using Pattern matches method: " + Pattern.matches(".bb", str));
Pattern.matches([a - e1 - 3]., d#”)
import java.util.regex package;

public class PatternExample {
    public static void main(String[] args) {
        try {
            Pattern pattern = Pattern.compile(".xx.");
            Matcher matcher = pattern.matcher("MxxY");
            System.out.println("Input String matches regex - " + matcher.matches());
            pattern = Pattern.compile("*xx*"); // bad regular expression
        } catch (PatternSyntaxException pse) {
            System.out.println(e.getMessage());
        }
    }
}

Capturing Groups

  • () in regex is used to treat multiple characters as a single unit.
  • portion of input matching the capturing group is saved into memory and can be recalled using Backreference.
  • matcher.groupCount() method - find number of capturing groups.
  • For example, ((a)(bc)) contains 3 capturing groups – ((a)(bc)), (a) and (bc).
  • You can use Backreference in regular expression with backslash () and then the number of groups to be recalled.
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2a2"));         	//true  \1 is a2
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2b2"));         	//false \1 is a2
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B2AB"));	//true  \1 is AB
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B3AB"));	//false \2 is B2

RegEx Keywords

Regex Basics Description
^ The start of a string
$ The end of a string
. Wildcard which matches any character, except newline (\n).
| Matches a specific character or group of characters on either side (e.g. a|b corresponds to a or b)
\ Used to escape a special character
a The character “a”
ab The string “ab”
Quantifiers Description
* Used to match 0 or more of the previous (e.g. xy*z could correspond to “xz”, “xyz”, “xyyz”, etc.
? Matches 0 or 1 of the previous
+ Matches 1 or more of the previous
{5} Matches exactly 5
{5,} Matches 5 or more.
{5, 10} Matches everything between 5-10
Character Classes Description
\s Matches a whitespace character
\S Matches a non-whitespace character
\w Matches a word character
\W Matches a non-word character
\d Matches one digit
\D Matches one non-digit
[\b] A backspace character
\c A control character
Special Characters Description
\n Matches a newline
\t Matches a tab
\r Matches a carriage return
\ZZZ Matches octal character ZZZ
\xZZ Matches hex character ZZ
\0 A null character
\v A vertical tab
Groups Description
(xyz) Grouping of characters
(?:xyz) Non-capturing group of characters
[xyz] Matches a range of characters (e.g. x or y or z)
[^xyz] Matches a character other than x or y or z
[a-q] Matches a character from within a specified range
[0-7] Matches a digit from within a specified range
String Replacements Description
$` Insert before matched string
$’ Insert after matched string
$+ Insert last matched
$& Insert entire match
$n Insert nth captured group
Assertions Description
(?=xyz) Positive lookahead
(?!xyz) Negative lookahead
?!= or ?<! Negative lookbehind
\b Word Boundary (usually a position between /w and /W)
?# Comment

Pattern and Matcher

  • Pattern object with flags, Pattern.CASE_INSENSITIVE enables case insensitive matching.
  • Pattern class also provides split(String) method that is similar to String class split() method.
  • Pattern class toString() - regex String from which pattern was compiled.
  • Matcher classes have start() and end() index methods that show precisely where the match was found in the input string.
  • Matcher class provides - replaceAll(String replacement) and replaceFirst(String replacement).
package com.journaldev.util;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexExamples {
    public static void main(String[] args) {
        // using pattern with flags
        Pattern pattern = Pattern.compile("ab", Pattern.CASE_INSENSITIVE);
        Matcher matcher = pattern.matcher("ABcabdAb");
        // using Matcher find(), group(), start() and end() methods
        while (matcher.find()) {
            System.out.println("Found the text \"" + matcher.group() +
                "\" starting at " + matcher.start() +
                " index and ending at index " + matcher.end());
        }
        // using Pattern split() method
        pattern = Pattern.compile("\\W");
        String[] words = pattern.split("one@two#three:four$five");
        for (String s: words) {
            System.out.println("Split using Pattern.split(): " + s);
        }
        // using Matcher.replaceFirst() and replaceAll() methods
        pattern = Pattern.compile("1*2");
        matcher = pattern.matcher("11234512678");
        System.out.println("Using replaceAll: " + matcher.replaceAll("_"));
        System.out.println("Using replaceFirst: " + matcher.replaceFirst("_"));
    }
}
Output of the above java regex example program is.
Found the text "AB" starting at 0 index and ending at index 2
Found the text "ab" starting at 3 index and ending at index 5
Found the text "Ab" starting at 6 index and ending at index 8
Split using Pattern.split(): one
Split using Pattern.split(): two
Split using Pattern.split(): three
Split using Pattern.split(): four
Split using Pattern.split(): five
Using replaceAll: _345_678
Using replaceFirst: _34512678

Common Matchings

Matching an Email Address

/^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,5})$/
  • Group 1 ([a-z0-9_.-]+)
    • In this section of the expression, we match one or more lowercase letters between a-z, numbers between 0-9, underscores, periods, and hyphens. The expression is then followed by an @ sign.
  • Group 2 ([\da-z.-]+)
    • Next, the domain name must be matched which can use one or more digits, letters between a-z, periods, and hyphens. The domain name is then followed by a period ..
  • Group 3 ([a-z.]{2,5})
    • Lastly, the third group matches the top level domain. This section looks for any group of letters or dots that are 2-5 characters long. This can also account for region-specific top-level domains.
  • Therefore, with the regex expression above you can match many of the commonly used emails such as firstname.lastname@domain.com for example.

Matching a Phone Number

/^\b\d{3}[-.]?\d{3}[-.]?\d{4}\b$/
  • Section 1 \b\d{3}
    • This section begins with a word boundary to tell regex to match the alpha-numeric characters. It then matches 3 of any digit between 0-9 followed by either a hyphen, a period, or nothing [-.]?.
  • Section 2 \d{3}
    • The second section is quite similar to the first section, it matches 3 digits between 0-9 followed by another hyphen, period, or nothing [-.]?.
  • Section 3 \d{4}\b
    • Lastly, this section is slightly different in that it matches 4 digits instead of three. The word boundary assertion is also used at the end of the expression. Finally, the end of the string is defined by the $.
  • Therefore, with the above regex expression for finding phone numbers, it would identify a number in the format of 123-123-1234, 123.123.1234, or 1231231234.