Class Regex

java.lang.Object
com.norconex.commons.lang.text.Regex

public class Regex extends Object

Builder and utility methods making it easier to construct and use regular expressions. In addition, you can obtain a Matcher with support for empty or null values

Empty and null values

Since 3.0.0, you can force null and empty strings to be considered a positive match, regardless of the specified pattern. To do so, set matchEmpty to true. To have blank values (containing white spaces only) considered as positive matches, also set trim to true. When matching empties, doing replacement on a null value behaves as if the value is an empty string.

Since:
2.0.0
See Also:
  • Field Details

    • UNICODE_MARK_INSENSTIVE_FLAG

      public static final int UNICODE_MARK_INSENSTIVE_FLAG
      Flag that ignores diacritical marks when matching or replacing (e.g. accents). This flag is not supported by Java Pattern and only works when used with this class.
      See Also:
    • UNICODE_CASE_INSENSTIVE_FLAG

      public static final int UNICODE_CASE_INSENSTIVE_FLAG
      Convenience flag that combines Pattern.UNICODE_CASE and Pattern.CASE_INSENSITIVE
      See Also:
  • Constructor Details

    • Regex

      public Regex()
    • Regex

      public Regex(String pattern)
    • Regex

      public Regex(String pattern, int... flags)
  • Method Details

    • dotAll

      public Regex dotAll()
    • setDotAll

      public Regex setDotAll(boolean dotAll)
    • isDotAll

      public boolean isDotAll()
    • ignoreCase

      public Regex ignoreCase()
    • setIgnoreCase

      public Regex setIgnoreCase(boolean ignoreCase)
    • isIgnoreCase

      public boolean isIgnoreCase()
    • unixLines

      public Regex unixLines()
    • setUnixLines

      public Regex setUnixLines(boolean unixLines)
    • isUnixLines

      public boolean isUnixLines()
    • literal

      public Regex literal()
    • setLiteral

      public Regex setLiteral(boolean literal)
    • isLiteral

      public boolean isLiteral()
    • comments

      public Regex comments()
    • setComments

      public Regex setComments(boolean comments)
    • isComments

      public boolean isComments()
    • multiline

      public Regex multiline()
    • setMultiline

      public Regex setMultiline(boolean multiline)
    • isMultiline

      public boolean isMultiline()
    • canonEq

      public Regex canonEq()
    • setCanonEq

      public Regex setCanonEq(boolean canonEq)
    • isCanonEq

      public boolean isCanonEq()
    • unicodeCase

      public Regex unicodeCase()
    • setUnicodeCase

      public Regex setUnicodeCase(boolean unicode)
    • isUnicodeCase

      public boolean isUnicodeCase()
    • unicodeCharacterClass

      public Regex unicodeCharacterClass()
    • setUnicodeCharacterClass

      public Regex setUnicodeCharacterClass(boolean unicode)
    • isUnicodeCharacterClass

      public boolean isUnicodeCharacterClass()
    • ignoreDiacritic

      public Regex ignoreDiacritic()
      Ignores diacritical marks when matching or replacing (e.g. accents).
      Returns:
      this instance
    • setIgnoreDiacritic

      public Regex setIgnoreDiacritic(boolean ignoreDiacritic)
    • isIgnoreDiacritic

      public boolean isIgnoreDiacritic()
    • isMatchEmpty

      public boolean isMatchEmpty()
      Gets whether null or empty strings should be considered a positive match.
      Returns:
      true if null and empty strings are considered a match
      Since:
      3.0.0
    • setMatchEmpty

      public Regex setMatchEmpty(boolean matchEmpty)
      Sets whether null or empty strings should be considered a positive match. To also consider blank values as positive matches, use setTrim(boolean).
      Parameters:
      matchEmpty - true to have null and empty strings are considered a match.
      Returns:
      this instance
      Since:
      3.0.0
    • matchEmpty

      public Regex matchEmpty()
      Sets that null or empty strings should be considered a positive match. Same as invoking setMatchEmpty(boolean) with true.
      Returns:
      this instance
      Since:
      3.0.0
    • isTrim

      public boolean isTrim()
      Gets whether values should be trimmed before being evaluated (as per String.trim()).
      Returns:
      true if values are trimmed before evaluation
      Since:
      3.0.0
    • setTrim

      public Regex setTrim(boolean trim)
      Sets whether values should be trimmed before being evaluated (as per String.trim()).
      Parameters:
      trim - true to trim values before evaluation
      Returns:
      this instance
      Since:
      3.0.0
    • trim

      public Regex trim()
      Sets that values should be trimmed before being evaluated (as per String.trim()). Same as invoking setTrim(boolean) with true.
      Returns:
      this instance
      Since:
      3.0.0
    • setFlags

      public Regex setFlags(int... flags)
    • getFlags

      public Set<Integer> getFlags()
    • setPattern

      public Regex setPattern(String pattern)
    • getPattern

      public String getPattern()
    • compile

      public Pattern compile()

      Compiles a previously set pattern.

      For text-matching with diacritical mark insensitivity support enabled, or for trim() and matchEmpty() support, use matcher(CharSequence) instead.

      Returns:
      compiled pattern
    • compile

      public Pattern compile(String pattern)

      Compiles the given pattern without assigning it to this object.

      For text-matching with diacritical mark insensitivity support enabled, or for trim() and matchEmpty() support, use matcher(String, CharSequence) instead.

      Parameters:
      pattern - the pattern to compile
      Returns:
      compiled pattern
      Throws:
      IllegalArgumentException - if pattern is null
    • compileDotAll

      public static Pattern compileDotAll(String regex, boolean ignoreCase)
      Compiles a "dotall" pattern (dots match all, including new lines) with optional case sensitivity.
      Parameters:
      regex - regular expression
      ignoreCase - true to ignore character case.
      Returns:
      compiled pattern
    • escape

      public static String escape(String pattern)
      Escape special characters with a backslash (\) in a regular expression. This is an alternative to Pattern.quote(String) for when you do not want the string to be treated as a literal.
      Parameters:
      pattern - the pattern to escape
      Returns:
      escaped pattern
    • matcher

      public Matcher matcher(CharSequence text)
      Matches the previously set pattern against the given text.
      Parameters:
      text - the text to match
      Returns:
      matcher
    • matcher

      public Matcher matcher(String pattern, CharSequence text)
      Matches the the given pattern against the given text without assigning the pattern to this object. Since 3.0.0, null or empty text will generate no match unless isMatchEmpty() is true, in which case it will match positively.
      Parameters:
      pattern - the pattern to match
      text - the text to match
      Returns:
      matcher
    • createKeyValueExtractor

      public RegexFieldValueExtractor createKeyValueExtractor()
    • createKeyValueExtractor

      public RegexFieldValueExtractor createKeyValueExtractor(String key)
    • createKeyValueExtractor

      public RegexFieldValueExtractor createKeyValueExtractor(String key, int valueGroup)
    • createKeyValueExtractor

      public RegexFieldValueExtractor createKeyValueExtractor(int keyGroup, int valueGroup)
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • canEqual

      protected boolean canEqual(Object other)
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object