Class CharMatcher

  • All Implemented Interfaces:
    >, <>

    @GwtCompatible(emulated=true)
    public abstract class CharMatcher
    extends 
    implements >
    Determines a true or false value for any Java char value, just as Predicate does for any . Also offers basic text processing methods based on this function. Implementations are strongly encouraged to be side-effect-free and immutable.

    Throughout the documentation of this class, the phrase "matching character" is used to mean "any char value c for which this.matches(c) returns true".

    Warning: This class deals only with char values, that is, . It does not understand in the range 0x10000 to 0x10FFFF which includes the majority of assigned characters, including important CJK characters and emoji.

    Supplementary characters are , and a CharMatcher treats these just as two separate characters. countIn(java.lang.CharSequence) counts each supplementary character as 2 chars.

    For up-to-date Unicode character properties (digit, letter, etc.) and support for supplementary code points, use ICU4J UCharacter and UnicodeSet (freeze() after building). For basic text processing based on UnicodeSet use the ICU4J UnicodeSetSpanner.

    Example usages:

       String trimmed = whitespace().trimFrom(userInput);
       if (ascii().matchesAllOf(s)) { ... }

    See the Guava User Guide article on .

    Since:
    1.0
    Author:
    Kevin Bourrillion
    • Constructor Summary

      Constructors 
      Modifier Constructor Description
      protected CharMatcher()
      Constructor for use by subclasses.
    • Method Summary

      All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      CharMatcher and​(CharMatcher other)
      Returns a matcher that matches any character matched by both this matcher and other.
      static CharMatcher any()
      Matches any character.
      static CharMatcher  sequence)
      Returns a char matcher that matches any BMP character present in the given character sequence.
      boolean  character)
      Deprecated.
      Provided only to satisfy the Predicate interface; use matches(char) instead.
      static CharMatcher ascii()
      Determines whether a character is ASCII, meaning that its code point is less than 128.
      static CharMatcher breakingWhitespace()
      Determines whether a character is a breaking whitespace (that is, a whitespace which can be interpreted as a break between words for formatting purposes).
       sequence, char replacement)
      Returns a string copy of the input character sequence, with each group of consecutive matching BMP characters replaced by a single replacement character.
      int  sequence)
      Returns the number of matching chars found in a character sequence.
      static CharMatcher digit()
      Deprecated.
      Many digits are supplementary characters; see the class documentation.
      static CharMatcher > predicate)
      Returns a matcher with identical behavior to the given -based predicate, but which operates on primitive char instances instead.
      int  sequence)
      Returns the index of the first matching BMP character in a character sequence, or -1 if no matching character is present.
      int  sequence, int start)
      Returns the index of the first matching BMP character in a character sequence, starting from a given position, or -1 if no character matches after that position.
      static CharMatcher inRange​(char startInclusive, char endInclusive)
      Returns a char matcher that matches any character in a given BMP range (both endpoints are inclusive).
      static CharMatcher invisible()
      Deprecated.
      Most invisible characters are supplementary characters; see the class documentation.
      static CharMatcher is​(char match)
      Returns a char matcher that matches only one specified BMP character.
      static CharMatcher isNot​(char match)
      Returns a char matcher that matches any character except the BMP character specified.
      static CharMatcher javaDigit()
      Deprecated.
      Many digits are supplementary characters; see the class documentation.
      static CharMatcher javaIsoControl()
      Determines whether a character is an ISO control character as specified by .
      static CharMatcher javaLetter()
      Deprecated.
      Most letters are supplementary characters; see the class documentation.
      static CharMatcher javaLetterOrDigit()
      Deprecated.
      Most letters and digits are supplementary characters; see the class documentation.
      static CharMatcher javaLowerCase()
      Deprecated.
      Some lowercase characters are supplementary characters; see the class documentation.
      static CharMatcher javaUpperCase()
      Deprecated.
      Some uppercase characters are supplementary characters; see the class documentation.
      int  sequence)
      Returns the index of the last matching BMP character in a character sequence, or -1 if no matching character is present.
      abstract boolean matches​(char c)
      Determines a true or false value for the given character.
      boolean  sequence)
      Returns true if a character sequence contains only matching BMP characters.
      boolean  sequence)
      Returns true if a character sequence contains at least one matching BMP character.
      boolean  sequence)
      Returns true if a character sequence contains no matching BMP characters.
      CharMatcher negate()
      Returns a matcher that matches any character not matched by this matcher.
      static CharMatcher none()
      Matches no characters.
      static CharMatcher  sequence)
      Returns a char matcher that matches any BMP character not present in the given character sequence.
      CharMatcher or​(CharMatcher other)
      Returns a matcher that matches any character matched by either this matcher or other.
      CharMatcher precomputed()
      Returns a char matcher functionally equivalent to this one, but which may be faster to query than the original; your mileage may vary.
       sequence)
      Returns a string containing all non-matching characters of a character sequence, in order.
       sequence, char replacement)
      Returns a string copy of the input character sequence, with each matching BMP character replaced by a given replacement character.
       sequence,  replacement)
      Returns a string copy of the input character sequence, with each matching BMP character replaced by a given replacement sequence.
       sequence)
      Returns a string containing all matching BMP characters of a character sequence, in order.
      static CharMatcher singleWidth()
      Deprecated.
      Many such characters are supplementary characters; see the class documentation.
      toString()
      Returns a string representation of this CharMatcher, such as CharMatcher.or(WHITESPACE, JAVA_DIGIT).
       sequence, char replacement)
      Collapses groups of matching characters exactly as collapseFrom(java.lang.CharSequence, char) does, except that groups of matching BMP characters at the start or end of the sequence are removed without replacement.
       sequence)
      Returns a substring of the input character sequence that omits all matching BMP characters from the beginning and from the end of the string.
       sequence)
      Returns a substring of the input character sequence that omits all matching BMP characters from the beginning of the string.
       sequence)
      Returns a substring of the input character sequence that omits all matching BMP characters from the end of the string.
      static CharMatcher whitespace()
      Determines whether a character is whitespace according to the latest Unicode standard, as illustrated .
      • Methods inherited from class java.lang.

        , , , , , , , , ,
      • Methods inherited from interface java.util.function.

        ,
    • Constructor Detail

      • CharMatcher

        protected CharMatcher()
        Constructor for use by subclasses. When subclassing, you may want to override toString() to provide a useful description.
    • Method Detail

      • any

        public static CharMatcher any()
        Matches any character.
        Since:
        19.0 (since 1.0 as constant ANY)
      • none

        public static CharMatcher none()
        Matches no characters.
        Since:
        19.0 (since 1.0 as constant NONE)
      • whitespace

        public static CharMatcher whitespace()
        Determines whether a character is whitespace according to the latest Unicode standard, as illustrated . This is not the same definition used by other Java APIs. (See a .)

        All Unicode White_Space characters are on the BMP and thus supported by this API.

        Note: as the Unicode definition evolves, we will modify this matcher to keep it up to date.

        Since:
        19.0 (since 1.0 as constant WHITESPACE)
      • breakingWhitespace

        public static CharMatcher breakingWhitespace()
        Determines whether a character is a breaking whitespace (that is, a whitespace which can be interpreted as a break between words for formatting purposes). See whitespace() for a discussion of that term.
        Since:
        19.0 (since 2.0 as constant BREAKING_WHITESPACE)
      • ascii

        public static CharMatcher ascii()
        Determines whether a character is ASCII, meaning that its code point is less than 128.
        Since:
        19.0 (since 1.0 as constant ASCII)
      • digit

        public static CharMatcher digit()
        Deprecated.
        Many digits are supplementary characters; see the class documentation.
        Determines whether a character is a BMP digit according to . If you only care to match ASCII digits, you can use inRange('0', '9').
        Since:
        19.0 (since 1.0 as constant DIGIT)
      • javaDigit

        public static CharMatcher javaDigit()
        Deprecated.
        Many digits are supplementary characters; see the class documentation.
        Determines whether a character is a BMP digit according to . If you only care to match ASCII digits, you can use inRange('0', '9').
        Since:
        19.0 (since 1.0 as constant JAVA_DIGIT)
      • javaLetter

        public static CharMatcher javaLetter()
        Deprecated.
        Most letters are supplementary characters; see the class documentation.
        Determines whether a character is a BMP letter according to . If you only care to match letters of the Latin alphabet, you can use inRange('a', 'z').or(inRange('A', 'Z')).
        Since:
        19.0 (since 1.0 as constant JAVA_LETTER)
      • javaLetterOrDigit

        public static CharMatcher javaLetterOrDigit()
        Deprecated.
        Most letters and digits are supplementary characters; see the class documentation.
        Determines whether a character is a BMP letter or digit according to .
        Since:
        19.0 (since 1.0 as constant JAVA_LETTER_OR_DIGIT).
      • javaUpperCase

        public static CharMatcher javaUpperCase()
        Deprecated.
        Some uppercase characters are supplementary characters; see the class documentation.
        Determines whether a BMP character is upper case according to .
        Since:
        19.0 (since 1.0 as constant JAVA_UPPER_CASE)
      • javaLowerCase

        public static CharMatcher javaLowerCase()
        Deprecated.
        Some lowercase characters are supplementary characters; see the class documentation.
        Determines whether a BMP character is lower case according to .
        Since:
        19.0 (since 1.0 as constant JAVA_LOWER_CASE)
      • javaIsoControl

        public static CharMatcher javaIsoControl()
        Determines whether a character is an ISO control character as specified by .

        All ISO control codes are on the BMP and thus supported by this API.

        Since:
        19.0 (since 1.0 as constant JAVA_ISO_CONTROL)
      • invisible

        public static CharMatcher invisible()
        Deprecated.
        Most invisible characters are supplementary characters; see the class documentation.
        Determines whether a character is invisible; that is, if its Unicode category is any of SPACE_SEPARATOR, LINE_SEPARATOR, PARAGRAPH_SEPARATOR, CONTROL, FORMAT, SURROGATE, and PRIVATE_USE according to ICU4J.

        See also the Unicode Default_Ignorable_Code_Point property (available via ICU).

        Since:
        19.0 (since 1.0 as constant INVISIBLE)
      • singleWidth

        public static CharMatcher singleWidth()
        Deprecated.
        Many such characters are supplementary characters; see the class documentation.
        Determines whether a character is single-width (not double-width). When in doubt, this matcher errs on the side of returning false (that is, it tends to assume a character is double-width).

        Note: as the reference file evolves, we will modify this matcher to keep it up to date.

        See also .

        Since:
        19.0 (since 1.0 as constant SINGLE_WIDTH)
      • is

        public static CharMatcher is​(char match)
        Returns a char matcher that matches only one specified BMP character.
      • isNot

        public static CharMatcher isNot​(char match)
        Returns a char matcher that matches any character except the BMP character specified.

        To negate another CharMatcher, use negate().

      • anyOf

        public static  sequence)
        Returns a char matcher that matches any BMP character present in the given character sequence. Returns a bogus matcher if the sequence contains supplementary characters.
      • noneOf

        public static  sequence)
        Returns a char matcher that matches any BMP character not present in the given character sequence. Returns a bogus matcher if the sequence contains supplementary characters.
      • inRange

        public static CharMatcher inRange​(char startInclusive,
                                          char endInclusive)
        Returns a char matcher that matches any character in a given BMP range (both endpoints are inclusive). For example, to match any lowercase letter of the English alphabet, use CharMatcher.inRange('a', 'z').
        Throws:
        - if endInclusive < startInclusive
      • forPredicate

        public static > predicate)
        Returns a matcher with identical behavior to the given -based predicate, but which operates on primitive char instances instead.
      • matches

        public abstract boolean matches​(char c)
        Determines a true or false value for the given character.
      • negate

        public CharMatcher negate()
        Returns a matcher that matches any character not matched by this matcher.
        Specified by:
         in interface <>
        Returns:
        a predicate that represents the logical negation of this predicate
      • and

        public CharMatcher and​(CharMatcher other)
        Returns a matcher that matches any character matched by both this matcher and other.
      • or

        public CharMatcher or​(CharMatcher other)
        Returns a matcher that matches any character matched by either this matcher or other.
      • precomputed

        public CharMatcher precomputed()
        Returns a char matcher functionally equivalent to this one, but which may be faster to query than the original; your mileage may vary. Precomputation takes time and is likely to be worthwhile only if the precomputed matcher is queried many thousands of times.

        This method has no effect (returns this) when called in GWT: it's unclear whether a precomputed matcher is faster, but it certainly consumes more memory, which doesn't seem like a worthwhile tradeoff in a browser.

      • matchesAnyOf

        public boolean  sequence)
        Returns true if a character sequence contains at least one matching BMP character. Equivalent to !matchesNoneOf(sequence).

        The default implementation iterates over the sequence, invoking matches(char) for each character, until this returns true or the end is reached.

        Parameters:
        sequence - the character sequence to examine, possibly empty
        Returns:
        true if this matcher matches at least one character in the sequence
        Since:
        8.0
      • matchesAllOf

        public boolean  sequence)
        Returns true if a character sequence contains only matching BMP characters.

        The default implementation iterates over the sequence, invoking matches(char) for each character, until this returns false or the end is reached.

        Parameters:
        sequence - the character sequence to examine, possibly empty
        Returns:
        true if this matcher matches every character in the sequence, including when the sequence is empty
      • matchesNoneOf

        public boolean  sequence)
        Returns true if a character sequence contains no matching BMP characters. Equivalent to !matchesAnyOf(sequence).

        The default implementation iterates over the sequence, invoking matches(char) for each character, until this returns true or the end is reached.

        Parameters:
        sequence - the character sequence to examine, possibly empty
        Returns:
        true if this matcher matches no characters in the sequence, including when the sequence is empty
      • indexIn

        public int  sequence)
        Returns the index of the first matching BMP character in a character sequence, or -1 if no matching character is present.

        The default implementation iterates over the sequence in forward order calling matches(char) for each character.

        Parameters:
        sequence - the character sequence to examine from the beginning
        Returns:
        an index, or -1 if no character matches
      • indexIn

        public int  sequence,
                           int start)
        Returns the index of the first matching BMP character in a character sequence, starting from a given position, or -1 if no character matches after that position.

        The default implementation iterates over the sequence in forward order, beginning at start, calling matches(char) for each character.

        Parameters:
        sequence - the character sequence to examine
        start - the first index to examine; must be nonnegative and no greater than sequence.length()
        Returns:
        the index of the first matching character, guaranteed to be no less than start, or -1 if no character matches
        Throws:
        - if start is negative or greater than sequence.length()
      • lastIndexIn

        public int  sequence)
        Returns the index of the last matching BMP character in a character sequence, or -1 if no matching character is present.

        The default implementation iterates over the sequence in reverse order calling matches(char) for each character.

        Parameters:
        sequence - the character sequence to examine from the end
        Returns:
        an index, or -1 if no character matches
      • countIn

        public int  sequence)
        Returns the number of matching chars found in a character sequence.

        Counts 2 per supplementary character, such as for whitespace()().negate()().

      • removeFrom

        public   sequence)
        Returns a string containing all non-matching characters of a character sequence, in order. For example:
        
         CharMatcher.is('a').removeFrom("bazaar")
         
        ... returns "bzr".
      • retainFrom

        public   sequence)
        Returns a string containing all matching BMP characters of a character sequence, in order. For example:
        
         CharMatcher.is('a').retainFrom("bazaar")
         
        ... returns "aaa".
      • replaceFrom

        public   sequence,
                                  char replacement)
        Returns a string copy of the input character sequence, with each matching BMP character replaced by a given replacement character. For example:
        
         CharMatcher.is('a').replaceFrom("radar", 'o')
         
        ... returns "rodor".

        The default implementation uses indexIn(CharSequence) to find the first matching character, then iterates the remainder of the sequence calling matches(char) for each character.

        Parameters:
        sequence - the character sequence to replace matching characters in
        replacement - the character to append to the result string in place of each matching character in sequence
        Returns:
        the new string
      • replaceFrom

        public   sequence,
                                   replacement)
        Returns a string copy of the input character sequence, with each matching BMP character replaced by a given replacement sequence. For example:
        
         CharMatcher.is('a').replaceFrom("yaha", "oo")
         
        ... returns "yoohoo".

        Note: If the replacement is a fixed string with only one character, you are better off calling replaceFrom(CharSequence, char) directly.

        Parameters:
        sequence - the character sequence to replace matching characters in
        replacement - the characters to append to the result string in place of each matching character in sequence
        Returns:
        the new string
      • trimFrom

        public   sequence)
        Returns a substring of the input character sequence that omits all matching BMP characters from the beginning and from the end of the string. For example:
        
         CharMatcher.anyOf("ab").trimFrom("abacatbab")
         
        ... returns "cat".

        Note that:

        
         CharMatcher.inRange('\0', ' ').trimFrom(str)
         
        ... is equivalent to .
      • trimLeadingFrom

        public   sequence)
        Returns a substring of the input character sequence that omits all matching BMP characters from the beginning of the string. For example:
        
         CharMatcher.anyOf("ab").trimLeadingFrom("abacatbab")
         
        ... returns "catbab".
      • trimTrailingFrom

        public   sequence)
        Returns a substring of the input character sequence that omits all matching BMP characters from the end of the string. For example:
        
         CharMatcher.anyOf("ab").trimTrailingFrom("abacatbab")
         
        ... returns "abacat".
      • collapseFrom

        public   sequence,
                                   char replacement)
        Returns a string copy of the input character sequence, with each group of consecutive matching BMP characters replaced by a single replacement character. For example:
        
         CharMatcher.anyOf("eko").collapseFrom("bookkeeper", '-')
         
        ... returns "b-p-r".

        The default implementation uses indexIn(CharSequence) to find the first matching character, then iterates the remainder of the sequence calling matches(char) for each character.

        Parameters:
        sequence - the character sequence to replace matching groups of characters in
        replacement - the character to append to the result string in place of each group of matching characters in sequence
        Returns:
        the new string
      • trimAndCollapseFrom

        public   sequence,
                                          char replacement)
        Collapses groups of matching characters exactly as collapseFrom(java.lang.CharSequence, char) does, except that groups of matching BMP characters at the start or end of the sequence are removed without replacement.
      • apply

        public boolean  character)
        Deprecated.
        Provided only to satisfy the Predicate interface; use matches(char) instead.
        Description copied from interface: Predicate
        Returns the result of applying this predicate to input (Java 8 users, see notes in the class documentation above). This method is generally expected, but not absolutely required, to have the following properties:
        • Its execution does not cause any observable side effects.
        • The computation is consistent with equals; that is, Objects.equal(a, b) implies that predicate.apply(a) == predicate.apply(b)).
        Specified by:
        >
      • toString

        public  toString()
        Returns a string representation of this CharMatcher, such as CharMatcher.or(WHITESPACE, JAVA_DIGIT).
        Overrides:
         in class 
        Returns:
        a string representation of the object.