Description
The naming of the matcher implies it would give consistent results with String.isBlank()
. However, it does not. The matcher uses a regex with pattern \\s*
which will only match ASCII whitespace characters. String.isBlank()
eventually delegates to Character.isWhitespace()
which checks for Unicode whitespace as well. See https://stackoverflow.com/a/57998178 for a detailed breakdown, summarized below:
But further, there are different definitions of white-space
Character.isWhitespace:
Determines if the specified character (Unicode code point) is white space according to Java. A character is a Java whitespace character if and only if it satisfies one of the following criteria:
It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').
It is '\t', U+0009 HORIZONTAL TABULATION.
It is '\n', U+000A LINE FEED.
It is '\u000B', U+000B VERTICAL TABULATION.
It is '\f', U+000C FORM FEED.
It is '\r', U+000D CARRIAGE RETURN.
It is '\u001C', U+001C FILE SEPARATOR.
It is '\u001D', U+001D GROUP SEPARATOR.
It is '\u001E', U+001E RECORD SEPARATOR.
It is '\u001F', U+001F UNIT SEPARATOR.
the \s pattern (by default):\s A whitespace character: [ \t\n\x0B\f\r]
I discovered this when using JUnit Quickcheck with an assumption to prevent handling whitespace strings of the form:
Assume.assumeThat(string, not(blankString()));
The assumption passed, and my code would check the string via String.isBlank()
and fail the test.