Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
text_searcher_icu.cc	text_searcher_icu.cc

Exercise 1

In LEVEL 1, we can relay on Details and just to search for the func which Details mentioned. It is far away from the real bug hunting scene. therefore in LEVEL 2 we do the same as LEVEL 1 without the help of Details.

CVE-2021-21128

I sugget you don't search any report about it to prevents get too much info like patch.

Details

In level 2, we do it without the help of Details

For more info click me! But you'd better not do this

https://bugs.chromium.org/p/chromium/issues/detail?id=1138877

Set environment

after you fetch chromium

git reset --hard 04fe9cc9bf0b67233b9f7f80b9a914499a431fa4

Do it

Do this exercise by yourself, If you find my answer have something wrong, please correct it.

My answer

IsWholeWordMatch This func looks like buggy.

static bool IsWholeWordMatch(const UChar* text,
                           int text_length,
                           MatchResultICU& result) {
DCHECK_LE((int)(result.start + result.length), text_length);
UChar32 first_character;
U16_GET(text, 0, result.start, result.length, first_character);  [1]

// Chinese and Japanese lack word boundary marks, and there is no clear
// agreement on what constitutes a word, so treat the position before any CJK
// character as a word start.
if (Character::IsCJKIdeographOrSymbol(first_character))
  return true;

wtf_size_t word_break_search_start = result.start + result.length;
while (word_break_search_start > result.start) {
  word_break_search_start =
      FindNextWordBackward(text, text_length, word_break_search_start);
}
if (word_break_search_start != result.start)
  return false;
return static_cast<int>(result.start + result.length) ==
       FindWordEndBoundary(text, text_length, word_break_search_start);
}
==========================================================
#define CHECK_LE(val1, val2) CHECK_OP(<=, val1, val2)

[1] call U16_GET after DCHECK_LE. This check means result.start + result.length must lessthan text_length, we can see about U16_GET

/**
* Get a code point from a string at a random-access offset,
* without changing the offset.
* "Safe" macro, handles unpaired surrogates and checks for string boundaries.
*
* The offset may point to either the lead or trail surrogate unit
* for a supplementary code point, in which case the macro will read
* the adjacent matching surrogate as well.
*
* The length can be negative for a NUL-terminated string.
*
* If the offset points to a single, unpaired surrogate, then
* c is set to that unpaired surrogate.
* Iteration through a string is more efficient with U16_NEXT_UNSAFE or U16_NEXT.
*
* @param s const UChar * string
* @param start starting string offset (usually 0)
* @param i string offset, must be start<=i<length
* @param length string length
* @param c output UChar32 variable
* @see U16_GET_UNSAFE
* @stable ICU 2.4
*/
#define U16_GET(s, start, i, length, c) UPRV_BLOCK_MACRO_BEGIN { \
  (c)=(s)[i]; \
  if(U16_IS_SURROGATE(c)) { \
      uint16_t __c2; \
      if(U16_IS_SURROGATE_LEAD(c)) { \
          if((i)+1!=(length) && U16_IS_TRAIL(__c2=(s)[(i)+1])) { \ [2]
              (c)=U16_GET_SUPPLEMENTARY((c), __c2); \
          } \
      } else { \
          if((i)>(start) && U16_IS_LEAD(__c2=(s)[(i)-1])) { \
              (c)=U16_GET_SUPPLEMENTARY(__c2, (c)); \
          } \
      } \
  } \
} UPRV_BLOCK_MACRO_END

the third parameter is the length of the target string which be searched, just like find xy in xyd, and the length of this time is two. But [2] makes me puzzle, it seems like the length parameter is the end index of the xyd, but in truth it is the length of xy. And @param length string length proves my opinion. If we assignment i == length like i = 2, length = 2 and __c2=(s)[(i)+1] can oob read. We can check our answer by Detail.

This patch chagnes |IsWholeWordMatch()| to use |U16_GET()| with valid parameters to avoid reading out of bounds data.

In case of search "\uDB00" (broken surrogate pair) in "\u0022\uDB00", we call |U16_GET(text, start, index, length, u32)| with start=1, index=1, length=1, where text = "\u0022\DB800", then |U16_GET()| reads text[2] for surrogate tail.

After this patch, we call |U16_GET()| with length=2==end of match, to make |U16_GET()| not to read text[2].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exercise_1

exercise_1

README.md

Exercise 1

CVE-2021-21128

Details

Set environment

Related code

Do it

Files

exercise_1

Directory actions

More options

Directory actions

More options

Latest commit

History

exercise_1

Folders and files

parent directory

README.md

Exercise 1

CVE-2021-21128

Details

Set environment

Related code

Do it