Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase test coverage, coerce inputs to string and fix encoding inconsistencies #101

Merged
merged 5 commits into from
Sep 16, 2023

Conversation

mudge
Copy link
Owner

@mudge mudge commented Sep 15, 2023

Where possible, coerce inputs to strings with StringValue rather than raising a TypeError. This particularly impacts the relatively recent RE2::Set API which was excessively strict about its arguments.

Add test coverage to all parts of the API, better covering edge cases including how encoding is handled based on the encoding of the RE2 pattern.

@mudge mudge force-pushed the test-coverage branch 2 times, most recently from 279621d to 029e019 Compare September 15, 2023 13:25
@mudge mudge marked this pull request as ready for review September 15, 2023 13:59
Copy link
Owner Author

@mudge mudge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will close #74 and #73.

spec/re2_spec.rb Outdated Show resolved Hide resolved
spec/re2_spec.rb Outdated Show resolved Hide resolved
ext/re2/re2.cc Show resolved Hide resolved
@mudge mudge changed the title Increase test coverage and coerce inputs to string Increase test coverage, coerce inputs to string and fix encoding inconsistencies Sep 15, 2023
Where possible, coerce inputs to strings with StringValue rather than
raising a TypeError. This particularly impacts the relatively recent
RE2::Set API which was excessively strict about its arguments.

Add test coverage to all parts of the API, better covering edge cases
including how encoding is handled based on the encoding of the given
pattern.
When replacing with a string pattern, we will implicitly create an RE2
pattern from the string with the default options meaning it will assume
and produce UTF-8 results.

This could potentially be a breaking change for users who rely on the
string pattern's encoding but the behaviour has been misleading (and
passing anything except ISO-8859-1 or UTF-8 to RE2 is undefined).
As RE2::Regexp#match will only return true or false when given a pattern
with no capturing groups (rather than returning an RE2::MatchData),
document and test this edge case.
To distinguish "re2" the gem and "RE2" the C++ library, try to be
consistent with capitalisation.

Add a dedicated "Encoding" section to the README as well as adding the
same documentation to every method that returns strings from RE2.
@mudge mudge merged commit dbf90f6 into main Sep 16, 2023
@mudge mudge deleted the test-coverage branch September 16, 2023 11:22
ext/re2/re2.cc Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants