Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose more of the matching interface in Ruby #119

Closed
mudge opened this issue Nov 27, 2023 · 2 comments
Closed

Expose more of the matching interface in Ruby #119

mudge opened this issue Nov 27, 2023 · 2 comments
Assignees
Labels

Comments

@mudge
Copy link
Owner

mudge commented Nov 27, 2023

When I first wrote this gem about a decade(!) ago, I naïvely intended it to be a drop-in replacement for Ruby’s Regexp standard library. However, RE2 not only doesn’t have the same syntax as Ruby’s regular expressions but it has its own unique capabilities that we’re not taking advantage of by hiding it behind a restrictive Ruby API.

Already, RE2::Regexp#match has some poorly documented functionality that is unique to RE2: the ability to specify the exact number of submatches when performing a match which has a significant effect on performance. This should not only be better explained but be a core part of the API along with the other arguments to Match: startpos, endpos (not available on all versions of RE2) and anchor.

This would also create a natural opportunity to introduce the higher-level FullMatch and PartialMatch APIs.

@mudge mudge added the feature label Nov 27, 2023
@mudge mudge self-assigned this Nov 27, 2023
@mudge
Copy link
Owner Author

mudge commented Nov 28, 2023

It's also worth nothing that Google's own documentation for RE2 only focusses on FullMatch and PartialMatch and doesn't mention other parts of the API so we should at least offer Ruby analogues.

mudge added a commit that referenced this issue Nov 30, 2023
GitHub: #119

Add new options to `RE2::Regexp#match` that expose the underlying
capabilities of RE2's Match function:

* anchor: specifying whether a match should be unanchored (the default),
  anchored to the start of the text or anchored to both ends
* startpos: the offset at which to start matching (defaults to the start
  of the text)
* submatches: the number of submatches to extract (defaults to the
  number of capturing groups in the pattern)

We keep compatibility with the previous API by still accepting a number
of submatches as the second argument to match.

With these new options in place, we can now offer a higher-level
`RE2::Regexp#full_match` and `RE2::Regexp#partial_match` API to match
RE2's own. Note we don't actually use the underlying `FullMatchN` or
`PartialMatchN` functions as we need to use `Match`'s behaviour of
returning the overall match first before any extracted submatches.

The plan is to then heavily promote these two methods over the
lower-level `match`.
mudge added a commit that referenced this issue Nov 30, 2023
GitHub: #119

Add new options to `RE2::Regexp#match` that expose the underlying
capabilities of RE2's Match function:

* anchor: specifying whether a match should be unanchored (the default),
  anchored to the start of the text or anchored to both ends
* startpos: the offset at which to start matching (defaults to the start
  of the text)
* submatches: the number of submatches to extract (defaults to the
  number of capturing groups in the pattern)

We keep compatibility with the previous API by still accepting a number
of submatches as the second argument to match.

With these new options in place, we can now offer a higher-level
`RE2::Regexp#full_match` and `RE2::Regexp#partial_match` API to match
RE2's own. Note we don't actually use the underlying `FullMatchN` or
`PartialMatchN` functions as we need to use `Match`'s behaviour of
returning the overall match first before any extracted submatches.

The plan is to then heavily promote these two methods over the
lower-level `match`.
mudge added a commit that referenced this issue Dec 1, 2023
GitHub: #119

Expose RE2::Match()'s endpos argument in Ruby so users can specify an
offset at which to stop matching.

Note that old versions of RE2 don't accept an endpos argument when
matching so we explicitly detect this and raise an exception when
attempting to pass it to a version that doesn't support it.
mudge added a commit that referenced this issue Dec 1, 2023
GitHub: #119

Add new options to `RE2::Regexp#match` that expose the underlying
capabilities of RE2's Match function:

* anchor: specifying whether a match should be unanchored (the default),
  anchored to the start of the text or anchored to both ends
* startpos: the offset at which to start matching (defaults to the start
  of the text)
* submatches: the number of submatches to extract (defaults to the
  number of capturing groups in the pattern)

We keep compatibility with the previous API by still accepting a number
of submatches as the second argument to match.

With these new options in place, we can now offer a higher-level
`RE2::Regexp#full_match` and `RE2::Regexp#partial_match` API to match
RE2's own. Note we don't actually use the underlying `FullMatchN` or
`PartialMatchN` functions as we need to use `Match`'s behaviour of
returning the overall match first before any extracted submatches.

The plan is to then heavily promote these two methods over the
lower-level `match`.
mudge added a commit that referenced this issue Dec 1, 2023
GitHub: #119

Expose RE2::Match()'s endpos argument in Ruby so users can specify an
offset at which to stop matching.

Note that old versions of RE2 don't accept an endpos argument when
matching so we explicitly detect this and raise an exception when
attempting to pass it to a version that doesn't support it.
@mudge
Copy link
Owner Author

mudge commented Dec 5, 2023

Version 2.5.0 now exposes the full underlying Match interface upon which the new full_match and partial_match APIs are built.

@mudge mudge closed this as completed Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant