-
Notifications
You must be signed in to change notification settings - Fork 19
Define most of Pathname in Ruby (redo) #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
* This is just before methods started to be moved from Ruby code to the C extension. * BTW, in the ruby/pathname repository there was no pathname.rb before that commit. (cherry picked from commit 16e97a5)
* This means it's only additions in lib/pathname.rb and zero removals. (cherry picked from commit 3736eab)
(cherry picked from commit 955186c)
* The <=> implementation in the extension is much faster, so is kept. * The other methods are actually faster in Ruby than in C, because rb_funcall() and rb_ivar_get() in C code have no inline cache, but method calls and `@path` have inline caches in Ruby code. https://railsatscale.com/2023-08-29-ruby-outperforms-c/ is an explanation of that (though it was known well before that). (cherry picked from commit c8c2210)
(cherry picked from commit a15c1f5)
(cherry picked from commit fe027ae)
* Avoids a MatchData allocation. (cherry picked from commit 643585a)
(cherry picked from commit 177a86d)
(cherry picked from commit f8e0cae)
(cherry picked from commit aa4d4c6)
(cherry picked from commit c96b559)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly fine to me besides a few nitpicks, and I'm very much in favor of migrating things to pure Ruby when it makes sense.
Not sure how this works with Pathname having been made a core class though.
begin | ||
old = Thread.current[:pathname_sub_matchdata] | ||
Thread.current[:pathname_sub_matchdata] = $~ | ||
eval("$~ = Thread.current[:pathname_sub_matchdata]", block.binding) | ||
ensure | ||
Thread.current[:pathname_sub_matchdata] = old | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get what this does, but it's super scary. Perhaps it's outdated and no longer necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sets $~
in the block, removing it causes this test to fail:
pathname/test/pathname/test_pathname.rb
Lines 573 to 580 in 593f030
def test_sub_matchdata | |
result = Pathname("abc.gif").sub(/\..*/) { | |
assert_not_nil($~) | |
assert_equal(".gif", $~[0]) | |
".png" | |
} | |
assert_equal("abc.png", result.to_s) | |
end |
It's a bit hacky but that's the original code and I don't see a better solution.
I could restore the C version of
#sub
if desired, but I'd need to keep the pure-Ruby version for e.g. JRuby.(I didn't know one can set
$~
, TIL)
…assed too * Core methods regularly gain new keyword arguments so this is more future-proof.
37ebd64
to
834cc54
Compare
Co-authored-by: Jean Boussier <[email protected]>
@hsbt Let's discuss your concerns and suggestions here.
Can you make a concrete suggestion by what you mean by small PRs for this change? I could make a PR with fewer commits, but every commit until Handle Windows NTFS edge case in Pathname#sub_ext is strictly necessary, otherwise the CI doesn't pass. If you are asking a smaller diff in general I think that is not feasible, e.g. making a PR per method would take months of work and still be the exact same end result. The approach here as detailed in the first commit message, Restore lib/pathname.rb from ext/pathname/lib/pathname.rb at ed9270a is to use the Ruby code of pathname.rb from before the translation to C. There is no meaningful way to break that in smaller changes. And that code has already been reviewed, it was exactly the code in Pathname before the translation to C started. Please take the time to read the commit messages, they should make it very clear what I did and what needs deeper review (e.g. imported code from the gem as-is doesn't). |
Same as #53, but that was reverted in 593f030.
Should be reviewed commit-by-commit, that makes it much clearer which parts of the code are new, and which are from the original pathname.rb before translation to C began.
Please review it.
If there is no review within a week I'll assume everyone agrees with the PR in the current state.
I cherry-picked the commits to make it easier to review.
Description from the original PR, reordered to have the most important first:
Once upon a time, Pathname was pure-Ruby: https://github.com/ruby/ruby/blob/95bc02237635d3fe42532bfe53038257575cee75/lib/pathname.rb
This PR goes back to that, but keeps the C extension implementation of
<=>
as that one is significantly faster.The other Pathname methods are actually faster in Ruby than in C, because all these methods just do
rb_funcall()
andrb_ivar_get()
and those in C code have no inline cache, but the corresponding method calls and@path
have inline caches in Ruby code.https://railsatscale.com/2023-08-29-ruby-outperforms-c/ is an explanation of that (though it was known well before that).
I have discussed this with @akr several times (notably in https://bugs.ruby-lang.org/issues/17473) and the last time he said it was OK to do this change.
The main goals are:
I worked to make the diff really clean, it only adds lines in
lib/pathname.rb
and only removes lines inext/pathname/pathname.c
. That way it should be easy to review it.I restored the Ruby implementation of the methods from ed9270a, the commit just before methods started being migrated to the C extension.
I then fixed things to make the test suite pass and implemented the few missing methods based on their C definition.
The individual commits and their messages make it clear what exactly happened, so I would recommend to review commit-by-commit.
From my discussions with @akr, IIRC, the original motivation to rewrite pathname.rb to C, besides the optimization for
<=>
, was apparently to use*at
functions likeopenat
(seeman openat
,Rationale for openat() and other directory file descriptor APIs
) but these are not portable, it did not happen, and is only useful in very rare edge cases.The Ruby
Dir
class could potentially support some of that, but it seems it has never been important enough for someone to implement it.The API of Pathname would anyway also need to change to take advantage of a working directory different than the process CWD, e.g. Pathname methods would need to take an extra "Pathname to use as working directory" argument.
(because if one just uses
Pathname("relative/path").open(...)
there is no point to use*at()
functions).It's significantly faster with this PR (first line is this branch, second line is
master
):