-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Add fold_mut alternative to Iterator trait #76746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the default should be implemented in terms of
fold
, so it automatically takes advantage of all the iterators that already specialize this path.Even better if that isolates the closure generics like #62429, so it only depends on
Self::Item
rather than all ofSelf
(plusB
andF
of course).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try with inputs like
Chain
orFlatMap
to see the performance difference of customfold
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea! I tried it here and the results seem to indicate that this brought back the performance "issue" (assuming the benchmark is built right, Criterion is doing it's thing correctly, and I'm interpreting the results correctly):

Should I push forward with that? I was originally making
fold_mut
for the performance but we could instead make it about the shape of the closure and just wait on improvements tofold
if we think that's valuableGood call, I'll whip up a few of these when I get the chance!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eureka! I added some benchmarks (with terrible names and with terrible duplication that should be replaced with a macro) in 2921101 and it appears that
fold_mut
is much slower when usingchain
andflat_map
!Unfortunately they're a little bit out of order - I will definitely pick better names if we decide to go through with this PR but basically for
chain
andflat_map
,fold_mut
is like 50%-100% slower.For a non-chain-non-flat-map fold on a hashmap, they're about the same (except
fold_mut
has a huge variance so I'm not sure about that there).For a non-chain-non-flat-map fold on a number,
fold_mut
is slower now (interesting that has now switched. I guess that means they're basically "the same within precision"?).For a non-chain-non-flat-map fold on a
Vec
,fold_mut
is faster by some huge margin that is suspicious.I think maybe I'll port these benchmarks back to my other repo so I can use criterion for (maybe) more consistent data? Or should we just pull the plug on this? Or redirect the effort towards the ergonomics and not worry about performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's expected because of the
while let
(essentially afor
loop), as cuviper said/implied. If you just replace the (conceptual) for loop with a.for_each(
I suspect you'd recover the performance:(insert comment about #62429 here and how that shouldn't be the implementation in
core
, but it'd be fine for benchmarking)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm that went a little over my head, let me see if I understand:
fold
among them) for certain iterators. This reduced the number of types the iterator closure was generic overfold_mut
that doesn't usefold
,fold_mut
misses out on the specificfold
implementations for those iteratorsfor_each
,fold_mut
may recover the performance because, I assume, there are similarly specializedfor_each
implementations for several iteratorsInterested to know if any of those are in the right ballpark!
And for expedience (maybe), assuming those points are in the correct ballpark, may I ask:
fold
that was "closer" to a "zero cost abstraction". It's seeming more and more like that isn't super possible (except with maybe thefor_each
construction above?). Should I bail on the "performance" offold_mut
and double down on the ergonomics of the closure by definingfold_mut
in terms offold
and thenfold_mut
will maybe become more of a zero cost abstraction asfold
is specialized on other iterators likeVec::IntoIter
or something?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concerns of #62429 are mostly separate from the concerns of
fold
specialization. The overlap is just that usingfold
orfor_each
often introduces a generic closure.The point of #62429 is that closures inherit all of the generic type parameters of the surrounding scope. In your example that would be
B
,F
, andSelf
-- but the closure doesn't really need the whole iterator type, justSelf::Item
. So you can end up generating the same closure for<B, F, vec::IntoIter<i32>>
,<B, F, Cloned<hash_set::Iter<'_, i32>>>
,<B, F, Map<Chain<Range<i32>, Once<i32>>, map-closure{...}>>
, etc., when our closure could be just<B, F, Item = i32>
. This is a compile-time concern for excessive monomorphizations, and can also make it exceed type-length limits. There has been some compiler work to trim unused parameters, but it still doesn't know how to reduce thatSelf
toSelf::Item
. So the trick in #62429 is to use a "closure factory" function with tighter control over the generic parameters.The
fold
specialization is more about runtime performance, which is why it changes your benchmarks. For example,Chain::next()
has to check its state every time it is called, whether to pull an item from its first or second iterator, whereasChain::fold()
can just fold all of the first iterator and then all of the second. The defaultfor_each
is just afold
with an empty()
accumulator, which is why it also benefits here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, thank you for that explanation!
Awesome, this makes a ton of sense
Woah, did not see that coming! So I originally read that thinking, "Great, I'll rewrite
fold_mut
usingfor_each
as mentioned above and I'll get the benefits offold
improvements plus maybe Rust can tell infor_each
that moving and reassigning()
is a no-op and we'll keep the performance improvements on 'simple' iterators".I'm running out of benchmark steam (is there a better way to do this than running
./x.py bench -i library/core/ --test-args mark_bench_fold
? - it takes about a half hour on my machine) but gave this a shot - I defined three methodsfold_mut
which useswhile let
fold_mut_fold
which usesfold
under the hoodfold_mut_each
which usesfor_each
under the hoodAnd ran 8 benchmarks. 4 were operating on
(0..100000).map(black_box)
(named_simple
) and 4 were operating on(0i64..1000000).chain(0..1000000).map(black_box)
(named_chain
). Each test was (hopefully) calculating the sum of all the even numbers:so I think based on all these shifty benchmarks moving around so much ... maybe these're all within statistical uncertainy (combined with whatever my computer is doing at any given time). This was super fun to play around with but I think I'm going to bow out of the "maybe
fold_mut
will be faster!" argument :-DThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a zero sized type (ZST), so there's literally nothing to do in codegen.
I'm really surprised that it did poorly in your benchmark, but my guess is that there was some unlucky inlining (or lack thereof), especially if parts of that benchmark got split across codegen-units.