-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Scan is overly specialized #68371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We cannot change the existing definition of |
Isn't |
@KrishnaSannasi
This is exactly the part of the existing You can certainly implement the standard In my case I'm laying out lines of text with word wrapping so I need to keep track of the y-offset of the end of the previous line and yield the character offset of where to split each line. The wikipedia article I linked lists other applications. |
Building on the naming in the wikipedia article, maybe we could add |
If we look at the different map and fold methods that exist, we can infer how scan methods should behave and be named. Mapping methodsRegular mapping : fn map<B, F>(self, f: F) -> Map<Self, F>
where
F: FnMut(Self::Item) -> B Filter mapping (F returns an fn filter_map<B, F>(self, f: F) -> FilterMap<Self, F>
where
F: FnMut(Self::Item) -> Option<B> While Mapping (F returns an fn map_while<B, P>(self, predicate: P) -> MapWhile<Self, P>
where
P: FnMut(Self::Item) -> Option<B> Folding methodsSeeded folding: fn fold<B, F>(self, init: B, f: F) -> B
where
F: FnMut(B, Self::Item) -> B First-seeded folding: fn fold_first<F>(self, f: F) -> Option<Self::Item>
where
F: FnMut(Self::Item, Self::Item) -> Self::Item A scan is literally meant to map each step of a folding operation. As it's somehow a combination of both operations, the concept of "scanning" might be any combination of their variants. If we stick to consistency in the naming, the possible scanning methods would be :
(note that I'm not 100% certain that the We can see from this table that the current behavior of scan is the behavior we could expect of It is indeed very hard to change the current definition of scan as it would completely break existing code. Changing the naming convention could be the way to get around the problem (and indeed depreciating fold_seed() // currently: fold()
fold_first() // currently: fold_first(), no change here
scan_seed() // no method in rust
scan_first() // no method in rust
filter_scan_seed() // no method in rust
filter_scan_first() // no method in rust
scan_seed_while() // currently: scan()
scan_first_while() // no method in rust Obviously, not all of them are not needed. Some of them can be easily expressed using other ones (see the Wikipedia article above for details). Some of them have very few practical use and might not need to be implemented in Rust. Though, I strongly agree with the original concern of the issue : having as the only available implementation of scan the one we currently have is quite terrible : more often than not, the "end scanning when F returns PS : I'm not sure how "seeded" and "first-seeded" relate to "inclusive" and "exclusive" scanning/folding. It seems to me that they're different concepts, and the Wikipedia page seems to disagree with me but in a very unclear way, so I chose not to confuse the two. They might be the same thing, IDK |
@Mayt38 great points! |
I was trying to implement a function which accepts an iterator and reduces adjacent repeating elements into only one element. Say we have |
Besides the fact that the current The proposed let mut state = init;
iter.map(move |x| f(state, x)); This is at least as simple and pretty as Let's consider Haskell's First try: let mut state = init;
iter.map(move |x| {
state = f(state, x);
state
})) ...but this does only work for copy types and it differs from Haskell's My next try, now passing the state to let mut state = init;
iter.map(move |x| {
let new_state = f(&state, x);
std::mem::replace(&mut state, new_state)
}) ...but now the last item is missing and we cannot simply get it out of the closure's state. Unless anyone else has a better idea, you would have to create your own iterator struct to emulate the behaviour of Haskell. Haskell is a well-thought-out language and I think that |
@Johannesd3 I didn't even think about how odd it is to have the state being mutable. Good point. |
I disagree with this, iterators are about being declarative instead of imperative, mutability is an orthogonal concern. Rust doesn't shy away from mutability like in more functional languages, because sometimes mutability is required for performance or making the algorithm easier to understand. Limiting mutability artificially makes apis harder to use in Rust. For example: All that said, I also agree that the current implementation of
scanl in Rust-- from prelude
scanl :: (b -> a -> b) -> b -> [a] -> [b] is equivalent to mut_scanl :: (a -> State b ()) -> b -> [a] -> [b]
mut_scanl mut_f b as = scanl f b as
where f b a = next_b
where ((), next_b) = runState (mut_f a) b is equivalent to the following Rust fn scanl<B, F, I>(f: F, mut b: B, iter: I) -> impl Iterator<Item = B>
where
I: Iterator,
// in Haskell everything is `Clone`, so this is equivalent
B: Clone,
// 1. may as well provide `FnMut` because we can
// 2. `&mut B` is equivalent to `State B ()` in Haskell
F: FnMut(I::Item, &mut B),
{
iter.map(move |a| {
f(a, &mut b);
b.clone()
})
} We could take it a step further: fn scanl<B, F, I>(f: F, mut b: B, iter: I) -> impl Iterator<Item = C>
where
I: Iterator,
F: FnMut(I::Item, &mut B) -> C,
{
iter.map(move |a| f(a, &mut b))
} But what does this offer over I can see a use case for the first implementation of We transform this back to Haskell too mut_scanl :: (a -> State b c) -> b -> [a] -> [c]
mut_scanl mut_f b = join . map pick_c . scanl f (b, Nothing)
where f (b, _) x = (next_b, Just next_c)
where (next_c, next_b) = runState (mut_f x) b
pick_c (_, c) = maybeToList c (my Haskell isn't great, I'm just trying so the actual implementations aren't the focus, just the signatures) |
I wasn't claiming that One example: Creating an iterator that yields factorials using Using iter::once(one()).chain((1..).scan(one(), |state: &mut BigUint, x: usize| {
*state *= x;
Some(state.clone())
})) Using let mut fac = one();
(0..).map(move |n: usize| {
if n > 0 {
fac *= n;
}
fac.clone()
}) And using a Haskell-like myscan(1.., one(), |state: &BigUint, x: usize| x * state) Link to playground, containg the implementation of But the last one has two disadvantages: 1. The next factorial is calculated one step too early. 2. It is noticeably slower, but I am not sure why. So at least there seems to be a reason why the way with the mutable reference was chosen. (However, I am curious what exactly makes it so much slower, even though it works without EDIT: Wrong link |
@RustyYato the more general scan is useful in rust when you want to chain iterator functions like (actual example from my code):
In this example using
I'm not against mutability, but I do think that separating mutable operations from non-mutable operations improves readability because expectations are more clear. If I wanted to both transform each element and update a collection I would just use a Whether this is done conventionally or enforced by the standard library definitions is, however, a different question. Maybe there is a case to be made that people ought to be allowed to mutate at will for performance or other reasons. |
@Johannesd3 This is because the former can reuse the allocation, while the latter cannot. This means that
If every return of the codefont.layout(todo, *scale, point)
.enumerate()
.map({ let row_offset = 0.0; |(index, item)| {
let item_offset = item
.pixel_bounding_box()
.map_or_else(|| item.position().x, |x| x.max.x as f32);
if index == 0 {
Some(0)
} else if item_offset - row_offset <= (WIDTH as f32) {
None
} else {
row_offset += item_offset;
Some(index)
}
}})
.flatten()
.chain(once(todo.len()))
.collect::<Vec<usize>>()
.windows(2)
.map(|indices| &todo[indices[0]..indices[1]])
.collect::<Vec<&'a str>>() I use this pattern when spawning threads, which is another place where you want some stuff stuffed into the closure, but not the containing scope. There are a number of ways to format this, I chose this way because it most closely mirrors
Sure, retain itself is a mutable operation, but most of the time you don't need to mutate the elements of the I would argue that scan is a mutable operation, you're providing a state that is packaged with the iterator that will be mutated on each iteration. I can see |
@RustyYato Instead of using |
There was recently an internals thread on exactly this topic here. There it was also shown that I don't know how you can change |
I wasn't aware of that topic, but it is quite related. I'm not sure whether to continue the disscussion here or there. However, one interesting point from the discussion: When I talk about "replacing" |
I would just continue here
Another option is to add a combinator that removes the
Ahh, that makes more sense. |
There was some suggestions about AFAIK Let's look at let a = [1, 2, 3];
let mut iter = a.iter().scan(1, |state, &x| {
// each iteration, we'll multiply the state by the element
*state = *state * x;
// then, we'll yield the negation of the state
Some(-*state)
});
assert_eq!(iter.next(), Some(-1));
assert_eq!(iter.next(), Some(-2));
assert_eq!(iter.next(), Some(-6));
assert_eq!(iter.next(), None); Observations:
I think:
So, I suggest these new methods to replace Example: let a = [1, 2, 3];
assert_eq!(a.iter().state_before(None, |state, &item| state * item).eq(&[(1, 1), (2, 2), (6, 3)]));
assert_eq!(a.iter().state_before(Some(4), |state, &item| state * item).eq(&[(4, 1), (8, 2), (24, 3)]));
assert_eq!(a.iter().state_after(None, |state, &item| state * item).eq(&[(1, 1), (1, 2), (2, 3)]));
assert_eq!(a.iter().state_after(Some(4), |state, &item| state * item).eq(&[(4, 1), (4, 2), (8, 3)]));
What do you think? I used let max1 = r.iter().scan_after(0, |acc, x| acc + *x).map(|(x, _)| x).max().unwrap(); Mostly it's useful for something like creating or searching in prefix sum/max arrays. |
@optozorax IMHO using integers in your example is also a bit specialized. The abstraction provided by What's annoying to me is that since |
Main problem with current scan or map with external state is that the mutation of state usually returns |
I just want an auxiliary function with a signature as simple as |
Prefix sums, or more generally Haskell's |
Here is the existing type signature for std::iter::scan:
fn scan<St, B, F>(self, initial_state: St, f: F) -> Scan<Self, St, F> where F: FnMut(&mut St, Self::Item) -> Option<B>,
If I were to use a verbose name for
scan
as implemented now, I would call itmap_with_state_until
, this means that if I just wantmap_with_state
, things can get really awkward using the existing scan if I want to return an option which doesn't terminate the Iterator forcollect
etc. e.g.vecter.iter().scan( |state, x| if x > 1 then Some(None) else Some(Some(x)) ).flatten()
If we instead have
F
just turnB
instead ofOption<B>
like:fn scan<St, B, F>(self, initial_state: St, f: F) -> Scan<Self, St, F> where F: FnMut(&mut St, Self::Item) -> B,
then that becomes a lot more simple:vecter.iter().scan( |state, x| if x > 1 then None else Some(x) )
and achieving the existing
scan
behavior is trivially achieved by simply addingfuse()
.This allows
scan
to have behavior more in-line with similar functions likemap
andfold
:and also brings it more in-line with how other languages define
scan
:(a -> b -> a) -> a -> [b] -> [a]
(Haskell),('State -> 'T -> 'State) -> 'State -> seq<'T> -> seq<'State>
(F#) etc.I think this also gives a clear way forward to addressing other issues which previously ended up unresolved like:
#14425
So I think there ought to be
scan
as defined above, andscan_until
which is the existing implementation.References:
http://zvon.org/other/haskell/Outputprelude/scanl_f.html
https://msdn.microsoft.com/en-us/visualfsharpdocs/conceptual/seq.scan%5b't,'state%5d-function-%5bfsharp%5d
The text was updated successfully, but these errors were encountered: