Skip to content

Added Range iterator #447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from
Closed

Conversation

TrolledWoods
Copy link
Contributor

I thought an iterator that iterates over a range of values would be pretty neat to have. Use if you want to!

Copy link
Member

@phimuemue phimuemue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there. Surprising to see that there is apparently no existing functionality like this. Could polish it a bit and remove the (probably accidental) formatting changes?

@me22
Copy link

me22 commented Jun 5, 2020

One potential reason it doesn't exist is that it can be accomplished with skip-take: If I'm reading things correctly here, .range(a..b) is equivalent to .skip(a).take(b.saturating_sub(a)).

It may still be worth keeping the range method, but consider whether the new iterator struct is actually necessary -- reusing Skip and Take would mean things are automatically double-ended and such where possible. (And I'm not sure whether supporting .range(1..) and .range(..10) are actually valuable, since they're equivalent to .skip(1) and .take(10).)

@jswrenn
Copy link
Member

jswrenn commented Jun 5, 2020

This is very similar to Vec::drain. Should it use the same name?

@TrolledWoods
Copy link
Contributor Author

@jswrenn I don't think it should be called drain, because it consumes the whole iterator, not just a part of a vector, like drain does. But yeah, the formatting changes were accidental, and reusing skip and take may be a better idea. The problem with those though is that they wouldn't work too well with unlimited iterators. range(5..) only corresponds to the call skip(5) and doesn't need a take, but that changes the type of the iterator. This could potentially be fixed by using an enum, though.

@scottmcm
Copy link
Contributor

scottmcm commented Jun 5, 2020

I feel pretty strongly that it shouldn't be drain because the important part to me about drain is that it modifies the receiver even if the return value is not used. To me this is far more like Vec::get.

@TrolledWoods One could also use an associated type for the return type so that .range(5..) returns Skip<_> and .range(..5) returns Take<_> and .range(5..10) return Skip<Take<_>>.

@TrolledWoods
Copy link
Contributor Author

That's a great idea @scottmcm

@TrolledWoods
Copy link
Contributor Author

Yeah, this is definitely better, now we can have range(..) be the same as a noop, so we keep ALL the functionality of the underlying iterator!

@TrolledWoods
Copy link
Contributor Author

I was thinking maybe the range standalone function should use IntoIterator instead of Iterator, since it seems like most other standalone functions do that in itertools

@TrolledWoods
Copy link
Contributor Author

And do we really want the access to the range function to be via itertools::range::range? I wanted to avoid the name clash that pub use range::range; could cause, but maybe I should rename the range.rs file to something like range_traits.rs

src/range.rs Outdated
/// Limits an iterator to a range. See [`Itertools::range`]
/// for more information.
pub fn range<I, R>(iter: I, range: R)
-> R::IterTo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of exploiting existing iterator implementations. Three questions:

  • Should we introduce a type alias RangeIterator<R, I> so that our public interface does not expose that the return type is realized as an associated type?

  • I guess at some point users want to write:

    fn doit(r: impl std::ops::RangeBounds) {
     give_me_usize_iterator().range(r); // somewhere within the function
    }
    

    This - as far as I can see - does not work because range expects an IntoRangeIter, so the user probably goes ahead and tries:

    fn doit(r: impl IntoRangeIter) {
     give_me_usize_iterator().range(r); // somewhere within the function
    }
    

    But if I am not mistaken this will not work either, because IntoRangeIter requires to specify an iterator upfront. Is it possible to implement IntoRangeIter in a way that does not require the user to specify the iterator?

  • Should we try to inform the type system that each IntoRangeIter also satisfies std::ops::RangeBounds so that users can write

    fn doit(r: impl IntoRangeIter<(possibly iter type)> {
      give_me_iter().range(r.clone());
      vec![1,2,3,4,5].drain(r);
    }
    

    without having to specify the (imho opinion) implied RangeBounds trait bound?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that the TraitBounds -> IntoRangeIter conversion can work, because IntoRangeIter type is implement specifically for each TraitBounds type, so implementing it for all TraitBounds types would be a conflict. I'm not sure if rust team is working on prioritized implementations, but if it's possible to somehow say "use these implementations, or this other even more generic one if the specific ones are not possible".

IntoIterRange implementing RangeBounds is definitely something we should add.

And for some reason I had the idea that trait methods couldn't have generics, but that doesn't seem to be the case, so we shouldn't need to pass around the iterator information in the trait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'll see if I can figure out that type alias thing. I am not too used to working with some of these things, I have stuck in my little rust box and not gone into other places too much, but it's great that I am now!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that maybe we should rename "IntoRangeIter" to "IterRangeBounds", because I feel like that makes more sense. The bounds aren't the thing that turn into an iterator after all, they just are a RangeBounds that work on an iterator

@TrolledWoods
Copy link
Contributor Author

I'm going to be honest though, it feels like the associated type version has a few more drawbacks than the previous approach. It adds a new trait, it changes its return type based on the input, it cannot take a generic RangeBounds type, and(giving it some more thought I realized this) you have to pass around the iterator with the range as a type parameter until generic trait types are stable. The current approach is sure a lot easier to implement and less error prone, but I'm not so sure anymore it's the right approach

@jswrenn
Copy link
Member

jswrenn commented Jun 6, 2020

The trait approach is very close to how the core library implements slice indexing: it defines a trait, SliceIndex, and then implements ops::Index for all indices implementing that trait. In practice, the burden of this extra trait is small; I think most end-users are completely unaware that SliceIndex exists!

Likewise, I don't think implementing iterator indexing with an “IteratorIndex” trait would be terribly unergonomic. The symmetry with SliceIndex is a point in its favor, too.

Here's what an implementation mirroring SliceIndex as closely as possible looks like: https://play.rust-lang.org/?edition=2018&gist=2d908fb28c16fb69070ea857c5aa1143

I'm curious if this is a candidate for standard library inclusion.

@TrolledWoods
Copy link
Contributor Author

Then, making it a trait seems to be very nicely consistant with the standard library! I have really terrible internet at the moment, so it's kindof hard for me to push changes at the moment.

@TrolledWoods
Copy link
Contributor Author

One thing though, since range is so closely mapped by slice.get(), maybe it should just be called get, and support usize indexing as well? So .range(4) (or .get(4)), is just the same as .nth(4). This range thing is for convenience to begin with, so I don't think that this addition would be strange

@jswrenn
Copy link
Member

jswrenn commented Jun 8, 2020

Agree completely. Both of those changes are already incorporated into the playground URL I linked. :-)

@TrolledWoods
Copy link
Contributor Author

Welp, I don't know about the naming, but I think it's starting to come together quite nicely now

@jswrenn
Copy link
Member

jswrenn commented Jun 8, 2020

Let's stick to mirroring SliceIndex as closely as possible:

  • IterIndexIteratorIndex
  • IterIndex::getIteratorIndex::index

Could you also just make sure that indentation consistently uses four spaces? Right now it's a mix of tabs and spaces.

@TrolledWoods
Copy link
Contributor Author

Oh sorry, I prefer tabs for my personal projects, so my editor uses them. I'll rename the things and fix the tabs

@TrolledWoods
Copy link
Contributor Author

Should I rename the module to iterator_index too, or just keep it as iter_index?

@jswrenn
Copy link
Member

jswrenn commented Jun 8, 2020

iter_index is fine, as it's just an internal implementation detail and not publically exposed.

@TrolledWoods
Copy link
Contributor Author

Are there any more tweaks that I should do?

@TrolledWoods
Copy link
Contributor Author

I could limit the IteratorIndex::Output to be an IntoIterator, which could allow people to write some nice generic code with the get if they wanted to. Another option would be to limit the Output to Iterator, but then the .get(5) case would have to return the Option iterator, which I think could be kinda clunky for users

@jswrenn
Copy link
Member

jswrenn commented Jun 8, 2020

Limit IteratorIndex to Iterator (that way it'll play nice with method chaining).

The usize indexing case should be this:

impl<I> IteratorIndex<I> for usize
where
    I: Iterator,
{
    type Output = <Option<I::Item> as IntoIterator>::IntoIter;

    fn index(self, mut iter: I) -> Self::Output {
        iter.nth(self).into_iter()
    }
}

@TrolledWoods
Copy link
Contributor Author

Done!

@jswrenn
Copy link
Member

jswrenn commented Jun 18, 2020

I'm excited to merge this, but I want to first be sure we won't eventually be in conflict with the core library. I've opened an issue on @rust-lang/rust asking whether this PR would be a good core library addition: rust-lang/rust#73482

@cuviper
Copy link
Contributor

cuviper commented Jun 18, 2020

Did you consider using dropping instead of skip? That changes the laziness, but you did use nth in IteratorIndex for usize, which has the same immediate effect.

edit: I guess you'd need to fuse though, in case dropping exhausts the iterator.

@TrolledWoods
Copy link
Contributor Author

That's a good point. nth isn't consistant with the rest of the things. I think however the better approach is to make IteratorIndex for usize lazy instead of making everything else pre-evaluated. I think having it be lazy makes more sense. Not sure what you think, though

@cuviper
Copy link
Contributor

cuviper commented Jun 20, 2020

Another consideration -- slice get(range) returns new slices, but get(usize) just returns the element, whereas you have IteratorIndex returning new iterators in all cases. But if iterator get(usize) were to return the element directly, that's literally just nth.

It's probably more consistent for iterators to be lazy all around.

@jswrenn
Copy link
Member

jswrenn commented Jun 26, 2020

I'm of two minds.

Approach 1: It's just a shorthand

We could have a get method that's just a shorthand for nth, take and skip. In this possibility, we have get(usize) produce an Option<Item>. Ergonomics are maybe slightly diminished, but the explanation of get is simpler: get is just a shorthand for nth, take and skip.

If an end-user wants to lazily get the nth element, they still can: they just index on n..=n.

I like the ease of explanation, ease of implementation, and flexibility of this approach.

The downside of this approach is get(usize) consumes the underlying iterator, but the other index types do not. This is weird for two reasons:

  1. First, when using Iterator::get in a context that's generic over the indexing type, it's weird that certain instantiations of that type would mutate the consume iterator, but others would not. I don't anticipate IteratorIndex to be used by end-users often, so I'm not too worried about this scenario.
  2. Second, we're clearly drawing an analogy to Slice::get, but this lazy/eager discrepancy weakens the analogy: Slice::get never consumes the underlying slice.

Approach 2: Consistency above all

If we keep the Iterator bound on Output, then we probably ought to ensure that get(usize) lazily consumes the underlying iterator. However, there's no lazy nth in the standard library, so we'd either need to use a combination of skip and take for this case, or introduce a new lazy_nth construct.

I like the semantic consistency of this approach. However, it's not as easy to implement, and (IMO) it's not as easy to explain. It's a lot of work just for a lazy_nth. But is there any demand for a lazy_nth construct? I struggle to think of a case where I would use it, and a cursory glance at the issues here and on rust-lang/rust doesn't turn up any requests for it.


In the absence of demonstrated demand for a lazy_nth construct, I have a slight preference towards the first approach, but I'm not really convinced either way.

If this issue proves to be major roadblock, let's just remove the IteratorIndex implementation for usize (and change the bound on IteratorIndex::Output to IntoIterator).

Copy link
Member

@jswrenn jswrenn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TrolledWoods, could you:

  1. make the requested changes
  2. rustfmt src/index_iter.rs
  3. fix the merge conflict

After that, I'd be delighted to merge this!

@TrolledWoods
Copy link
Contributor Author

I'm terribly sorry it took so long, I completely forgot about this and it seems github is not very adamant about telling you new messages have appeared. If there's anything more you need do tell!

Copy link
Member

@Philippe-Cholet Philippe-Cholet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jswrenn @phimuemue
Hi everyone, I had a mental draft for this. This seems quite promising and I agree to not include .get(usize).
I'm not sure what is the blocker here? Are we waiting for rust-lang/rust#73482 (RFC2845 would fix the painpoint, see #702 (comment))? Or maybe it just got lost in the limbs?! 👻

It would have to be rebased (to run CI) and (thankfully) formatted (for CI to pass).

///
/// Works similarly to [`slice::get`](https://doc.rust-lang.org/std/primitive.slice.html#method.get).
///
/// It's a generalisation of [`take`], [`skip`] and [`nth`], and uses these
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[`Iterator::take`], [`Iterator::skip`] should be enough.
Remove nth since it is not used anymore.

Comment on lines +527 to +529
/// [`take`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.take
/// [`skip`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.skip
/// [`nth`]: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.nth
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove links.

Comment on lines +491 to +492
/// Returns an element at a specific location, or returns an iterator
/// over a subsection of the iterator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "an element at a specific location, or returns".

///
/// range = vec.iter().get(..).copied().collect();
/// assert_eq!(range, vec);
/// ```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No test about RangeToInclusive (such as ..=2) yet, add one or codecov will rightfully complain.

@jswrenn
Copy link
Member

jswrenn commented Feb 29, 2024

I'd love to see this merged! It just needs a bit of attention and rebasing.

@Philippe-Cholet
Copy link
Member

Nice.

@TrolledWoods Are you around to finish this awesome feature of yours?

@Philippe-Cholet Philippe-Cholet mentioned this pull request Mar 1, 2024
@TrolledWoods
Copy link
Contributor Author

I will try to do the changes! Turns out formatting and then un-formatting everything makes it hard for git to merge though, will have to try to clean these commits up somehow...

@TrolledWoods
Copy link
Contributor Author

Oh, sorry didn't see that it was being worked on already. Nice work!

@Philippe-Cholet
Copy link
Member

Philippe-Cholet commented Mar 17, 2024

@TrolledWoods
When I realized it was gonna be painful for anyone to rebase on master, I copied change diffs, pasted on master, formatted, committed with you as co-author (as I could not remove myself from it) and opened #891. There are still small changes to do (like corner cases) that I'll probably do next week.

Thanks for this! I guess this was delayed partially because maintainers were busy elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants