Skip to content
This repository was archived by the owner on Sep 13, 2023. It is now read-only.

Thoughts on the current implementation of iterators #25

Closed
ncloudioj opened this issue Jan 18, 2019 · 6 comments
Closed

Thoughts on the current implementation of iterators #25

ncloudioj opened this issue Jan 18, 2019 · 6 comments
Labels
ARCHIVED CLOSED at time of archiving enhancement New feature or request

Comments

@ncloudioj
Copy link
Member

ncloudioj commented Jan 18, 2019

While trying to seek the solution to this potential optimization, I am more inclined to believe that the current iterator implementation on Cursor is problematic.

Firstly, as @mykmelez pointed out here, it's fairly easy to run into panics if the consumer ignores the fact that iterators are derived from the underlying cursors, hence iterators can't outlive its cursors, whereas the current implementation can not enforce that. Although the iterators in rkv are implemented by bundling the iterator and the cursor together, since rkv exposes it via pub use lmdb::Cursor, it remains as a risk there.

Secondly, allowing consumers to create iterators on an in-flight cursor could cause invalid (EINVAL) cursor states, which also complicate the error handling code. Imagine the consumer performs this on an empty store,

cursor.get(op=MDB_NEXT);
cursor.iter().next();

One way to fix this is to have iterators own the underlying cursors, and let the Store(s) be responsible for creating those iterators. This also requires a change on the dup iterators, which currently return a regular iterator on each next call. If we'd like to keep this flexibility other than flatten everything, then it will need some kind of sharing of that cursor between iterators.

Would love to hear your thoughts? @mykmelez

@mykmelez
Copy link

One way to fix this is to have iterators own the underlying cursors, and let the Store(s) be responsible for creating those iterators.

I like this idea, especially for rkv, which shouldn't be exposing footguns like lmdb::Cursor. I'm less sure about lmdb-rs itself, where it seems useful to expose the full functionality of LMDB, including cursor operations, even if we can't make them entirely safe.

This also requires a change on the dup iterators, which currently return a regular iterator on each next call. If we'd like to keep this flexibility other than flatten everything, then it will need some kind of sharing of that cursor between iterators.

If we can, I'd preserve the current semantics of IterDup, which seem intuitive and ergonomic. There's discussion in danburkert#32 about using a StreamingIterator, which might work.

@ncloudioj
Copy link
Member Author

If we can, I'd preserve the current semantics of IterDup, which seem intuitive and ergonomic. There's discussion in danburkert#32 about using a StreamingIterator, which might work.

Sounds good.

I'd like to put this proposal on hold for following reasons:

  • Rkv has already partially fixed it by bundling up Cursor and Iter/DupIter, we can defuse the footgun by concealing those low level modules in rkv
  • This change would further diverge this fork against the upstream, which could be undesired at the moment, we can re-visit it in the future if necessary

@ncloudioj ncloudioj added the enhancement New feature or request label Jan 23, 2019
@mykmelez
Copy link

Ok, sounds good, let's put this on hold for now!

@oersted
Copy link

oersted commented Sep 21, 2022

I faced issues with this when trying to return an Iterator wrapping Iter. From my tests, it's enough to keep Transaction alive along with Iter, letting the Cursor drop doesn't seem to cause panics (but may be unsafe?).

The best solution I came up with is to use ouroboros to define a self-referential struct. Like this:

#[self_referencing]
pub struct Wrapper<'a> {
    txn: RoTransaction<'a>,
    #[borrows(txn)]
    iter: Iter<'a>,
}

impl<'a> Answers<'a> {
    fn iter(key: &[u8], txn: RoTransaction<'a>, db: Database) -> lmdb::Result<Self> {
        Wrapper::try_new(txn, |txn| {
            let mut cursor = txn.open_ro_cursor(db)?;
            let iter = cursor.iter_dup_of(&key);
            Ok(iter)
        })
    }
}

Is this correct? Am I taking any risks here? Is there a better way?

Regardless, it took me a while to arrive at this solution, for what I expect is a relatively common use case. This rather diminishes the "idiomatic and safe API" headline for the crate. It would be good to document this or at least give an example.

I assume there's a good reason for not doing it, but it would be natural to have an API that takes the Transaction and Cursor as self, such that Iter owns its Transaction and has the same lifetime. I understand this would prevent performing any other operations afterwards within the Transaction though.

If Iter at least kept a reference to the Transaction, the compiler wouldn't allow a user to keep one alive without the other.

There's discussion in danburkert#32 about using a StreamingIterator, which might work.

By the way, as far as I understand GATs have recently been stabilized in nightly, so a clean StreamingIterator is now possible. Might be good to look at this again.

@oersted
Copy link

oersted commented Sep 21, 2022

I've been attempting to include the Cursor in the struct so that it's not dropped, but I'm running into all kinds of trouble with ouroboros, I can't seem to make something like this work.

#[self_referencing]
pub struct Wrapper<'a> {
    txn: RoTransaction<'a>,
    #[borrows(txn)]
    cursor: RoCursor<'a>,
    #[borrows(mut cursor)]
    iter: Iter<'a>,
}

I'd welcome any suggestions, there must be an easier way to do this than to use self-referential types? I've tried a few variants with Arc too with no success.

@oersted
Copy link

oersted commented Sep 22, 2022

Yeah I am getting STATUS_ACCESS_VIOLATION in some cases (iterating on multiple values for the same key, with DUP_SORT on). So much for the API being safe... I don't want to sound indignant, let me know what I should do to help. But these limitations should be at least documented and marked as unsafe methods until this is sorted.

@cknowles-admin cknowles-admin added the ARCHIVED CLOSED at time of archiving label Sep 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
ARCHIVED CLOSED at time of archiving enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants