Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add constant time Series/Dataframe.count(), trivial implementations of iloc(integer) #176

Open
nevion opened this issue Dec 26, 2024 · 7 comments

Comments

@nevion
Copy link

nevion commented Dec 26, 2024

count requires O(N) and at(offset) - the most frequent type of index - does hashmap lookups.

consider an O(N) for loop becomes O(N^2) and the performance of this sort of loop could have.... The random places in practice one needs the shape of a dataframe are high and shouldn't incur computation or heavy operations.

function visitEveryRow( visitor )
for(let i = 0; i < df.count(); ++i){
   visitor(df.iloc[i])
}
@nevion nevion changed the title add constant time Series/Dataframe.count(), trivial implementations of iat(integer) add constant time Series/Dataframe.count(), trivial implementations of iloc(integer) Dec 26, 2024
@ashleydavis
Copy link
Member

@nevion thanks for you feedback. Are you able to propose a solution?

@nevion
Copy link
Author

nevion commented Dec 29, 2024

return this.getContent().values.length, similiar for Series.

iloc(index: number) : this.getContent().values[index]

also noticed dtypes aren't stored as part of the context - these should be preserved for 0 length data frames

@ashleydavis
Copy link
Member

Thanks @nevion, are you up for getting that in a PR?

@nevion
Copy link
Author

nevion commented Jan 2, 2025

I think one thing that needs evaluation is remove allowing deferred values for construction which is something for instance pandas doesn't do and which is satisfiable by another strategy which is user holding a promise around a dataframe / using an async expression , not having the dataframe deal with that internally. This would remove the checking for deferred values inside the dataframe and allow the trivial implementations above which optimize out well and simplify + optimize to the common case. Wdyt?

@ashleydavis
Copy link
Member

I think it could be a good idea, but I'm worried removing something could break existing code. I suppose you could implement it so that the deferred values are evaluated immediately - would that still result in the desired simplification?

@nevion
Copy link
Author

nevion commented Jan 2, 2025

yea I think so, test if it's a generator vs an array and if generator eval in constructor

@ashleydavis
Copy link
Member

Sounds like it shouldn't be too hard. Do you want to try it and submit a PR? For the first step (evaluating the lazy argument in the constructor) shouldn't require any new tests... just need to keep existing tests passing and try to make sure the interface to the user is the same.

No hurry though... take your time with it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants