add constant time Series/Dataframe.count(), trivial implementations of iloc(integer) #176

nevion · 2024-12-26T15:01:45Z

count requires O(N) and at(offset) - the most frequent type of index - does hashmap lookups.

consider an O(N) for loop becomes O(N^2) and the performance of this sort of loop could have.... The random places in practice one needs the shape of a dataframe are high and shouldn't incur computation or heavy operations.

function visitEveryRow( visitor )
for(let i = 0; i < df.count(); ++i){
   visitor(df.iloc[i])
}

The text was updated successfully, but these errors were encountered:

ashleydavis · 2024-12-29T05:37:47Z

@nevion thanks for you feedback. Are you able to propose a solution?

nevion · 2024-12-29T09:51:18Z

return this.getContent().values.length, similiar for Series.

iloc(index: number) : this.getContent().values[index]

also noticed dtypes aren't stored as part of the context - these should be preserved for 0 length data frames

ashleydavis · 2025-01-02T00:18:37Z

Thanks @nevion, are you up for getting that in a PR?

nevion · 2025-01-02T00:34:10Z

I think one thing that needs evaluation is remove allowing deferred values for construction which is something for instance pandas doesn't do and which is satisfiable by another strategy which is user holding a promise around a dataframe / using an async expression , not having the dataframe deal with that internally. This would remove the checking for deferred values inside the dataframe and allow the trivial implementations above which optimize out well and simplify + optimize to the common case. Wdyt?

ashleydavis · 2025-01-02T00:36:41Z

I think it could be a good idea, but I'm worried removing something could break existing code. I suppose you could implement it so that the deferred values are evaluated immediately - would that still result in the desired simplification?

nevion · 2025-01-02T00:49:46Z

yea I think so, test if it's a generator vs an array and if generator eval in constructor

ashleydavis · 2025-01-02T01:37:52Z

Sounds like it shouldn't be too hard. Do you want to try it and submit a PR? For the first step (evaluating the lazy argument in the constructor) shouldn't require any new tests... just need to keep existing tests passing and try to make sure the interface to the user is the same.

No hurry though... take your time with it.

nevion changed the title ~~add constant time Series/Dataframe.count(), trivial implementations of iat(integer)~~ add constant time Series/Dataframe.count(), trivial implementations of iloc(integer) Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add constant time Series/Dataframe.count(), trivial implementations of iloc(integer) #176

add constant time Series/Dataframe.count(), trivial implementations of iloc(integer) #176

nevion commented Dec 26, 2024 •

edited

Loading

ashleydavis commented Dec 29, 2024

nevion commented Dec 29, 2024 •

edited

Loading

ashleydavis commented Jan 2, 2025

nevion commented Jan 2, 2025

ashleydavis commented Jan 2, 2025

nevion commented Jan 2, 2025

ashleydavis commented Jan 2, 2025

add constant time Series/Dataframe.count(), trivial implementations of iloc(integer) #176

add constant time Series/Dataframe.count(), trivial implementations of iloc(integer) #176

Comments

nevion commented Dec 26, 2024 • edited Loading

ashleydavis commented Dec 29, 2024

nevion commented Dec 29, 2024 • edited Loading

ashleydavis commented Jan 2, 2025

nevion commented Jan 2, 2025

ashleydavis commented Jan 2, 2025

nevion commented Jan 2, 2025

ashleydavis commented Jan 2, 2025

nevion commented Dec 26, 2024 •

edited

Loading

nevion commented Dec 29, 2024 •

edited

Loading