Skip to content
This repository has been archived by the owner on Mar 31, 2019. It is now read-only.

nested types in pandas #9

Open
martindurant opened this issue Aug 7, 2018 · 2 comments
Open

nested types in pandas #9

martindurant opened this issue Aug 7, 2018 · 2 comments

Comments

@martindurant
Copy link

How difficult would it be to have nested oamap structures as a column in pandas using the extension types interface? I could see that as being a nice win-both-ways of the normal pandas tabular analysis and descending into the nested structures with fast numba-jit.

As a side issue, has there been any string functionality? I.e., if a leaf node type is string, is there anything that you can do with that within a numba function?

@jpivarski
Copy link
Member

Thanks for pointing this out— I didn't know that Pandas has extension types and it would definitely be a good idea to make awkward-array aware of it. (They should be both Numba extensions and Pandas extensions.) The development will be in the awkward-array repo, though.

As for strings, I've been representing them as jagged arrays of characters (in awkward array; OAMap's terminology is a List(Primitive(uint8))).

@martindurant
Copy link
Author

Yes, I didn't know where things stood with awkward versus oamap. Actually operating on the strings may be problematic, however, given numba limitations. The rust string (utf8) API is surprisingly complete and maybe would be the best thing to leverage for functions like startswith, replace or find - but now you need to care about creating new arrays within the jit-function, rather than just applying logic and aggregating.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants