Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow retaining JSON-LD column names #47

Open
bmschmidt opened this issue Apr 25, 2022 · 2 comments
Open

Allow retaining JSON-LD column names #47

bmschmidt opened this issue Apr 25, 2022 · 2 comments

Comments

@bmschmidt
Copy link
Contributor

I have a personal preference here for using the LOD versions of the column names. For example, use htBibUrl rather than ht_bib_url. The rationale is that as of EF 2.0, these are LOD identifiers, not just column names.

Would be nice to have an option--or even default--to use those.

Related is whether the serial number within books should be called seq (as in the json-ld) or renamed page (as in HTRCFR).
I hear that the Google METS data may be leaving Hathi, which opens up the possibility that actual page numbers (like, the numbers on the corners of the book) might get out at some point.

I have no idea why seq is a string like '000000001' instead of an integer.

One option would be to use the original LOD names internally, and to move the renaming from the json parsing to the last handoff. This way old pandas code would keep working, but raw representations could use the LOD names.

@organisciak
Copy link
Collaborator

organisciak commented Apr 25, 2022

The intent of this library is Python scaffolding for working with EF. Part of that is following Python convention, including PEP 8, which expects lower_case_with_underscores for method names and variables. https://peps.python.org/pep-0008/#method-names-and-instance-variables

I understand your motivation for camelCase, but it doesn't seem like a strong enough case to justify the work and potential compatibility issues associated with a deviation from the original design decisions.

Regarding seq, that's a question for @borice. I often cast to int, but there was some reason lost to my memory as to why it's a string to begin with.

@bmschmidt
Copy link
Contributor Author

Agreed on full PEP compliance: my thought though is that these aren't actually method names or variables. Or is your thinking that because pandas columns are often accessed with syntax like df.ht_bib_url, the PEP 8 rules should apply? (I assume that there must be some pandas-specific conventions out there).

I am thinking that if there are underpowered arrow methods, those would return the original linked data names, while pandas frames would return the PEP compliant names preserving back-compatibility. This of course would make it slightly harder to turn old pandas code into new arrow code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants