You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a personal preference here for using the LOD versions of the column names. For example, use htBibUrl rather than ht_bib_url. The rationale is that as of EF 2.0, these are LOD identifiers, not just column names.
Would be nice to have an option--or even default--to use those.
Related is whether the serial number within books should be called seq (as in the json-ld) or renamed page (as in HTRCFR).
I hear that the Google METS data may be leaving Hathi, which opens up the possibility that actual page numbers (like, the numbers on the corners of the book) might get out at some point.
I have no idea why seq is a string like '000000001' instead of an integer.
One option would be to use the original LOD names internally, and to move the renaming from the json parsing to the last handoff. This way old pandas code would keep working, but raw representations could use the LOD names.
The text was updated successfully, but these errors were encountered:
I understand your motivation for camelCase, but it doesn't seem like a strong enough case to justify the work and potential compatibility issues associated with a deviation from the original design decisions.
Regarding seq, that's a question for @borice. I often cast to int, but there was some reason lost to my memory as to why it's a string to begin with.
Agreed on full PEP compliance: my thought though is that these aren't actually method names or variables. Or is your thinking that because pandas columns are often accessed with syntax like df.ht_bib_url, the PEP 8 rules should apply? (I assume that there must be some pandas-specific conventions out there).
I am thinking that if there are underpowered arrow methods, those would return the original linked data names, while pandas frames would return the PEP compliant names preserving back-compatibility. This of course would make it slightly harder to turn old pandas code into new arrow code.
I have a personal preference here for using the LOD versions of the column names. For example, use
htBibUrl
rather thanht_bib_url
. The rationale is that as of EF 2.0, these are LOD identifiers, not just column names.Would be nice to have an option--or even default--to use those.
Related is whether the serial number within books should be called
seq
(as in the json-ld) or renamedpage
(as in HTRCFR).I hear that the Google METS data may be leaving Hathi, which opens up the possibility that actual page numbers (like, the numbers on the corners of the book) might get out at some point.
I have no idea why
seq
is a string like '000000001' instead of an integer.One option would be to use the original LOD names internally, and to move the renaming from the json parsing to the last handoff. This way old pandas code would keep working, but raw representations could use the LOD names.
The text was updated successfully, but these errors were encountered: