-
Notifications
You must be signed in to change notification settings - Fork 50
add support for schema.org #263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work Tom,
Write support
i get impression you didn't check your implementation on https://validator.schema.org/, because it still has quite some validation issues, see below.
currently dataset type is not detected in https://validator.schema.org/
when using the validator, make sure to embed json in
I wonder if we should use some of this work inside pycsw/pygeoapi...
noticed this on distribution
should be @type:'schema:dataDownload'
format or encoding can be used for the mimetype
seems the validator assumes 'type': as '@type'
Read support
I notice you also added read support, i tried with
pygeometa metadata import schema-org.json --schema schema-org -v DEBUG
and got a
WARNING:pygeometa.core:Import failed: list indices must be integers or slices, not str
null
...
when debugging
from pygeometa.schemas.schema_org import SchemaOrgOutputSchema
sos = SchemaOrgOutputSchema()
f = open("./schema-org.json", "r")
f2 = sos.import_(f.read())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/geopython/lib/python3.10/site-packages/pygeometa-0.17.dev1-py3.10.egg/pygeometa/schemas/schema_org/__init__.py", line 116, in import_
geo = md['spatialCoverage']['geo']
TypeError: list indices must be integers or slices, not str
would be nice this if this error is reported by the command-line client
schema-org.zip
The interesting part here is that rdf typically allows a single or a list of items as content of an element, which brings us to a next topic, seems this implementation expects a json-ld serialisation of rdf, which indeed is the most common form of schema-org. However quite some implementations of schema-org use RDF-a/microdata. In theory one can also serialise schema.org as turtle or rdf/xml. To support that case, rdflib can be used to read the rdf and serialise it to json-ld, before parsing.
after fixing the spatialcoverage, next error:
File "/geopython/lib/python3.10/site-packages/pygeometa-0.17.dev1-py3.10.egg/pygeometa/schemas/schema_org/__init__.py", line 123, in import_
mcf['spatial']['datatype'] = 'vector'
KeyError: 'spatial'
seems the datatype is set before spatial is initialized
seconded, good work, but testing through https://validator.schema.org/ is critical (I wish there was an API available to validate, instead of manually testing through the validator). |
Validator is not available as a service, but a shacl oriented test is available at https://github.com/google/schemarama/blob/main/core/test/shacl-test.js |
* fix export to schem-org * Update __init__.py * Update __init__.py * Update __init__.py --------- Co-authored-by: Tom Kralidis <[email protected]>
575a1f3
to
55be744
Compare
@jmckenna @pvgenuchten I dusted off this PR and pushed some updates: import: successful import from schema-org/JSON-LD (sample) to MCF, as well as examples (thanks @pvgenuchten) in:
export: successful export of |
Fixes #231. Also adds early out for autodetection (first schema found).