Skip to content

Conversation

tomkralidis
Copy link
Member

Fixes #231. Also adds early out for autodetection (first schema found).

@tomkralidis tomkralidis requested a review from pvgenuchten March 25, 2025 14:34
Copy link
Contributor

@pvgenuchten pvgenuchten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Tom,

Write support

i get impression you didn't check your implementation on https://validator.schema.org/, because it still has quite some validation issues, see below.

currently dataset type is not detected in https://validator.schema.org/
when using the validator, make sure to embed json in

<script type="application/ld+json">{}</script>

I wonder if we should use some of this work inside pycsw/pygeoapi...

noticed this on distribution
image
should be @type:'schema:dataDownload'
format or encoding can be used for the mimetype
seems the validator assumes 'type': as '@type'

Read support

I notice you also added read support, i tried with

pygeometa metadata import schema-org.json --schema schema-org -v DEBUG

and got a

WARNING:pygeometa.core:Import failed: list indices must be integers or slices, not str
null
...

when debugging

from pygeometa.schemas.schema_org import SchemaOrgOutputSchema
sos = SchemaOrgOutputSchema()
f = open("./schema-org.json", "r")
f2 = sos.import_(f.read())

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/geopython/lib/python3.10/site-packages/pygeometa-0.17.dev1-py3.10.egg/pygeometa/schemas/schema_org/__init__.py", line 116, in import_
    geo = md['spatialCoverage']['geo']
TypeError: list indices must be integers or slices, not str

would be nice this if this error is reported by the command-line client
schema-org.zip

The interesting part here is that rdf typically allows a single or a list of items as content of an element, which brings us to a next topic, seems this implementation expects a json-ld serialisation of rdf, which indeed is the most common form of schema-org. However quite some implementations of schema-org use RDF-a/microdata. In theory one can also serialise schema.org as turtle or rdf/xml. To support that case, rdflib can be used to read the rdf and serialise it to json-ld, before parsing.

after fixing the spatialcoverage, next error:

  File "/geopython/lib/python3.10/site-packages/pygeometa-0.17.dev1-py3.10.egg/pygeometa/schemas/schema_org/__init__.py", line 123, in import_
    mcf['spatial']['datatype'] = 'vector'
KeyError: 'spatial'

seems the datatype is set before spatial is initialized

@jmckenna
Copy link
Member

seconded, good work, but testing through https://validator.schema.org/ is critical (I wish there was an API available to validate, instead of manually testing through the validator).

@pvgenuchten
Copy link
Contributor

Validator is not available as a service, but a shacl oriented test is available at https://github.com/google/schemarama/blob/main/core/test/shacl-test.js

@tomkralidis
Copy link
Member Author

@jmckenna @pvgenuchten I dusted off this PR and pushed some updates:

import: successful import from schema-org/JSON-LD (sample) to MCF, as well as examples (thanks @pvgenuchten) in:

export: successful export of sample.yml and validated against https://validator.schema.org/ (0 errors, 0 warnings).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add schema.org schema

3 participants