|
| 1 | + |
| 2 | +semantic_standardization |
| 3 | +========================== |
| 4 | + |
| 5 | +This project is currently a POC exploring the standardization of terms using the vocabulary `Istat-Classificazione-08-Territorio` and the ontology `CLV-AP_IT`. |
| 6 | +Currently the component are designed to use an in-memory storage of only those ontology and vocabulary, but the component can be extended to act in a similar way for different use cases and ontology/vocabulary couples. |
| 7 | + |
| 8 | +Two endpoints are provided: |
| 9 | + |
| 10 | +1. the first one retrieves a flat representation of a vocabulary (conceptually similar to a CSV, but in JSON), using an ad-hoc SPARQL query. |
| 11 | +2. the second on expose a list of properties actually used in a vocabulary from an ontology, returning the "local" hierarchy for each property. |
| 12 | + |
| 13 | +The idea is that each endpoint (and its configured queries) acts for a very specific domain, so the next versions could introduce new vocabularies and ontologies, but needs to create ad-hoc SPARQL queries for retrieving the informations needed. |
| 14 | + |
| 15 | +## semantic annotation in DAF ingestion |
| 16 | + |
| 17 | +The [DAF](https://github.com/italia/daf) `semantic_annotation` has currently the following structure: `{ontology}.{concept}.{property}`. |
| 18 | +During the ingestion phase of datasets in DAF platform a `semantic_annotation` is used, in order to relate some column of a dataset to the most appropriate property of a given existing concept, from the controlled vocabularies. |
| 19 | + |
| 20 | +**Note** that while the annotation is used to relate cells with vocabularies, it does not save explicitly a reference to the vocabularies used. A reference to concept from an ontology is used instead. |
| 21 | + |
| 22 | + |
| 23 | +## examples |
| 24 | + |
| 25 | + |
| 26 | +### example: sequence of calls |
| 27 | + |
| 28 | +1. retrieves (vocabulary,ontology) reference from semantic_annotation tag |
| 29 | +``` |
| 30 | +curl -X GET http://localhost:9000/kb/v1/daf/annotation/lookup?semantic_annotation=POI-AP_IT.PointOfInterestCategory.POIcategoryIdentifier -H "accept: application/json" -H "content-type: application/json" |
| 31 | +``` |
| 32 | + |
| 33 | +2. retrieves the hierarchies for a given property |
| 34 | +``` |
| 35 | +curl -X GET http://localhost:9000/kb/v1/hierarchies/properties?vocabulary_name=POICategoryClassification&ontology_name=poiapit&lang=it -H "accept: application/json" -H "content-type: application/json" |
| 36 | +``` |
| 37 | + |
| 38 | +3. retrieves the dataset values for a certain vocaulary |
| 39 | +``` |
| 40 | +curl -X GET http://localhost:9000/kb/v1/vocabularies/POICategoryClassification?lang=it -H "accept: application/json" -H "content-type: application/json" |
| 41 | +``` |
| 42 | + |
| 43 | +---- |
| 44 | + |
| 45 | +### example: retrieves informations from the semantic_annotation tag |
| 46 | +With this endpoint we can retrieve informations about the vocabulary/ontology pair related to a given `semantic_annotation` tag: |
| 47 | + |
| 48 | +``` |
| 49 | +curl -X GET http://localhost:9000/kb/v1/daf/annotation/lookup?semantic_annotation={semantic_annotation} \ |
| 50 | +-H "accept: application/json" -H "content-type: application/json" |
| 51 | +``` |
| 52 | + |
| 53 | +for example, for the Point Of Interest vocabulary: |
| 54 | + |
| 55 | +``` |
| 56 | +curl -X GET 'http://localhost:9000/kb/v1/daf/annotation/lookup?semantic_annotation=POI-AP_IT.PointOfInterestCategory.POIcategoryIdentifier' \ |
| 57 | +-H "accept: application/json" -H "content-type: application/json" |
| 58 | +``` |
| 59 | + |
| 60 | +This will return a datastructure similar to the following one for each tag: |
| 61 | + |
| 62 | +``` |
| 63 | +[ |
| 64 | + { |
| 65 | + "vocabulary_id": "POICategoryClassification", |
| 66 | + "vocabulary": "http://dati.gov.it/onto/controlledvocabulary/POICategoryClassification", |
| 67 | + "ontology": "http://dati.gov.it/onto/poiapit", |
| 68 | + "semantic_annotation": "POI-AP_IT.PointOfInterestCategory.POIcategoryIdentifier", |
| 69 | + "property_id": "POIcategoryIdentifier", |
| 70 | + "concept_id": "PointOfInterestCategory", |
| 71 | + "ontology_prefix": "poiapit", |
| 72 | + "ontology_id": "POI-AP_IT", |
| 73 | + "concept": "http://dati.gov.it/onto/poiapit#PointOfInterestCategory", |
| 74 | + "property": "http://dati.gov.it/onto/poiapit#POIcategoryIdentifier" |
| 75 | + } |
| 76 | +] |
| 77 | +``` |
| 78 | + |
| 79 | +the idea is to be able to have as much informations as possible to eventually relate the annotation to ontologies and vocabularies. |
| 80 | + |
| 81 | + |
| 82 | +### example: retrieving a vocabulary dataset |
| 83 | + |
| 84 | +We can obtain a de-normalized, tabular version of the vocabulary `Istat-Classificazione-08-Territorio` using the curl call: |
| 85 | + |
| 86 | +``` |
| 87 | +curl -X GET http://localhost:9000/kb/v1/hierarchies/properties?vocabulary_name={vocabulary_name}&ontology_name={ontology_prefix}&lang={lang} \ |
| 88 | +-H "accept: application/json" -H "content-type: application/json" |
| 89 | +``` |
| 90 | + |
| 91 | +A `SPARQL` query is used to create a proper tabular representation of the data. |
| 92 | + |
| 93 | +#### example: PontOfInterest / POI_AP-IT |
| 94 | + |
| 95 | +``` |
| 96 | +curl -X GET http://localhost:9000/kb/v1/hierarchies/properties?vocabulary_name=POICategoryClassification&ontology_name=poiapit&lang=it -H "accept: application/json" -H "content-type: application/json" |
| 97 | +``` |
| 98 | + |
| 99 | +this will return a data structure: |
| 100 | + |
| 101 | +``` |
| 102 | +[ |
| 103 | + { |
| 104 | + "vocabulary": "POI-AP_IT", |
| 105 | + "path": "POI-AP_IT.PointOfInterestCategory.definition", |
| 106 | + "hierarchy_flat": "PointOfInterestCategory", |
| 107 | + "hierarchy": [ |
| 108 | + { |
| 109 | + "class": "PointOfInterestCategory", |
| 110 | + "level": 0 |
| 111 | + } |
| 112 | + ] |
| 113 | + }, |
| 114 | + ... |
| 115 | +] |
| 116 | +``` |
| 117 | + |
| 118 | + |
| 119 | +#### example: Luoghi Istat / CLV_AP-IT |
| 120 | +``` |
| 121 | +$ curl -X GET "http://localhost:9000/kb/v1/vocabularies/Istat-Classificazione-08-Territorio?lang=it" -H "accept: application/json" -H "content-type: application/json" |
| 122 | +``` |
| 123 | +this will return a result structure similar to the following one: |
| 124 | + |
| 125 | +``` |
| 126 | +[ |
| 127 | + [ |
| 128 | + { "key": "CLV-AP_IT_Country_name", "value": "Italia"}, |
| 129 | + {"key": "CLV-AP_IT_City_name", "value": "Abano Terme"}, |
| 130 | + {"key": "CLV-AP_IT_Province_name", "value": "Padova"}, |
| 131 | + {"key": "CLV-AP_IT_Region_name", "value": "Veneto"} |
| 132 | + ], |
| 133 | + [ |
| 134 | + {"key":"CLV-AP_IT_Province_name", "value": "Lodi"}, |
| 135 | + {"key":"CLV-AP_IT_City_name", "value": "Abbadia Cerreto"}, |
| 136 | + {"key": "CLV-AP_IT_Country_name", "value": "Italia"}, |
| 137 | + {"key": "CLV-AP_IT_Region_name", "value": "Lombardia"} |
| 138 | + ] |
| 139 | + ... |
| 140 | +] |
| 141 | +``` |
| 142 | + |
| 143 | +For technical reason, currently a value of `CLV-AP_IT_Region_name` is used in place of `CLV-AP_IT.Region.name`. |
| 144 | + |
| 145 | +### example: retrieve the hierarchies for the properties used |
| 146 | + |
| 147 | +If we have the example vocabulary `Istat-Classificazione-08-Territorio`, which uses terms from the ontology `clvapit`, we can retrieve the local hierarchy associated to each property with the curl command: |
| 148 | + |
| 149 | +``` |
| 150 | +$ curl -X GET http://localhost:9000/kb/v1/hierarchies/properties?vocabulary_name={vocabulary_name}&ontology_name={ontology_prefix}&lang={lang} \ |
| 151 | +-H "accept: application/json" -H "content-type: application/json" |
| 152 | +``` |
| 153 | + |
| 154 | +#### example: POI / POI_AP-IT |
| 155 | + |
| 156 | +``` |
| 157 | +curl -X GET http://localhost:9000/kb/v1/vocabularies/POICategoryClassification?lang=it \ |
| 158 | +-H "accept: application/json" -H "content-type: application/json" |
| 159 | +``` |
| 160 | + |
| 161 | +which will return results: |
| 162 | + |
| 163 | +``` |
| 164 | +[ |
| 165 | + [ |
| 166 | + { |
| 167 | + "key": "POI-AP_IT_PointOfInterestCategory_definition", |
| 168 | + "value": "Rientrano in questa categoria tutti i punti di interesse connessi all'intrattenimento come zoo, discoteche, pub, teatri, acquari, stadi, casino, parchi divertimenti, ecc." |
| 169 | + }, |
| 170 | + { |
| 171 | + "key": "POI-AP_IT_PointOfInterestCategory_POICategoryName", |
| 172 | + "value": "Settore intrattenimento" |
| 173 | + }, |
| 174 | + { |
| 175 | + "key": "POI-AP_IT_PointOfInterestCategory_POICategoryIdentifier", |
| 176 | + "value": "cat_1" |
| 177 | + } |
| 178 | + ], |
| 179 | + ... |
| 180 | +] |
| 181 | +``` |
| 182 | + |
| 183 | + |
| 184 | +#### example: Luoghi Istat / CLV_AP-IT |
| 185 | + |
| 186 | +``` |
| 187 | +$ curl -X GET http://localhost:9000/kb/v1/hierarchies/properties?vocabulary_name=Istat-Classificazione-08-Territorio&ontology_name=clvapit&lang=it \ |
| 188 | +-H "accept: application/json" -H "content-type: application/json" |
| 189 | +``` |
| 190 | + |
| 191 | +which will return the results: |
| 192 | + |
| 193 | +``` |
| 194 | +[ |
| 195 | + { |
| 196 | + "vocabulary": "CLV-AP_IT", |
| 197 | + "path": "CLV-AP_IT.Country.name", |
| 198 | + "hierarchy_flat": "Country", |
| 199 | + "hierarchy": "hierarchy" |
| 200 | + }, |
| 201 | + { |
| 202 | + "vocabulary": "CLV-AP_IT", |
| 203 | + "path": "CLV-AP_IT.City.name", |
| 204 | + "hierarchy_flat": "Country.Region.Province.City", |
| 205 | + "hierarchy": "hierarchy" |
| 206 | + } |
| 207 | + ... |
| 208 | +] |
| 209 | +``` |
| 210 | + |
| 211 | + |
| 212 | +### example configurations |
| 213 | + |
| 214 | +An example configuration for working with a vocabulary (VocabularyAPI): |
| 215 | + |
| 216 | +``` |
| 217 | +"data_dir": "./data" |
| 218 | +
|
| 219 | +"Istat-Classificazione-08-Territorio" { |
| 220 | +
|
| 221 | + vocabulary.name: "Istat-Classificazione-08-Territorio" |
| 222 | + |
| 223 | + vocabulary.ontology.name: "CLV-AP_IT" |
| 224 | + |
| 225 | + vocabulary.ontology.prefix: "clvapit" |
| 226 | + |
| 227 | + vocabulary.file: ${data_dir}"/vocabularies/Istat-Classificazione-08-Territorio.ttl" |
| 228 | + |
| 229 | + vocabulary.contexts: [ "http://dati.gov.it/onto/clvapit#" ] |
| 230 | + |
| 231 | + vocabulary.query.csv: ${data_dir}"/vocabularies/Istat-Classificazione-08-Territorio#dataset.csv.sparql" |
| 232 | + |
| 233 | +} |
| 234 | +``` |
| 235 | + |
| 236 | +The `vocabulary.query.csv` is a reference to a SPARQL query designed to produce a flat representation of the vocabulary informations. |
| 237 | + |
| 238 | + |
| 239 | +An example configuration for working with an ontology (OntologyAPI) could be similar to the following one: |
| 240 | + |
| 241 | +``` |
| 242 | +clvapit { |
| 243 | + |
| 244 | + ontology.name: "CLV-AP_IT" |
| 245 | + ontology.prefix: "clvapit" |
| 246 | + |
| 247 | + ontology.file: ${data_dir}"/ontologies/agid/CLV-AP_IT/CLV-AP_IT.ttl" |
| 248 | + |
| 249 | + ontology.contexts: [ "http://dati.gov.it/onto/clvapit#" ] |
| 250 | + |
| 251 | + ontology.query.hierarchy: ${data_dir}"/ontologies/agid/CLV-AP_IT/CLV-AP_IT.hierarchy.sparql" |
| 252 | +} |
| 253 | +``` |
| 254 | +The `ontology.query.hierarchy` is a reference to a SPARQL query designed to produce a flat representation of the vocabulary informations. |
| 255 | + |
| 256 | + |
| 257 | +* * * |
| 258 | + |
| 259 | +**Note** that the `${data_dir}` can be replaced with a specific root path on disk: at this stage of the development this will be a relative folder (for example: `/dist/data` for the sbt project). |
| 260 | + |
| 261 | +Eventually the idea of pre-loading ontologies and vocabularies from disk can be replaced with the import from a central datastore (dedicated maintain the last version of ontologies), where they are already loaded under conventional paths/names. This way we will be able to switch from an in-memory tiny repository (one for each ontology/vocabulary) to a central RDF/SPARQL repository, containing all the pre-loaded ontologies and vocabulariesl. |
| 262 | + |
| 263 | + |
| 264 | +---- |
| 265 | + |
| 266 | +## TODO |
| 267 | + |
| 268 | ++ more documentation / comments |
| 269 | ++ more proper tests |
| 270 | ++ remove redundant classes for RDFRepository, importing external kb-core dependency, instead |
| 271 | + |
| 272 | + |
| 273 | +## known ISSUES |
| 274 | + |
| 275 | +... |
0 commit comments