Skip to content

Commit 0efaa22

Browse files
committed
Merge commit '89cb8ab2623eb3e7fad06c1eda45ffa5a0d950ba' as 'semantic_standardization'
2 parents bb3c0ea + 89cb8ab commit 0efaa22

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+136506
-0
lines changed

semantic_standardization/.gitignore

+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
2+
.idea
3+
.DS_Store
4+
.history
5+
6+
# Eclipse #
7+
.classpath
8+
.project
9+
.settings/
10+
target
11+
12+
13+
# Play! #
14+
logs
15+
.swagger-codegen-ignore
16+
client/.gitignore
17+
client/.swagger-codegen-ignore
18+
client/build.gradle
19+
client/build.sbt
20+
client/git_push.sh
21+
client/gradle.properties
22+
client/gradle/
23+
client/gradlew
24+
client/gradlew.bat
25+
client/pom.xml
26+
client/settings.gradle
27+
client/src/
28+
29+
# Play! #
30+
bin/
31+
/db
32+
.eclipse
33+
/lib/
34+
/logs/
35+
/modules
36+
/project/project
37+
/project/target
38+
/target
39+
tmp/
40+
test-result
41+
server.pid
42+
*.eml
43+
#/dist/
44+
.cache
45+
.cache-main
46+
.cache-tests
47+
48+
# extra #
49+
NO__lib
50+

semantic_standardization/README.md

+275
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,275 @@
1+
2+
semantic_standardization
3+
==========================
4+
5+
This project is currently a POC exploring the standardization of terms using the vocabulary `Istat-Classificazione-08-Territorio` and the ontology `CLV-AP_IT`.
6+
Currently the component are designed to use an in-memory storage of only those ontology and vocabulary, but the component can be extended to act in a similar way for different use cases and ontology/vocabulary couples.
7+
8+
Two endpoints are provided:
9+
10+
1. the first one retrieves a flat representation of a vocabulary (conceptually similar to a CSV, but in JSON), using an ad-hoc SPARQL query.
11+
2. the second on expose a list of properties actually used in a vocabulary from an ontology, returning the "local" hierarchy for each property.
12+
13+
The idea is that each endpoint (and its configured queries) acts for a very specific domain, so the next versions could introduce new vocabularies and ontologies, but needs to create ad-hoc SPARQL queries for retrieving the informations needed.
14+
15+
## semantic annotation in DAF ingestion
16+
17+
The [DAF](https://github.com/italia/daf) `semantic_annotation` has currently the following structure: `{ontology}.{concept}.{property}`.
18+
During the ingestion phase of datasets in DAF platform a `semantic_annotation` is used, in order to relate some column of a dataset to the most appropriate property of a given existing concept, from the controlled vocabularies.
19+
20+
**Note** that while the annotation is used to relate cells with vocabularies, it does not save explicitly a reference to the vocabularies used. A reference to concept from an ontology is used instead.
21+
22+
23+
## examples
24+
25+
26+
### example: sequence of calls
27+
28+
1. retrieves (vocabulary,ontology) reference from semantic_annotation tag
29+
```
30+
curl -X GET http://localhost:9000/kb/v1/daf/annotation/lookup?semantic_annotation=POI-AP_IT.PointOfInterestCategory.POIcategoryIdentifier -H "accept: application/json" -H "content-type: application/json"
31+
```
32+
33+
2. retrieves the hierarchies for a given property
34+
```
35+
curl -X GET http://localhost:9000/kb/v1/hierarchies/properties?vocabulary_name=POICategoryClassification&ontology_name=poiapit&lang=it -H "accept: application/json" -H "content-type: application/json"
36+
```
37+
38+
3. retrieves the dataset values for a certain vocaulary
39+
```
40+
curl -X GET http://localhost:9000/kb/v1/vocabularies/POICategoryClassification?lang=it -H "accept: application/json" -H "content-type: application/json"
41+
```
42+
43+
----
44+
45+
### example: retrieves informations from the semantic_annotation tag
46+
With this endpoint we can retrieve informations about the vocabulary/ontology pair related to a given `semantic_annotation` tag:
47+
48+
```
49+
curl -X GET http://localhost:9000/kb/v1/daf/annotation/lookup?semantic_annotation={semantic_annotation} \
50+
-H "accept: application/json" -H "content-type: application/json"
51+
```
52+
53+
for example, for the Point Of Interest vocabulary:
54+
55+
```
56+
curl -X GET 'http://localhost:9000/kb/v1/daf/annotation/lookup?semantic_annotation=POI-AP_IT.PointOfInterestCategory.POIcategoryIdentifier' \
57+
-H "accept: application/json" -H "content-type: application/json"
58+
```
59+
60+
This will return a datastructure similar to the following one for each tag:
61+
62+
```
63+
[
64+
{
65+
"vocabulary_id": "POICategoryClassification",
66+
"vocabulary": "http://dati.gov.it/onto/controlledvocabulary/POICategoryClassification",
67+
"ontology": "http://dati.gov.it/onto/poiapit",
68+
"semantic_annotation": "POI-AP_IT.PointOfInterestCategory.POIcategoryIdentifier",
69+
"property_id": "POIcategoryIdentifier",
70+
"concept_id": "PointOfInterestCategory",
71+
"ontology_prefix": "poiapit",
72+
"ontology_id": "POI-AP_IT",
73+
"concept": "http://dati.gov.it/onto/poiapit#PointOfInterestCategory",
74+
"property": "http://dati.gov.it/onto/poiapit#POIcategoryIdentifier"
75+
}
76+
]
77+
```
78+
79+
the idea is to be able to have as much informations as possible to eventually relate the annotation to ontologies and vocabularies.
80+
81+
82+
### example: retrieving a vocabulary dataset
83+
84+
We can obtain a de-normalized, tabular version of the vocabulary `Istat-Classificazione-08-Territorio` using the curl call:
85+
86+
```
87+
curl -X GET http://localhost:9000/kb/v1/hierarchies/properties?vocabulary_name={vocabulary_name}&ontology_name={ontology_prefix}&lang={lang} \
88+
-H "accept: application/json" -H "content-type: application/json"
89+
```
90+
91+
A `SPARQL` query is used to create a proper tabular representation of the data.
92+
93+
#### example: PontOfInterest / POI_AP-IT
94+
95+
```
96+
curl -X GET http://localhost:9000/kb/v1/hierarchies/properties?vocabulary_name=POICategoryClassification&ontology_name=poiapit&lang=it -H "accept: application/json" -H "content-type: application/json"
97+
```
98+
99+
this will return a data structure:
100+
101+
```
102+
[
103+
{
104+
"vocabulary": "POI-AP_IT",
105+
"path": "POI-AP_IT.PointOfInterestCategory.definition",
106+
"hierarchy_flat": "PointOfInterestCategory",
107+
"hierarchy": [
108+
{
109+
"class": "PointOfInterestCategory",
110+
"level": 0
111+
}
112+
]
113+
},
114+
...
115+
]
116+
```
117+
118+
119+
#### example: Luoghi Istat / CLV_AP-IT
120+
```
121+
$ curl -X GET "http://localhost:9000/kb/v1/vocabularies/Istat-Classificazione-08-Territorio?lang=it" -H "accept: application/json" -H "content-type: application/json"
122+
```
123+
this will return a result structure similar to the following one:
124+
125+
```
126+
[
127+
[
128+
{ "key": "CLV-AP_IT_Country_name", "value": "Italia"},
129+
{"key": "CLV-AP_IT_City_name", "value": "Abano Terme"},
130+
{"key": "CLV-AP_IT_Province_name", "value": "Padova"},
131+
{"key": "CLV-AP_IT_Region_name", "value": "Veneto"}
132+
],
133+
[
134+
{"key":"CLV-AP_IT_Province_name", "value": "Lodi"},
135+
{"key":"CLV-AP_IT_City_name", "value": "Abbadia Cerreto"},
136+
{"key": "CLV-AP_IT_Country_name", "value": "Italia"},
137+
{"key": "CLV-AP_IT_Region_name", "value": "Lombardia"}
138+
]
139+
...
140+
]
141+
```
142+
143+
For technical reason, currently a value of `CLV-AP_IT_Region_name` is used in place of `CLV-AP_IT.Region.name`.
144+
145+
### example: retrieve the hierarchies for the properties used
146+
147+
If we have the example vocabulary `Istat-Classificazione-08-Territorio`, which uses terms from the ontology `clvapit`, we can retrieve the local hierarchy associated to each property with the curl command:
148+
149+
```
150+
$ curl -X GET http://localhost:9000/kb/v1/hierarchies/properties?vocabulary_name={vocabulary_name}&ontology_name={ontology_prefix}&lang={lang} \
151+
-H "accept: application/json" -H "content-type: application/json"
152+
```
153+
154+
#### example: POI / POI_AP-IT
155+
156+
```
157+
curl -X GET http://localhost:9000/kb/v1/vocabularies/POICategoryClassification?lang=it \
158+
-H "accept: application/json" -H "content-type: application/json"
159+
```
160+
161+
which will return results:
162+
163+
```
164+
[
165+
[
166+
{
167+
"key": "POI-AP_IT_PointOfInterestCategory_definition",
168+
"value": "Rientrano in questa categoria tutti i punti di interesse connessi all'intrattenimento come zoo, discoteche, pub, teatri, acquari, stadi, casino, parchi divertimenti, ecc."
169+
},
170+
{
171+
"key": "POI-AP_IT_PointOfInterestCategory_POICategoryName",
172+
"value": "Settore intrattenimento"
173+
},
174+
{
175+
"key": "POI-AP_IT_PointOfInterestCategory_POICategoryIdentifier",
176+
"value": "cat_1"
177+
}
178+
],
179+
...
180+
]
181+
```
182+
183+
184+
#### example: Luoghi Istat / CLV_AP-IT
185+
186+
```
187+
$ curl -X GET http://localhost:9000/kb/v1/hierarchies/properties?vocabulary_name=Istat-Classificazione-08-Territorio&ontology_name=clvapit&lang=it \
188+
-H "accept: application/json" -H "content-type: application/json"
189+
```
190+
191+
which will return the results:
192+
193+
```
194+
[
195+
{
196+
"vocabulary": "CLV-AP_IT",
197+
"path": "CLV-AP_IT.Country.name",
198+
"hierarchy_flat": "Country",
199+
"hierarchy": "hierarchy"
200+
},
201+
{
202+
"vocabulary": "CLV-AP_IT",
203+
"path": "CLV-AP_IT.City.name",
204+
"hierarchy_flat": "Country.Region.Province.City",
205+
"hierarchy": "hierarchy"
206+
}
207+
...
208+
]
209+
```
210+
211+
212+
### example configurations
213+
214+
An example configuration for working with a vocabulary (VocabularyAPI):
215+
216+
```
217+
"data_dir": "./data"
218+
219+
"Istat-Classificazione-08-Territorio" {
220+
221+
vocabulary.name: "Istat-Classificazione-08-Territorio"
222+
223+
vocabulary.ontology.name: "CLV-AP_IT"
224+
225+
vocabulary.ontology.prefix: "clvapit"
226+
227+
vocabulary.file: ${data_dir}"/vocabularies/Istat-Classificazione-08-Territorio.ttl"
228+
229+
vocabulary.contexts: [ "http://dati.gov.it/onto/clvapit#" ]
230+
231+
vocabulary.query.csv: ${data_dir}"/vocabularies/Istat-Classificazione-08-Territorio#dataset.csv.sparql"
232+
233+
}
234+
```
235+
236+
The `vocabulary.query.csv` is a reference to a SPARQL query designed to produce a flat representation of the vocabulary informations.
237+
238+
239+
An example configuration for working with an ontology (OntologyAPI) could be similar to the following one:
240+
241+
```
242+
clvapit {
243+
244+
ontology.name: "CLV-AP_IT"
245+
ontology.prefix: "clvapit"
246+
247+
ontology.file: ${data_dir}"/ontologies/agid/CLV-AP_IT/CLV-AP_IT.ttl"
248+
249+
ontology.contexts: [ "http://dati.gov.it/onto/clvapit#" ]
250+
251+
ontology.query.hierarchy: ${data_dir}"/ontologies/agid/CLV-AP_IT/CLV-AP_IT.hierarchy.sparql"
252+
}
253+
```
254+
The `ontology.query.hierarchy` is a reference to a SPARQL query designed to produce a flat representation of the vocabulary informations.
255+
256+
257+
* * *
258+
259+
**Note** that the `${data_dir}` can be replaced with a specific root path on disk: at this stage of the development this will be a relative folder (for example: `/dist/data` for the sbt project).
260+
261+
Eventually the idea of pre-loading ontologies and vocabularies from disk can be replaced with the import from a central datastore (dedicated maintain the last version of ontologies), where they are already loaded under conventional paths/names. This way we will be able to switch from an in-memory tiny repository (one for each ontology/vocabulary) to a central RDF/SPARQL repository, containing all the pre-loaded ontologies and vocabulariesl.
262+
263+
264+
----
265+
266+
## TODO
267+
268+
+ more documentation / comments
269+
+ more proper tests
270+
+ remove redundant classes for RDFRepository, importing external kb-core dependency, instead
271+
272+
273+
## known ISSUES
274+
275+
...
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
/*
2+
* Copyright 2017 TEAM PER LA TRASFORMAZIONE DIGITALE
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
import javax.inject._
18+
19+
import play.api.http.DefaultHttpErrorHandler
20+
import play.api._
21+
import play.api.mvc._
22+
import play.api.mvc.Results._
23+
import play.api.routing.Router
24+
25+
import scala.concurrent.Future
26+
27+
import de.zalando.play.controllers.PlayBodyParsing
28+
29+
/**
30+
* The purpose of this ErrorHandler is to override default play's error reporting with application/json content type.
31+
*/
32+
class ErrorHandler @Inject() (
33+
env: Environment,
34+
config: Configuration,
35+
sourceMapper: OptionalSourceMapper,
36+
router: Provider[Router]
37+
) extends DefaultHttpErrorHandler(env, config, sourceMapper, router) {
38+
39+
private def contentType(request: RequestHeader): String =
40+
request.acceptedTypes.map(_.toString).filterNot(_ == "text/html").headOption.getOrElse("application/json")
41+
42+
override def onProdServerError(request: RequestHeader, exception: UsefulException) = {
43+
implicit val writer = PlayBodyParsing.anyToWritable[Throwable](contentType(request))
44+
Future.successful(InternalServerError(exception))
45+
}
46+
47+
// called when a route is found, but it was not possible to bind the request parameters
48+
override def onBadRequest(request: RequestHeader, error: String): Future[Result] = {
49+
implicit val writer = PlayBodyParsing.anyToWritable[String](contentType(request))
50+
Future.successful(BadRequest("Bad Request: " + error))
51+
}
52+
53+
// 404 - page not found error
54+
override def onNotFound(request: RequestHeader, message: String): Future[Result] = {
55+
implicit val writer = PlayBodyParsing.anyToWritable[String](contentType(request))
56+
Future.successful(NotFound(request.path))
57+
}
58+
}
+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
/**
2+
* Created by ale on 06/06/17.
3+
*/
4+
import javax.inject.Inject
5+
6+
import play.api.http.DefaultHttpFilters
7+
import play.filters.cors.CORSFilter
8+
9+
class Filters @Inject() (corsFilter: CORSFilter)
10+
extends DefaultHttpFilters(corsFilter)

0 commit comments

Comments
 (0)