-
Notifications
You must be signed in to change notification settings - Fork 1
Tutorial
Have you ever wondered what's behind the closed doors of the museum? What's hidden away in the storerooms? Where all the objects come from and what they're made of? With only 10% of the Museum’s objects on display, how can we answer these broader questions about the collection?
In a landmark Future Museum project we have opened up our catalogue so that you can begin to explore and investigate 163 years of your Museum’s collecting history.
Auckland Museum’s Collections Online celebrated its first anniversary last year. Now, researchers, educators and the public have free and open access to around one million collection records. We have released the collection using the principles of Linked Open Data, which means that, not only is it available on our museum website, it can also be searched via any number of specialist aggregators and applications.
For the past 160 years, our Curators, Collection Managers, and volunteers have been creating descriptions, classifications and taxonomies — meaning we have amassed a huge amount of data about the objects in our care.
One of the aims of Future Museum is to open up our collections and engage with online communities. The public API, which allows detailed open access to collection data, is a major part of our 'open-first' approach that fulfils this aim.
An Application Programming Interface is essentially a set of instructions that tells two pieces of software how to communicate with one other — allowing you to build an app or integrate your website into another, for example.
If you have used Collections Online on our website, then you've already seen the API in action. When you put in keywords and hit enter, your browser makes a request to the API, which delivers the data as search results. With the API, you can bypass the website and send requests directly to the database that runs the website — particularly useful if you're a researcher or app developer; seeing the raw data can be very useful for a variety of reasons.
Our API provides responses using the JSON data standard. In order to view the data that the API sends back, you'll need some way to look at JSON-formatted data. Most modern browsers — i.e., Chrome, Safari, and Firefox — have extensions that can do this for you.
The most basic form of search API is the empty search, which doesn’t specify any query but simply returns all records: http://api.aucklandmuseum.com/search/_search
We could also specify which index we want to search over:
api.aucklandmuseum.com/search/{index}
/_search
collectionsonline
index to perform searches over all Collections data.
http://api.aucklandmuseum.com/search/collectionsonline/_search
cenotaph
index to perform searches over just Cenotaph data.
http://api.aucklandmuseum.com/search/cenotaph/_search
By default, a search will only return the first 10 results. We learn how to change this here
Next, let’s try searching the all text fields for the word "cat". To do this, we’ll use a lightweight search method that is easy to use. This method is often referred to as a query-string search, since we pass the search as a URL query-string parameter.
We use the same _search
endpoint in the path, and we add the query itself in a q=
parameter.
api.aucklandmuseum.com/search/_search?q=
http://api.aucklandmuseum.com/search/_search?q=cat
or with a specified index
api.aucklandmuseum.com/search/{index}
/_search?q=
http://api.aucklandmuseum.com/search/collectionsonline/_search?q=cat
#Decoding the results
At the top of your results you see that the query was successful - the hits
section shows the total number of records that matched our search query. Each record is also given a relevance _score
, which is a measure of how well the document matches the query. By default, results are returned with the most relevant documents first. The max_score
is the highest _score
of any document that matches our query.
},
"hits": {
"total": 416,
"max_score": 4.5924325,
"hits": [
Below this section are the first 10 hits
"_index": "collectionsonline-2016-10-18-1",
"_type": "ecrm:E20_Biological_Object",
"_id": "http://api.aucklandmuseum.com/id/naturalsciences/object/261552",
"_score": 1.0,
"_source": {
The index
specifies which index the results is from, the _type
is the high level categorisation of the object. We have six top categories:
_Type | usage |
---|---|
ecrm:E20_Biological_Object | Objects from the Natural Science Collection |
ecrm:E22_Man-Made_Object | 3D Objects from the Human History Collection |
ecrm:E84_Information_Carrier | 2D Objects from the Documentary Heritage Collection |
am:MilitaryPerson | Online Cenotaph Records - Military records New Zealanders |
ecrm:E21_Person | Non Cenotaph Person records (Field Collectors, Artists, Creators etc.) |
am:Corporation | Non-Cenotaph Corporation Records |
am:vessel | Ships and transport vessels associated with Cenotaph |
The _id
is the unique reference for the object - you can following these links to view the full record page. http://api.aucklandmuseum.com/id/naturalsciences/object/261552
#Query string syntax
syntax | usage | Example | link |
---|---|---|---|
- | Must not be present | Search for ice axe and not Hillary | http://api.aucklandmuseum.com/search/collectionsonline/_search?q=ice axe -hillary |
+ | Must be present | Search for Hillary, Nepal must be present | http://api.aucklandmuseum.com/search/collectionsonline/_search?q=hillary +nepal |
OR | Contains either search term | Search for both Hillary and Tenzing | http://api.aucklandmuseum.com/search/collectionsonline/_search?q=hillary or tenzing |
_missing_ | Field has no value | Results missing a Title field | http://api.aucklandmuseum.com/search/collectionsonline/_search?q=_missing_:dc_title |
_exists_ | Field must have a value | Results must contain the language field | http://api.aucklandmuseum.com/search/collectionsonline/_search?q=_exists_:language |
Due to the number of Person
records in the system you may wish to add the following to the end of any general queries:
-type=am_MilitaryPerson -type:ecrm_E21
#Wildcards and Fuzziness
syntax | usage | Example |
---|---|---|
? | replace a single character | http://api.aucklandmuseum.com/search/collectionsonline/_search?q=auc?land |
* | replace zero or more characters | http://api.aucklandmuseum.com/search/collectionsonline/_search?q=auck* |
~ | search for terms that are similar to, but not exactly like our search terms | http://api.aucklandmuseum.com/search/collectionsonline/_search?q=aptery~1 |
#Simple Range Searches
Inclusive ranges are specified with square brackets [min TO max]
dc_date:[2012-01-01 TO 2012-12-31]
http://api.aucklandmuseum.com/search/collectionsonline/_search?q=dc_date:[1950-01-01 TO 2012-01-01]
#Boosting
We can use the boost operator ^
to make one term more relevant than another.
In this example we want all vases, but we are partially interested in vases with flowers on them
http://api.aucklandmuseum.com/search/collectionsonline/_search?q=dc_title:vase +flower^2
The default boost value is 1, but can be any positive floating point number. Boosts between 0 and 1 reduce relevance.
Boosts can also be applied to phrases or to groups
#Fields available for query-string searches Instead of an all text field search we can also specify certain fields in a query string search. If we only wanted to search the description field we would use: http://api.aucklandmuseum.com/search/collectionsonline/_search?q=dc_description:dog
We can specify the address of the record — the collections area, object or library sub-collection, and the ID of the record. Using those three pieces of information, we can return the original JSON document:
eg. http://api.aucklandmuseum.com/id/library/ephemera/19176
naturalsciences/object/{ID}
library/photography/{ID}
library/manuscriptsandarchives/{ID}
library/paintinganddrawings/{ID}
library/catalogq40/{ID}
library/ephemera/{ID}
Query-string search is handy for ad hoc searches, but it has limitations. We can use the Elasticsearch Domain-specific language (DSL) to build much more complicated, robust queries. For these searches we will be POST data to the API. To do this you will require a API Client - software that will allow you interact with our API. For Chrome you can use Postman or Advanced REST Client
We have instructions how to set these up here
The search URL remains the same:
http://api.aucklandmuseum.com/search/collectionsonline/_search
The basic all text set up will look like this:
{
"query": {
"query_string": {
"query": "Cat"
}
}
}
You can use all the standard search operators in the query:
{
"query": {
"query_string": {
"query": "Cat +dog -kitten"
}
}
}
Lets try searching just the free text field "dc_description" for references to the Victoria Cross.
{
"query" : {
"match" : {
"dc_description" : "Victoria Cross",
}
}
}
#Boolean Searches
{
"query" : {
"bool": {
"must": {
"match" : {
"firstName" : "Edmund"
}
},
"filter": {
"range" : {
"_score" : { "gt" : 8 }
}
}
}
}
}
{
"query": {
"bool": {
"should": [
{ "match": { "am_documentNotes": "Vol1" }},
{ "match": { "am_embarkationBody.rdf_value": "6th*" }}
]
}
}
}
#Selecting a list of records
If you have a list of known records.
http://api.aucklandmuseum.com/search/collectionsonline/_search
{
"query" :{
"ids": {
"values": [
"http://api.aucklandmuseum.com/id/humanhistory/object/65211",
"http://api.aucklandmuseum.com/id/humanhistory/object/657199"
]
}
}
}
Boolean Searches.
{
"query" : {
"bool": {
"must": {
"match" : {
"firstName" : "Edmund"
}
}
}
}
}
#Source Filtering (select which fields are returned)
{
"_source": {
"includes": [ "dc_title", "dc_description", "displayLocation" ]
}
}
#To all CC BY records and select only the top 8 fields:
{
"fields" : ["lastModifiedOn", "_id", "copyright", "dc_contributor", "dc_description", "dc_identifier", "primaryRepresentation","appellation.Primary Title"],
"query": {
"match": {
"copyright": "CC BY"
}
}
}
Elasticsearch has functionality called aggregations, which allow you to generate sophisticated analytics over the data.
Agrregations can be run over all the available fields. We are setting the size
to zero as we don’t need the actual search results we just want the summary. Returning zero hits will also speeds up the query.
{
"size":0,
"aggs": {
"Format": {
"terms": { "field": "dc_format" }
}
}
}
Or the most common surname:
{
"size":0,
"aggs": {
"FamilyName": {
"terms": { "field": "familyName" }
}
}
}
#Complex Facets
{
"size" : 0,
"aggs": {
"LastModified": {
"date_histogram": {
"field": "lastModifiedOn",
"interval": "day",
"format": "yyyy-MM-dd"
}
}
}
}
#Nested Fields
{
"size" : 0,
"aggs": {
"accession date": {
"nested": {
"path": "period"
},
"aggs": {
"by_month": {
"date_histogram": {
"field": "period.accession.end",
"interval": "month",
"format": "yyyy-MM"
}
}
}
}
}
}
This query doesn't include any months with 0 results - if you require this data (for creating graphs etc) then the following parameters can be added
#Geographical Bounding Box Search
The geo_distance
filter draws a circle around the specified location and finds all documents that have a geo-point within that circle:
{
"query": {
"filtered": {
"filter": {
"geo_distance": {
"distance": "100km",
"geopos": {
"lat": 36.5,
"lon": 175
}
}
}
}
}
}
#Errors
200 search results found
400 bad request
404 not found
#Pagination
By default, a search will only return the top 10 results, you can use size
and from
to change how many results you can view.
Beware of paging too deep or requesting too many results at once. Results are sorted before being returned and large requests may results in a timeout error.
http://api.aucklandmuseum.com/search/_search?q=dc_description:cat&size=100
#Apendix
List of Department
{
"size":0,
"aggs": {
"Format": {
"terms": { "field": "department",
"size":50}
}
}
}
Name |
---|
botany |
entomology |
photography |
marine |
publication |
ethnology |
pacific |
history |
applied arts |
land vertebrates |
ephemera |
archaeology |
birds |
geology |
manuscripts |
world ethnology |
amphibians |
maori ethnology |
paintings |