-
Notifications
You must be signed in to change notification settings - Fork 114
API
An experimental RESTful API is available for accessing DMI-TCAT.
It is experimental, because not all DMI-TCAT functionality is available. In fact, other than getting some information about query bins, only the function to purge tweets has been implemented.
A Representational State Transfer (REST) interface is characterised by three main features:
- Resources, identified by URIs.
- Operations, performed on the resources.
- Representations, of the resources.
The representation is controlled by the standard HTTP "Accept" header in the HTTP request.
All API operations support JSON, HTML and plain text representations. Some API operations may support other representations too.
Normally, the API will be invoked by a computer program and will want to use the machine-readable JSON representation. This is also the default representation, if there is no HTTP "Accept" header in the HTTP request.
For example, using curl as the client the following will list all the query bins in JSON:
curl -H "Accept: application/json" -u admin:«password» http://«hostname»/api/querybin.php
Note: actually specifying the Accept header for JSON is unnecessary with curl, since curl doesn't send an Accept header by default and therefore the default JSON representation will be returned.
For interactive use, the functions can be accessed via a Web browser, in which case the HTML representation is returned. Note: this is because Web browsers indicates in the HTTP "Accept" header that they would like to receive "text/html".
For example, visit in a Web browser:
http://«hostname»/api/querybin.php
Human readable text representation.
For example,
curl -H "Accept: text/plain" -u admin:«password» http://«hostname»/api/querybin.php
In addition to implementing access to the API via HTTP, the API scripts can also be invoked from the command line.
For example, on the machine where DMI-TCAT is installed, run:
php /var/www/dmi-tcat/api/querybin.php --help
The "--help" command line provides a description of the available command line options.
In most cases, the output from the command line is the same as the plain text representation.
Show the version of the API.
Resource: http://«hostname»/api/
Operation: GET
Note: the trailing slash is mandatory.
List the names of all query bins in the deployment of DMI-TCAT.
Resource: http://«hostname»/api/querybin.php
Operation: GET
Shows some basic information about a specific query bin.
Resource: http://«hostname»/api/querybin.php/«binName»
Operation: GET
Warning: the information returned is different between JSON/HTML and CSV/TSV representations. This design "flaw" might be fixed in the future.
In JSON or HTML representations, shows the number of tweets in the selected time period (or all tweets, if no time period is specified).
In CSV or TSV, exports the actual tweets in the selected time period. Note: this is currently implemented as a HTTP redirection to the URL of the existing export function in DMI-TCAT.
Resource: http://«hostname»/api/querybin.php/«binName»/tweets
Operation: GET
Query parameters:
-
startdate
: tweets before this timestamp are not included. If this parameter is not specified, it is as if the timestamp of the earliest tweet is specified. See timestamp syntax below. -
enddate
: tweets after this timestamp are not included. If this parameter is not specifed, it is as if the timestamp of the latest tweet is specified. See timestamp syntax below. -
export
: optional query parameter to set representation to either 'csv' or 'tsv'. Useful when using a Web browser where the HTTP Accept header cannot be set.
The time duration specified by startdate
and/or enddate
includes tweets which
have timestamps equal to those times. If only startdate
is specified, tweets
equal to or after that time are included. If only enddate
is specified, tweets
from the beginning of capture up to and including that time are included.
If neither startdate
nor enddate
is specifed, all captured tweets are included.
Other values for HTTP Accept header:
- text/csv - export selected tweets in Comma Separated Values format.
- text/tab-separated-values - export selected tweets in TSV format.
Deletes tweets from the selected time period.
Note: this does not reduce the space occupied by the database on disk, since it is not compacted. But additional captures will not increase the size of the database on disk until the freed up space has been reused.
Resource: http://«hostname»/api/querybin.php/«binName»/tweets
Operation: DELETE
Note: since standard Web browsers do not support the DELETE method,
alternatively a POST request with the action=tweet-purge
query
parameter can be used.
Query parameters:
-
startdate
: tweets before this timestamp are not deleted. If this parameter is not specified, it is as if the timestamp of the earliest tweet is specified. See timestamp syntax below. -
enddate
: tweets after this timestamp are not deleted. If this parameter is not specifed, it is as if the timestamp of the latest tweet is specified. See timestamp syntax below.
Known limitations:
The tcat_* tables are not modified when tweets are purged, so the original capture periods remain. After purging tweets, it will appear as if the capture(s) were performed but no tweets appeared during the purged time period.
A possible enhancement could be an option to modify the tcat_* tables so that it appears as if capturing was not performed during the purged time period.
Timestamps must be in the form of "YYYY-DD-MM HH:MM:SS TZ". The letter "T" (with no whitespace around it) can also be used to separate the date from the time. The whitespace before the timezone is optional.
The timezone can be "Z", "UTC" or an offset. The format of a timezone offset is [+-]HH(:MM). That is, a mandatory plus or minus sign, followed by mandatory number of hours; optionally followed by a colon and a number of minutes.
For example,
- 2016-02-28 17:10:00 Z
- 2016-02-28T17:10:00UTC
- 2016-02-28 17:10:00 +10:00
- 2016-02-28 17:10:00-08:00
An API default timezone can be configured in the api/lib/common.php file. If configured, timestamps without an explicit timezone are interpreted in the API default timezone. If there is no API default timezone, timezones without an explicit timezone are invalid.
Partial timestamps can be specified by omitting the least significant components. For example, specifying everything up to the hour, but omitting the minutes and seconds.
Partial timestamps are interpreted as the beginning of the period for startdates, and as the end of the period for enddates.
For example, as the startdate:
- 2016-03-14T09:15Z is 2016-03-14T09:15:00+00:00
- 2016-03-14T09Z is 2016-03-14T09:00:00+00:00
- 2016-03-14Z is 2016-03-14T00:00:00+00:00
- 2016-03Z is 2016-03-01T00:00:00+00:00
- 2016Z is 2016-01-01T00:00:00+00:00
For example, as the enddate::
- 2016-03-14T09:15Z is 2016-03-14T09:15:59+00:00
- 2016-03-14T09Z is 2016-03-14T09:59:59+00:00
- 2016-03-14Z is 2016-03-14T23:59:59+00:00
- 2016-03Z is 2016-03-31T23:59:59+00:00
- 2016Z is 2016-12-31T23:59:59+00:00