Skip to content
Jim Amsden edited this page Jun 26, 2018 · 4 revisions

Overall Design

ldp-service is an Express middleware component that adds LDP capabilities to any Node/Express Web application. Like any other Express middleware component, routes are configured to direct HTTP requests to the ldp-service which is configured on a context URI that is the root LDPC for all accessible resources.

Configuration

ldp-service gets its configuration from the app in which it is used.

var ldpService = require('ldp-service');
var app = express();
app.use(ldpService(env));

The configuration information is minimally:

{
    "scheme": "http",
    "host": "localhost",
    "port": 3000,
    "context": "/r",
    "mongoURL": "mongodb://localhost:27017/ldp"
}

LDP Storage Abstraction

Current LDP persistence API

The original MongoDB implementation of ldp-service is not really generic with abstract storage services. Rather the ldp-service’s CRUD methods are tightly coupled with the db in the following ways:

  • db.get() document is the MongoDB representation, not generic triples. We could use rdflib.js in the storage implementations to parse whatever resource representation is available from the DB, and then serialize it as requested.
  • addHeaders() sees if the MongoDB document is a container in order to see the link headers
  • serialization of the MongoDB representation and setting the 406 status code if the Accept header isn’t Turtle or JSON-LD and turtle.js and jsonld.js are serializers that are specifically based on the MongoDB triple format.
  • insertCalculatedTriples inserts containment triples that are not explicitly stored in the MongoDB representation, before the document is serialized
  • Why are the following included in ldp-service-mongodb and ldp-service-jena jsonld.js media.js turtle.js These are in ldp-service.js to provide serializers for JSON-LD and Turtle, and can be moved to the ldp-service-mongodb since they are specific to the MongoDB representation of triples. Other service implementations should use rdflib serializers
  • Should the jsonld and turtle implementations be replaced by rdflib? No, these parser/serializers are specific to the MongoDB representation of a graph Sam and Steve designed. Move them from ldp-service to ldp-service-mongodb

All this lead Neil to develop completely separate, full implementations of ldp-service in ldp-service-mongodb and ldp-service-jena instead of trying to factor out a storage service. This was Neil’s original implementation of ldp-service-jena.

However, this would result in a lot of duplicated LDP code that would be difficult to maintain.

  • need to decide how to move forward
  • ldp-service will be built on an abstract storage.js service
  • var ldpService = require('ldp-service')(env) instantiates the ldp-service on the desired storage service
  • all storage service methods use rdflib IndexedFormula as the representation of a resource.
  • content negotiation is handled by ldp-service

New LDP persistence API

This is captured in abstract module ldp-service/storage.js (which used to be db.js).

The architecture is a container of resources that represent RDF graphs

Methods are implemented on a db that is expected to be provided by the concrete implementation.

Abstract methods simply throw not implemented exception.

Abstract Methods:

  • init - initializes the database
  • drop - drops the database
  • reserveURI - reserves a URI for future use
  • releaseURI - releases a URI that is no longer needed
  • get - get a resource
  • put - update a resource
  • remove - delete a resource
  • findContainer - query for a container resource
  • getContainment
  • createMembershipResource

Configuring the Storage Service

The application (usually app.js or server.js) establishes the ldp-service though the express middleware

ldp-app does this:

var ldpService = require('ldp-service');
var env = require('./env.js');

// initialize database and set up LDP services and viz when ready
app.use(ldpService(env));

ldp-service/service.js exports a function(env) that uses information from the environment to configure the database for the LDP service.

  • Decide on what information should be included in the env in order to configure the ldp-service database. This is used in the db.init function.
  • Jena
    • jenaURL
  • MongoDb
    • mongoURL
  • use that in the module.exports = function(env) function to instantiate the proper storage concrete implementation
	var db = require('./storage.js') // the abstract storage services
	require('ldp-service-mongodb')(db)  // instantiate using a MongoDB implementation

env.storageImpl

The db.init() function is called here with the environment.

ldp-service get() calls db.get() to actually read the resource.

db.get() is currently returning an array of triples defined by the MongoDB stored representation! Then it uses the req.accepts() header to determine if turtle or jsonld is requested and creates a serialize using jsonld.js or turtle.js

The serailzier than serializes the MongoDB triples into JSON-LD or Turtle and returns that source document.

The question is, where should the accept header and serialization be handled:

  1. in the storage service where the resource is actually read or
      • keeps the internal representation variability in the storage service
      • minimizes changes to the MongoDB implementation
      • doesn’t rely on the somewhat inefficient rdflib JSON-LD serializer
      • results in redundant code for the serializers/deserializers
      • doesn’t provide a common internal resource representation for in-memory access
  2. √ in ldp-service using a common triple representation returned by the storage service
      • uses common rdflib internal representation for in memory access
      • pushes the variability management into the storage service and provides the IndexedFormula as an easier way to deal with it than source documents, or different internal representations
      • allows all content negotiation to be done directly in ldp-service in one place
      • Supported content types are no longer storage service dependent
      • support for JSON-LD will be inefficient until this is fixed in rdflib (because of the incompatible internal representations of imported parsers)
      • results in more changes to ldp-service.mongodb to convert from the MongoDB representation of triples into rdflib.
      • results in an extra conversion between RDF documents and IndexedFormula. For example, GET for the Jena implementation will use SPARQL endpoint to get a graph resource, parse it, then serialize it. Instead, it could be possible to send the content type into the storage service and use it directly on the SPARQL endpoint to avoid the extra parse/serialize step. However, this would couple ldp-service to the content types supported by the SPARQL end point, not those supported by rdflib. This extra parse/serialize decouples the storage formats supported by the persistence layer from the serialization formats supported by ldp-service.
  • Choose # 2

So the rather large change we need to fix this is:

  • storage methods return the rdflib IndexedFormula
  • ldp-service does content negotiation and uses rdflib serializers to produce the desired RDF resource format.
  • move jsonld.js and turtle.js to ldp-service-mongodb - these are specific to this service provider, not any service provider. These can be changed to convert to/from rdflib IndexedFormula

Some assumptions:

  1. If the underlying storage service shouldn’t be concerned with content negotiation, just persistence,
  2. ldp-service can return 406 Not Acceptable if rdflib doesn’t support the desired content type.
  3. Since ldp-service is generally responding to HTTP GET requests for linked-data resources, it will always be responding with a standard RDF resource representation (application/xml, application/rdf+xml, text/turtle, application/ld+json should all be supported)

Instantiating a concrete ldp-service storage implementation

ldpService = require('ldp-service') creates the ldp-service instance
ldpService(env) returns the express routes and instantiates the storage service as:
	var db = require('./storage.js') // the abstract storage services
	require('ldp-service-jena')(db)  // instantiate using a Jena implementation

ldp-service-jena/storage.js simply overrides all the methods in ldp-service/storage.js

Do these abstract methods need to be implemented? Neil didn’t implement these. Looks like he didn’t worry about containers and membership in his tests. These do need to be implemented.

  • findContainer
  • getContainment
  • createMembershipResource
  • drop

Support multiple database implementations using different ldp-service subApp routes

All the LDP repos are cloned in Users/jamsden/Developer/LDP

The current ldp-service implementation uses abstract storage.js which in ldp-app is instantiated on the ldp-service-mongodb storage.js implementation. This works, but the ldp-service and abstract storage.js are tightly coupled with the ldp-service-mongodb internal representation of a MongoDB RDF resource. This needs to be replaced by rdflib.js IndexedFormula.

But first, let’s document the overall architecture of ldp-service by following a GET operation from start to finish.

ldp-service is an express middleware component that has an express() subApp for handling LDP requests.

it exports a function(env) that returns the subApp as a route instantiated on a storage implementation.

The express route has an all(function(req, res, next) method that handles the common LDP link headers.

Then the route has functions to handle HTTP requests for LDP resources.

GET: function get(req, res, includeBody) for example…

includeBody is a boolean to determine if the body should be returned - this should be head, not a flag on GET… resource.head calls get(req, res, false), resource.get calls get(req, res, true) to share the common code. The problem is that head still causes the resource to be read and serialized, even though it isn’t sent. This is inefficient, but may be necessary in order calculate the common headers and set the proper link headers for LDP containment..

delegates the GET to the db.get(url, function(err, document))

examines err to set status codes appropriately, does res.sendStatus() and returns if it can’t continue.

examines the request accept header to determine what serializer to use. Note the db may use its own request and accept header to interact with the underlying database, but that is storage.js implementation specific and should not be exposed at this level

adds common headers (GET, POST, OPTIONS):

  • sets Allow header to GET, HEAD, DELETE, OPTIONS
  • if the document is a container,
    • sets response links type to the document interaction model
    • adds POST to the allow header
    • sets Accept-Post allowed media types (turtle, jsonld, json)
  • if it not a container,
    • adds PUT to the allow header

Inserts some calculated triples that aren’t stored in the document:

  • based on preferences, determines how members of a container should be handled. This information is not necessarily stored in the resource, and the membership could be returned in multiple ways based on request preferences in the Prefer header.
  • handles the LDP container membership predicates for ldp:contains, or the LDP membership predicate

serializes the document (document is MongoDB specific) based on the content type. uses jsonld.js or turtle.js serializers for this.

handles Preference-Applied header

generates an eTag and writes the ETag and Content-Type headers.

if includeBody is true, does res.end(newBuffer(content), 'utf-8') to send the response body. otherwise just does res.end.

Changes:

  1. db.get(url, function(err, document)), document should be an IndexedFormula
  2. may not be a way to determine if the document existed, but was deleted. That relies on a database implementation that marks this to deletion, but doesn’t delete them. The MongoDB db.remove operation sets the triples to [] and marks the resource as deleted.
  3. uses rdflib.js to serialize the KB based on the Accept header.

Overall Implementation

https://www.w3.org/2012/ldp/hg/ldp-primer/ldp-primer.html

  • Implement all the ldp-service methods on an rdflib.js IndexedFormula. storage.js is abstract in this repo, and sets the API.

    • is fullURL() still needed - yes, req.url is the relative URL, but rdflib IndexedFormula needs absolute URLs
  • Implement ldp-service-jena to support storage.js implementation on Jena returning all the IndexFormula instances ldp-service needs. rdflib.js has ways of making requests that should be used in the storage.js implementations if they are appropriate. This might work directly for jena and not require direct use of jena SPARQL endpoint and request.

Note however, that ldp-service is an HTTP server, and it will need to be able to set appropriate headers that might need to be set from the storage.js response.

  • Test ldp-app with the updated ldp-service

  • OPTIONAL: reimplement ldp-service-mongodb to support storage.js implementation on MongoDB returning all the IndexFormula instances ldp-service needs. Not sure this is all that valuable as a deliverable. But would be useful to ensure ldp-service and implementations of storage.js are sufficient to support adapters.

jsonld.js and turtle.js are specific to the old MongoDB implementation and stored data format and swould need to be re-implemented on IndexedFormula instead of RDF source. Use rdflib to parse into IndexedFormula, then use a converter to convert to MongoDB internal representation.

Alternatively, implement storage-services-mongodb (not the rename from ldp-service-mongodb) using JSON-LD as the document storage format in MongoDB.

Implement storage service init

Currently just gets the jenaURL from the env parameter. This implementation expects this fuseki database to already exist and is what the fuseki server was started on. I does not support adding and dropping new fuseki databases, but rather relies on fuseki server startup to do that directly.

Implement storage service drop

Not needed for the Jena storage. This is all handled by fuseki outside the application. See init.

Implement GET/HEAD storage service read

express is going to call resource.get, which calls internal get method with a flag to determine if the body should be provided or not. This implements LDP GET and HEAD.

  • addHeaders needs to determine if the document from the storage service is a container and its InteractionModel

  • isContainer document is a container if its interactionModel is ldp.BasicContainer or ldp.DirectContainer {url rdf:type ldp:BasicContainer} or {url rdf:type ldp:DirectContainer}

  • insertCalculatedTriples - these are based on the Prefer header and how the containers should be represented. This might be why the MongoDB implementation didn’t store membership triples in the resource. Its these could be added based on the prefer header.

  • determine a simple way to test ldp-service using Postman tests/test-app.js This is just a simple server that uses:

env.scheme = "http"
env.host = "localhost"
env.port = 3000,
env.context = "/r"
env.storageImpl = "ldp-service-jena"
env.jenaURL = "http://localhost:3030/mrm"
  • Get fuseki running on the mrm database and make sure there’s some data in it. cd ~/bin/apache-jena-fuseki-3.7.0 ./fuseki-server --update --loc=../mrm /mrm &

Provides these endpoints:

http://localhost:3000/r/example/spc is a resource I can GET from the mrm repository.

This GET is working and proves the overall architecture and implementation outlined above.

var store = $rdf.graph() var timeout = 5000 // 5000 ms timeout var fetcher = new $rdf.Fetcher(store, timeout)

fetcher.nowOrWhenFetched(url, function(ok, message, response) { if (!ok) { console.log("Oops, something happened and couldn't fetch data: "+message) console.log("HTTP Status: "+response.status) } else { // do something with the data that was just added to the store } })

Note: Fetcher appears to deal with LDP containers. This needs more study. See src/fetcher.js.

There’s not much documentation on Fetcher:

I tried to get Fetcher to work but it got into an infinite loop. Didn't try to further debug because the current implementation seems be sufficient for now.