RDS/WIP 2.0 Plan

Goals, design, resourcing and scheduling to implemenet the RDS/WIP 2.0.

External To Do

This section lists actions external to the RDS/WIP 2.0 project that need to occur in the near term.

  • Steering Committee Review
  • Resource Allocation

Goals

  • Replace Brutus, Tarcus and RDS2RDF triplestore.
  • Not reliant on definition of ISO 15926 itself.
  • Supports and does not restrict ISO 15926 OWL/RDF in any way.
  • Supports raw RDF and other OWL too.
  • Can be theoretically extended to operate as an ISO 15926 façade for ISO 15926 submittals.
  • Simple, fast to build.
  • Iteratively developed - early review and testing.

Architecture

The RDS/WIP 2.0 will not be built as a demonstration of a façade, rather it, will be built as a generic RDF store with support for the administration and bulk submission features that IDS-ADI needs.

Overview

The architecture of the RDS/WIP 2.0 SPARQL service will be as a simple triplestore with a primitive operation for transactionally modifying the contents of that triplestore according to strictures layed out by this document.

This approach treats the triplestore as vanilla RDF and does not ascribe or interpret anything specific to ISO 15926 or OWL.

This is important in several regards:

  • because it does not interpret ISO 15926, it allows ISO 15926 to change, without the need for the administration and submission tools to change;
  • it allows information outside of ISO 15926 (other OWL and raw RDF) to be represented in the triplestore; and
  • it does not require entailment or any other special technology in order to evaluate security constraints - a raw RDF store with SPARQL support is enough.

Platform & Componentry

Initial platform:

  • 1GB 1800MHz Pentium 4 Class Dedicated Server
  • Debian Linux 4.0
  • Java 6 Development Kit
  • Joseki 3.1 augmented with SOAP as implemented by NRX and tested by Bentley.
  • Jena as provided with Joseki 3.1.
  • mysql 5.0
  • Internet address.
  • Domain name mapping.
  • SSL and HTTP access.
  • SSL certificate and asymmetric key pair.

Notes:

  • May substitute mysql for OWLgres with Pellet, licensing permitting.

Deployment & Resources

POSC Caesar to provide dedicated hosting including SSL certificate and asymmetric key pair.

Supported by DNV staff in the IRM group.

No service level agreement - leave the RDS/WIP 1.0 in place to satisfy ISO requirements for now.

Protocol

Overview:

  • All operations described in OWL/RDF.
  • Operations implemented as POST with command verb and payload.
  • Operations also implemented as SOAP with WSDL descriptor.
  • OWL/RDF to be treated as opaque payload at SOAP layer.

Details:

Namespaces

There is an RDS/WIP Administration namespace that is treated specially in terms of content - users cannot directly create content that addresses it in any manner. We refer to this namespace as "rwa", and use that as the prefix to represent it in examples.

Operations

  • There is one operation - alter.
  • It is a sequence of actions to be completed as a single transaction.
  • Each action consists of verb and parameters.
  • The verbs are INSERT, REMOVE, MOVE, CREATE and DELETE.
  • INSERT takes a target model and content.
  • REMOVE takes a target model and content.
  • MOVE takes a source model, a target model and content.
  • CREATE creates a target model.
  • DELETE deletes a target model.
  • Content is a sequence of triples that may be defined in two ways:
    • As the result of a SPARQL CONSTRUCT query.
    • As an included sequence of triples.
  • Any content containing references to the rwa namespace will be silently elided.
  • The result of an operation is a sequence of statuses.

Security Model

Authentication:

  • Authentication subject to container norms (Basic min.).
  • SSL used for all authenticated connections (HTTPS).

Authorization:

  • Declarative, in the same triplestore dataset.

Whenever an operation is started, submittal state is recorded using properties and structures in the rwa namespace. The submittal itself is provided with a unique identifier. The kind of information recorded against the submittal identifier includes:

  • the date/time
  • the security principal (authenticated user)

As each action is executed, further submittal records are created for the action, in the rwa namespace.

Whenever any content triple is added, it is added with an additional statement about it (using reification), linking the statement to its generating action.

Thus for any triple with a non rwa property, its contributing security principal can be discovered via this path:

triple => action -> operation -> principal

Where => is a link by reification, and -> is a link by id.

The security model above uses the submittal data for tracking provenance of triples, but the submittal data also may need to be linked to data in other ways. Note however, that this is model dependent.

So at the outset it is clear that some models will need rules for attaching submittal data that links data to submittals in redundant ways. This implies that there needs to be some rule-system configurable on a per endpoint basis for the generation of submittal linkage in the schema of the model.

For example, OWL and ISO 15926 might use OWL properties for the (semantic) attachment of submittal information to a class or property definition.

ID Generation

There is often a need to be able to create a new identifier, particularly in the case of triples being inserted into the store from an external source.

So when an insert specifies subjects that match a given regexp pattern, those subjects are marked for ID generation. Blank nodes are excluded from this and generated into the target model as is.

Because multiple models may be affected by a single operation, and there may be correspondences between triples in different actions in the same operation, ID substitution is dealt with in the operation as a whole. Any element (subject, predicate, object) of a triple that corresponds with an identifier for substitution, will be substituted accordingly.

Strategy

The operation is streamed into a temporary model in the service dataset in the triplestore. Because the operation is a sequence of actions, there is a strict order to execution.

Next each action is executed from the temporary model in turn. In the case of any action that deals with content, the content is written into a new temporary model and the

Once the operation is committed, the id correspondences are streamed back to the caller, in no particular order.

Blank Node Deletion

When all the referents to a blank node disappear, so should the blank node itself disappear. Thus when deletes occur that result in the deletion of triples that reference blank nodes, those blank node identifiers are marked for delete checking.

Delete checking involves cycling through the set of marked blank node identifiers and deleting any triples with a subject of an unreferenced blank node, until a cycle removes no more triples.

This process - blank node deletion - is applied at the end of every operation that could result in the removal of a triple.

Backups

The mysql database should be backed up according to a regular schedule - local hot backups to the same system daily, remote copy of a hot backup to another system weekly.

Privileges

Security principals are restricted on the action verbs that they can use on specific models. There are no other privileges.

Administration Client

Probably a thin-client web-app, using the POST submission API.

Submission Client

Probably a thin-client web-app, using the POST submission API.

Query Client

For ad-hoc queries and to expose ISO 15926 structures specifically - possibly built around the Joseki SPARQL XSLT API.

Discrete Editor

There is a requirement from some quarters for a discrete editor implemented specifically as a thin-client solution on the same server as the RDS/WIP. The technology for this has not yet been selected - consider perhaps using the google web toolkit or something similar.

Internal To Do

Because this project will be developed iteratively, some of the technology will be developed, tested and either accepted or rejected as we go along. Part of that process also involves looking at and experimenting with existing or proposed 3rd party schemes that address our problem set.

  • Look at existing triplestore access control schemes such as University of Leipzig work by Sebastian Dietzold and Soren Auer.
  • Look at existing triplestore management software such as OpenLink Virtuoso.
  • Define the "rwa".
  • Test submission and administration.
  • Solve and implement ID substitution.
  • Solve and implement blank node deletion.

Discussion

You have no rights to see this discussion.

Home
About PCA
Reference Data Services
Projects
Workgroups