POSC-Caesar FIATECH IDS-ADI Projects

Intelligent Data Sets Accelerating Deployment of ISO15926

Realizing Open Information Interoperability


RDS/WIP 1.0 Process

This page describes the process by the which RDS/WIP 1.0 is used to effect the goals of reference data publishing and distribution.

The described process is an interim process that can be used while RDS/WIP 2.0 and the U-RDL-I are prepared to automate and formalize much of what here is manual and informal.

Audience

This page is intended for people who have accepted the basic value proposition offered by ISO 15926 and the RDS/WIP, and who understand that in order to properly utilize the RDS/WIP, their organizations must be prepared to participate in defining reference data content, publishing it in the RDS/WIP and integrating the run-time query of the RDS/WIP into their systems.

Purpose

This page (once completed) should cover:

  • technical issues:
    • supported input formats
    • supported distribution mechanism
  • legal issues:
    • acceptable submission terms of use
    • operator liability limitations
  • business issues:
    • who can submit what
    • who can accept submissions
    • how to interact with standards tracks
  • frequently asked questions
    • who do I contact to become a submitter?
    • how do I make a submission?
    • what are the requirements of a submission?
    • how do I test a submission (what will it look like)?

Overview

The RDS/WIP provides a means of publishing reference data for ISO 15926, and an environment providing access to that reference data. The process works like this:

  • Analyzing your data:
    • Manually consult the RDS/WIP to find definitions for your purpose.
    • If the found definitions are exactly applicable, use them.
    • If the definitions are inexact, then create or derive new definitions.
    • If no definitions are found, then create new definitions.
    • If new definitions should be published, continue to Publishing ...
  • Publishing (this document):
    • Engage with IDS-ADI to discuss your intent.
    • Choose a license for your submissions.
    • Agree with the IDS-ADI terms of use.
    • Prepare your new definitions in an acceptable input format.
    • Coordinate submission with nominated submission acceptor personnel from IDS-ADI.
  • Distribution (this document):
    • by SPARQL over HTTP (with or without SOAP); and
    • by HTML over HTTP;
    • with RSS feeds for new items.
  • Utilization:
    • Use RSS feed readers to keep up with changes.
    • Use HTML pages for manual access to information.
    • Use SPARQL for systems integration.

Publishing Overview

The current publishing process requires ISO 15926 class and template definitions to be submitted via electronic means to specific people.

A submission is reviewed, and if considered appropriate, it is published via a number of transformation and processing steps, which are not yet fully automated.

Published items may be marked as candidates for various different standards tracks using the same process.

A standards track may result in a published item receiving additional marks indicating acceptance, rejection or any other criteria that the standards track has been constructed to assess. Such marks are again published using the same process.

Distribution Overview

Distribution of reference data (all data that can be published via the above mechanism is deemed reference data) is effected by inserting submission items into an RDF triplestore. Different graphs in the triplestore are bound to different URI endpoints. Each endpoint allows access to information in its bound graph.

Thus, each submitted item has an endpoint in which it is resides. The identifier of an item is a URI that includes the endpoint in a derivable fashion. Each endpoint is accessible for query by machines using SPARQL and for query by humans using web pages generated from the triplestore.

The effect of this system is that each item has at least one identifier which is a URI that allows both machines and humans to immediately resolve information about the item.

Publishing Detail

The publication process for the RDS/WIP 1.0 is as follows:

  • coordinate with IDS-ADI to become a vetted submitter
  • assemble and send a submission to an assessor
  • await publication or change recommendation

While not ideal, this process is workable in the interim. For the RDS/WIP 2.0, it is anticipated that submitters will participate in a certification programme, rather than informal vetting, and there will be no manual assessment stage, simply automated publishing.

Submitter Requirements

The RDS/WIP will not accept unsolicited submissions, that is submissions made without prior arrangement. Because of the flexibility offered in terms of input and the responsibility of IDS-ADI (or other nominated body) for the published outcome, submitters must be vetted in some way.

Vetting Submitters

For the RDS/WIP 1.0, rather than a certification scheme (which is planned for RDS/WIP 2.0) we will have an informal system of vetting submitters.

The goal of this is to ensure that IDS-ADI as a group have collective faith that the submitter will participate in good faith and have an understanding of the level of process maturity and time constraints of assessor personnel. In short, IDS-ADI need to be convinced that the submitter will be a constructive, rather than destructive influence on the RDS/WIP in the nascent stage.

Access Credentials

Once a submitter has been vetted, they will be have access credentials prepared for them on the ids-adi.org infrastructure and distributed to them via email. The credentials will be for an individual, not for an organization.

Access Rights

Access credentials will provide a submitter with access to the ID generation and allocation infrastructure? and with a formal measure of recognition as a submitter for administrative purposes of the RDS/WIP. Access rights do not mean that submissions are automatically accepted for publishing under RDS/WIP 1.0, nor do they guarantee any particular response time by the assessors.

Submission Requirements

Since submissions are being published with an intended quality of allowing further storage and reproduction by anyone, certain requirements apply to it, including syntactic form, acceptable language, acceptable copyright terms, and semantic form.

Syntactic Form

Syntactic form is negotiable. Supported forms are QMXF, QXF, RDF/XML, N3 or N-TRIPLE.

Character set encoding of any XML form (QMXF, QXF, RDF/XML) must be formally stated in the XML header, unless it is UTF-8 - this is consistent with XML processing rules, but is restated here for emphasis.

The character set encoding of N3 or N-TRIPLE must be UTF-8.

All ISO 10646 code points expressible as 16 bit integers (Unicode) are acceptable in textual data.

RDF identifier rules do apply to fragment identifiers, regardless of format: these rules are the same as those used for XML namespace prefix tokens (reference will be added here on request if needed).

Acceptable Text

A submission must not contain text intended to cause alarm or offense. Submissions found to do so may be candidates for deletion even after publication (one of the few cases of allowed deletion).

Each submission item must provide a copyright statement that allows free use of the content and any implied patents. Copyright need not be transferred from the original party, however, anyone must be guaranteed free and unencumbered use of the information, with the exception of the responsibility to convey the copyright terms.

Semantic Form

A submission should:

  • have a self-consistent semantic form (ie. not contradict itself);
  • not be directly comparable to an existing entry.

Semantic Content

A submission consists of a number of distinct items. These items may be of four basic types:

  • class definition
  • class qualification (subclass/specialization)
  • template definition
  • template qualification (subclass/specialization)
  • license definition
  • application of status to an item

Status application is restricted - a submitter must have permission to apply a status to an item within a given context, they may also have restricted permissions on which status they may apply within in a given context.

Submission Format Options

Three different submission formats are supported, each providing different levels of flexibility. The more flexible, the greater the responsibility on the submitter for ensuring correctness.

Because there are only a few people currently tasked with handling submissions, submitters may be asked to provide submissions in specific formats, better able to be easily validated. In the future, the submission process will be automatically validated, and this necessity may disappear.

QMXF

QMXF is an XML format that allows representation of template and class definitions. QMXF has several important features in relation to the reference data submission process:

  • it constrains the data that can be submitted to class or template definitions;
  • it requires the representation of specific meta-data;
  • it provides reasonable defaults for many options;
  • it insulates the definition from the precise template instantiation;
  • it insulates the definition from the precise OWL/RDF;
  • it has a published XSL transform into template instances.

QMXF is useful in that it provides the submitter with distance from instantiated templates and OWL/RDF. This reduces the number of formats and types of technology with which a submitter must be familiar.

Also, since the exact mapping to template instances is remote from the QMXF content, it allows both the template instancing process and the OWL/RDF representation to change, without the need to redefine the input class and template definitions. This feature is seen as being important until ISO 15926 parts 7, 8 and 9 are published.

QXF

QXF is an XML format that allows representation of only template instances. Since all ISO 15926 data (including reference data) is representable as instances of templates, this is sufficient to describe all reference data. QXF has several important features with respect to the reference data publication:

  • it constrains submitted data to template instances;
  • any template instance has only one consistent form;
  • it is flat and non-nested;
  • it insulates template instances from the precise OWL/RDF.

These properies allow QXF to be used for easy eye-verification of template instance construction, since there is only one single representation for any specific template instance.

Also, since the exact mapping to OWL/RDF is remote from the QXF content, it allows the OWL/RDF representation to change, without the need to redefine input template instances. This feature is seen as being important until ISO 15926 parts 7, 8 and 9 are published.

OWL/RDF

OWL/RDF is an ontology language (OWL) expressed in terms of a kind of binary relation abstraction (RDF). Given that ISO 15926 part 2 is an ontology expressed in terms of binary relations, OWL/RDF is a useful means of communicating ISO 15926 information. Similarly, OWL/RDF usage has conventions for the representation of n-ary relationships, which fit well with template instances and ISO 15926 part 7.

In the context of reference data publishing, RDF has several important features:

  • it identifies concepts primarily with full URIs;
  • it has well-supported, freely-available tools on several platforms;
  • it has a well-supported, web-oriented query language, SPARQL;
  • it is the subject of robust standards process in the W3C.

Together, these features make RDF a potentially useful means for publishing and distributing reference data for any model reducible to relationships.

In the context of reference data publishing for ISO 15926, OWL has several important features:

  • it is based on RDF;
  • it is founded in formal logics;
  • it is the subject of robust standards process in the W3C;
  • it is leveraged by the proposed ISO 15926 parts 8 and 9.

These features together make it a natural part of the reference data landscape for ISO 15926.

RDF has several different exchange representations: XML, N3 and N-TRIPLE. All of these formats are acceptable inputs to the submission process. OWL has an exchange representation of its own: OWLX - this is not currently supported as an acceptable input to the submission process.

Distribution Detail

Submission Archive

Whatever the source of the original submission, the submitted source is archived on the server machine (backed up offsite nightly) into an individual, sequenced directory. Each variant of the source, including all step points in transformations through to OWL/RDF is archived in the same place. This allows reconstruction of the submission from any step in the chain of transformations.

Submission Allocation

In preparation for insertion into the nominated endpoint, non-resolvable identifiers (apart from blank nodes) in the OWL/RDF source are replaced with allocations from the ID generator (again, for the nominated endpoint).

The resulting OWL/RDF submission file is stored in the submission archive with the original submission materials.

Submission Insertion

The prepared file with the replaced identifiers is inserted into the specified graph that matches the nominated endpoint (termed a model in a Jena triplestore). This step makes the information available to the public.

Query

Any names or designations in the source will be transformed to kinds of rdf:label in the specified graph. Due to the additional indexing and the configuration of cursor-based search in the triplestore, this allows searching for any terms by which the information in the submission is known.

Query Technical

Each endpoint supports SPARQL query over HTTP, using both CGI and SOAP according to the relevant W3C standards. Result sets for SPARQL select queries are returned to the client encoding in XML according to the relevant W3C standards.

This technology allows applications to dynamically query the endpoint at the machine to machine level. While an understanding of the RDF assembly of the content is required, SPARQL itself requires no construction of RDF, and RDF parsers are not required to read the SPARQL result set. This means that relatively simple CGI and XML parsing technology can be used to interoperate with the RDS/WIP.

Presentation

Once a submission item in question is located (a URI with fragment is provided to identify the submission item), it can be displayed in a web browser simply by clicking on the URI (or pasting it into the address bar).

From here, the server will identify that the client is a browser, and redirect the client to request a CGI encoded-query with the fragment id included, rather than the URI itself bare.

This CGI query results in the generation on the server of a page of content that aggregates information about the submission item. This single page is returned the client web browser.

Presentation Technology

The presentation is interpreted via a scheme developed by IDS-ADI named "infofilter" which is a means of mixing XSLT and SPARQL together with a namespace that allows specification of staged filters built on same.

Since this system utilizes open standards and a manifestly simply filter staging system, it can be edited by those with no knowledge of the specific implementation: SPARQL and XSLT are all that is required to generate the most complex parts of the filter, and by convention in the design of the specific filter stages used, only XSLT is needed for the final transformation into XHTML for presentation.

RSS Feeds

The intention at this stage is to produce RSS feeds for each new RDF identifier that appears in any of the IDS-ADI controlled SPARQL endpoints, so that anyone can monitor and observe changes in the content of the RDS/WIP.

Completed Action Points

(none as yet)

Outstanding Action Points

This is the list of issues that we need to attend to in order to meet the goals of this process.

Identify Assessors

Nominate Julian and Martin as asessors - Julian and Martin should work as a team to examine submissions for acceptability. If in any doubt, Julian and Martin should consult with Onno.

Identify Acceptable Licenses

Nominate BSD 2 Clause license as an acceptable license for publishing content - the owner should be stated in the license, and may optionally be indicated by a URI.

Extend Submission Format for Licensing Terms

Define raw template instances and extend QMXF to account for licenses as per acceptable licenses above.

Alter Presentation to Show Licensing Terms

Alter the presentation scripts to clearly show the licensing terms of each item, including those items owned by PCA and USPI. Items created by IDS-ADI should be licensed by IDS-ADI.

Licence and Licensing Reflexivity

Do licences need to be licensed? What about the act of licensing - do we just mandate that public licensing is inferred for licences and licensing?

RSS Schema Definition

Need to figure out an RSS schema for reporting - should it be U-RDL-I level or RDS/WIP level? What are the specific RSS compliant RDF predicates?

RSS Schema Implementation

Automation of RSS channel data distribution. Need to figure out how to build summaries, what the summary lifespan is (how long a summary statement covers), etc. etc.

Submitter Vetting

Talk to SC about plan to vett submitters.

Preferred or Recommended Submission Format

Is the preferred or recommended submission format QMXF?


Home
About PCA
Reference Data Services
Projects
Workgroups