OData not only offers a RESTful interface to any datastore to perform CRUD operations, it also provides a very powerful query interface to the underlying datastore making it a candidate for the ‘universal’ query language. However the query capability is often neglected. Why?

Simple OData Query Example

It is always easier to start simply. So the question I want answered from a Northwind datastore is

Find customers located in France

Given an OData endpoint, the query is simply answered with the following URL:

http://localhost:8080/odata2sparql.v4/NW/

Customer?

$filter=contains(customerCountry,’France’)

The elements of this URL are as follows:

  1. The first line identifies the OData endpoint, in this case it is a local OData2SPARQL (http://inova8.com/bg_inova8.com/offerings/odata2sparql/) endpoint that is publishing a RDF/SPARQL triplestore hosting an RDF version of Northwind (https://github.com/peterjohnlawrence/com.inova8.northwind).

Alternatively you could use a publicly hosted endpoint such as http://services.odata.org/V4/Northwind/Northwind.svc

  1. The next line specifies the entity or entityset, in this case the Customer entitySet that is the start of the query
  2. The final line specifies a filter condition to be applied to the property values of each entity in the entitySet

The partial results are shown below in JSON format, although OData supports other formats

But that is not much more than any custom RESTful API would provide, right? Actually OData querying is far more powerful as illustrated in the next section.

Complex OData Query Example

The question I now want answered from a Northwind datastore is

Get product unit prices for any product that is part of an order placed since 1996 made by customers located in France

One thing is that the terminology used by OData is different than that of RDBMS/SQL, RDF/RDFS/OWL/SPARQL, Graph, POJO, etc. The following table shows the corresponding terms that will be used in this description

OData Terminology Mapping to Relational, RDF, and Graph

Mapped OData Entity Relational Entity RDF/RDFS/OWL Graph
Schema Namespace, EntityContainer Name Model Name owl:Ontology Graph
EntityType, EntitySet Table/View rdfs:Class

owl:Class

Node
EntityType’s Properties Table Column owl:DatatypeProperty Attribute
EntityType’s Key Properties Primary Key URI Node ID
Navigation Property on EntityType, Association, AssosiationSet Foreign Key owl:ObjectProperty Edge
FunctionImport Procedure spin:Query  

Given an OData endpoint, the query is answered with the following URL:

http://localhost:8080/odata2sparql.v4/NW/

Customer?

$filter=contains(customerCountry,’France’)&

$select=customerCompanyName,customerAddress,customerContactName&

$expand=hasPlacedOrder(

$filter=orderDate gt 1996-07-05;

$select=shipAddress,orderDate;

$expand=hasOrderDetail(

$select=quantity;

$expand=product(

$select=productUnitPrice

)

)

)

The elements of this URL are much the same as before, but now we nest some queryOptions as we navigate from one related entity to another, just as we would when navigating through a graph or object structure.

  1. Customer entitySet is specified as the start of the query
  2. The $filter specifies the same filter condition as before since we want only customers with an address in France.
  3. Next we include a $select to limit the properties (aka RDBMS column, or OWL DatatypeProperty) of a Customer entity that are returned.
  4. Then the $expand is a query option to specify that we want to move along a navigationProperty (aka RDBMS foreign key, Graph edge or OWL ObjectProperty). In this case the navigationProperty takes us to all orders placed by that customer.

Now we are at a new Order entity we can add queryOptions to this entity

  1. The $filter specifies that only orders after a certain date should be included.
  2. The $select limits the properties of an Order entity that are returned.
  3. The $expand further navigates the query to the order details (aka line items) of this order.

Yet again we are at another entity so we can add queryOptions to this Order_Detail entity:

  1. The $select limits to only the quantity ordered
  2. The $expand yet again navigates the query to the product entity referred to in the Order_Detail

Finally we can add queryOptions to this product entity:

  1. The $select limits only the productUnitPrice to be returned.

The partial results are shown below:

You can test the same query at the public Northwind endpoint, the only difference being the modified names of entitySets, properties, and navigationProperties.

http://services.odata.org

/V4/Northwind/Northwind.svc/Customers?

$filter=contains(Country, ‘France’)&

$select=CompanyName,Address,ContactName&

$expand=Orders(

$filter=OrderDate gt 1996-07-05;

$select=ShipAddress,OrderDate;

$expand=Order_Details(

$select=Quantity;

$expand=Product(

$select=UnitPrice

)

)

)

Structure of an OData Query

The approximate BNF of an OData query is shown below. This is only illustrative, not pure BNF. The accurate version is available here (http://docs.oasis-open.org/odata/odata/v4.0/odata-v4.0-part2-url-conventions.html)

ODataRequest              := entity|entitySet?queryOptions

entity                            a single entity

                                    := Customer(‘NWD~Customer-GREAL’)

                                    := Customer(‘NWD~Customer-GREAL’)/hasPlacedOrder(‘NWD~Order-10528’)

entitySet                       a set of entities

                                    := Customer

                                    := Customer(‘NWD~Customer-GREAL’)/hasPlacedOrder

queryOptions               The optional conditions placed on the current entity

                                    := filter ; select ; expand

filter                             Specify a filter to limit what entities should be included

                                    := $filter = {filterCondition}*

                                    := $filter = contains(customerCountry,’France’)

select                            Specify what properties of the current entity should be returned

                                    := $select = {property}*

                                    := $select = customerCompanyName,customerAddress,customerContactName

expand                        Specify the navigationProperty along which the query should proceed

                                    := $expand = {navigationProperty(queryOptions)}*

.                                   := $expand = hasPlacedOrder($expand=..)

Perhaps this can be illustrated more clearly (for some!):

Using the same diagram notation the complex query can be seen as traversing nodes in a graph:

Other Notes

  1. OData publishes the underlying model via a $metamodel document, so any application can be truly model-driven rather than hard-coded. This means that OData query builders allow one to access any OData endpoint and start constructing complex queries. Examples include:
  2. The examples in this description use OData V4 syntax ( http://docs.oasis-open.org/odata/odata/v4.0/odata-v4.0-part2-url-conventions.html ). The same queries can be expressed in the older OData V2 syntax (http://www.odata.org/documentation/odata-version-2-0/uri-conventions/), but V4 has changed the syntax to be far more query friendly.
  3. What about really, really complex queries: do I have to define them always via the URL? Of course not because each underlying datastore has a way of composing ‘views’ of the data that can then be published as pseudo-entityTypes via OData:
    • RDBMS/SQL has user defined views
    • RDF/RDFS/OWL/SPARQL has SPIN queries (http://spinrdf.org/) that can be added as pseudo EntityTypes via OData2SPARQL
  4. By providing a standardized RESTful interface that also is capable of complex queries we can isolate application development and developers from the underlying datastore:
    • The development could start with a POJO database, migrate through an RDBMS, and then., realizing the supremacy of RDF, end up with a triplestore without having to change any of the application code. OData can be a true Janus-point in the development

Conclusions

OData is much more than JDBC/ODBC for the 21st century: it is also a candidate for the universal query language that allows development and developers to be completely isolated from the underlying datastore technology..

Additional Resources

Odata2SPARQL (http://inova8.com/bg_inova8.com/offerings/odata2sparql/)

Lens2Odata (http://inova8.com/bg_inova8.com/offerings/lens2odata/)

SQL2RDFincremental materialization uses a stream of source data changes (DML) and transforms them to the corresponding changes (inserts, deletes) in the triple store ensuring concurrency between RDBMS and triplestore. SQL2RDFincremental materialization is based on the same R2RML mapping used for virtualization (Ontop), and (Capsenta), and materialization (In4mium, and R2RML Parser) thus does not incur an additional configuration maintenance problem.

Mapping an existing RDBMS to RDF has become an increasingly popular way of accessing data when the system-of-record is not, or cannot be, in a RDF database. Despite its attractiveness, virtualization is not always possible for various reasons such as performance, the need for full SPARQL 1.1 support, the need to reason over the virtualized as well as other materialized data, etc. However materialization of an existing RDBMS to RDF is also not an ideal alternative. Reasons include the inevitable lack of concurrency between the system-of-record and the RDF.

Thus incremental materialization provides an alternative between virtualization and materialization, offering:

  • Easy ETL of source RDB as no additional configuration required other than the same R2RML required for virtualization or bulk materialization.
  • Improved concurrency of data compared with materialization.
  • Significantly less computational overhead than a full materialization of the source data.
  • Improved query performance compared with virtualization especially when reasoning is required.
  • Compatibility with R2RML thus reducing configuration maintenance.
  • Supports insert, update, and delete changes on the source RDBMS.
  • Source transactions can be batched, and the changes to the triplestore are part of a single transaction that can be rolled back in the event of a problem.
  • Supports change logging so that committed changes can be rolled back.

Contact  inova8 if you would like to try SQL2RDF

When a subject-matter-expert (SME) describes their data domain they use English language. In response we give them visually beautiful, but often undecipherable, diagrams. Here we propose an alternative: a ‘word-model’ that describes the model in structured English without loss of accuracy or completeness.

Creating an ontological model invariably involves a subject-matter-expert (SME) who has an in-depth knowledge of the domain to be modelled working together with a data modeler or ontologist who will translate the domain into an ontological model.

When a subject-matter-expert (SME) is in discussion with a data modeler, they will be describing their information using English language:

“I have customers who are assigned to regions, and they create orders with a salesperson who prepares the order-details and assigns a shipper that operates in that region”

In response a data modeler will offer the subject-matter-expert (SME) an E-R diagram or an ontology graph!

Figure 1: Geek Model Diagrams

No wonder we are regarded as geeks! Although visually appealing, they are difficult for an untrained user to verify the model that the diagram is describing.

Instead we could document the model in a more easily consumed way with a ‘word-model’. This captures the same details of the model that are present in the various diagrams, but uses structured English and careful formatting as shown below.

aCustomer

  • places anOrder
    • made by anEmployee
      • who reports to anotherEmployee
      • and who operates in aTerritory
    • and contains anOrderDetail
      • refers to aProduct
        • which is supplied by aSupplier
        • and which categorized as aCategory
      • and is shipped by aShipper
      • and is assigned to aRegion
        • which has aTerritory
      • and belongs to aRegion

The basic principle is that we start at any entity, in this case ‘Customer’ and traverse the edges of the graph that describe that entity. At each new node we indent. Each line is a combination of the predicate and range class. No rules regarding order in which we visit the nodes and edges. No rules regarding depth. The only rule should be that, if we want a complete word-model, the description traverses all edges in one direction or the other.

These word-models are useful in the model design phase, as they are easy to document using any editor. Since the word-model is in ‘plain English’, it is easy for a SME to verify the accuracy of the model, and mark up with comments and edits. However word-models are also easy to generate from an RDF/OWL model.

Enhancements to Word-Model

We can refine the contents of the word-model as we develop the model with the SME. We can also enhance the readability by decorating the word-model text with the following fonts:

Word-Model Legend
Italics indicate an instance of a class, a thing
Bold indicates a class
Underline is a predicate or property that relates instances of classes
BoldItalic is used for cardinality expressions

Level 1a: Add the categorization of entities

Rather than using an example of an entity, we can qualify with the class to which the sample entity belongs.

aCustomer, a Customer

  • places anOrder, an Order
    • made by anEmployee, an Employee
      • who reports to anotherEmployee, an Employee
      • and who operates in aTerritory, a Territory
    • and contains anOrderDetail, an OrderDetail
      • and refers to aProduct, a Product
        • which is supplied by aSupplier, a Supplier
        • and which is categorized as aCategory, a Category
      • and is shipped by aShipper, a Shipper
      • and is assigned to aRegion, a Region
        • which has aTerritory, a Territory
      • and belongs to aRegion, a Region

Level 1b: Add cardinality of predicates

We can also add the cardinality as a modifier between the predicate and the entity:

aCustomer

  • who places zero, one or more orders
    • each made by one anEmployee
      • who reports to zero or one anotherEmployee
      • and who operates in one or more aTerritorys
    • and each contains one or more anOrderDetails
      • which refers to one aProduct
        • which is supplied by one aSupplier
        • and which is categorized as one aCategory
      • and is shipped by one aShipper
      • and is assigned to one aRegion
        • which has one or more aTerritorys
      • and belongs to one aRegion

Level 2: Add categorization and cardinality

Of course we can combine these extensions into a single word-modol as shown below.

aCustomer, a Customer

  • who places zero, one or more orders, each an Order
    • each made by one anEmployee, an Employee
      • who reports to zero or one anotherEmployee, an Employee
      • and who operates in one or more aTerritorys, each a Territory
    • and each contains one or more anOrderDetails, each an OrderDetail
      • which refers to one aProduct, a Product
        • which is supplied by one aSupplier, a Supplier
        • and which is categorized as one aCategory, a Category
      • and is shipped by one aShipper, a Shipper
      • and is assigned to one aRegion, a Region
        • which has one or more aTerritorys, each a Territory
      • and belongs to one aRegion, a Region

Despite the completeness of what is being described by this word-model, it is still easy to read by SMEs.

Auto-generation of Word-Model

Once the word-model has been formally documented in RDF/OWL, we can still use a word-model to document the RDF/OWL by auto-generating a word-model from the underlying RDF/OWL ontology as shown below.

Figure 2: Word-Model

This was generated using a SPIN magic-property as follows:

select ?modelDefinition
{
   (model:Customer 4 false) :modelDefinition ?modelDefinition .
}

This auto-generation can go further by including the datatype properties associated with each entity as shown below:

Figure 3: Word-Model including datatype properties

This was generated using a SPIN magic-property as follows:

select ?modelDefinition
{
   (model:Customer 4 true) :modelDefinition ?modelDefinition .
}

Appendix

SPIN Magic Properties:

The following SPIN properties were defined for auto generation of the word-model in HTML format:

classProperties

SELECT ?classProperties
WHERE {
{SELECT  ?arg1   (    IF(?arg2CONCAT("(",?properties,")"),"") as ?classProperties) WHERE
    {
        SELECT ?arg1 ((GROUP_CONCAT(CONCAT("<i>", ?dataPropertyLabel, "</i>"); SEPARATOR=', ')) AS ?properties)
        WHERE {
           # BIND (model:Product AS ?arg1) .
            ?arg1 a ?class .
            FILTER (?class IN (rdfs:Class, owl:Class)) .
            ?arg1 rdfs:label ?classLabel .
            ?dataProperty a owl:DatatypeProperty .
            ?dataProperty rdfs:domain ?arg1 .
            ?dataProperty rdfs:label ?dataPropertyLabel .
        }
        GROUP BY ?arg1
    }
}
}

classDefinition

SELECT ?classDefinition ?priorPath
WHERE {
    {
        SELECT ?arg1 ?arg2  ((GROUP_CONCAT( ?definition; SEPARATOR='<br/>and that ')) AS ?classDefinition)  ((GROUP_CONCAT( ?pastPath; SEPARATOR='\t')) AS ?priorPath)
        WHERE {
           ?arg1 a ?class . FILTER( ?class in (rdfs:Class, owl:Class ))
            ?arg1 rdfs:label ?classLabel .
            ?objectProperty a owl:ObjectProperty .
            {
                        ?objectProperty rdfs:domain ?arg1 .
                        ?objectProperty rdfs:label ?objectPropertyLabel .
                        ?objectProperty rdfs:range ?nextClass .
                        ?nextClass rdfs:label ?nextClassLabel .   BIND(?objectProperty as ?property)
            }UNION{
                        ?objectProperty  owl:inverseOf ?inverseObjectProperty .
                        ?objectProperty rdfs:domain  ?nextClass.
                        ?inverseObjectProperty rdfs:label ?objectPropertyLabel .
                        ?objectProperty rdfs:range ?arg1 .
                        ?nextClass rdfs:label ?nextClassLabel .    BIND(?inverseObjectProperty as ?property)
            }UNION{
                        ?inverseObjectProperty  owl:inverseOf ?objectProperty .
                        ?objectProperty rdfs:domain  ?nextClass.
                        ?inverseObjectProperty rdfs:label ?objectPropertyLabel .
                        ?objectProperty rdfs:range  ?arg1 .
                        ?nextClass rdfs:label ?nextClassLabel .    BIND(?inverseObjectProperty as ?property)
            }
#Stop from going too deep
            BIND(?arg2 -1 as ?span) FILTER(?span>0). 
            ?nextClass a ?nextClassClass. FILTER( ?nextClassClass in (rdfs:Class, owl:Class ))
#,  odata4sparql:Operation))  .
#Do not process an already processed arc (objectProperty)         
            BIND(CONCAT(?arg4,"\t",?objectPropertyLabel) as ?forwardPath) FILTER( !CONTAINS(?arg4, ?objectPropertyLabel ))  
            (?nextClass ?span ?arg3 ?forwardPath ) :classDefinition (?nextClassDefinition  ?nextPath).           
#Do not include if arc (objectProperty) appears already   
            FILTER(  !CONTAINS(?nextPath, ?objectPropertyLabel )) BIND(CONCAT( ?objectPropertyLabel, IF(?nextPath="","",CONCAT("\t",?nextPath))) as ?pastPath)
                                    (?nextClass ?arg3) :classProperties ?nextClassProperties .
            BIND (CONCAT("<u>",?objectPropertyLabel , "</u> <b>", ?nextClassLabel, "</b>",  ?nextClassProperties, IF ((?nextClassDefinition!=""), CONCAT("<br/><blockquote>that ",  ?nextClassDefinition, "</blockquote>"), "")  ) as ?definition)
        }
        GROUP BY ?arg1 ?arg2
    } .
}

modelDefintion

SELECT ?modelDefinition
WHERE {
    {
        SELECT ?arg1 ?arg2 ?arg3 ((CONCAT("<b>", ?classLabel, "</b>", ?nextClassProperties, "<blockquote>that ", ?classDefinition, "</blockquote>")) AS ?modelDefinition)
        WHERE {
           # BIND (model:Order AS ?arg1) .  BIND (4 AS ?arg2) . BIND (false AS ?arg3) .
            ( ?arg1 ?arg2 ?arg3 "") :classDefinition (?classDefinition "") .
            ( ?arg1 ?arg3 ) :classProperties ?nextClassProperties .
            ?arg1 rdfs:label ?classLabel .
        }
    } .
}

Usage

The following will return the HTML description of the model, starting with model:Customer, stopping at a depth of 4 in any direction, and not including the datatypeproperty definition.

select ?modelDefinition
{
   (model:Customer 4 false) :modelDefinition ?modelDefinition .
}

Let’s face it, RDF Graph datastores have not become the go-to database for application development like MySQL, MongoDB, and others have. Why? It is not that they cannot scale to handle the volume and velocity of data, nor the variety of structured and unstructured types.

Perhaps it is the lack of application development frameworks integrated with RDF. After all any application needs to not only store and query the data but provide users with the means to interact with that data, whether it be data entry forms, charts, graphs, visualizations and more.

However application development frameworks target popular back-ends accessible via JDBC and, now we are in the 21century, OData. RDF and SPARQL are not on their radar … that is unless we wrap RDF with OData so that the world of these application development environments is opened up to RDF Graph datastores.

OData2SPARQL provides that Janus-inflexion point, hiding the nuances of the RDF Graph behind an OData service which can then be consumed by powerful development environments such as OpenUI5 and WebIDE.

This article shows how an RDF Graph CRUD application can be rapidly developed, yet without losing the flexibility that HTML5/JavaScript offers, from which it can be concluded that there is no reason preventing the use of RDF Graphs as the backend for production-capable applications.

A video of this demo can be found here: https://youtu.be/QFhcsS8Bx-U

Figure 1: OData2SPARQL: the Janus-Point between RDF data stores and application development

Rapid Application Development Environments

There are a great number of superb application development frameworks that allow one to create cross platform (desktop, web, iOS, and Android), rich (large selection of components such as grids, charts, forms etc) applications. Most of these are based on the MVC or MVVM model both of which require a systematic and complete (CRUD) access to the back-end data via a RESTful API. Now that OData has been adopted by OASIS, the number of companies offering explicit support for OData is increasing, ranging from Microsoft, IBM, and SAP. Similarly there are a number of frameworks, one of which is SAPUI5 which has an open source version OpenUI5.

OpenUI5

OpenUI5 is an open source JavaScript UI library, maintained by SAP and available under the Apache 2.0 license. OpenUI5 lets you build enterprise-ready web applications, responsive to all devices, running on almost any browser of your choice. It’s based on JavaScript, using JQuery as its foundation, and follows web standards. It eases your development with a client-side HTML5 rendering library including a rich set of controls, and supports data binding to different models (JSON, XML and OData).

With its extensive support for OData, combining OpenUI5 with OData2SPARQL releases the potential of RDF Graph datasources for web application development.

Web IDE

SAP Web IDE is a powerful, extensible, web-based integrated development tool that simplifies end-to-end application development. Since it is built around using OData datasources as its provider, then WebIDE can be used as a semantic application IDE when the RDF Graph data is published via OData2SPARQL.

WebIDE runs either as a cloud based service supplied free by SAP, or can be downloaded as an Eclipse ORION application. Since the development is probably against a local OData endpoint, then the latter is more convenient.

RDF Graph application in 5 steps:

  1. Deploy OData2SPARQL endpoint for chosen RDF Graph store

The Odata2SPARQL war is available here, together with instructions for configuring the endpoints: odata2sparql.v2

The endpoint is this example is against an RDF-ized version of the ubiquitous Northwind databases. This RDF graph version can be downloaded here: Northwind

  1. Install WebIDE

Instructions for installing the Web IDE Personal edition can be found here: SAP Web IDE Personal Edition

  1. Add OData service definition

Once installed an OData service definition file (for example NorthwindRDF) can be added to the SAPWebIDE\config_master\service.destinations\destinations folder, as follows

Description=Northwind

Type=HTTP

TrustAll=true

Authentication=NoAuthentication

Name=NorthwindRDF

ProxyType=Internet

URL=http://localhost:8080

WebIDEUsage=odata_gen

WebIDESystem=NorthwindRDF

WebIDEEnabled=true

  1. Create new application from template

An application can be built using a template which is accessed via File/New/Project from Template. In this example the “CRUD Master-Detail Application” was selected.

The template wizard needs a Data Connection: choose Service URL, select the data service (NorthwindRDF) and enter the path of the particular endpoint (/odata2sparql/2.0/NW/).

Figure 2: WEB IDE Data Connection definition

At this stage the template wizard allows you to browse the endpoint’s entityTypes and properties, or classes and properties in RDF graph-speak.

Since the Web IDE and OpenUI5 is truly model driven, the IDE creates a master-detail given the entities that you define in the next screen.

The template wizard will ask you for the ‘object’ entityType which in this example is Category. Additionally you should enter the ‘line item’ but in this case there is only one navigation property (aka objectproperty in RDF graph-speak) which is the products that belong to the category.

Note that this template allows other fields to be defined, so titls and productUnitprice were selected.

Figure 3: WEB IDE Template Customization

  1. Run the application

The application is now complete and can be launched from the IDE. Right-click the application and select Run/Run as/As web application:

Figure 4: Application

Even this templated application is not limited to browsing data: it allows full editing of the categories. Note that even the labels are derived from the OData endpoint, but of course they can be changed by editing the application.

Figure 5: Edit Category

Alternatively a new category can be added:

Figure 6: Create New Category

Next Steps

That’s it: a fully functional RDF Graph web application built without a single line of code being written. However we can chose to use this just as a starting point:

  1. Publish data views, the SPARQL equivalent of a SQL view on the data, to the OData2SPARQL endpoint.
    • These are particularly useful when publishing reports in OpenUI5, Excel PowerQuery, Spotfire, etc.
  2. Modify the model that is published via the OData2SPARQL endpoint
    • The model that is published is extracted from the schema defined in the endpoint configuration. Thus the schema can be changed to suit what one wants in the endpoint metamodel.
  3. Edit the templated application
    • The templated application is just a starting point for adaptation: the code that is created is nothing more than standard HTML5/JavaScript using the OpenUI5 libraries.
  4. Build the application from first-principles
    • A template is not always the best starting point, so an application can always be built from first-principles. However the OpenUI5 libraries make this much easier by, for example, making the complete metamodel of the OData2SPARQL endpoint available within the application simply by defining that endpoint as the datasource.

Integration Problem to be solved

  • Data in different databases, even with Linked Open data sources.
  • Misaligned models, different datasets have different meanings for classes and predicates that need to be aligned.
  • Misaligned names for the same concepts.
  • Replication is problematical.
  • Query definition and scope of querying difficult to define in advance.
  • Provence of data necessary.
  • Cannot depend on inferences being available in advance
  • Scalable architecture requires that all queries are stateless

Data Cathedrals versus Information Shopping Bazaars

Linked Open Data has been growing since 2007 from a few (12) interconnected datasets to 295 as of 2011, and it continues to grow. To quote “Linked Data is about using the Web to connect related data that wasn’t previously linked, or using the Web to lower the barriers to linking data currently linked using other methods.” (Linked Data, n.d.) 

Figure 1: Growth of the Linked Data ‘Cloud’

As impressive as the growth of interconnected datasets is, what is more important is the value of that interconnected data. A corollary of Metcalf’s law suggests that the benefit gained from integrated information grows geometrically[1] with the number of data communities that are integrated.

Many organizations have their own icebergs of information: operations, sales, marketing, personnel, legal, finance, research, maintenance, CRM, document vaults etc. (Lawrence, 2012) Over the years there have been various attempts to melt the boundaries between these icebergs including the creation of the mother-of-all databases that houses (or replicates) all information or the replacement of disparate applications with their own database with a mother-of-all application that eliminates the separate databases. Neither of these has really succeeded in unifying any or all data within an organization. (Lawrence, Data cathedrals versus information bazaars?, 2012). The result is a ‘Data Cathedral’ through which users have no way to navigate to find the information that will answer their questions.

Figure 2: Users have no way to navigate through the Enterprise’s Data Cathedral

Remediator at the heart of Linked Enterprise Data

Can we create an information shopping bazaar for users to answer their questions without committing heresy in the Data Cathedral?  Can we create the same information shopping bazaar as Linked Data within the Enterprise: Linked Enterprise Data (LED). That is the objective of Remediator.

First of all we must recognize that the enterprise will have many structured, aggregated, and unstructured data stores already in place:

Figure 3: Enterprise Structured, Aggregated, and Unstructured Data Icebergs

One of the keys to the ability of Linked Data to interlink 300+ datasets is that they are all are expressed as RDF. The enterprise does not have the luxury of replicating all existing data into RDF datasets. However that is not necessary (although still sometimes desirable) because there are adapters that can make any existing dataset look as if it contains RDF, and can be accessed via a SPARQLEndpoint. Examples are listed below

  1. D2RQ: (D2RQ: Accessing Relational Databases as Virtual RDF Graphs )
  2. Ultrawrap:(Research in Bioinformatics and Semantic Web/Ultrawrap)
  3. Ontop:(-ontop- is a platform to query databases as Virtual RDF Graphs using SPARQ)

Attaching these adapters to existing data-stores, or replicating existing data into a triple store, takes us one step further to the Linked Enterprise Data:

Figure 4: Enterprise Data Cloud, the first step to integration

Of course now that we have harmonized the data all as RDF accessible via a SPARQLEndpoint we can view this as an extension of the Linked Data cloud in which we provide enterprises users access to both enterprise and public data:

Figure 5: Enterprise Data Cloud and Linked Data cloud

We are now closer to the information shopping bazaar, since users would, given appropriate discovery and searching user interfaces, be able to navigate their own way through this data cloud.  However, despite the harmonization of the data into RDF, we still have not provided a means for users ask new questions:

What Company (and their fiscal structure) are we working with that have a Business Practise of type Maintenance for the target industry of Oil and Gas with a supporting technology based on Vendor-Application and this Application is not similar to any of our Application?

Such questions require pulling information from many different sources within an organization. Even with the Enterprise Data Cloud one has provided the capability to discover such answers. Would it not be better to allow a user to ask such a question, and let the Linked Enterprise Data determine from where it should pull partial answers which it can then aggregate into the complete answer to the question. It is like asking a team of people to answer a complex question, each contributing their own, and then assembling the overall answer rather than relying on a single guru.  Remediator has the role of that team, taking parts of the questions and asking that part of the question of the data-sources.

Figure 6: Remediator as the Common Entry Point to Linked Enterprise Data (LED)

Thus our question can become:

  1. What Business Practise of type Maintenance for the target industry of Oil and Gas?
  2. What Company are we working with?
  3. What Company have a Business Practise of type Maintenance?
  4. What Business Practise with a supporting technology based on Vendor- Application?
  5. What Company (and their fiscal structure)?
  6. What Vendor-Application and this Application is not similar to any of our Application?

This decomposition of a question into sub-questions relevant to each dataset is automated by Remediator:

Figure 7: Sub-Questions distributed to datasets for answers

Requirements for a Linked Enterprise Data Architecture

  • Keep it simple
  • Do not re-invent that which already exists.
  • Eliminate replication where possible.
  • Avoid the need for prior inferencing.
  • Efficient query performance.
  • Provide provenance of results.
  • Provide optional caching for further slicing and dicing of result-set.
  • Use Void only Void and nothing but Void to drive the query

[1]  If I have 10 database systems running my business that are entirely disconnected, then the benefits are 10 * K, where K is some constant. If I integrate these databases in pairs (operations + accounting, accounting + payroll, etc), then the benefits increase to 10 * K * 2. If I integrate in threes, (operations + accounting + maintenance, accounting + payroll + receiving, etc), then the benefits increase four-fold (a corollary of Metcalf’s law) to 10 * K * 4. For quad-wise integration my benefits would be 10 * K * 8 and so on. Now it might not be 8 fold but the point is there is a geometric, not linear, growth in benefits as I integrate all of my information across my organization.


OData2SPARQL is an OData proxy protocol convertor for any SPARQL/RDF triplestore. To compare SPARQL with OData is somewhat misleading. After all SPARQL has its roots as a very powerful query language for RDF data, but is not intended as a RESTful protocol. Similarly OData has its roots as an abstract interface to any type of datastore, not as a specification of that datastore. Some have said “OData is the equivalent of ODBC for the Web”.
The data management strengths of SPARQL/RDF can be combined with the application development strengths of OData with a protocol proxy: OData2SPARQL, a Janus-point between the application development world and the semantic information world.

Figure 1: OData2SPARQL Proxy between Semantic data and Application consumers

What is OData?

OData is a standardized protocol for creating and consuming data APIs. OData builds on core protocols like HTTP and commonly accepted methodologies like REST. The result is a uniform way to expose full-featured data APIs (Odata.org).  Version 4.0 has been standardized at OASIS, and was released in March 2014.

OData RESTful APIs are easy to consume. The OData metadata, a machine-readable description of the data model of the APIs, enables the creation of powerful generic client proxies and tools. Some have said “OData is the equivalent of ODBC for the Web” (OASIS Approves OData 4.0 Standards for an Open, Programmable Web, 2014). Thus a comprehensive ecosystem of applications, and development tools, has emerged a few of which are listed below:

  • LINQpad: LINQPad is a tool for building OData queries interactively.
  • OpenUI5is an open source JavaScript UI library, maintained by SAP and available under the Apache 2.0 license. OpenUI5 lets you build enterprise-ready web applications, responsive to all devices, running on almost any browser of your choice. It’s based on JavaScript, using JQuery as its foundation, and follows web standards. It eases your development with a client-side HTML5 rendering library including a rich set of controls, and supports data binding to different models including OData.
  • Power Query for Excel is a free Excel add-in that enhances the self-service Business Intelligence experience in Excel by simplifying data discovery, access and collaboration.
  • Tableau – an excellent client-side analytics tool – can now consume OData feeds
  • Teiid allows you to connect to and consume sources (odata services, relational data, web services, file, etc.) and deploy a single source that is available as an OData service out-of-the-box.
  • Telerik not only provides native support for the OData protocol in its products, but also offers several applications and services which expose their data using the OData protocol.
  • TIBCO Spotfire is a visual data discovery tool which can connect to OData feeds.
  • Sharepoint: Any data you’ve got on SharePoint as of version 2010 can be manipulated via the OData protocol, which makes the SharePoint developer API considerably simpler.
  • XOData is a generic web-based OData Service visualization & exploration tool that will assist in rapid design, prototype, verification, testing and documentation of OData Services. 

OData vs SPARQL/RDF

To compare SPARQL with OData is somewhat misleading. After all SPARQL has its roots as a very powerful query language for RDF data, and is not intended as a RESTful protocol. Similarly OData has its roots as an abstract interface to any type of datastore, not as a specification of that datastore.  Recently JSON-LD has emerged (Manu Sporny, 2014), providing a method of transporting Linked Data using JSON. However JSON-LD focusses on the serialization of linked data (RDF) as JSON rather than defining the protocol for a RESTful CRUD interface. Thus it is largely an alternative to, say, Turtle or RDF/XML serialization format.

OData and SPARQL/RDF: Contradictory or Complimentary?
  OData SPARQL/RDF
Strengths ·   Schema discovery

·   OData provides a data source neutral web service interface which means application components can be developed independently of the back end datasource.

·   Supports CRUD

·   Not limited to any particular physical data storage

·   Client tooling support

·   Easy to use from JavaScript

·   Growing set of OData productivity tools such as Excel, SharePoint, Tableau and BusinessObjects.

·   Growing set of  OData frameworks such as SAPUI5, OpenUI5, and KendoUI

·   Growing set of independent development tools such as LINQPad, and XOdata

·   Based on open (OASIS) standards after being initiated by Microsoft

·   Strong commercial support from Microsoft, IBM, and SAP.

·   OData is not limited to traditional RDBMS applications. Vendors of real-time data such as OSI are publishing their data as an OData endpoint.

 

·   Extremely flexible schema that can change over time.

·   Vendor independent.

·   Portability of data between triple stores.

·   Federation over multiple, disparate, data-sources is inherent in the intent of RDF/SPARQL.

·   Increasingly standard format for publishing open data.

·   Linked Open Data expanding.

·   Identities globally defined.

·   Inferencing allows deduction of additional facts not originally asserted which can be queried via SPARQL.

·   Based on open (W3C) standards

Weaknesses ·   Was perceived as vendor (Microsoft) protocol

·   Built around the use of a static data-model (RDBMS, JPA, etc)

·   No concept of federation of data-sources

·   Identities defined with respect to the server.

·   Inferencing limited to sub-classes of objects

 

·   Application development frameworks that are aligned with RDF/SPARQL limited.

·   Difficult to access from de-facto standard BI tools such as Excel.

·   Difficult to report using popular reporting tools

 

Table 1: ODATA AND SPARQL/RDF: CONTRADICTORY OR COMPLIMENTARY?

OData2SPARQL: OData complementing RDF/SPARQL

The data management strengths of SPARQL/RDF can be combined with the application development strengths of OData with a protocol proxy: OData4SPARQL. OData4SPARQL is the Janus-point between the application development world and the semantic information world.

  • Brings together the strength of a ubiquitous RESTful interface standard (OData) with the flexibility, federation ability of RDF/SPARQL.
  • SPARQL/OData Interop proposed W3C interoperation proxy between OData and SPARQL (Kal Ahmed, 2013)
  • Opens up many popular user-interface development frameworks and tools such as OpneUI5.
  • Acts as a Janus-point between application development and data-sources.
  • User interface developers are not, and do not want to be, database developers. Therefore they want to use a standardized interface that abstracts away the database, even to the extent of what type of database: RDBMS, NoSQL, or RDF/SPARQL
  • By providing an OData4SPARQL server, it opens up any SPARQL data-source to the C#/LINQ development world.
  • Opens up many productivity tools such as Excel/PowerQuery, and SharePoint to be consumers of SPARQL data such as Dbpedia, Chembl, Chebi, BioPax and any of the Linked Open Data endpoints!
  • Microsoft has been joined by IBM and SAP using OData as their primary interface method which means there will many application developers familiar with OData as the means to communicate with a backend data source.

Consuming RDF via OData: OData2SPARQL

All of the following tools are demonstrated accessing an RDF triple store via the OData2SPARQL porotocol proxy.

Development Tools

XOData

A new online OData development is XOData from (PragmatiQa, n.d.). Unlike other OData tools, XOData renders very useful relationship diagrams. The Northwind RDFD model published via OData4SPARQL endpoint is shown below:

 Figure 2: Browsing the EDM model Published by ODaTa2SPARQL using XOData

XOData also allows the construct of queries as shown below:

Figure 3: Querying The OData2SPARQL Endpoints Using XODATA

LINQPad

(LINQPad, n.d.) is a free development tool for interactively querying databases using C#/LINQ. Thus it supports Object, SQL, EntityFramework, WCF Data Services, and, most importantly for OData4SPARQWL, OData services. Since LINQPad is centered on the Microsoft frameworks, WCF, WPF etc, this illustrates how the use of OData can bridge between the Java worlds of many semantic tools, and the Microsoft worlds of corporate applications such as SharePoint and Excel.

LINQPad shows the contents of the EDM model as a tree. One can then select an entity within that tree, and then create a LINQ or Lambda query. The results of executing that query are then presented below in a grid.

Figure 4:  Browsing and Querying the OData2SPARQL Endpoints Using LINQPad

LINQPad and XOData are good for testing out queries against any datasource. Therefore this also demonstrates using the DBpedia SPARQL endpoint as shown below:

 Figure 5: Browsing DBPedia SPARQLEndpoint Using LINQPad via OData2SPARQL

Browsing Data

One of the primary motivations for the creation of OData2SPARQL is to allow access to Linked Open Data and other SPARQLEndpoints from the ubiquitous enterprise and desktop tools such as SharePoint and Excel.

Excel/PowerQuery

“Power Query is a free add-in for Excel 2010 and up that provide users an easy way to discover, combine and refine data all within the familiar Excel interface.” (Introduction to Microsoft Power Query for Excel, 2014)

PowerQuery allows a user to build their personal data-mart from external data, such as that published by OData2SPARQL. The user can fetch data from the datasource, add filters to that data, navigate through that data to other entities, and so on with PowerQuery recording the steps taken along the way. Once the data-mart is created it can be used within Excel as a PivotTable or a simple list within a sheet. PowerQuery caches this data, but since the steps to create the data have been recorded, it can be refreshed automatically by allowing PowerQuery to follow the same processing steps. This feature resolves the issue of concurrency in which the data-sources are continuously being updated with new data yet one cannot afford to always query the source of the data. These features are illustrated below using the Northwind.rdf endpoint published via OData2SPARQL:

Figure 6: Browsing the OData4SPARQL Endpoint model with PowerQuery

Choosing an entity set allows one to start filtering and navigating through the data, as shown in the ‘Applied Steps’ frame on the right.

Note that the selected source is showing all values as ‘List’ since each value can have zero, one, or more values as is allowed for RDF DatatypeProperties.

Figure 7: Setting Up Initial Source of Data in PowerQuery

As we expand the data, such as the companyProperty, we see that the Applied Steps records the steps take so that they can be repeated.

Figure 8: Expanding Details in PowerQuery

The above example expanded a DatatypeProperty collection. Alternatively we may navigate through a navigation property such as Customer_orders, the orders that are related to the selected customer:

Figure 9: Navigating through related data with PowerQuery

 Once complete the data is imported into Excel:

Figure 10: Importing data from OData2SPARQL with PowerQuery

Unlike conventional importing of data into Excel, the personal data-mart that was created in the process of selecting the data is still available.

Application Development Frameworks

There are a great number of superb application development frameworks that allow one to create cross platform (desktop, web, iOS, and Android), rich (large selection of components such as grids, charts, forms etc) applications. Most of these are based on the MVC or MVVM model both of which require a systematic and complete (CRUD) access to the back-end data via a RESTful API. Now that OData has been adopted by OASIS, the number of companies offering explicit support for OData is increasing, ranging from Microsoft, IBM, and SAP to real-time database vendors such as OSI. Similarly there are a number of frameworks, one of which is SAPUI5 (UI Development Toolkit for HTML5 Developer Center , n.d.) which has an open source version OpenUI5 (OpenUI5, n.d.).

SAPUI5

SAPUI5 is an impressive framework which makes MVVC/MVVM application development easy via the Eclipse-based IDE. Given that OData4SPARQL publishes any SPARQLEndpoint as an OData endpoint, it means that this development environment is immediately available for an semantic application development.  The following illustrates a master-detail example against the Northwind.rdf SPARQL endpoint via OData4SPARQL.

Figure 11: SAPUI5 Application using OData4SPARQL endpoint

Yes we could have cheated and used the Northwind OData endpoint directly, but the Qnames of the Customer ID and Order Number reveals that the origin of the data is really RDF.

Handling Contradictions between OData and RDF/SPARQL

RDF is an extremely powerful way of expressing data, so a natural question to ask is what could be lost when that data is published via an OData service. The answer is very little! The following table lists the potential issues and their mitigation:

Issue Description Mitigation
OData 3NF versus RDF 1NF RDF inherently supports multiple values for a property, whereas OData up to V2 only supported scalar values Odata V3+ supports collections of property values, which are supported by OData4SPARQL proxy server
RDF Language tagging RDF supports language tagging of strings OData supports complex types, which are used to map a language tagged string to a complex type with separate language tag, and string value.
DatatypeProperties versus object-attributes OWL DatatypeProperties are concepts independent of the class, bound to a class by a domain, range or OWL restriction. Conversely OData treats such properties bound to the class. In OData4SPARQL The OWL DatatypeProperty is converted to an OData EntityType property for each of the DatatypeProperty domains.
Multiple inheritance Odata only supports single inheritance via the OData baseType declaration within an EntityType definition.  
Multiple domain properties An OWL ObjectProperty will be mapped to an OData Association.  An Association can be between only one FromRole and one ToRole and the Association must be unique within the namespace. OData Associations are created for each domain. The OData4SPARQL names these associations {Domain}_{ObjectProperty}, ensuring uniqueness.
Cardinality The capabilities of OData V3 allow all DatatypeProperties to be OData collections. However the ontology might have further restrictions on the cardinality. OData4SPARQL assumes cardinality will be managed by the backend triple store. However in future versions, if the cardinality is restricted to one or less, then the EntityType property can be declared as a scalar rather than a collection.

Table 2: Contradictions between OData and RDF/SPARQL

Availability of OData2SPARQL

Two versions of OData2SPARQL are freely available as listed below:

  1. inova8.odata2sparql.v2 : OData V2 based on the Olingo.V2 library supporting OData Version 2
  2. inova8.odata2sparql.v4 : OData V4 based on the Olingo.V4 library supporting OData V4 (in progress)

 


Answering complex queries with easy-to-use graphical interface

The objectives of lens2odata are to provide a simple method of OData query construction driven by the metadata provide by OData services

  • Provides metamodel-driven OData query construction
    • Eliminates any configuration required to expose any OData service to lens2odata
  • Allows searches to be saved and rerun
    • Allows ease of use by casual users
  • Allows queries to be pinned to ‘Lens’ dashboard panels
    • Provides simple-to-use dashboard
  • Searches can be parameterized
    • Allows for easy configuration of queries
  • Compatible with odata2sparql, a service that exposes any triple store as an OData service
    • Provides a Query-Answering-over-Linked-Data (QALD) interface to any linked data.

Concept of Operation

Lens2odata consists of 3 primary pages with which users interact:

  1. Query: is the page in which users can
    1. add new OData services,
    2. compose queries,
    3. save those queries for reuse, and
    4. pin the queries as result fragments on a Lens
  2. Search: is the page in which users can
    1. select an existing query, and execute that query to explore the results
    2. from where they can navigate to Lens pages for specific entities or collections of entities
  3. Lens: are the pages, composed by users, which display fragments of details, optionally grouped into tabs, about a specific entity or collections of entities. Fragments can either be forms or tables. Other fragment layouts are being added.

Navigation between these pages are shown in the diagram below. Specifically these navigation paths are:

  1. Toggle between Search and Query to explore how the results would appear to a casual user
  2. Navigate to a concept’s Lens from Query preview hyperlinks
  3. Navigate to a concept’s Lens from Search results’ hyperlinks

Figure 1: lens2odata Navigation

Quick Start

Login to lens2odata

  1. Navigate to the Url provided by your administrator for lens2odata, http://<server>/lens2odata
  2. Enter users name and password
  3. Since no service has been previously setup, you will be prompted to enter the service display name and Url of that service. Check to use default proxy if not a local service
  4. After ‘save’, as long as your service was validated, you will immediately enter the Query page with a new query initialized with the first collection found in the service
  5. Execute ‘Preview’ to populate the Results Preview with a few values from the collection:
  6. You are now ready to:
    1. View the query via Search
    2. Explore the results further via Lens
    3. Expand the query with more values and filters

Let’s explore Search first of all

  1. Click on at the top-left of the Query page to navigate to the search page:

Search is the page that general users will access. From this page they can select a predefined query, and execute that query to start their discovery journey.

  1. Click on to populate the results form:

 

This shows more details than the Query page because, in the absence of any specific definition about what details of the instances of collections should be displayed, search will display whatever it can find.

  1. Click on the Url “Categories(1)” to navigate to the Lens for that type of instance.

The Lens page is an information dashboard that can be constructed for any type of instance that is discovered. In this case, since no specific lens page has been setup for ‘Category’ types of instance, a default page has been used with a single tab.

  1. There is another Uri on this page “Products”. Navigating this link will take you to a similar lens page, but one for a collection of instances:

 

  1. This Lens for Products shows a list of instances of products. The first column is a Uri to the lens page for the individual product. Click one to navigate to its Lens:
  2. You are now discovering information by navigating through the data, having started by querying a collection. Next steps would be:
  • The original query was not particularly specific. Lens2odata allows that query to be further refined by specifying which attributes should be displayed, adding navigation properties to other entities, adding filters to the query to restrict results or even parameterizing the query so that a user can execute the saved query, modifying the results just by supplying parameter values.
  • The Lens dashboard page can be further refined by adding more fragments of information on a tab, or adding more tabs to the lens page to logically group the information.

 

Availability of Lens2OData

A versions of Lens2OData  is available on GitHub at com.inova8.lens2odata


SKOS, the Simple Knowledge Organization System, offers an easy to understand schema for vocabularies and taxonomies. However modeling precision is lost when skos:semanticRelation predicates are introduced.

Combining SKOS with RDFS/OWL allows both the precision of owl:ObjectProperty to be combined with the flexibility of SKOS. However clarity is then lost as the number of core concepts (aka owl:Class) grow.
Many models are not just documenting the ‘state’ of an entity. Instead they are often tracking the actions performed on entities by agents at locations. Thus aligning the core concepts to the Activity, Entity, Agent, and Location classes of the PROV ontology provides a generic upper-ontology within which to organize the model details.

Vehicle Manufacturing Example

This examples captures information about vehicle manufacturing. Following

  1. Manufacturers: the manufacturer of models of cars in various production lines sited at plants
  2. Models: the models that the manufacturer produces
  3. ProductionLines: the production lines set up to produce models of vehicles on behalf of a manufacturer
  4. Plants: the plants that house the production lines

In addition there are different ‘styles’ of manufacturing that occur for various models and various sites:

  1. Manufacturing: the use of a ProductionLine for a particular Model

SKOS Modeling

If we follow a pure SKOS model we proceed as follows by creating a VehicleManufacturingScheme  skos:ConceptScheme

s:VehicleManufacturingScheme

rdf:type skos:ConceptScheme

.

Then we create skos:topConceptOf Manufacturer, Model, Plant, and Production as follows:

 

s:Manufacturer

rdf:type owl:Class ;

rdfs:subClassOf skos:Concept ;

skos:topConceptOf s:VehicleManufacturingScheme

.

s:Model

rdf:type owl:Class ;

rdfs:subClassOf skos:Concept ;

skos:topConceptOf s:VehicleManufacturingScheme

.

These top-level concepts are being created of type owl:Class and a subClassOf skos:Concept.  This is the pattern recommended in (Bechhofer, et al.)

Finally we can create skos:broader concepts as follows:

s:Ford

rdf:type s:Manufacturer ;

skos:broader s:Manufacturer ;

skos:inScheme s:VehicleManufacturingScheme

.

s:Fusion

rdf:type s:Model ;

skos:broader s:Model ;

skos:inScheme s:VehicleManufacturingScheme

.

The resultant SKOS taxonomy of the VehicleManufacturingScheme  skos:ConceptScheme then appears as follows:

Figure 1: SKOS taxonomy

Why?

By starting with a pure SKOS model we provide access to the underling concepts in a more accessible style for the less proficient user, as illustrated by the SKOS Taxonomy above. Yet we have not sacrificed the ontological precision of owl:Classes.

Thus we can ask questions about all concepts:

SELECT * WHERE

{

       ?myConcepts rdfs:subClassOf+ skos:Concept .

}

Or we can get a list of anything broader than one of these concepts:

SELECT * WHERE

{

       ?myBroaderConcepts skos:broader s:Model .

}

SKOS+OWL Modeling

Although skos:semanticRelation allows one to link concepts together, this predicate is often too broad when trying to create an ontology that documents specific relations between specific types of concept.

In our VehicleManufacturingScheme we might want to know the following:

  1. isManufacturedBy: which manufacturer manufactures a particular model
  2. operatedBy: which manufacturer operates a particular production facility
  3. performedAt: which plant is the location of a production facility
  4. wasManufacturedAt: which production facility was used to manufacture a particular model

Figure 2: SKOS+OWL model of Relations

 

These predicates can be defined using RDFS as follows:

so:isManufacturerBy

 rdf:type owl:ObjectProperty ;

rdfs:domain s:Model ;

rdfs:range s:Manufacturer ;

rdfs:subPropertyOf skos:semanticRelation

.

so:operatedBy

rdf:type owl:ObjectProperty ;

rdfs:domain s:Production ;

rdfs:range s:Manufacturer ;

rdfs:subPropertyOf skos:semanticRelation

.

Note that the definition of Model, Manufacturer etc. as subClassOf skos:Concept allows us to precisely define the domain and range.

s:Fusion

so:isManufacturerBy s:Ford ;

so:wasManufacturedAt s:Halewood-SmallVehicle ;

.

s:Dagenham-Truck

so:operatedBy s:Ford ;

so:performedAt s:Dagenham ;

.Thus we have used the flexibility of SKOS with the greater modeling precision of RDFS/OWL.

Why?

By building upon the SKOS model, one can ask an expansive question such as what concepts are semantically related to, say, the concept s:Fusion with a simple query:

SELECT * WHERE

{

       s:Fusion  ?p ?y .

       ?p rdfs:subPropertyOf* skos:semanticRelation

}

Yet with the same model we can ask a specific question about a relationship of a specific instance:

SELECT * WHERE

{

       s:Camry so:isManufacturerBy   ?o .

}

SKOS+OWL+PROV Modeling

One of the attractions of SKOS is that a taxonomy can grow organically. One of the problems of SKOS is that a taxonomy can grow organically!

As the taxonomy grows it can be useful to add another layer of structure beyond a catalog of concepts. Many models are not just documenting the ‘state’ of an entity. Instead they are often tracking the actions performed on entities by agents at locations. Thus aligning the core concepts to the Activity, Entity, Agent, and Location classes of the PROV ontology (Lebo, et al.) provides a generic upper-ontology within which to organize the model details.

Figure 3: PROV model

Thus our VehicleManufacturingScheme has each core PROV concept:

  1. Manufacturers: the Agents who manufacture models, and operate plants
  2. Models: the Entities
  3. ProductionLines: the Activities that produce Models on behalf of Manufacturers.
  4. Plants: the Location at which Activities take place, and Agents and Entities are located.

Figure 4: PROV Model

 

s:Production

rdfs:subClassOf prov:Activity ;

.

s:Model

rdfs:subClassOf prov:Entity ;

.

s:Manufacturer

rdfs:subClassOf prov:Organization ;

.

s:Plant

rdfs:subClassOf prov:Location ;

.

Similarly we can cast our predicates into the same PROV model as follows:

so:isManufacturerBy

rdfs:subPropertyOf prov:wasAttributedTo ;

.

so:operatedBy

rdfs:subPropertyOf prov:wasAssociatedWith ;

.

so:performedAt

rdfs:subPropertyOf prov:atLocation ;

.

so:wasManufacturedAt

rdfs:subPropertyOf prov:wasGeneratedBy ;

.

Why?

The PROV model is closer to the requirements of most enterprise models, that are trying to ‘model the business’, than a simple E-R model. The latter concentrates on capturing the attributes of an entity that record the current state of that entity. Often those attributes focus on documenting the process by which the entity gained its current state:

  • The agent that created the entity
  • The activity used to create the entity
  • The location when things were performed
  • The data of the activity, etc

Superimposing the PROV model formalizes this model, and thus allows a structure within which a more casual user can navigate, rather than a sea of entities.

By building upon the PROV model, one can ask an expansive question such as what entities behave as Agents and in which entities are they involved:

SELECT * WHERE

{

?organization a ?Agent .

?Agent rdfs:subClassOf* prov:Agent .

?entity ?predicate ?organization

}

SKOS+OWL+PROV-Qualified Modeling

Within the structure of PROV, predicates define the relationships between Activities, Entities, Agents, and Locations. However it is sometimes necessary to qualify these relationships. For example, the so:wasManufacturedAt predicate defines that a s:Production facility was used to manufacture a s:Model. When? How was it used? Why?

To extend the model, PROV adds the concept of a qualified influence, which allows the relationship to be further defined.

Figure 5: Qualified PROV for some predicates

We do this first of all by creating sopq:Manufacturing:

sopq:Manufacturing

rdf:type owl:Class ;

rdfs:subClassOf skos:Concept ;

rdfs:subClassOf prov:Generation ;

skos:topConceptOf s:VehicleManufacturingScheme ;

.

Note that this is a rdfs:subClassOf prov:Generation, the reification of the predicate prov:wasGeneratedBy

We then add two predicates, one (sopq:wasManufacturedUsing) from the prov:Entity to the prov:Generation, and one (sopq:production) from the prov:Generation to the prov:Activity as follows:

sopq:wasManufacturedUsing

rdf:type owl:ObjectProperty ;

rdfs:domain s:Model ;

rdfs:range sopq:Manufacturing ;

rdfs:subPropertyOf skos:semanticRelation ;

rdfs:subPropertyOf prov:qualifiedGeneration ;

.

sopq:production

rdf:type owl:ObjectProperty ;

rdfs:subPropertyOf skos:semanticRelation ;

rdfs:subPropertyOf prov:activity ;

.

Finally we can create a Manufacturing qualified generation concept as follows:

sopq:L-450H_at_Swindon-Hybrid

rdf:type sopq:Manufacturing ;

sopq:production s:Swindon-Hybrid ;

 skos:broader sopq:Manufacturing ;

.

s:L-450H

sopq:wasManufacturedUsing sopq:L-450H_at_Swindon-Hybrid ;

.

In the figure below we can see that these qualified actions simply extend the SKPOS taxonomy:

Figure 6: Taxonomy extended with Qualified Actions

Why?

Using qualifiedActions provides a systematic, rather than ad-hoc, way to provide more precision to a model.

Remaining Issues

  1. The PROV structure does not manifest itself within the taxonomy. Should Activity, Entity, Agent, and Location therefore be ConceptSchemes?

Model

The model files used in this example are included here: model2rdf

  • skos.ttl
  • skos+owl.ttl
  • skos+owl+prov.ttl
  • skos+owl+provqualified.ttl

References

Bechhofer, Sean and Miles, Alistair. Using OWL and SKOS. [Online] W3C. https://www.w3.org/2006/07/SWD/SKOS/skos-and-owl/master.html.

Lebo, Timothy, Satya, Sahoo and McGuinness, Deborah. PROV-O: The PROV Ontology. W3C. [Online] https://www.w3.org/TR/prov-o/.Reaming Issues


A large community exists “that sees Linked Data, let alone the full Semantic Web, as an unnecessarily complicated technology”, Phil Archer. However early adopters are always known for zealotry rather than pragmatism: mainframe to mini, mini-to-pc, pc-to-web and many more are examples where the zealots often threw the baby out with the bath-water.

Rather than abandoning the semantic web perhaps we should take a pragmatic view and identify its strengths, together with some soul-searching to be honest about its weaknesses:

SPARQL is unlikely ever to be end-user friendly:

However one can say exactly the same of SQL (which, believe it or not, used to be called User-Friendly-Interface, or UFI for short). SQL is now hidden behind much more developer-friendly facades so that they can deliver user-friendly experiences such as Hibernate/JPA, C#+LINQ, Odata, and many more. Even so SQL remains the language of choice for manipulating and querying RDBMS.

So we still need SPARQL for manipulating semantic data, but we need developer- and user-friendly facades that make the semantic information more accessible. Some initiatives are JPA for RDF, LINQtoRDF, Odata-SPARQL but the effort is fragmented with few standard initiatives.

RDF/OWL is unlikely to supplant Entity-Relational, JSON-Object databases:

However we have all learnt the lesson that the data model is often at the core of any application. Furthermore any changes to that data model can be costly, especially in the later stages of the development lifecycle. Who has not created an object-attribute data model deployed in an RDBMS to retain design flexibility without the need to change the underlying database schema? Are we not reinventing the semantic data model?

So the principles and flexibility of semantic (RDF, RDFS, and OWL) data modeling of data could be adopted as a data modeling paradigm representing the evolution that started with Entity-Relationship (ER), went through Object-Role Modeling (ORM or NIAM), to Semantic Data Modeling (SDM). Used in combination with dynamic REST/JSON, such as Odata, one could truly respond to the Dynamic Business Application Imperative (Forrester).

Triple-stores are unlikely to supplant Big-Data:

However, of volume, velocity, and variety, many big data solutions are weaker when handling variety, especially when the variety of data is changing over time as it does with evolving user needs. Most of the time the variety at the data source will be solved by Extract-Transform-Load (ETL) into a big-data store. Is this any different than data warehousing which has matured over the last 20 years, except for the use of different data storage technology? Like any goods in transit, data gets damaged and contaminated when it is moved. It is far better to use the data in-situ if at all possible, but this has been the unachievable Holy Grail of data integration for many years.

So the normalization of any and all data into triples might not be the best way to store data but can be the way to mediate variable information from a variety of data-sources: Ontology Based Data Access (OBDA). The ontology acts a semantic layer between the user and the data. The semantics of the ontology are used to enrich the information on the sources and/or cope with incomplete information in them. 

In summary …

Semantic technologies will see more success if it pursues the more pragmatic approach of the solving those problems that are not satisfactorily solved elsewhere such as querying and reasoning, dynamic data models, and data source mediation; instead of resolving the already solved.  

References

Phil Archer http://semanticweb.com/tag/phil-archer

LINQtoRDF: https://code.google.com/p/linqtordf/

JPA/RDF: https://github.com/mhgrove/Empire

ODATA/SPARQL: https://github.com/brightstardb/odata-sparql

Dynamic Business Imperative: http://www.forrester.com/The+Dynamic+Business+Applications+Imperative/fulltext/-/E-RES41397?objectid=RES41397

Ontology Based Data Access: http://obda.inf.unibz.it/