IntelligentGraph embeds calculation and analysis capability within RDF knowledge graphs, rather than forcing analysis to be undertaken by exporting query results to external analysis applications such as Excel.

IntelligentGraph achieves this by embedding scripts into the RDF knowledge graph which are evaluated when queried with SPARQL.

Scripts, written in a variety of languages such as JavaScript, Java, and Python, access the underlying graph using simple pathPatternQL navigation.

IntelligentGraph=KnowledgeGraph+Embedded Analysis.pdf

Why IntelligentGraph?

At present calculations over stored data are either delivered by custom code or exporting the stored data to spreadsheets. The data behind these tools is inevitably tabular. In fact, so dominant are spreadsheets for analysis that the spreadsheet itself becomes the ‘database’ with the inherent difficulties of syncing that data with the source system of record.

The real-world is better represented as a network or graph of interconnected things Therefore a knowledge graph is a far better storage organization than tables or objects. However, there is still the need to perform ad hoc numerical analysis over this data. 

RDF DataCube can help organize data for analysis, but still the analysis has to be performed externally. Confronted with this dilemma, knowledge graph data would typically be exported in tabular form to a datamart or directly into, yet again, a spreadsheet where the analysis could be performed.

IntelligentGraph turns this approach on its head by embedding the calculations as scripts within the knowledge graph. These scripts are evaluated on query, and utilise the data in situ: no concurrency issues.. This allows the calculations to have knowledge of its neighbouring nodes and edges, just like Excel cells can access other cells in the spreadsheet. Access to other nodes within the graph uses pathPatternQL navigation.

Example Data and Analysis

An Industrial Internet of Things (IIoT) application is connecting all the measurements about a process plant, such as an oil refinery, into a knowledge graph that relates the measurements to the material flows through the process equipment.

Although there is an abundance of measurements and laboratory analyses available, the values required for operating and performance monitoring are not (and mostly cannot) be directly measured. 

For example:

  • Stream Mass-Flow: direct mass flow measurements are rare. Instead, a volume flow measurement is used in conjunction with a measured material density to calculate the mass-flow
  • Unit Mass Flow Throughput: this is calculated by summing either all feed stream mass flows or product stream mass flows.
  • Unit Mass Balance: this is calculated by differencing the feed from product mass flows
  • Product Stream Yield: this is the ratio of a stream’s mass-flow to the unit to which the stream is connected throughput.

Figure 1: Typical Process Flow Sheet

These are simple examples; however, they show the reliance on the knowledge graph structure to perform the analysis.

Solving data analysis, the traditional way

Data is in the database, analysis is done by the analysis engine (aka Excel), right?

Figure 2: Data analysis the traditional way

In this scenario, the local power user sets up a query to export data from the database and converts to a format that can be imported into Excel. Ever increasingly complex formulae are then written to wrangle the data into the results that are required.

Why is the spreadsheet approach risky?

  • The analysis is now separated from the data. Data changes will not be reflected in the analysis. Worse still, changes to the analysis might not be propagated to all the spawned copies of the spreadsheet.
  • The data is separated from the analysis. The analysis results are rarely re-imported into the data store where data vs analysis could be performed. Instead, even more data is extracted into the spreadsheet.
  • The difficultly of managing the separation of data from analysis becomes so great that in many cases the database is dispensed with entirely and the spreadsheet becomes the de-facto database.

Solving data analysis with an IntelligentGraph

The beauty of Excel is that a cell can contain either a value or a formula that can reference other cell’s values. Why not do the same with a graph: a node can have edges that terminate with a literal value, or a formula that can reference other node’s values.

This is illustrated in the diagram below:

  • The :massFlow property is not measured directly, so a formula is used for its value instead. This formula references $this, the node to which the calculation is attached, and uses the method getFact() to retrieve related values. The argument of getFact() is a pathPatternQL expression.
  • The :totalProduction property is not measured directly, so a formula is used instead which iterates over all of the ‘stream out’ nodes, retrieving the value of the :massFlow for each stream. The :massFlow value is, of course, in turn a calculation.

Figure 3: Intelligent Graph Data Analysis

Why is the IntelligentGraph approach so advantageous?

  • There is no separation between data and analysis, removing the risk of stale and inaccurate data and calculations.
  • The calculations embedded within the graph can take advantage of the knowledge that is contained within that graph. This makes the calculations far simpler than those that need to be embedded in spreadsheets.
  • The calculations will automatically utilize on the fly the changing knowledge.

How does IntelligentGraph Work?

Analysis is embedded in an IntelligentGraph simply by adding script literals as object values of subjects with datatype of the scripting language (groovy, javascript, python etc).

The IntelligentGraph engine is provided as an RDF4J Stackable SAIL. This means that its capabilities can be combined with any other RDF4J capabilities. The choice of RDF storage remains the same as for any other RDF4J compliant framework.

Modeling with Scripts

Typically, a graph node will have associated attributes with values, such as a stream with volumeFlow and density values:

Stream Attributes:

:Stream_1
   :density ".36"^^xsd:float ;
   :volumeFlow "40"^^xsd:float .

Of course, in the ‘real-world’ these measured values are sourced from outside the KnowledgeGraph and change over time. IntelligentGraph can deal with both of these requirements.

The ‘model’ of the streams can be captured as edges associated with the Unit:

:Unit_1
   :hasProductStream :Stream_1 ;
   :hasProductStream :Stream_2 ;
.

Calculate Mass Flow

The calculations are declared as literals[1] with a datatype whose local name corresponds to one of the installed script languages:

:Stream_1 :massFlow     
    "_this.getFact(':density')*    
     _this.getFact(':volumeFlow');"^^:groovy .

Calculate Total Production

A typical performance metric is to understand the total production from a unit, which is not of course directly measured. However, it can be easily expressed using existing calculated values:

:Unit_1   :totalProduction
    "var totalProduction =0.0;
    for(Resource stream : _this.getFacts(':hasProductStream'))
    {
        totalProduction += stream.getFact(':massFlow');
    }
    return totalProduction; "^^:groovy .

Instead of returning the object literal value (aka the script), the IntelligentGraph will return the result value for the script.

We can write this script even more succinctly using the expressive power PathQL:

:Unit_1  :totalProduction  
    "return _this.getFacts(':hasProductStream/:massFlow').total(); "^^:groovy

However, IntelligentGraph allows us to build upon existing calculations to simply express what would normally be difficult-to-calculate metrics, such as product yield or mass balance.

Calculate Mass Yield

Any production unit has different valued products. So a key metric is the yield of individual streams. This can easily be calculated as follows, using values that are themselves calculations.

var result= _this.getFact(":massFlow").floatValue()/ 
_this.getFact("^:hasStream/:totalProduction").floatValue();  
result;

Calculate Mass Balance

Measurements are not perfect, nor is the operation of a unit. One of the first indicators of a problem is when the mass flow in does not match the mass flow out. This can be expressed as another calculated property of a Unit:

return  _this.getFacts(":hasFeedStream/massFlow").total() -_this.getFacts(":totalProduction").total();

Querying Results

Access to the calculated values is via standard-SPARQL. However instead of returning the script literal, IntelligentGraph will invoke the script engine, 

Thus to access the :massFlow calculated value, the SPARQL is simply:

select ?massFlow
{
 :Stream_1 :massFlow ?massFlow 
}

If the script literal is required then the object variable can be postfixed with _SCRIPT:

select ?massFlow ?massFlow_SCRIPT 
{
 :Stream_1 :massFlow ?massFlow, ?massFlow_SCRIPT 
}

If a full trace of the calculation, including tracing calls to other scripts, is required then the object variable can be postfixed with _TRACE:

select ?massFlow ?massFlow_TRACE 
{
 :Stream_1 :massFlow ?massFlow, ?massFlow_TRACE 
}

How to Write IntelligentGraph Scripts?

Script Languages

Any Java 9 supported language can be used simply by making the corresponding language JAR available. 

By default, JavaScript, Groovy, Python JAR are installed. The complete list of compliant languages is as follows

AWK, BeanShell, ejs, FreeMarker, Groovy, Jaskell, Java, JavaScript, JavaScript (Web Browser), Jelly, JEP, Jexl, jst, JudoScript, JUEL, OGNL, Pnuts, Python, Ruby, Scheme, Sleep, Tcl, Velocity, XPath, XSLT, JavaFX Script, ABCL, AppleScript, Bex script, OCaml Scripting Project, PHP, Python, Smalltalk, CajuScript, MathEclipse

Script Context Variables

In addition, each script has access to the following predefined variables that allow the script to access the context within which it is being run.

  • _this, a Thing corresponding to the subject of the triples for which the script is the object.  Since this available, helper functions are provided to navigate edges to or from this ‘thing’ below:
  • _property, a Thing corresponding to the predicate or property of the triples for which the script is the object.
  • _customQueryOptions, a HashMap<String, Value> of name/value pairs corresponding to the pairs of additional arguments to the SPARQL extension function. These are useful for passing application-specific parameters.
  • _builder, a RDF4J graph builder object allowing a graph to be constructed (and manipulated) within the script. A graph cannot be returned from a SPARQL function. However the IRI of the graph can be returned, and any graph created by a script will be persisted.
  • _tripleSource, the RDF4J TripleSource to which the subject, predicate, triple belongs.

Fact and Path Functions

The spreadsheets’ secret sauce is the ability of a cell formula to access values of other cells, either individually or as a set. The IntelligentGraph provides this functionality with several methods associated with Thing, which are applicable to the _this Thing initiated for each script with the subject Thing.

Thing.getFact(String pathPattern) returns Value

Returns the value of node referenced by the pathPattern, for example “:volumeFlow” returns the object value of the :volumeFlow edge relative to _this node. The pathPattern allows for more complex path navigation.

Thing.getFacts(String pathPattern) returns Values

Returns the values of nodes referenced by the pathPattern, for example “:hasProductStream” returns an iterator for all object values of the :hasProductStream edge relative to _this node. The pathPattern allows for more complex path navigation.

Thing.getPath(String pathQL) returns Path

Returns the first (shortest)  path referenced by the pathQL, for example “:parent{1..5}” returns the path to the first ancestor of _this node. The pathQL allows for more complex path navigation.

Thing.getPaths(String pathQL) returns PathResults

Returns all paths referenced by the pathQL, for example :parent{1..5}” returns an iterator, starting with the shortest path,  for all paths to the ancestors of _this node. The pathQL allows for more complex path navigation.

Path Patterns

Spreadsheets are not limited to accessing just adjacent cells; neither is the IntelligentGraph. PathPatterns provide a powerful way of navigating from one Thing node to another. PathPatterns are inspired by SPARQL and propertyPaths, but a richer, more expressive, PathQL was required for the IntelligentGraph.

Examples

Examples of PathQL patterns are as follows:

_this.getFact(“:hasParent”)

will return the first parent of $this.

_this.getFact(“^:hasParent”)

will return the first child of $this.

_this.getFacts(“:hasParent/:hasParent”)

will return the grandparents of $this.

_this.getFacts(“:hasParent/^:hasParent”)

will return the siblings of $this.

_this.getFacts(“:hasParent[:gender :female]/:hasParent”)

will return the maternal grandparents of $this

_this.getFacts(“:hasParent[:gender :female]/:hasParent[:gender :male]”)

will return the maternal grandfather of $this.

_this.getFacts(“:hasParent[:gender [ rdfs:label “female”]]”)

will return the mother of $this but using the label instead of the IRI.

_this.getFacts(“:hasParent[eq :Peter]/:hasParent[:gender :male]”)

will return the grandfather of $this, who is the parent of :Peter.

_this.getFacts(“:hasParent[ne :Peter]/:hasParent[:gender :male]”)

will return grandfathers of $this, who are not the parent of :Peter.

The following diagram visualizes a path through a genealogical graph, from _this to the find the parents of a maternal grandfather born in Maidstone:

_this.getFacts(“/:parent[:gender :female]/:parent[:gender :male, :birthplace [rdfs:label ‘Maidstone’]]/:parent”)

 

Figure 4: PathPatternQL Example

How Is Performance?

IntelligentGraph takes the following actions to improve performance:

  1. All intermediate calculation results are cached, keyed by the subjectNode, predicate, and customQueryOptions.
  2. Cache can be cleared using the SPARQL function ClearCache.
  3. The SPARQL function ObjectValue takes as its argument the subject, predicate and objectValue. If the objectValue supplied is not of script datatype, the function will immediately return the objectValue.
  4. Circular functions, in which A calls B calls A, are detected and rejected.

Can I Debug?

Since IntelligentGraph combines calculations with the knowledge graph, it is inevitable that any evaluation will involve calls to values of other nodes which are in turn calculations. For this reason,  IntelligenrtGraph supports tracing and debugging:

 

Figure 4: Tracing Calculation

How Do I Add Intelligence to my RDFGraph?

Download

The project is located in Github, from where the intelligentgraph.jar can be downloaded from there:

The intelligentgraph.jar does not include all of the scripting etc language dependencies, so to use it you would have to be certain all dependencies are already available.

Install

IntelligentGraphwill work only with RDF4J version 3.3.0 and above. 

Copy intelligentgraph.jar

To  /usr/local/tomcat/webapps/rdf4j-server/WEB-INF/lib/intelligentgraph.jar

The RDF4J server will need to be restarted for it to recognize this new JAR and initiate the scripting engine.

[1] In this case the script uses Groovy, but any Java 9 compliant scripting language can be used, such as JavaScript, Python, Ruby, and many more.

Providing answers to users’ analysis, searching, visualizing or other questions of their own data

Creating an overall solution that presents data in a useful way can be challenging, but OData2SPARQL and Lens2OData solves this.

  • RDF-Graph: Data + Model = Information, allowing us to combine you raw data with an adaptable model to create meaningful information
  • OData2SPARQL: Information + Rules = Knowledge, provides you the ability to access that information combine with additional rules (SPIN and SHACL) to deliver useful knowledge that can be consumed by applications and BI tools.
  • Lens2OData: Knowledge + Action = Results, allows users to easily navigate, search, explore, and visualize this knowledge in such a way that it is easy to take action and produce results.

To see this all in action we have prepared a demonstrator that can be downloaded, and the following videos which illustrate the capabilities of this demonstrator.

  • Explore provenance of data sources which is retained by RDF- graph and OData2SPARQL rather than losing that provenance with typical ETL processing.

  • Explore the Transport For London train lines, stations and zones, illustrating how easy it is to transform any dataset to RDF_Graph and immediately get the benefits of OData access, and Lens UI/UX

To download and run this demonstrator go toDocker hub here

SQL2RDFincremental materialization uses a stream of source data changes (DML) and transforms them to the corresponding changes (inserts, deletes) in the triple store ensuring concurrency between RDBMS and triplestore. SQL2RDFincremental materialization is based on the same R2RML mapping used for virtualization (Ontop), and (Capsenta), and materialization (In4mium, and R2RML Parser) thus does not incur an additional configuration maintenance problem.

Mapping an existing RDBMS to RDF has become an increasingly popular way of accessing data when the system-of-record is not, or cannot be, in a RDF database. Despite its attractiveness, virtualization is not always possible for various reasons such as performance, the need for full SPARQL 1.1 support, the need to reason over the virtualized as well as other materialized data, etc. However materialization of an existing RDBMS to RDF is also not an ideal alternative. Reasons include the inevitable lack of concurrency between the system-of-record and the RDF.

Thus incremental materialization provides an alternative between virtualization and materialization, offering:

  • Easy ETL of source RDB as no additional configuration required other than the same R2RML required for virtualization or bulk materialization.
  • Improved concurrency of data compared with materialization.
  • Significantly less computational overhead than a full materialization of the source data.
  • Improved query performance compared with virtualization especially when reasoning is required.
  • Compatibility with R2RML thus reducing configuration maintenance.
  • Supports insert, update, and delete changes on the source RDBMS.
  • Source transactions can be batched, and the changes to the triplestore are part of a single transaction that can be rolled back in the event of a problem.
  • Supports change logging so that committed changes can be rolled back.

Contact  inova8 if you would like to try SQL2RDF

Integration Problem to be solved

  • Data in different databases, even with Linked Open data sources.
  • Misaligned models, different datasets have different meanings for classes and predicates that need to be aligned.
  • Misaligned names for the same concepts.
  • Replication is problematical.
  • Query definition and scope of querying difficult to define in advance.
  • Provence of data necessary.
  • Cannot depend on inferences being available in advance
  • Scalable architecture requires that all queries are stateless

Data Cathedrals versus Information Shopping Bazaars

Linked Open Data has been growing since 2007 from a few (12) interconnected datasets to 295 as of 2011, and it continues to grow. To quote “Linked Data is about using the Web to connect related data that wasn’t previously linked, or using the Web to lower the barriers to linking data currently linked using other methods.” (Linked Data, n.d.) 

Figure 1: Growth of the Linked Data ‘Cloud’

As impressive as the growth of interconnected datasets is, what is more important is the value of that interconnected data. A corollary of Metcalf’s law suggests that the benefit gained from integrated information grows geometrically[1] with the number of data communities that are integrated.

Many organizations have their own icebergs of information: operations, sales, marketing, personnel, legal, finance, research, maintenance, CRM, document vaults etc. (Lawrence, 2012) Over the years there have been various attempts to melt the boundaries between these icebergs including the creation of the mother-of-all databases that houses (or replicates) all information or the replacement of disparate applications with their own database with a mother-of-all application that eliminates the separate databases. Neither of these has really succeeded in unifying any or all data within an organization. (Lawrence, Data cathedrals versus information bazaars?, 2012). The result is a ‘Data Cathedral’ through which users have no way to navigate to find the information that will answer their questions.

Figure 2: Users have no way to navigate through the Enterprise’s Data Cathedral

Remediator at the heart of Linked Enterprise Data

Can we create an information shopping bazaar for users to answer their questions without committing heresy in the Data Cathedral?  Can we create the same information shopping bazaar as Linked Data within the Enterprise: Linked Enterprise Data (LED). That is the objective of Remediator.

First of all we must recognize that the enterprise will have many structured, aggregated, and unstructured data stores already in place:

Figure 3: Enterprise Structured, Aggregated, and Unstructured Data Icebergs

One of the keys to the ability of Linked Data to interlink 300+ datasets is that they are all are expressed as RDF. The enterprise does not have the luxury of replicating all existing data into RDF datasets. However that is not necessary (although still sometimes desirable) because there are adapters that can make any existing dataset look as if it contains RDF, and can be accessed via a SPARQLEndpoint. Examples are listed below

  1. D2RQ: (D2RQ: Accessing Relational Databases as Virtual RDF Graphs )
  2. Ultrawrap:(Research in Bioinformatics and Semantic Web/Ultrawrap)
  3. Ontop:(-ontop- is a platform to query databases as Virtual RDF Graphs using SPARQ)

Attaching these adapters to existing data-stores, or replicating existing data into a triple store, takes us one step further to the Linked Enterprise Data:

Figure 4: Enterprise Data Cloud, the first step to integration

Of course now that we have harmonized the data all as RDF accessible via a SPARQLEndpoint we can view this as an extension of the Linked Data cloud in which we provide enterprises users access to both enterprise and public data:

Figure 5: Enterprise Data Cloud and Linked Data cloud

We are now closer to the information shopping bazaar, since users would, given appropriate discovery and searching user interfaces, be able to navigate their own way through this data cloud.  However, despite the harmonization of the data into RDF, we still have not provided a means for users ask new questions:

What Company (and their fiscal structure) are we working with that have a Business Practise of type Maintenance for the target industry of Oil and Gas with a supporting technology based on Vendor-Application and this Application is not similar to any of our Application?

Such questions require pulling information from many different sources within an organization. Even with the Enterprise Data Cloud one has provided the capability to discover such answers. Would it not be better to allow a user to ask such a question, and let the Linked Enterprise Data determine from where it should pull partial answers which it can then aggregate into the complete answer to the question. It is like asking a team of people to answer a complex question, each contributing their own, and then assembling the overall answer rather than relying on a single guru.  Remediator has the role of that team, taking parts of the questions and asking that part of the question of the data-sources.

Figure 6: Remediator as the Common Entry Point to Linked Enterprise Data (LED)

Thus our question can become:

  1. What Business Practise of type Maintenance for the target industry of Oil and Gas?
  2. What Company are we working with?
  3. What Company have a Business Practise of type Maintenance?
  4. What Business Practise with a supporting technology based on Vendor- Application?
  5. What Company (and their fiscal structure)?
  6. What Vendor-Application and this Application is not similar to any of our Application?

This decomposition of a question into sub-questions relevant to each dataset is automated by Remediator:

Figure 7: Sub-Questions distributed to datasets for answers

Requirements for a Linked Enterprise Data Architecture

  • Keep it simple
  • Do not re-invent that which already exists.
  • Eliminate replication where possible.
  • Avoid the need for prior inferencing.
  • Efficient query performance.
  • Provide provenance of results.
  • Provide optional caching for further slicing and dicing of result-set.
  • Use Void only Void and nothing but Void to drive the query

[1]  If I have 10 database systems running my business that are entirely disconnected, then the benefits are 10 * K, where K is some constant. If I integrate these databases in pairs (operations + accounting, accounting + payroll, etc), then the benefits increase to 10 * K * 2. If I integrate in threes, (operations + accounting + maintenance, accounting + payroll + receiving, etc), then the benefits increase four-fold (a corollary of Metcalf’s law) to 10 * K * 4. For quad-wise integration my benefits would be 10 * K * 8 and so on. Now it might not be 8 fold but the point is there is a geometric, not linear, growth in benefits as I integrate all of my information across my organization.


OData2SPARQL is an OData proxy protocol convertor for any SPARQL/RDF triplestore. To compare SPARQL with OData is somewhat misleading. After all SPARQL has its roots as a very powerful query language for RDF data, but is not intended as a RESTful protocol. Similarly OData has its roots as an abstract interface to any type of datastore, not as a specification of that datastore. Some have said “OData is the equivalent of ODBC for the Web”.
The data management strengths of SPARQL/RDF can be combined with the application development strengths of OData with a protocol proxy: OData2SPARQL, a Janus-point between the application development world and the semantic information world.

Figure 1: OData2SPARQL Proxy between Semantic data and Application consumers

What is OData?

OData is a standardized protocol for creating and consuming data APIs. OData builds on core protocols like HTTP and commonly accepted methodologies like REST. The result is a uniform way to expose full-featured data APIs (Odata.org).  Version 4.0 has been standardized at OASIS, and was released in March 2014.

OData RESTful APIs are easy to consume. The OData metadata, a machine-readable description of the data model of the APIs, enables the creation of powerful generic client proxies and tools. Some have said “OData is the equivalent of ODBC for the Web” (OASIS Approves OData 4.0 Standards for an Open, Programmable Web, 2014). Thus a comprehensive ecosystem of applications, and development tools, has emerged a few of which are listed below:

  • LINQpad: LINQPad is a tool for building OData queries interactively.
  • OpenUI5is an open source JavaScript UI library, maintained by SAP and available under the Apache 2.0 license. OpenUI5 lets you build enterprise-ready web applications, responsive to all devices, running on almost any browser of your choice. It’s based on JavaScript, using JQuery as its foundation, and follows web standards. It eases your development with a client-side HTML5 rendering library including a rich set of controls, and supports data binding to different models including OData.
  • Power Query for Excel is a free Excel add-in that enhances the self-service Business Intelligence experience in Excel by simplifying data discovery, access and collaboration.
  • Tableau – an excellent client-side analytics tool – can now consume OData feeds
  • Teiid allows you to connect to and consume sources (odata services, relational data, web services, file, etc.) and deploy a single source that is available as an OData service out-of-the-box.
  • Telerik not only provides native support for the OData protocol in its products, but also offers several applications and services which expose their data using the OData protocol.
  • TIBCO Spotfire is a visual data discovery tool which can connect to OData feeds.
  • Sharepoint: Any data you’ve got on SharePoint as of version 2010 can be manipulated via the OData protocol, which makes the SharePoint developer API considerably simpler.
  • XOData is a generic web-based OData Service visualization & exploration tool that will assist in rapid design, prototype, verification, testing and documentation of OData Services. 

OData vs SPARQL/RDF

To compare SPARQL with OData is somewhat misleading. After all SPARQL has its roots as a very powerful query language for RDF data, and is not intended as a RESTful protocol. Similarly OData has its roots as an abstract interface to any type of datastore, not as a specification of that datastore.  Recently JSON-LD has emerged (Manu Sporny, 2014), providing a method of transporting Linked Data using JSON. However JSON-LD focusses on the serialization of linked data (RDF) as JSON rather than defining the protocol for a RESTful CRUD interface. Thus it is largely an alternative to, say, Turtle or RDF/XML serialization format.

OData and SPARQL/RDF: Contradictory or Complimentary?
  OData SPARQL/RDF
Strengths ·   Schema discovery

·   OData provides a data source neutral web service interface which means application components can be developed independently of the back end datasource.

·   Supports CRUD

·   Not limited to any particular physical data storage

·   Client tooling support

·   Easy to use from JavaScript

·   Growing set of OData productivity tools such as Excel, SharePoint, Tableau and BusinessObjects.

·   Growing set of  OData frameworks such as SAPUI5, OpenUI5, and KendoUI

·   Growing set of independent development tools such as LINQPad, and XOdata

·   Based on open (OASIS) standards after being initiated by Microsoft

·   Strong commercial support from Microsoft, IBM, and SAP.

·   OData is not limited to traditional RDBMS applications. Vendors of real-time data such as OSI are publishing their data as an OData endpoint.

 

·   Extremely flexible schema that can change over time.

·   Vendor independent.

·   Portability of data between triple stores.

·   Federation over multiple, disparate, data-sources is inherent in the intent of RDF/SPARQL.

·   Increasingly standard format for publishing open data.

·   Linked Open Data expanding.

·   Identities globally defined.

·   Inferencing allows deduction of additional facts not originally asserted which can be queried via SPARQL.

·   Based on open (W3C) standards

Weaknesses ·   Was perceived as vendor (Microsoft) protocol

·   Built around the use of a static data-model (RDBMS, JPA, etc)

·   No concept of federation of data-sources

·   Identities defined with respect to the server.

·   Inferencing limited to sub-classes of objects

 

·   Application development frameworks that are aligned with RDF/SPARQL limited.

·   Difficult to access from de-facto standard BI tools such as Excel.

·   Difficult to report using popular reporting tools

 

Table 1: ODATA AND SPARQL/RDF: CONTRADICTORY OR COMPLIMENTARY?

OData2SPARQL: OData complementing RDF/SPARQL

The data management strengths of SPARQL/RDF can be combined with the application development strengths of OData with a protocol proxy: OData4SPARQL. OData4SPARQL is the Janus-point between the application development world and the semantic information world.

  • Brings together the strength of a ubiquitous RESTful interface standard (OData) with the flexibility, federation ability of RDF/SPARQL.
  • SPARQL/OData Interop proposed W3C interoperation proxy between OData and SPARQL (Kal Ahmed, 2013)
  • Opens up many popular user-interface development frameworks and tools such as OpneUI5.
  • Acts as a Janus-point between application development and data-sources.
  • User interface developers are not, and do not want to be, database developers. Therefore they want to use a standardized interface that abstracts away the database, even to the extent of what type of database: RDBMS, NoSQL, or RDF/SPARQL
  • By providing an OData4SPARQL server, it opens up any SPARQL data-source to the C#/LINQ development world.
  • Opens up many productivity tools such as Excel/PowerQuery, and SharePoint to be consumers of SPARQL data such as Dbpedia, Chembl, Chebi, BioPax and any of the Linked Open Data endpoints!
  • Microsoft has been joined by IBM and SAP using OData as their primary interface method which means there will many application developers familiar with OData as the means to communicate with a backend data source.

Consuming RDF via OData: OData2SPARQL

All of the following tools are demonstrated accessing an RDF triple store via the OData2SPARQL porotocol proxy.

Development Tools

XOData

A new online OData development is XOData from (PragmatiQa, n.d.). Unlike other OData tools, XOData renders very useful relationship diagrams. The Northwind RDFD model published via OData4SPARQL endpoint is shown below:

 Figure 2: Browsing the EDM model Published by ODaTa2SPARQL using XOData

XOData also allows the construct of queries as shown below:

Figure 3: Querying The OData2SPARQL Endpoints Using XODATA

LINQPad

(LINQPad, n.d.) is a free development tool for interactively querying databases using C#/LINQ. Thus it supports Object, SQL, EntityFramework, WCF Data Services, and, most importantly for OData4SPARQWL, OData services. Since LINQPad is centered on the Microsoft frameworks, WCF, WPF etc, this illustrates how the use of OData can bridge between the Java worlds of many semantic tools, and the Microsoft worlds of corporate applications such as SharePoint and Excel.

LINQPad shows the contents of the EDM model as a tree. One can then select an entity within that tree, and then create a LINQ or Lambda query. The results of executing that query are then presented below in a grid.

Figure 4:  Browsing and Querying the OData2SPARQL Endpoints Using LINQPad

LINQPad and XOData are good for testing out queries against any datasource. Therefore this also demonstrates using the DBpedia SPARQL endpoint as shown below:

 Figure 5: Browsing DBPedia SPARQLEndpoint Using LINQPad via OData2SPARQL

Browsing Data

One of the primary motivations for the creation of OData2SPARQL is to allow access to Linked Open Data and other SPARQLEndpoints from the ubiquitous enterprise and desktop tools such as SharePoint and Excel.

Excel/PowerQuery

“Power Query is a free add-in for Excel 2010 and up that provide users an easy way to discover, combine and refine data all within the familiar Excel interface.” (Introduction to Microsoft Power Query for Excel, 2014)

PowerQuery allows a user to build their personal data-mart from external data, such as that published by OData2SPARQL. The user can fetch data from the datasource, add filters to that data, navigate through that data to other entities, and so on with PowerQuery recording the steps taken along the way. Once the data-mart is created it can be used within Excel as a PivotTable or a simple list within a sheet. PowerQuery caches this data, but since the steps to create the data have been recorded, it can be refreshed automatically by allowing PowerQuery to follow the same processing steps. This feature resolves the issue of concurrency in which the data-sources are continuously being updated with new data yet one cannot afford to always query the source of the data. These features are illustrated below using the Northwind.rdf endpoint published via OData2SPARQL:

Figure 6: Browsing the OData4SPARQL Endpoint model with PowerQuery

Choosing an entity set allows one to start filtering and navigating through the data, as shown in the ‘Applied Steps’ frame on the right.

Note that the selected source is showing all values as ‘List’ since each value can have zero, one, or more values as is allowed for RDF DatatypeProperties.

Figure 7: Setting Up Initial Source of Data in PowerQuery

As we expand the data, such as the companyProperty, we see that the Applied Steps records the steps take so that they can be repeated.

Figure 8: Expanding Details in PowerQuery

The above example expanded a DatatypeProperty collection. Alternatively we may navigate through a navigation property such as Customer_orders, the orders that are related to the selected customer:

Figure 9: Navigating through related data with PowerQuery

 Once complete the data is imported into Excel:

Figure 10: Importing data from OData2SPARQL with PowerQuery

Unlike conventional importing of data into Excel, the personal data-mart that was created in the process of selecting the data is still available.

Application Development Frameworks

There are a great number of superb application development frameworks that allow one to create cross platform (desktop, web, iOS, and Android), rich (large selection of components such as grids, charts, forms etc) applications. Most of these are based on the MVC or MVVM model both of which require a systematic and complete (CRUD) access to the back-end data via a RESTful API. Now that OData has been adopted by OASIS, the number of companies offering explicit support for OData is increasing, ranging from Microsoft, IBM, and SAP to real-time database vendors such as OSI. Similarly there are a number of frameworks, one of which is SAPUI5 (UI Development Toolkit for HTML5 Developer Center , n.d.) which has an open source version OpenUI5 (OpenUI5, n.d.).

SAPUI5

SAPUI5 is an impressive framework which makes MVVC/MVVM application development easy via the Eclipse-based IDE. Given that OData4SPARQL publishes any SPARQLEndpoint as an OData endpoint, it means that this development environment is immediately available for an semantic application development.  The following illustrates a master-detail example against the Northwind.rdf SPARQL endpoint via OData4SPARQL.

Figure 11: SAPUI5 Application using OData4SPARQL endpoint

Yes we could have cheated and used the Northwind OData endpoint directly, but the Qnames of the Customer ID and Order Number reveals that the origin of the data is really RDF.

Handling Contradictions between OData and RDF/SPARQL

RDF is an extremely powerful way of expressing data, so a natural question to ask is what could be lost when that data is published via an OData service. The answer is very little! The following table lists the potential issues and their mitigation:

Issue Description Mitigation
OData 3NF versus RDF 1NF RDF inherently supports multiple values for a property, whereas OData up to V2 only supported scalar values Odata V3+ supports collections of property values, which are supported by OData4SPARQL proxy server
RDF Language tagging RDF supports language tagging of strings OData supports complex types, which are used to map a language tagged string to a complex type with separate language tag, and string value.
DatatypeProperties versus object-attributes OWL DatatypeProperties are concepts independent of the class, bound to a class by a domain, range or OWL restriction. Conversely OData treats such properties bound to the class. In OData4SPARQL The OWL DatatypeProperty is converted to an OData EntityType property for each of the DatatypeProperty domains.
Multiple inheritance Odata only supports single inheritance via the OData baseType declaration within an EntityType definition.  
Multiple domain properties An OWL ObjectProperty will be mapped to an OData Association.  An Association can be between only one FromRole and one ToRole and the Association must be unique within the namespace. OData Associations are created for each domain. The OData4SPARQL names these associations {Domain}_{ObjectProperty}, ensuring uniqueness.
Cardinality The capabilities of OData V3 allow all DatatypeProperties to be OData collections. However the ontology might have further restrictions on the cardinality. OData4SPARQL assumes cardinality will be managed by the backend triple store. However in future versions, if the cardinality is restricted to one or less, then the EntityType property can be declared as a scalar rather than a collection.

Table 2: Contradictions between OData and RDF/SPARQL

Availability of OData2SPARQL

Two versions of OData2SPARQL are freely available as listed below:

  1. inova8.odata2sparql.v2 : OData V2 based on the Olingo.V2 library supporting OData Version 2
  2. inova8.odata2sparql.v4 : OData V4 based on the Olingo.V4 library supporting OData V4 (in progress)

 


Answering complex queries with easy-to-use graphical interface

The objectives of lens2odata are to provide a simple method of OData query construction driven by the metadata provide by OData services

  • Provides metamodel-driven OData query construction
    • Eliminates any configuration required to expose any OData service to lens2odata
  • Allows searches to be saved and rerun
    • Allows ease of use by casual users
  • Allows queries to be pinned to ‘Lens’ dashboard panels
    • Provides simple-to-use dashboard
  • Searches can be parameterized
    • Allows for easy configuration of queries
  • Compatible with odata2sparql, a service that exposes any triple store as an OData service
    • Provides a Query-Answering-over-Linked-Data (QALD) interface to any linked data.

Concept of Operation

Lens2odata consists of 3 primary pages with which users interact:

  1. Query: is the page in which users can
    1. add new OData services,
    2. compose queries,
    3. save those queries for reuse, and
    4. pin the queries as result fragments on a Lens
  2. Search: is the page in which users can
    1. select an existing query, and execute that query to explore the results
    2. from where they can navigate to Lens pages for specific entities or collections of entities
  3. Lens: are the pages, composed by users, which display fragments of details, optionally grouped into tabs, about a specific entity or collections of entities. Fragments can either be forms or tables. Other fragment layouts are being added.

Navigation between these pages are shown in the diagram below. Specifically these navigation paths are:

  1. Toggle between Search and Query to explore how the results would appear to a casual user
  2. Navigate to a concept’s Lens from Query preview hyperlinks
  3. Navigate to a concept’s Lens from Search results’ hyperlinks

Figure 1: lens2odata Navigation

Quick Start

Login to lens2odata

  1. Navigate to the Url provided by your administrator for lens2odata, http://<server>/lens2odata
  2. Enter users name and password
  3. Since no service has been previously setup, you will be prompted to enter the service display name and Url of that service. Check to use default proxy if not a local service
  4. After ‘save’, as long as your service was validated, you will immediately enter the Query page with a new query initialized with the first collection found in the service
  5. Execute ‘Preview’ to populate the Results Preview with a few values from the collection:
  6. You are now ready to:
    1. View the query via Search
    2. Explore the results further via Lens
    3. Expand the query with more values and filters

Let’s explore Search first of all

  1. Click on at the top-left of the Query page to navigate to the search page:

Search is the page that general users will access. From this page they can select a predefined query, and execute that query to start their discovery journey.

  1. Click on to populate the results form:

 

This shows more details than the Query page because, in the absence of any specific definition about what details of the instances of collections should be displayed, search will display whatever it can find.

  1. Click on the Url “Categories(1)” to navigate to the Lens for that type of instance.

The Lens page is an information dashboard that can be constructed for any type of instance that is discovered. In this case, since no specific lens page has been setup for ‘Category’ types of instance, a default page has been used with a single tab.

  1. There is another Uri on this page “Products”. Navigating this link will take you to a similar lens page, but one for a collection of instances:

 

  1. This Lens for Products shows a list of instances of products. The first column is a Uri to the lens page for the individual product. Click one to navigate to its Lens:
  2. You are now discovering information by navigating through the data, having started by querying a collection. Next steps would be:
  • The original query was not particularly specific. Lens2odata allows that query to be further refined by specifying which attributes should be displayed, adding navigation properties to other entities, adding filters to the query to restrict results or even parameterizing the query so that a user can execute the saved query, modifying the results just by supplying parameter values.
  • The Lens dashboard page can be further refined by adding more fragments of information on a tab, or adding more tabs to the lens page to logically group the information.

 

Availability of Lens2OData

A versions of Lens2OData  is available on GitHub at com.inova8.lens2odata


SKOS, the Simple Knowledge Organization System, offers an easy to understand schema for vocabularies and taxonomies. However modeling precision is lost when skos:semanticRelation predicates are introduced.

Combining SKOS with RDFS/OWL allows both the precision of owl:ObjectProperty to be combined with the flexibility of SKOS. However clarity is then lost as the number of core concepts (aka owl:Class) grow.
Many models are not just documenting the ‘state’ of an entity. Instead they are often tracking the actions performed on entities by agents at locations. Thus aligning the core concepts to the Activity, Entity, Agent, and Location classes of the PROV ontology provides a generic upper-ontology within which to organize the model details.

Vehicle Manufacturing Example

This examples captures information about vehicle manufacturing. Following

  1. Manufacturers: the manufacturer of models of cars in various production lines sited at plants
  2. Models: the models that the manufacturer produces
  3. ProductionLines: the production lines set up to produce models of vehicles on behalf of a manufacturer
  4. Plants: the plants that house the production lines

In addition there are different ‘styles’ of manufacturing that occur for various models and various sites:

  1. Manufacturing: the use of a ProductionLine for a particular Model

SKOS Modeling

If we follow a pure SKOS model we proceed as follows by creating a VehicleManufacturingScheme  skos:ConceptScheme

s:VehicleManufacturingScheme

rdf:type skos:ConceptScheme

.

Then we create skos:topConceptOf Manufacturer, Model, Plant, and Production as follows:

 

s:Manufacturer

rdf:type owl:Class ;

rdfs:subClassOf skos:Concept ;

skos:topConceptOf s:VehicleManufacturingScheme

.

s:Model

rdf:type owl:Class ;

rdfs:subClassOf skos:Concept ;

skos:topConceptOf s:VehicleManufacturingScheme

.

These top-level concepts are being created of type owl:Class and a subClassOf skos:Concept.  This is the pattern recommended in (Bechhofer, et al.)

Finally we can create skos:broader concepts as follows:

s:Ford

rdf:type s:Manufacturer ;

skos:broader s:Manufacturer ;

skos:inScheme s:VehicleManufacturingScheme

.

s:Fusion

rdf:type s:Model ;

skos:broader s:Model ;

skos:inScheme s:VehicleManufacturingScheme

.

The resultant SKOS taxonomy of the VehicleManufacturingScheme  skos:ConceptScheme then appears as follows:

Figure 1: SKOS taxonomy

Why?

By starting with a pure SKOS model we provide access to the underling concepts in a more accessible style for the less proficient user, as illustrated by the SKOS Taxonomy above. Yet we have not sacrificed the ontological precision of owl:Classes.

Thus we can ask questions about all concepts:

SELECT * WHERE

{

       ?myConcepts rdfs:subClassOf+ skos:Concept .

}

Or we can get a list of anything broader than one of these concepts:

SELECT * WHERE

{

       ?myBroaderConcepts skos:broader s:Model .

}

SKOS+OWL Modeling

Although skos:semanticRelation allows one to link concepts together, this predicate is often too broad when trying to create an ontology that documents specific relations between specific types of concept.

In our VehicleManufacturingScheme we might want to know the following:

  1. isManufacturedBy: which manufacturer manufactures a particular model
  2. operatedBy: which manufacturer operates a particular production facility
  3. performedAt: which plant is the location of a production facility
  4. wasManufacturedAt: which production facility was used to manufacture a particular model

Figure 2: SKOS+OWL model of Relations

 

These predicates can be defined using RDFS as follows:

so:isManufacturerBy

 rdf:type owl:ObjectProperty ;

rdfs:domain s:Model ;

rdfs:range s:Manufacturer ;

rdfs:subPropertyOf skos:semanticRelation

.

so:operatedBy

rdf:type owl:ObjectProperty ;

rdfs:domain s:Production ;

rdfs:range s:Manufacturer ;

rdfs:subPropertyOf skos:semanticRelation

.

Note that the definition of Model, Manufacturer etc. as subClassOf skos:Concept allows us to precisely define the domain and range.

s:Fusion

so:isManufacturerBy s:Ford ;

so:wasManufacturedAt s:Halewood-SmallVehicle ;

.

s:Dagenham-Truck

so:operatedBy s:Ford ;

so:performedAt s:Dagenham ;

.Thus we have used the flexibility of SKOS with the greater modeling precision of RDFS/OWL.

Why?

By building upon the SKOS model, one can ask an expansive question such as what concepts are semantically related to, say, the concept s:Fusion with a simple query:

SELECT * WHERE

{

       s:Fusion  ?p ?y .

       ?p rdfs:subPropertyOf* skos:semanticRelation

}

Yet with the same model we can ask a specific question about a relationship of a specific instance:

SELECT * WHERE

{

       s:Camry so:isManufacturerBy   ?o .

}

SKOS+OWL+PROV Modeling

One of the attractions of SKOS is that a taxonomy can grow organically. One of the problems of SKOS is that a taxonomy can grow organically!

As the taxonomy grows it can be useful to add another layer of structure beyond a catalog of concepts. Many models are not just documenting the ‘state’ of an entity. Instead they are often tracking the actions performed on entities by agents at locations. Thus aligning the core concepts to the Activity, Entity, Agent, and Location classes of the PROV ontology (Lebo, et al.) provides a generic upper-ontology within which to organize the model details.

Figure 3: PROV model

Thus our VehicleManufacturingScheme has each core PROV concept:

  1. Manufacturers: the Agents who manufacture models, and operate plants
  2. Models: the Entities
  3. ProductionLines: the Activities that produce Models on behalf of Manufacturers.
  4. Plants: the Location at which Activities take place, and Agents and Entities are located.

Figure 4: PROV Model

 

s:Production

rdfs:subClassOf prov:Activity ;

.

s:Model

rdfs:subClassOf prov:Entity ;

.

s:Manufacturer

rdfs:subClassOf prov:Organization ;

.

s:Plant

rdfs:subClassOf prov:Location ;

.

Similarly we can cast our predicates into the same PROV model as follows:

so:isManufacturerBy

rdfs:subPropertyOf prov:wasAttributedTo ;

.

so:operatedBy

rdfs:subPropertyOf prov:wasAssociatedWith ;

.

so:performedAt

rdfs:subPropertyOf prov:atLocation ;

.

so:wasManufacturedAt

rdfs:subPropertyOf prov:wasGeneratedBy ;

.

Why?

The PROV model is closer to the requirements of most enterprise models, that are trying to ‘model the business’, than a simple E-R model. The latter concentrates on capturing the attributes of an entity that record the current state of that entity. Often those attributes focus on documenting the process by which the entity gained its current state:

  • The agent that created the entity
  • The activity used to create the entity
  • The location when things were performed
  • The data of the activity, etc

Superimposing the PROV model formalizes this model, and thus allows a structure within which a more casual user can navigate, rather than a sea of entities.

By building upon the PROV model, one can ask an expansive question such as what entities behave as Agents and in which entities are they involved:

SELECT * WHERE

{

?organization a ?Agent .

?Agent rdfs:subClassOf* prov:Agent .

?entity ?predicate ?organization

}

SKOS+OWL+PROV-Qualified Modeling

Within the structure of PROV, predicates define the relationships between Activities, Entities, Agents, and Locations. However it is sometimes necessary to qualify these relationships. For example, the so:wasManufacturedAt predicate defines that a s:Production facility was used to manufacture a s:Model. When? How was it used? Why?

To extend the model, PROV adds the concept of a qualified influence, which allows the relationship to be further defined.

Figure 5: Qualified PROV for some predicates

We do this first of all by creating sopq:Manufacturing:

sopq:Manufacturing

rdf:type owl:Class ;

rdfs:subClassOf skos:Concept ;

rdfs:subClassOf prov:Generation ;

skos:topConceptOf s:VehicleManufacturingScheme ;

.

Note that this is a rdfs:subClassOf prov:Generation, the reification of the predicate prov:wasGeneratedBy

We then add two predicates, one (sopq:wasManufacturedUsing) from the prov:Entity to the prov:Generation, and one (sopq:production) from the prov:Generation to the prov:Activity as follows:

sopq:wasManufacturedUsing

rdf:type owl:ObjectProperty ;

rdfs:domain s:Model ;

rdfs:range sopq:Manufacturing ;

rdfs:subPropertyOf skos:semanticRelation ;

rdfs:subPropertyOf prov:qualifiedGeneration ;

.

sopq:production

rdf:type owl:ObjectProperty ;

rdfs:subPropertyOf skos:semanticRelation ;

rdfs:subPropertyOf prov:activity ;

.

Finally we can create a Manufacturing qualified generation concept as follows:

sopq:L-450H_at_Swindon-Hybrid

rdf:type sopq:Manufacturing ;

sopq:production s:Swindon-Hybrid ;

 skos:broader sopq:Manufacturing ;

.

s:L-450H

sopq:wasManufacturedUsing sopq:L-450H_at_Swindon-Hybrid ;

.

In the figure below we can see that these qualified actions simply extend the SKPOS taxonomy:

Figure 6: Taxonomy extended with Qualified Actions

Why?

Using qualifiedActions provides a systematic, rather than ad-hoc, way to provide more precision to a model.

Remaining Issues

  1. The PROV structure does not manifest itself within the taxonomy. Should Activity, Entity, Agent, and Location therefore be ConceptSchemes?

Model

The model files used in this example are included here: model2rdf

  • skos.ttl
  • skos+owl.ttl
  • skos+owl+prov.ttl
  • skos+owl+provqualified.ttl

References

Bechhofer, Sean and Miles, Alistair. Using OWL and SKOS. [Online] W3C. https://www.w3.org/2006/07/SWD/SKOS/skos-and-owl/master.html.

Lebo, Timothy, Satya, Sahoo and McGuinness, Deborah. PROV-O: The PROV Ontology. W3C. [Online] https://www.w3.org/TR/prov-o/.Reaming Issues