The IntelligentGraph-way or the Groundhog-way to efficient data analytics

Posted on September 27, 2021 by peterlawrence

The IntelligentGraph-way or the Groundhog-way to efficient data analytics.pdf

Data is rather like poor red wine: it neither travels nor ages well. IntelligentGraph avoids data traveling by moving analysis into the knowledge graph rather than moving data to the analysis engine, obsoleting the groundhog-analysis-way

Solving data analysis, the IntelligentGraph-way

Data is streamed to the IntelligentGraph datastore, and then analysis/calculation nodes are added to that graph which is accessible to all and by all applications.

The IntelligentGraph-way of data-analysis is to:

Extract-Load (losslessly) the source data into an Intelligent Knowledge Graph
Add analysis/calculation nodes to the KnowledgeGraph which calculate additional analysis values, aggregations, etc.
Report results, using any standard reporting tool
Add more analysis/calculation nodes as additional requests come through. These new calculations can refer to the existing results
… relax, you’ve become the star data analyst:-)

Solving data analysis, the Groundhog-way

Data is in the operational data-sources, data is staged in a data-warehouse/mart/lake, then the analysis is done by the analysis engine (aka Excel), right? And like poor red wine, constantly moving data damages it.

The Groundhog-way of data analysis is to:

Extract-Transform(aka probably damage the data as well)-Load the source data into a data-warehouse/mart/lake just to make it ‘easier’ to access.
Realizing the required analytical results are not in the data-warehouse/mart/lake, extract some data into Excel/PowerBI/BI-tool-of-choice where you can write your analysis calculations, aggregations, etc.
Report analysis results, but forget to or cannot put the analysis results back into the data-warehouse/mart/lake.
Repeat the same process every time there is a similar (or identical) analysis required.
… don’t relax, another analysis request shortly follows 🙁

IntelligentGraph Benefits

IntelligentGraph moves analysis into the knowledge graph rather than moving data to the analysis engine, avoiding the groundhog-analysis-way:

Improves analyst performance and efficiency
- Eliminates the need for analysts to create ELT to move data to the analysis engine.
Simplifies complex calculations and aggregations
- PathQL language greatly simplifies navigating and aggregating throughout the graph.
Ensures calculation and KPI concurrency
- Calculations are performed in-situ with the data, so no need to re-export data to the analysis engine to view updated results.
Uses familiar scripting language
- Scripts expressed in any of multiple scripting languages including Python, Javascript, Groovy, and Java.
Improves analysis performance and efficiency
- Time-to-answer reduced or eliminated as analysis is equivalent to reporting
Ensures analysis effort is shared with all
- Analysis results become part of the graph which can be used by others ensuring consistency.
Self-documenting analysis path to raw data
- The IntelligentGraph contains calculation scripts that define which calculations will be performed on what data (or other calculation results).
Improves analysis accuracy by providing provenance of all calculations
- Trace of any analysis through to raw data is automatically available.
Simplifies reporting
- Reporting tools can be used that focus on report appearance rather than calculation capability since the latter is performed in the IntelligentGraph.
Highly scalable
- IntelligentGraph is built upon the de-facto graph standard RDF4J, allowing for the use of any RDF4J compliant datastore.
Standard support
- Access to IntelligentGraph for querying, reporting, and more is unchanged from any RDF-based KnowledgeGraph.
Evolutionary, not revolutionary modeling and analysis
- Graph-based models offer the ability to evolve as data analysis needs grow, such as adding new dimensions to the data, unlike a ‘traditional’ data mart or warehouse which usually require a rebuild.
Creates the Intelligent Internet of Things
- Scripts can access external data, such as IoT, on-demand allowing IoT-based calculations and analysis to be performed in-situ.
Eliminates spreadsheet-hell
- All spreadsheet calculations and aggregations can be moved into the graph, leaving the spreadsheet as a presentation tool. This eliminates the problem of undocumented calculations and inconsistent calculations in different spreadsheets.

PathQL: Intelligently finding knowledge as a path through a maze of facts

Posted on September 2, 2021September 16, 2021 by peterlawrence

PathQL simplifies finding paths through the maze of facts within a KnowledgeGraph. Used within IntelligentGraph scripts it allows data analysis to be embedded within the graph, rather than requiring graph data to be exported to an analysis engine. Used with IntelligentGraph Jupyter Notebooks it provides powerful data analytics

I would suggest that Google does not have its own intelligence. If I search for, say, ‘Arnold Schwarzenegger and Harvard’, Google will only suggest documents that contain BOTH Arnold Schwarzenegger and Harvard. I might be lucky that someone has digested these facts and produced a single web page with the knowledge I want. I might, however, just as easily find a page of fake knowledge relating Arnold to Harvard.

It is undoubtedly true that Google can provide individual facts such as:

Arnold married to Shriver
Shriver daughter Joseph
Joseph alma mater Harvard

However, intelligence is the ability to connect individual facts into a knowledge path.

KnowledgeGraph models can provide the facts to answer these questions.
PathQL provides an easy way to discover knowledge by describing paths and connections through these facts.
IntelligentGraph embeds that intelligence into any KnowledgeGraph as scripts.

IntelligentGraph-PathQL and Scripting.pdf

Genealogical Example

Genealogy is a grandfather of graphs, it is, therefore, natural to organize family trees as a knowledge graph. A typical PathQL question to ask would then be: who are the parents of a male ancestor, born in Maidstone, of this individual, and what is that relationship?

Industrial Internet of Things (IIoT) Example

The Industrial Internet of Things (IIot) is best modeled as an interconnected graph of ‘thing’ nodes. These things might be sensors producing measurements, the equipment to which the sensors are attached, or how the equipment is interconnected to form a functioning plant. However, the ‘intelligence’ about why the plant is interconnected is what an experienced (aka intelligent, knowledgeable) process engineer offers. To support such intelligence with a knowledge graph requires answering PathQL questions such as

If the V101 bottoms pump stops how does this affect the product flow from this production unit?
If the FI101 instrument fails how does this affect the boiler feed temperature?
What upstream could possibly be affecting this stream’s quality?
… and so on.

Why PathQL?

SPARQL is a superb graph pattern query language, so why create another?

PathQL started out as the need to traverse the nodes and edges in a triplestore both without the benefit of SPARQL and within a scripting language of IntelligentGraph. IntelligentGraph works by embedding the calculations within the graph. Therefore, just like a spreadsheet calculation can access other ‘cells’ within its spreadsheet, IntelligentGraph needed a way of traversing the graph through interconnected nodes and edges to other nodes from where relevant values can be retrieved.

I didn’t want to create a new language, but it was essential that the IntelligentGraphprovided a very easy way to navigate a path through a graph. It then became clear that, as powerful as SPARQL is for graph pattern matching, it can be verbose for matching path patterns. PathQL was born, but not without positive prodding from my colleague Phil Ashworth.

Adding Intelligence to Graphs with Scripts

Typically, a graph node will have associated attributes with values, such as a stream with volumeFlow and density values. These values might have been imported from some external system or other:

Stream Attributes:

:Stream_1
    :density ".36"^^xsd:float ;
    :volumeFlow "40"^^xsd:float .
:Stream_2 ....

The ‘model’ of the streams can be captured as edges associated with the Unit:

:Unit_1
    :hasProductStream :Stream_1 ;
    :hasProductStream :Stream_2 .

However, most ‘attributes’ that we want to see about a thing are not measured directly. Instead, they need to be calculated from other values. This is why we end up with spreadsheet-hell: importing the raw data from the data sources into a spreadsheet simply so we can add calculated columns, the values of which are rarely exported back to the source-databases.

IntelligentGraph allows these calculations to be embedded in the graph as literals[1] with a datatype whose local name corresponds to one of the installed script languages:

:Stream_1
    :massFlow
        "_this.getFact(‘:density’)*
        _this.getFact(‘:volumeFlow’);"^^:groovy .

:Unit_1
    :totalProduction
        "var totalProduction =0.0;
        for(Resource stream : _this.getFacts(‘:hasProductStream’))
        {
            totalProduction += stream.getFact(‘:massFlow’);
        }
        return totalProduction; "^^:groovy .

Instead of returning the object literal value (aka the script), the IntelligentGraph will return the result value for the script.

We can write this script even more succinctly using the expressive power of PathQL:

:Unit_1  :totalProduction  "return _this.getFacts(‘:hasProductStream/:massFlow’).total(); "^^:groovy .

PathQL

Spreadsheets are not limited to accessing just adjacent cells; neither is the IntelligentGraph. PathQL provides a powerful way of navigating from one Thing node to others. PathQL was inspired by SPARQL and propertyPaths, but a richer, more expressive, path searching was required for the IntelligentGraph.

Examples

Genealogy Example Graph

Examples of PathQL that explore this genealogy are as follows:

```
_this.getFact(“:parent”)
```
- will return the first parent of _this.

```
_this.getFact(“^:parent”)
```
- will return the first child of _this.

```
_this.getFacts(“:parent/:parent”)
```
- will return the grandparents of _this.

```
this.getFacts(“:parent/^:parent”)
```
- will return the siblings of _this.

 _this.getFacts(“:parent[:gender :female]/:parent”)

will return the maternal grandparents of _this

_this.getFacts(“:parent[:gender :female]/:parent[:gender :male]”)

will return the maternal grandfather of _this.

```
_this.getFacts(“:parent[:gender [ rdfs:label “female”]]”)
```
- will return the mother of _this but using the label instead of the IRI.

 _this.getFacts(“:parent[eq :Peter]/:parent[:gender :male]”)

will return the grandfather of _this, who is the parent of :Peter.

 _this.getFacts(“:parent[ne :Peter]/:parent[:gender :male]”)

will return grandfathers of _this, who are not the parent of :Peter.

```
_this.getFacts(“:parent{0,4}/:parent[:hasLocation :maidstone]”)
```
- will return all ancestors whose parent was born in a location :maidstone

```
_this.getPath(“:parent{0,4}/:parent[:hasLocation :maidstone]”)
```
- will return the path to most recent ancestor whose parent was born in a location :maidstone

_this.getFacts(“:parent{0,4}/:parent[:hasLocation [rdfs:label ‘Maidstone’]]”)

will return all ancestors whose parent was born in a location named “Maidstone”

```
_this.getPaths(“:connectedTo{1,10}[eq :BakerStreet]”)
```
- will find all routes, starting with shortest, between _this and :BakerStreet with a maximum of 10 connections, thus all on the same line

```
_this.getPaths(“:connectedTo{1,5}/:changeTo{0,2}/:connectedTo{1,5}[eq :BakerStreet]”)
```
- will find routes, starting with the shortest, between _this and :BakerStreet with a maximum of two changes

PathQL Formal Syntax

The parts of a PathPattern are defined below. The formal syntax in BNF is here: PathPattern Formal Syntax

IRIRef:

The simplest pathPattern is an IRI of the predicate, property, or edge:

:parent

An unprefixed qname using the default namespace.

ft:parent

A prefixed qname using the namespace model.

<http://inova8.com/ft/hasParent>

A full IRI.

PathAlternative:

A pathPattern can consist of a set of alternative edges:

:parent|:hasChild

Alternative edges to ‘close relatives’ of the :subjectThing.

PathSequence:

A pathPattern can consist of a sequence of edges:

:parent/:hasChild

sequence of edges to the set of siblings of the start thing.

Inverse Modifier:

A modifier prefix to a predicate indicating that it should be navigated in the reverse direction (objectàsubject) instead of subjectàobject:

:parent/^:parent

A sequence of edges to the set of siblings of the start thing since ^:parent is equivalent to :hasChild.

Reified Modifier:

A modifier prefix to a predicate indicating that it should be assumed that the subject-predicate-object is reified.

@:marriedTo

navigates from the :subjectThing to the :objectThing when the edge has been reified as:

[]  rdf:subject :subjectThing ;
    rdf:predicate :marriedTo ;
    rdf:object :objectThing .

Inverse modifier can also be applied to navigate from the :objectThing to :subjectThing:

^@:marriedTo

navigates from the :objectThing to the :subjectThing

Extended Reification Modifier:

The reification type and the predicate of an extended reification:

:Marriage@:civilPartnership

navigates from the :subjectThing to the :objectThing when the edge has been reified as a class that is a :Marriage, which is rdfs:subClassOf rdf:Statement with a predicate of :civilPartnership. For example:

 [] a :Marriage
    :partner1 :subjectThing ;
    :marriageType :civilPartnership ;
    :partner2 :objectThing .

:Marraige rdfs:subClassOf rdf:Statement .
:partner1 rdf:subPropertyOf rdf:subject .
:marriageType rdf:subPropertyOf rdf:predicate.
:partner2 rdf:subPropertyOf rdf:object .

An inverse modifier can also be applied to navigate from the :objectThing to :subjectThing

^:Marriage@:marriedTo

navigates from the :objectThing to the :subjectThing in the extended reification.

Dereification Modifier:

Instead of navigating to the objectThing of a reification, the dereification operator navigates to the reification thing:

@:marriedTo#

navigates from the :subjectThing to the :marriage object.

@:marriedTo#/:at

navigates from the :subjectThing to the location :at which the marriage took place

@:marriedTo#/:when

navigates from the :subjectThing to the date :when the marriage took place

Path Filter:

A path filter can be applied to any point in a pathPattern to limit the subsequent paths. A path filter is like a SPARQL PropertyListNotEmpty graph pattern. However, it includes comparison operators lt, gt etc

:parent[:gender :male]

Navigates to the male parent.

:parent[:gender :male]/:parent[:gender :female]

Navigates to the paternal grandmother.

:volumeFlow[gt “50”]

Navigates only if the value is greater than “50”.

:appearsOn[eq :calc2graph1]

Navigates only if the objectNode value is :calc2graph1.

:appearsOn[ rdfs:label "Calc2Graph1"]

Navigates only if the objectNode has a rdfs:label with value “Calc2Graph1”.

:appearsOn[eq [ rdfs:label "Calc2Graph1"]]

Navigates only if the objectNode value is a node whose label is “Calc2Graph1”.

Cardinality:

Repeats the pathPattern between the range of minimum and maximum cardinality

:parent{1,5}

Finds the 1 to 5th ancestor of the reference node.

:parent[:gender :male]{1,5}

Finds the 1 to 5th male ancestor via the male line of the reference node.

Further Example Scripts

The following illustrates the IntelligentGraph scripts to perform the plant analysis example

Return Scalar Value

return 40;

Get Related Property Value

return _this.getFact(":testProperty2").doubleValue()

Calculate Stream Mass Flow

var result= _this.getFact(":volumeFlow").floatValue()* _this.getFact(":density").floatValue();  
return result;

Calculate Unit Mass Throughput

return _this.getFacts(":hasProductStream/:massFlow").total();

Calculate Stream Mass Yield

var result= _this.getFact(":massFlow").floatValue()/ _this.getFact("^:hasStream/:massThroughput").floatValue();  
return result;

Calculate Unit Mass Balance

return _this.getFacts(":hasFeedStream/:massFlow").total() 
- _this.getFacts(":hasProductStream/:massFlow").total();

Path Navigation Functions

The spreadsheets’ secret sauce is the ability of a cell formula to access values of other cells, either individually or as a set. The IntelligentGraph provides this functionality with several methods associated with Thing, which are applicable to the _this Thing initiated for each script with the subject Thing.

Thing.getFact(String pathQL) returns Resource

Returns the value of the node referenced by the pathQL, for example “:volumeFlow” returns the object value of the :volumeFlow edge relative to _this node. The pathPattern allows for more complex path navigation.

Thing.getFacts(String pathQL) returns ResourceResults

Returns the values of nodes referenced by the pathQL, for example “:hasProductStream” returns an iterator for all object values of the :hasProductStream edge relative to _this node. The pathPattern allows for more complex path navigation.

Thing.getPath(String pathQL) returns Path

Returns the first (shortest) path referenced by the pathQL, for example “:parent{1..5}” returns the path to the first ancestor of _this node. The pathQL allows for more complex path navigation.

Thing.getPaths(String pathQL) returns PathResults

Returns all paths referenced by the pathQL, for example “:parent{1..5}” returns an iterator, starting with the shortest path, for all paths to the ancestors of _this node. The pathQL allows for more complex path navigation.

Graph.getThing(String subjectIRI) returns Thing

Returns a node as defined by the IRI

Script Context Variables

Each script has access to the following predefined variables that allow the script to access the context within which it is being run.

_this, a Thing corresponding to the subject of the triples for which the script is the object. Since this is available, helper functions are provided to navigate edges to or from this ‘thing’ below:

_property, a Thing corresponding to the predicate or property of the triples for which the script is the object.

_customQueryOptions, a HashMap<String, Value> of name/value pairs corresponding to the pairs of additional arguments to the SPARQL extension function. These are useful for passing application-specific parameters.

_builder, a RDF4J graph builder object allowing a graph to be constructed (and manipulated) within the script. A graph cannot be returned from a SPARQL function. However the IRI of the graph can be returned, and any graph created by a script will be persisted.

_tripleSource, the RDF4J TripleSource to which the subject, predicate, triple belongs.

PathQL BNF

The formal syntax of the PathPattern is defined as follows using ANTLR4 BNF:

grammar PathPattern;

// PARSER RULES
queryString     : pathPattern queryOptions? EOF ;
queryOptions    : ( queryOption )+;
queryOption     : KEY '=' literal ('^^' type )?;
type            : qname;
pathPattern     : binding ('/'|'>') pathPatterns  #boundPattern
                | binding  #matchOnlyPattern
                | pathPatterns  #pathOnlyPattern;
binding         : factFilterPattern  ;
pathPatterns    : pathEltOrInverse cardinality?  #Path  
                | pathPatterns '|'  pathPatterns  #PathAlternative  
                | pathPatterns ('/'|'>')  pathPatterns  #PathSequence
                | negation? '(' pathPatterns ')'  cardinality? #PathParentheses;
cardinality     : '{'  INTEGER (',' ( INTEGER )? )?  '}'  ;
negation        : '!';
pathEltOrInverse: negation? INVERSE? predicate  ;
predicate       : ( reifiedPredicate 
                |  predicateRef 
                |  rdfType 
                |  anyPredicate ) factFilterPattern? ;
anyPredicate    : ANYPREDICATE ;
reifiedPredicate: iriRef? REIFIER predicateRef  factFilterPattern?  dereifier? ;
predicateRef    : IRI_REF  | rdfType  |  qname | pname_ns ;
iriRef          : IRI_REF |  qname | pname_ns ;  
dereifier       : DEREIFIER ;
factFilterPattern: '['  propertyListNotEmpty   ']';
propertyListNotEmpty: verbObjectList ( ';' ( verbObjectList )? )* ;  
verbObjectList  : verb objectList;
verb            : operator | pathEltOrInverse ;
objectList      : object ( ',' object )*;
object          : iriRef  | literal | factFilterPattern | BINDVARIABLE ;
qname           : PNAME_NS PN_LOCAL; 
pname_ns        : PNAME_NS ;   
literal         : (DQLITERAL | SQLITERAL) ('^^' (IRI_REF |  qname) )? ;  
operator        : OPERATOR ;
rdfType         : RDFTYPE ;

// LEXER RULES
KEY             : '&' [a-zA-Z]+ ;  
INTEGER         : DIGIT+ ; 
BINDVARIABLE    : '%' DIGIT+ ;
fragment
DIGIT           : [0-9] ;  
INVERSE         : '^';
REIFIER         : '@';
DEREIFIER       : '#';
RDFTYPE         : 'a';
ANYPREDICATE    : '*' ;
OPERATOR        : 'lt'|'gt'|'le'|'ge'|'eq'|'ne'|'like'|'query'|'property';
DQLITERAL       : '"' (~('"' | '\\' | '\r' | '\n') | '\\' ('"' | '\\'))* '"';
SQLITERAL       : '\'' (~('\'' | '\\' | '\r' | '\n') | '\\' ('\'' | '\\'))* '\'';
IRI_REF         : '<' ( ~('<' | '>' | '"' | '{' | '}' | '|' | '^' | '\\' | '`') | (PN_CHARS))* '>' ;      
PNAME_NS        : PN_PREFIX? (':'|'~')  ;   
VARNAME         : '?' [a-zA-Z]+ ;
fragment
PN_CHARS_U      : PN_CHARS_BASE | '_'  ;
fragment   
PN_CHARS        : PN_CHARS_U
                | '-'
                | DIGIT  ;
fragment
PN_PREFIX       : PN_CHARS_BASE ((PN_CHARS|'.')* PN_CHARS)? ;
PN_LOCAL        : ( PN_CHARS_U | DIGIT ) ((PN_CHARS|'.')* PN_CHARS)? ;
fragment
PN_CHARS_BASE   : 'A'..'Z'
                | 'a'..'z'
                | '\u00C0'..'\u00D6'
                | '\u00D8'..'\u00F6'
                | '\u00F8'..'\u02FF'
                | '\u0370'..'\u037D'
                | '\u037F'..'\u1FFF'
                | '\u200C'..'\u200D'
                | '\u2070'..'\u218F'
                | '\u2C00'..'\u2FEF'
                | '\u3001'..'\uD7FF'
                | '\uF900'..'\uFDCF'
                | '\uFDF0'..'\uFFFD' ;
WS  : [ \t\r\n]+ -> skip ;

[1] SCRIPT Languages

In this case, the script uses Groovy, but any Java 9 compliant scripting language can be used, such as JavaScript, Python, Ruby, and many more.

By default, JavaScript, Groovy, Python JAR are installed. The complete list of compliant languages is as follows

AWK, BeanShell, ejs, FreeMarker, Groovy, Jaskell, Java, JavaScript, JavaScript (Web Browser), Jelly, JEP, Jexl, jst, JudoScript, JUEL, OGNL, Pnuts, Python, Ruby, Scheme, Sleep, Tcl, Velocity, XPath, XSLT, JavaFX Script, ABCL, AppleScript, Bex script, OCaml Scripting Project, PHP, Python, Smalltalk, CajuScript, MathEclipse

See Quorum360 provide answers to users’ analysis, searching, visualizing or other questions of their own data

Posted on April 20, 2019April 24, 2019 by peterlawrence

Providing answers to users’ analysis, searching, visualizing or other questions of their own data

Creating an overall solution that presents data in a useful way can be challenging, but OData2SPARQL and Lens2OData solves this.

RDF-Graph: Data + Model = Information, allowing us to combine you raw data with an adaptable model to create meaningful information
OData2SPARQL: Information + Rules = Knowledge, provides you the ability to access that information combine with additional rules (SPIN and SHACL) to deliver useful knowledge that can be consumed by applications and BI tools.
Lens2OData: Knowledge + Action = Results, allows users to easily navigate, search, explore, and visualize this knowledge in such a way that it is easy to take action and produce results.

To see this all in action we have prepared a demonstrator that can be downloaded, and the following videos which illustrate the capabilities of this demonstrator.

Navigate Northwind using Lens2OData , our entirely configurable, code-free, UI/UX for knowledge graphs

Search Northwind using Lens2OData, demonstrating both text as well as linked search across the Northwind graph.

Visualize graphically the Northwind knowledge graph

Explore provenance of data sources which is retained by RDF- graph and OData2SPARQL rather than losing that provenance with typical ETL processing.

Perform typical business analysis of the RDF-Graph using OData2SPARQL and Excel PowerQuery

Explore the Transport For London train lines, stations and zones, illustrating how easy it is to transform any dataset to RDF_Graph and immediately get the benefits of OData access, and Lens UI/UX

To download and run this demonstrator go toDocker hub here.

Shaping RDF-Graph Data with SHACL via OData2SPARQL

Posted on March 26, 2019April 23, 2019 by peterlawrence

OData2SPARQL V4 endpoint now publishes any SHACL nodeShapes defined in the model mapping them to OData complexTypes, along with existing capability of publishing RDFS and OWL classes, and SPIN queries.

RDF graphs provide the best (IMHO) way to store and retrieve information. However creating user applications is complicated by the lack of a standard RESTful access to the graph, certainly not one that is supported by mainstream UI frameworks. OData2SPARQL solves that by providing a model-driven RESTful API that can be deployed against any RDF graph.

The addition of support for SHACL allows the RESTful services published by OData2SPARQL to not only reflect the underlying model structure, but also match the business requirements as defined by different information shapes.

Since OData is used by many applications and UI/UX frameworks as the 21^st century replacement to ODBC/JDBC, the addition of SHACL support means that user interfaces can be automatically generated that match with the SHACL shapes metadata published by OData2SPARQL.

SHACL Northwind Model Example

The ubiquitous Northwind model contains sample data of employees, customers, products, orders, order-details, and other related information. This was originally a sample SQL database, but is also available in many other formats including RDF.

OData2SPARQL using RDFS+ model

Using OData2SPARQL to map the Northwind RDF-graph will create entity-sets of Employees, Customers, Orders, OrderDetails, and so on.

OData4SPARQL maps RDFS+

Any rdfs:Class/owl:Class to an OData EntityType and EntitySet

Any OWL DatatypeProperty to a property of an OData EntityType

Any OWL ObjectProperty to an OData navigationProperty

Within RDF it is possible to create an Order-type of thing, without that order having a customer, or any order-details. Note that this is not a disadvantage of RDF; in fact it is one of the many advantages of RDF as it allows an order to be registered before all of its other details are available.

However, when querying an RDF-graph we are likely to ask what orders exist for a customer made by an employee, and with at least one order line-item (order-detail) that includes the product, quantity and discount. We could say that this is a qualified-order.

If we were to request the description of a particular order using OData2SPARQL:

OData2SPARQL Request:
 …/odata2sparql/northwind/Order('NWD~Order-10248')?format=json

OData2SPARQL Response:
{
       @odata.context: "http://localhost:8080/odata2sparql/northwind/$metadata#Order/$entity",
       customerId: "NWD~Customer-VINET",
       employeeId: "NWD~Employee-5",
       freight: "32.380001",
       label: "Order-10248",
       lat: 49.2559582,
       long: 4.1547448,
       orderDate: "2016-07-03T23:00:00Z",
       …
}

Now the above response includes every OData property (aka RDF datatypeProperty) we know about Order(‘NWD~Order-10248’). Not all are shown above for brevity.

However we might want to include related information, that which is connected via an OData navigation property (aka RDF objectproperty). To include this related information we simple augment the request with select=* and expand=* as follows:

OData2SPARQL Request: 
…/odata2sparql/northwind/Order('NWD~Order-10248')?$select=*&$expand=*&$format=json

OData2SPARQL Response: 
{
       @odata.context: "http://localhost:8080/odata2sparql/northwind/$metadata#Order(*)/$entity",
       freight: "32.380001",
       lat: 49.2559582,
       long: 4.1547448,
       orderDate: "2016-07-03T23:00:00Z",
       subjectId: "NWD~OrderDetail-10248”,
       …
       employee: {
              birthDate: "1975-03-04",
              employeeAddress: "14 Garrett Hill",
              employeeCity: "London",
              employeeCountry: "UK",
              subjectId: "NWD~Employee-5",
              …
       },
       orderRegion: null,
       shipVia: {
              shipperCompanyName: "Federal Shipping",
              shipperPhone: "(503) 555-9931",
              subjectId: "NWD~Shipper-3"
       },
       customer: {
              customerAddress: "59 rue de l'Abbaye",
              subjectId: "NWD~Customer-VINET",
              …
       },
       hasOrderDetail: [{
              discount: 0,
              orderDetailUnitPrice: 14,
              orderId: "NWD~Order-10248",
              productId: "NWD~Product-11",
              quantity: 12,
              subjectId: "NWD~OrderDetail-10248-11",
              …
       },
       {
              discount: 0,
              orderDetailUnitPrice: 9.8,
              orderId: "NWD~Order-10248",
              productId: "NWD~Product-42",
              quantity: 10,
              subjectId: "NWD~OrderDetail-10248-42",
              …
       },
       {
              discount: 0,
              orderDetailUnitPrice: 34.8,
              orderId: "NWD~Order-10248",
              productId: "NWD~Product-72",
              quantity: 5,
              subjectId: "NWD~OrderDetail-10248-72",
              …
       }],
       …
}

OData2SPARQL with SHACL Shapes

Well we got what we asked for: everything (a lot of the information returned has been hidden in the above for clarity). However this might be a little overwhelming as the actual question we wanted answered was:

Give me any qualified orders that have salesperson and customer with a least one line-item that has product, quantity, and discount defined.

The beauty of RDF is that it is based on an open-world assumption: anything can be said. Unfortunately the information-transaction world is steeped in referential integrity, so they see the looseness of RDF as anarchical. SHACL bridges that gap by defining patterns to which the RDF-graphs should adhere if they want to be classified as that particular shape. However SHACL still allows the open-world to co-exist.

Closed or Open World? We can have the best of both worlds:

The open-world assumption behind RDF-graphs in which anyone can say anything about anything, combined with

The close world assumption behind referential integrity that limits information to predefined patterns

Let’s shape what is a qualified order using SHACL. Using this shapes constraint language we can say that:

A QualifiedOrder comprises of

An Order, with

One and only one Customer

One and only one Employee (salesperson)

At least one OrderDetail, each with

Optionally one discount

One and only one product

One and only one quantity

OK, this is not overly complex but the intention is not to befuddle with complexity, but illustrate with a simple yet meaningful example.

The above ‘word-model’ can be expressed using the SHACL vocabulary as the following, which can be included with any existing schema/model of the information:

shapes:QualifiedOrder
  rdf:type sh:NodeShape ;
  sh:name "QualifiedOrder" ;
  sh:targetClass model:Order ;
  sh:property [
      rdf:type sh:PropertyShape ;
      skos:prefLabel "" ;
      sh:maxCount 1 ;
      sh:minCount 1 ;
      sh:name "mustHaveCustomer" ;
      sh:path model:customer ;
    ] ;
  sh:property [
      rdf:type sh:PropertyShape ;
      sh:maxCount 1 ;
      sh:minCount 1 ;
      sh:name "mustHaveSalesperson" ;
      sh:path model:employee ;
    ] ;
  sh:property [
      rdf:type sh:PropertyShape ;
      sh:inversePath model:order ;
      sh:minCount 1 ;
      sh:name "mustHaveOrderDetails" ;
      sh:node [
          rdf:type sh:NodeShape ;
          sh:name "QualifiedOrderDetail" ;
          sh:property [
              rdf:type sh:PropertyShape ;
              sh:maxCount 1 ;
              sh:minCount 0 ;
              sh:name "mayHaveOrderDetailDiscount" ;
              sh:path model:discount ;
            ] ;
          sh:property [
              rdf:type sh:PropertyShape ;
              sh:maxCount 1 ;
              sh:minCount 1 ;
              sh:name "mustHaveOrderDetailProduct" ;
              sh:path model:product ;
            ] ;
          sh:property [
              rdf:type sh:PropertyShape ;
              sh:maxCount 1 ;
              sh:minCount 1 ;
              sh:name "mustHaveOrderDetailQuantity" ;
              sh:path model:quantity ;
            ] ;
          sh:targetClass model:OrderDetail ;
        ] ;
    ] ;
.

OData2SPARQL has been extended to extract both the RDFS+ model, any SPIN operations, with now any SHACL shapes. These SHACL shapes are mapped as follows:

OData4SPARQL maps SHACL

any SHACL nodeShape to an OData ComplexType and a OData EntityType and EntitySet

any propertyShape to an OData property with the same restrictions

An OData2SPARQL request for an EntityType derived from a SHACL shape will construct the SPARQL query adhering to the shapes restrictions as shown in the example request below:

OData2SPARQL Request: 

…/odata2sparql/northwind/shapes_QualifiedOrder('NWD~Order-10248')?$select=*&$expand=*&$format=json

OData2SPARQL Response: 

{
       @odata.context: "http://localhost:8080/odata2sparql/northwind/$metadata#shapes_QualifiedOrder(*)/$entity",
       QualifiedOrder: {
              hasOrderDetail: [{
                     discount: 0,
                     quantity: 12,
                     product: {
                           subjectId: "NWD~Product-11",
                           …
                     }
              },
              {
                     discount: 0,
                     quantity: 10,
                     product: {
                           subjectId: "NWD~Product-42",
                           …
                     }
              },
              {
                     discount: 0,
                     quantity: 5,
                     product: {
                           subjectId: "NWD~Product-72",
                           …
                     }
              }],
              customer: {
                     subjectId: "NWD~Customer-VINET"
              },
              employee: {
                     subjectId: "NWD~Employee-5",
              }
       },
       subjectId: "NWD~Order-10248",
       …
}

In OData2SPARQL terms this means the following request:

OData2SPARQL Request: 

…/odata2sparql/northwind/shapes_QualifiedOrder?$format=json

OData2SPARQL Response: 

(Returns all orders that match the shape)

The shape is not implying that all orders have to follow the same shape rules. There could still be orders without, for example, any line-items. These are still valid, but they simply do not match this shape restriction.

Extending a SHACL shape

SHACL allows shapes to be derived from other shapes. For example we might further qualify an order with those that have a shipper specified: a ShippingOrder

This can be expressed in SHACL as follows:

shapes:ShippingOrder
  rdf:type sh:NodeShape ;
  sh:name "ShippingOrder" ;
  sh:node shapes:QualifiedOrder ;
  sh:property [
      rdf:type sh:PropertyShape ;
      sh:maxCount 1 ;
      sh:minCount 1 ;
      sh:path model:shipVia ;
    ];
.

OData2SPARQL Request: 

…/odata2sparql/northwind/shapes_QShippingOrder('NWD~Order-10248')?$select=*&$expand=*&$format=json

OData2SPARQL Response:

(This is the same structure as the QualifiedOrder with the addition of the Shipper details.)

SHACL Northwind User Interface Example

One of the motivations for the use of OData2SPARQL to publish RDF is that it brings together the strength of a ubiquitous RESTful interface standard (OData) with the flexibility, federation ability of RDF/SPARQL. This opens up many popular user-interface development frameworks and tools such as OpenUI5 and SAP WebIDE. This greatly simplifies the creation of user interface applications.

Already OpenUI5 makes it very easy to create a user interface of say Orders. OpenUI5 uses the metadata published by OData (and hence RDF schema published by OData2SPARQL, as illustrated in Really Rapid RDF Graph Application Development)

With the addition of SHACL shapes to OData2SPARQL, it allows us to create a UI/UX derived from the OData metadata, including these SHACL shapes. For example the QualifiedOrders shapes is what we would expect of a typical master-detail UI: the Order is the ‘master’ and the line-items of the order the ‘detail’. With OpenUI5 it is as simple to publish a SHACL shape as it is to publish any OData EntitySet based on an rdfs:Class.

Requesting all QualifiedOrders will show a list of any (not all) orders that satisfy the restrictions in the SHACL nodeShape. Odata2SPARQL does this by constructing a SPARQL query aligned with the nodeShape and containing the same restrictions.

Figure 1: A Grid displaying all QualifiedOrders derived directly from the shape definition

Similarly requesting everything about a particular QualifiedOrder will show a master-detail.

The master displays required salesperson and customer
The detail contains all line items of the order (the shape specifies at least one)
Each line item displays product, quantity, and discount

Figure 2: A Master-Detail Derived Directly from the QualifiedOrder Shape

The benefits of OData2SPARQL+SHACL

The addition of SHACL support to OData2SPARQL enables the model-centric approach of RDF-graphs to drive both a model-centric RESTful interface and model-centric user-interface.

OData: Is it the universal query language for RDBMS, RDF, and other datastores?

Posted on November 20, 2017April 23, 2019 by peterlawrence

OData not only offers a RESTful interface to any datastore to perform CRUD operations, it also provides a very powerful query interface to the underlying datastore making it a candidate for the ‘universal’ query language. However the query capability is often neglected. Why?

OData is just for RDBMS, right?
- Wrong: OData providers exist for multiple datastore types:
  - SQL
  - SPARQL/RDF (http://inova8.com/bg_inova8.com/offerings/odata2sparql/)
  - POJO
  - TopicMaps (https://code.google.com/archive/p/tm2o/)
  - HBase ( https://www.cdata.com/drivers/hbase/odata/ )
  - CouchBase ( https://www.cdata.com/drivers/couchbase/odata )
  - Teiid (http://teiid.jboss.org/)
  - … and many more
REST is just for fetching entities?
- Not the case for OData which allows navigation to related entities, and through those related entities to others, and so on.
Querying needs special language specific to the datastore?
- There is very little that cannot be expressed in an OData query, as hopefully demonstrated in this article

Simple OData Query Example

It is always easier to start simply. So the question I want answered from a Northwind datastore is

Find customers located in France

Given an OData endpoint, the query is simply answered with the following URL:

http://localhost:8080/odata2sparql.v4/NW/

Customer?

$filter=contains(customerCountry,’France’)

The elements of this URL are as follows:

The first line identifies the OData endpoint, in this case it is a local OData2SPARQL (http://inova8.com/bg_inova8.com/offerings/odata2sparql/) endpoint that is publishing a RDF/SPARQL triplestore hosting an RDF version of Northwind (https://github.com/peterjohnlawrence/com.inova8.northwind).

Alternatively you could use a publicly hosted endpoint such as http://services.odata.org/V4/Northwind/Northwind.svc

The next line specifies the entity or entityset, in this case the Customer entitySet that is the start of the query
The final line specifies a filter condition to be applied to the property values of each entity in the entitySet

The partial results are shown below in JSON format, although OData supports other formats

But that is not much more than any custom RESTful API would provide, right? Actually OData querying is far more powerful as illustrated in the next section.

Complex OData Query Example

The question I now want answered from a Northwind datastore is

Get product unit prices for any product that is part of an order placed since 1996 made by customers located in France

One thing is that the terminology used by OData is different than that of RDBMS/SQL, RDF/RDFS/OWL/SPARQL, Graph, POJO, etc. The following table shows the corresponding terms that will be used in this description

OData Terminology Mapping to Relational, RDF, and Graph
Mapped OData Entity	Relational Entity	RDF/RDFS/OWL	Graph
Schema Namespace, EntityContainer Name	Model Name	owl:Ontology	Graph
EntityType, EntitySet	Table/View	rdfs:Class owl:Class	Node
EntityType’s Properties	Table Column	owl:DatatypeProperty	Attribute
EntityType’s Key Properties	Primary Key	URI	Node ID
Navigation Property on EntityType, Association, AssosiationSet	Foreign Key	owl:ObjectProperty	Edge
FunctionImport	Procedure	spin:Query

Given an OData endpoint, the query is answered with the following URL:

http://localhost:8080/odata2sparql.v4/NW/

Customer?

$filter=contains(customerCountry,’France’)&

$select=customerCompanyName,customerAddress,customerContactName&

$expand=hasPlacedOrder(

$filter=orderDate gt 1996-07-05;

$select=shipAddress,orderDate;

$expand=hasOrderDetail(

$select=quantity;

$expand=product(

$select=productUnitPrice

)

)

)

The elements of this URL are much the same as before, but now we nest some queryOptions as we navigate from one related entity to another, just as we would when navigating through a graph or object structure.

Customer entitySet is specified as the start of the query
The $filter specifies the same filter condition as before since we want only customers with an address in France.
Next we include a $select to limit the properties (aka RDBMS column, or OWL DatatypeProperty) of a Customer entity that are returned.
Then the $expand is a query option to specify that we want to move along a navigationProperty (aka RDBMS foreign key, Graph edge or OWL ObjectProperty). In this case the navigationProperty takes us to all orders placed by that customer.

Now we are at a new Order entity we can add queryOptions to this entity

The $filter specifies that only orders after a certain date should be included.
The $select limits the properties of an Order entity that are returned.
The $expand further navigates the query to the order details (aka line items) of this order.

Yet again we are at another entity so we can add queryOptions to this Order_Detail entity:

The $select limits to only the quantity ordered
The $expand yet again navigates the query to the product entity referred to in the Order_Detail

Finally we can add queryOptions to this product entity:

The $select limits only the productUnitPrice to be returned.

The partial results are shown below:

You can test the same query at the public Northwind endpoint, the only difference being the modified names of entitySets, properties, and navigationProperties.

http://services.odata.org

/V4/Northwind/Northwind.svc/Customers?

$filter=contains(Country, ‘France’)&

$select=CompanyName,Address,ContactName&

$expand=Orders(

$filter=OrderDate gt 1996-07-05;

$select=ShipAddress,OrderDate;

$expand=Order_Details(

$select=Quantity;

$expand=Product(

$select=UnitPrice

)

)

)

Structure of an OData Query

The approximate BNF of an OData query is shown below. This is only illustrative, not pure BNF. The accurate version is available here (http://docs.oasis-open.org/odata/odata/v4.0/odata-v4.0-part2-url-conventions.html)

ODataRequest := entity|entitySet?queryOptions

entity a single entity

:= Customer(‘NWD~Customer-GREAL’)

:= Customer(‘NWD~Customer-GREAL’)/hasPlacedOrder(‘NWD~Order-10528’)

entitySet a set of entities

:= Customer

:= Customer(‘NWD~Customer-GREAL’)/hasPlacedOrder

queryOptions The optional conditions placed on the current entity

:= filter ; select ; expand

filter Specify a filter to limit what entities should be included

:= $filter = {filterCondition}*

:= $filter = contains(customerCountry,’France’)

select Specify what properties of the current entity should be returned

:= $select = {property}*

:= $select = customerCompanyName,customerAddress,customerContactName

expand Specify the navigationProperty along which the query should proceed

:= $expand = {navigationProperty(queryOptions)}*

. := $expand = hasPlacedOrder($expand=..)

Perhaps this can be illustrated more clearly (for some!):

Using the same diagram notation the complex query can be seen as traversing nodes in a graph:

Other Notes

OData publishes the underlying model via a $metamodel document, so any application can be truly model-driven rather than hard-coded. This means that OData query builders allow one to access any OData endpoint and start constructing complex queries. Examples include:
- Lens2OData (http://inova8.com/bg_inova8.com/offerings/lens2odata/)
- XOData (https://pragmatiqa.com/xodata/)
- PowerQuery (now known as Get&Transform) (https://support.office.com/en-us/article/Introduction-to-Microsoft-Power-Query-for-Excel-6e92e2f4-2079-4e1f-bad5-89f6269cd605)
The examples in this description use OData V4 syntax ( http://docs.oasis-open.org/odata/odata/v4.0/odata-v4.0-part2-url-conventions.html ). The same queries can be expressed in the older OData V2 syntax (http://www.odata.org/documentation/odata-version-2-0/uri-conventions/), but V4 has changed the syntax to be far more query friendly.
What about really, really complex queries: do I have to define them always via the URL? Of course not because each underlying datastore has a way of composing ‘views’ of the data that can then be published as pseudo-entityTypes via OData:
- RDBMS/SQL has user defined views
- RDF/RDFS/OWL/SPARQL has SPIN queries (http://spinrdf.org/) that can be added as pseudo EntityTypes via OData2SPARQL
By providing a standardized RESTful interface that also is capable of complex queries we can isolate application development and developers from the underlying datastore:
- The development could start with a POJO database, migrate through an RDBMS, and then., realizing the supremacy of RDF, end up with a triplestore without having to change any of the application code. OData can be a true Janus-point in the development

Conclusions

OData is much more than JDBC/ODBC for the 21^st century: it is also a candidate for the universal query language that allows development and developers to be completely isolated from the underlying datastore technology..

Additional Resources

Odata2SPARQL (http://inova8.com/bg_inova8.com/offerings/odata2sparql/)

Lens2Odata (http://inova8.com/bg_inova8.com/offerings/lens2odata/)

SQL2RDF: pump SQL DML immediately to RDF triplestore

Posted on November 15, 2017April 23, 2019 by peterlawrence

SQL2RDFincremental materialization uses a stream of source data changes (DML) and transforms them to the corresponding changes (inserts, deletes) in the triple store ensuring concurrency between RDBMS and triplestore. SQL2RDFincremental materialization is based on the same R2RML mapping used for virtualization (Ontop), and (Capsenta), and materialization (In4mium, and R2RML Parser) thus does not incur an additional configuration maintenance problem.

Mapping an existing RDBMS to RDF has become an increasingly popular way of accessing data when the system-of-record is not, or cannot be, in a RDF database. Despite its attractiveness, virtualization is not always possible for various reasons such as performance, the need for full SPARQL 1.1 support, the need to reason over the virtualized as well as other materialized data, etc. However materialization of an existing RDBMS to RDF is also not an ideal alternative. Reasons include the inevitable lack of concurrency between the system-of-record and the RDF.

Thus incremental materialization provides an alternative between virtualization and materialization, offering:

Easy ETL of source RDB as no additional configuration required other than the same R2RML required for virtualization or bulk materialization.
Improved concurrency of data compared with materialization.
Significantly less computational overhead than a full materialization of the source data.
Improved query performance compared with virtualization especially when reasoning is required.
Compatibility with R2RML thus reducing configuration maintenance.
Supports insert, update, and delete changes on the source RDBMS.
Source transactions can be batched, and the changes to the triplestore are part of a single transaction that can be rolled back in the event of a problem.
Supports change logging so that committed changes can be rolled back.

Contact inova8 if you would like to try SQL2RDF.

De-geeking Data-model Descriptions

Posted on September 13, 2017April 23, 2019 by peterlawrence

When a subject-matter-expert (SME) describes their data domain they use English language. In response we give them visually beautiful, but often undecipherable, diagrams. Here we propose an alternative: a ‘word-model’ that describes the model in structured English without loss of accuracy or completeness.

Creating an ontological model invariably involves a subject-matter-expert (SME) who has an in-depth knowledge of the domain to be modelled working together with a data modeler or ontologist who will translate the domain into an ontological model.

When a subject-matter-expert (SME) is in discussion with a data modeler, they will be describing their information using English language:

“I have customers who are assigned to regions, and they create orders with a salesperson who prepares the order-details and assigns a shipper that operates in that region”

In response a data modeler will offer the subject-matter-expert (SME) an E-R diagram or an ontology graph!

Figure 1: Geek Model Diagrams

No wonder we are regarded as geeks! Although visually appealing, they are difficult for an untrained user to verify the model that the diagram is describing.

Instead we could document the model in a more easily consumed way with a ‘word-model’. This captures the same details of the model that are present in the various diagrams, but uses structured English and careful formatting as shown below.

aCustomer

places anOrder

made by anEmployee

who reports to anotherEmployee

and who operates in aTerritory

and contains anOrderDetail

refers to aProduct

which is supplied by aSupplier

and which categorized as aCategory

and is shipped by aShipper

and is assigned to aRegion

which has aTerritory

and belongs to aRegion

The basic principle is that we start at any entity, in this case ‘Customer’ and traverse the edges of the graph that describe that entity. At each new node we indent. Each line is a combination of the predicate and range class. No rules regarding order in which we visit the nodes and edges. No rules regarding depth. The only rule should be that, if we want a complete word-model, the description traverses all edges in one direction or the other.

These word-models are useful in the model design phase, as they are easy to document using any editor. Since the word-model is in ‘plain English’, it is easy for a SME to verify the accuracy of the model, and mark up with comments and edits. However word-models are also easy to generate from an RDF/OWL model.

Enhancements to Word-Model

We can refine the contents of the word-model as we develop the model with the SME. We can also enhance the readability by decorating the word-model text with the following fonts:

Word-Model Legend

Italics indicate an instance of a class, a thing

Bold indicates a class

Underline is a predicate or property that relates instances of classes

BoldItalic is used for cardinality expressions

Level 1a: Add the categorization of entities

Rather than using an example of an entity, we can qualify with the class to which the sample entity belongs.

aCustomer, a Customer

places anOrder, an Order

made by anEmployee, an Employee

who reports to anotherEmployee, an Employee

and who operates in aTerritory, a Territory

and contains anOrderDetail, an OrderDetail

and refers to aProduct, a Product

which is supplied by aSupplier, a Supplier

and which is categorized as aCategory, a Category

and is shipped by aShipper, a Shipper

and is assigned to aRegion, a Region

which has aTerritory, a Territory

and belongs to aRegion, a Region

Level 1b: Add cardinality of predicates

We can also add the cardinality as a modifier between the predicate and the entity:

aCustomer

who places zero, one or more orders

each made by one anEmployee

who reports to zero or one anotherEmployee

and who operates in one or more aTerritorys

and each contains one or more anOrderDetails

which refers to one aProduct

which is supplied by one aSupplier

and which is categorized as one aCategory

and is shipped by one aShipper

and is assigned to one aRegion

which has one or more aTerritorys

and belongs to one aRegion

Level 2: Add categorization and cardinality

Of course we can combine these extensions into a single word-modol as shown below.

aCustomer, a Customer

who places zero, one or more orders, each an Order

each made by one anEmployee, an Employee

who reports to zero or one anotherEmployee, an Employee

and who operates in one or more aTerritorys, each a Territory

and each contains one or more anOrderDetails, each an OrderDetail

which refers to one aProduct, a Product

which is supplied by one aSupplier, a Supplier

and which is categorized as one aCategory, a Category

and is shipped by one aShipper, a Shipper

and is assigned to one aRegion, a Region

which has one or more aTerritorys, each a Territory

and belongs to one aRegion, a Region

Despite the completeness of what is being described by this word-model, it is still easy to read by SMEs.

Auto-generation of Word-Model

Once the word-model has been formally documented in RDF/OWL, we can still use a word-model to document the RDF/OWL by auto-generating a word-model from the underlying RDF/OWL ontology as shown below.

Figure 2: Word-Model

This was generated using a SPIN magic-property as follows:

select ?modelDefinition
{
   (model:Customer 4 false) :modelDefinition ?modelDefinition .
}

This auto-generation can go further by including the datatype properties associated with each entity as shown below:

Figure 3: Word-Model including datatype properties

This was generated using a SPIN magic-property as follows:

select ?modelDefinition
{
   (model:Customer 4 true) :modelDefinition ?modelDefinition .
}

Appendix

SPIN Magic Properties:

The following SPIN properties were defined for auto generation of the word-model in HTML format:

classProperties

SELECT ?classProperties
WHERE {
{SELECT  ?arg1   (    IF(?arg2,  CONCAT("(",?properties,")"),"") as ?classProperties) WHERE
    {
        SELECT ?arg1 ((GROUP_CONCAT(CONCAT("<i>", ?dataPropertyLabel, "</i>"); SEPARATOR=', ')) AS ?properties)
        WHERE {
           # BIND (model:Product AS ?arg1) .
            ?arg1 a ?class .
            FILTER (?class IN (rdfs:Class, owl:Class)) .
            ?arg1 rdfs:label ?classLabel .
            ?dataProperty a owl:DatatypeProperty .
            ?dataProperty rdfs:domain ?arg1 .
            ?dataProperty rdfs:label ?dataPropertyLabel .
        }
        GROUP BY ?arg1
    }
}
}

classDefinition

SELECT ?classDefinition ?priorPath
WHERE {
    {
        SELECT ?arg1 ?arg2  ((GROUP_CONCAT( ?definition; SEPARATOR='<br/>and that ')) AS ?classDefinition)  ((GROUP_CONCAT( ?pastPath; SEPARATOR='\t')) AS ?priorPath)
        WHERE {
           ?arg1 a ?class . FILTER( ?class in (rdfs:Class, owl:Class ))
            ?arg1 rdfs:label ?classLabel .
            ?objectProperty a owl:ObjectProperty .
            {
                        ?objectProperty rdfs:domain ?arg1 .
                        ?objectProperty rdfs:label ?objectPropertyLabel .
                        ?objectProperty rdfs:range ?nextClass .
                        ?nextClass rdfs:label ?nextClassLabel .   BIND(?objectProperty as ?property)
            }UNION{
                        ?objectProperty  owl:inverseOf ?inverseObjectProperty .
                        ?objectProperty rdfs:domain  ?nextClass.
                        ?inverseObjectProperty rdfs:label ?objectPropertyLabel .
                        ?objectProperty rdfs:range ?arg1 .
                        ?nextClass rdfs:label ?nextClassLabel .    BIND(?inverseObjectProperty as ?property)
            }UNION{
                        ?inverseObjectProperty  owl:inverseOf ?objectProperty .
                        ?objectProperty rdfs:domain  ?nextClass.
                        ?inverseObjectProperty rdfs:label ?objectPropertyLabel .
                        ?objectProperty rdfs:range  ?arg1 .
                        ?nextClass rdfs:label ?nextClassLabel .    BIND(?inverseObjectProperty as ?property)
            }
#Stop from going too deep
            BIND(?arg2 -1 as ?span) FILTER(?span>0). 
            ?nextClass a ?nextClassClass. FILTER( ?nextClassClass in (rdfs:Class, owl:Class ))
#,  odata4sparql:Operation))  .
#Do not process an already processed arc (objectProperty)         
            BIND(CONCAT(?arg4,"\t",?objectPropertyLabel) as ?forwardPath) FILTER( !CONTAINS(?arg4, ?objectPropertyLabel ))  
            (?nextClass ?span ?arg3 ?forwardPath ) :classDefinition (?nextClassDefinition  ?nextPath).           
#Do not include if arc (objectProperty) appears already   
            FILTER(  !CONTAINS(?nextPath, ?objectPropertyLabel )) BIND(CONCAT( ?objectPropertyLabel, IF(?nextPath="","",CONCAT("\t",?nextPath))) as ?pastPath)
                                    (?nextClass ?arg3) :classProperties ?nextClassProperties .
            BIND (CONCAT("<u>",?objectPropertyLabel , "</u> <b>", ?nextClassLabel, "</b>",  ?nextClassProperties, IF ((?nextClassDefinition!=""), CONCAT("<br/><blockquote>that ",  ?nextClassDefinition, "</blockquote>"), "")  ) as ?definition)
        }
        GROUP BY ?arg1 ?arg2
    } .
}

modelDefintion

SELECT ?modelDefinition
WHERE {
    {
        SELECT ?arg1 ?arg2 ?arg3 ((CONCAT("<b>", ?classLabel, "</b>", ?nextClassProperties, "<blockquote>that ", ?classDefinition, "</blockquote>")) AS ?modelDefinition)
        WHERE {
           # BIND (model:Order AS ?arg1) .  BIND (4 AS ?arg2) . BIND (false AS ?arg3) .
            ( ?arg1 ?arg2 ?arg3 "") :classDefinition (?classDefinition "") .
            ( ?arg1 ?arg3 ) :classProperties ?nextClassProperties .
            ?arg1 rdfs:label ?classLabel .
        }
    } .
}

Usage

The following will return the HTML description of the model, starting with model:Customer, stopping at a depth of 4 in any direction, and not including the datatypeproperty definition.

select ?modelDefinition
{
   (model:Customer 4 false) :modelDefinition ?modelDefinition .
}

Really Rapid RDF Graph Application Development

Posted on June 26, 2017April 23, 2019 by peterlawrence

Let’s face it, RDF Graph datastores have not become the go-to database for application development like MySQL, MongoDB, and others have. Why? It is not that they cannot scale to handle the volume and velocity of data, nor the variety of structured and unstructured types.

Perhaps it is the lack of application development frameworks integrated with RDF. After all any application needs to not only store and query the data but provide users with the means to interact with that data, whether it be data entry forms, charts, graphs, visualizations and more.

However application development frameworks target popular back-ends accessible via JDBC and, now we are in the 21century, OData. RDF and SPARQL are not on their radar … that is unless we wrap RDF with OData so that the world of these application development environments is opened up to RDF Graph datastores.

OData2SPARQL provides that Janus-inflexion point, hiding the nuances of the RDF Graph behind an OData service which can then be consumed by powerful development environments such as OpenUI5 and WebIDE.

This article shows how an RDF Graph CRUD application can be rapidly developed, yet without losing the flexibility that HTML5/JavaScript offers, from which it can be concluded that there is no reason preventing the use of RDF Graphs as the backend for production-capable applications.

A video of this demo can be found here: https://youtu.be/QFhcsS8Bx-U

Figure 1: OData2SPARQL: the Janus-Point between RDF data stores and application development

Rapid Application Development Environments

There are a great number of superb application development frameworks that allow one to create cross platform (desktop, web, iOS, and Android), rich (large selection of components such as grids, charts, forms etc) applications. Most of these are based on the MVC or MVVM model both of which require a systematic and complete (CRUD) access to the back-end data via a RESTful API. Now that OData has been adopted by OASIS, the number of companies offering explicit support for OData is increasing, ranging from Microsoft, IBM, and SAP. Similarly there are a number of frameworks, one of which is SAPUI5 which has an open source version OpenUI5.

OpenUI5

OpenUI5 is an open source JavaScript UI library, maintained by SAP and available under the Apache 2.0 license. OpenUI5 lets you build enterprise-ready web applications, responsive to all devices, running on almost any browser of your choice. It’s based on JavaScript, using JQuery as its foundation, and follows web standards. It eases your development with a client-side HTML5 rendering library including a rich set of controls, and supports data binding to different models (JSON, XML and OData).

With its extensive support for OData, combining OpenUI5 with OData2SPARQL releases the potential of RDF Graph datasources for web application development.

Web IDE

SAP Web IDE is a powerful, extensible, web-based integrated development tool that simplifies end-to-end application development. Since it is built around using OData datasources as its provider, then WebIDE can be used as a semantic application IDE when the RDF Graph data is published via OData2SPARQL.

WebIDE runs either as a cloud based service supplied free by SAP, or can be downloaded as an Eclipse ORION application. Since the development is probably against a local OData endpoint, then the latter is more convenient.

RDF Graph application in 5 steps:

Deploy OData2SPARQL endpoint for chosen RDF Graph store

The Odata2SPARQL war is available here, together with instructions for configuring the endpoints: odata2sparql.v2

The endpoint is this example is against an RDF-ized version of the ubiquitous Northwind databases. This RDF graph version can be downloaded here: Northwind

Install WebIDE

Instructions for installing the Web IDE Personal edition can be found here: SAP Web IDE Personal Edition

Add OData service definition

Once installed an OData service definition file (for example NorthwindRDF) can be added to the SAPWebIDE\config_master\service.destinations\destinations folder, as follows

Description=Northwind

Type=HTTP

TrustAll=true

Authentication=NoAuthentication

Name=NorthwindRDF

ProxyType=Internet

URL=http://localhost:8080

WebIDEUsage=odata_gen

WebIDESystem=NorthwindRDF

WebIDEEnabled=true

Create new application from template

An application can be built using a template which is accessed via File/New/Project from Template. In this example the “CRUD Master-Detail Application” was selected.

The template wizard needs a Data Connection: choose Service URL, select the data service (NorthwindRDF) and enter the path of the particular endpoint (/odata2sparql/2.0/NW/).

Figure 2: WEB IDE Data Connection definition

At this stage the template wizard allows you to browse the endpoint’s entityTypes and properties, or classes and properties in RDF graph-speak.

Since the Web IDE and OpenUI5 is truly model driven, the IDE creates a master-detail given the entities that you define in the next screen.

The template wizard will ask you for the ‘object’ entityType which in this example is Category. Additionally you should enter the ‘line item’ but in this case there is only one navigation property (aka objectproperty in RDF graph-speak) which is the products that belong to the category.

Note that this template allows other fields to be defined, so titls and productUnitprice were selected.

Figure 3: WEB IDE Template Customization

Run the application

The application is now complete and can be launched from the IDE. Right-click the application and select Run/Run as/As web application:

Figure 4: Application

Even this templated application is not limited to browsing data: it allows full editing of the categories. Note that even the labels are derived from the OData endpoint, but of course they can be changed by editing the application.

Figure 5: Edit Category

Alternatively a new category can be added:

Figure 6: Create New Category

Next Steps

That’s it: a fully functional RDF Graph web application built without a single line of code being written. However we can chose to use this just as a starting point:

Publish data views, the SPARQL equivalent of a SQL view on the data, to the OData2SPARQL endpoint.
- These are particularly useful when publishing reports in OpenUI5, Excel PowerQuery, Spotfire, etc.
Modify the model that is published via the OData2SPARQL endpoint
- The model that is published is extracted from the schema defined in the endpoint configuration. Thus the schema can be changed to suit what one wants in the endpoint metamodel.
Edit the templated application
- The templated application is just a starting point for adaptation: the code that is created is nothing more than standard HTML5/JavaScript using the OpenUI5 libraries.
Build the application from first-principles
- A template is not always the best starting point, so an application can always be built from first-principles. However the OpenUI5 libraries make this much easier by, for example, making the complete metamodel of the OData2SPARQL endpoint available within the application simply by defining that endpoint as the datasource.

Linked Enterprise Data (LED): how to create an Information Shopping Bazaar

Posted on April 27, 2017April 23, 2019 by peterlawrence

Integration Problem to be solved

Data in different databases, even with Linked Open data sources.
Misaligned models, different datasets have different meanings for classes and predicates that need to be aligned.
Misaligned names for the same concepts.
Replication is problematical.
Query definition and scope of querying difficult to define in advance.
Provence of data necessary.
Cannot depend on inferences being available in advance
Scalable architecture requires that all queries are stateless

Data Cathedrals versus Information Shopping Bazaars

Linked Open Data has been growing since 2007 from a few (12) interconnected datasets to 295 as of 2011, and it continues to grow. To quote “Linked Data is about using the Web to connect related data that wasn’t previously linked, or using the Web to lower the barriers to linking data currently linked using other methods.” (Linked Data, n.d.)

Figure 1: Growth of the Linked Data ‘Cloud’

As impressive as the growth of interconnected datasets is, what is more important is the value of that interconnected data. A corollary of Metcalf’s law suggests that the benefit gained from integrated information grows geometrically[1] with the number of data communities that are integrated.

Many organizations have their own icebergs of information: operations, sales, marketing, personnel, legal, finance, research, maintenance, CRM, document vaults etc. (Lawrence, 2012) Over the years there have been various attempts to melt the boundaries between these icebergs including the creation of the mother-of-all databases that houses (or replicates) all information or the replacement of disparate applications with their own database with a mother-of-all application that eliminates the separate databases. Neither of these has really succeeded in unifying any or all data within an organization. (Lawrence, Data cathedrals versus information bazaars?, 2012). The result is a ‘Data Cathedral’ through which users have no way to navigate to find the information that will answer their questions.

Figure 2: Users have no way to navigate through the Enterprise’s Data Cathedral

Remediator at the heart of Linked Enterprise Data

Can we create an information shopping bazaar for users to answer their questions without committing heresy in the Data Cathedral? Can we create the same information shopping bazaar as Linked Data within the Enterprise: Linked Enterprise Data (LED). That is the objective of Remediator.

First of all we must recognize that the enterprise will have many structured, aggregated, and unstructured data stores already in place:

Figure 3: Enterprise Structured, Aggregated, and Unstructured Data Icebergs

One of the keys to the ability of Linked Data to interlink 300+ datasets is that they are all are expressed as RDF. The enterprise does not have the luxury of replicating all existing data into RDF datasets. However that is not necessary (although still sometimes desirable) because there are adapters that can make any existing dataset look as if it contains RDF, and can be accessed via a SPARQLEndpoint. Examples are listed below

D2RQ: (D2RQ: Accessing Relational Databases as Virtual RDF Graphs )
Ultrawrap:(Research in Bioinformatics and Semantic Web/Ultrawrap)
Ontop:(-ontop- is a platform to query databases as Virtual RDF Graphs using SPARQ)

Attaching these adapters to existing data-stores, or replicating existing data into a triple store, takes us one step further to the Linked Enterprise Data:

Figure 4: Enterprise Data Cloud, the first step to integration

Of course now that we have harmonized the data all as RDF accessible via a SPARQLEndpoint we can view this as an extension of the Linked Data cloud in which we provide enterprises users access to both enterprise and public data:

Figure 5: Enterprise Data Cloud and Linked Data cloud

We are now closer to the information shopping bazaar, since users would, given appropriate discovery and searching user interfaces, be able to navigate their own way through this data cloud. However, despite the harmonization of the data into RDF, we still have not provided a means for users ask new questions:

What Company (and their fiscal structure) are we working with that have a Business Practise of type Maintenance for the target industry of Oil and Gas with a supporting technology based on Vendor-Application and this Application is not similar to any of our Application?

Such questions require pulling information from many different sources within an organization. Even with the Enterprise Data Cloud one has provided the capability to discover such answers. Would it not be better to allow a user to ask such a question, and let the Linked Enterprise Data determine from where it should pull partial answers which it can then aggregate into the complete answer to the question. It is like asking a team of people to answer a complex question, each contributing their own, and then assembling the overall answer rather than relying on a single guru. Remediator has the role of that team, taking parts of the questions and asking that part of the question of the data-sources.

Figure 6: Remediator as the Common Entry Point to Linked Enterprise Data (LED)

Thus our question can become:

What Business Practise of type Maintenance for the target industry of Oil and Gas?
What Company are we working with?
What Company have a Business Practise of type Maintenance?
What Business Practise with a supporting technology based on Vendor- Application?
What Company (and their fiscal structure)?
What Vendor-Application and this Application is not similar to any of our Application?

This decomposition of a question into sub-questions relevant to each dataset is automated by Remediator:

Figure 7: Sub-Questions distributed to datasets for answers

Requirements for a Linked Enterprise Data Architecture

Keep it simple
Do not re-invent that which already exists.
Eliminate replication where possible.
Avoid the need for prior inferencing.
Efficient query performance.
Provide provenance of results.
Provide optional caching for further slicing and dicing of result-set.
Use Void only Void and nothing but Void to drive the query

[1] If I have 10 database systems running my business that are entirely disconnected, then the benefits are 10 * K, where K is some constant. If I integrate these databases in pairs (operations + accounting, accounting + payroll, etc), then the benefits increase to 10 * K * 2. If I integrate in threes, (operations + accounting + maintenance, accounting + payroll + receiving, etc), then the benefits increase four-fold (a corollary of Metcalf’s law) to 10 * K * 4. For quad-wise integration my benefits would be 10 * K * 8 and so on. Now it might not be 8 fold but the point is there is a geometric, not linear, growth in benefits as I integrate all of my information across my organization.

OData2SPARQL: Consuming SPARQL/RDF with OData RESTful interface

Posted on April 25, 2017April 23, 2019 by peterlawrence

OData2SPARQL is an OData proxy protocol convertor for any SPARQL/RDF triplestore. To compare SPARQL with OData is somewhat misleading. After all SPARQL has its roots as a very powerful query language for RDF data, but is not intended as a RESTful protocol. Similarly OData has its roots as an abstract interface to any type of datastore, not as a specification of that datastore. Some have said “OData is the equivalent of ODBC for the Web”.

The data management strengths of SPARQL/RDF can be combined with the application development strengths of OData with a protocol proxy: OData2SPARQL, a Janus-point between the application development world and the semantic information world.

Figure 1: OData2SPARQL Proxy between Semantic data and Application consumers

What is OData?

OData is a standardized protocol for creating and consuming data APIs. OData builds on core protocols like HTTP and commonly accepted methodologies like REST. The result is a uniform way to expose full-featured data APIs (Odata.org). Version 4.0 has been standardized at OASIS, and was released in March 2014.

OData RESTful APIs are easy to consume. The OData metadata, a machine-readable description of the data model of the APIs, enables the creation of powerful generic client proxies and tools. Some have said “OData is the equivalent of ODBC for the Web” (OASIS Approves OData 4.0 Standards for an Open, Programmable Web, 2014). Thus a comprehensive ecosystem of applications, and development tools, has emerged a few of which are listed below:

LINQpad: LINQPad is a tool for building OData queries interactively.
OpenUI5is an open source JavaScript UI library, maintained by SAP and available under the Apache 2.0 license. OpenUI5 lets you build enterprise-ready web applications, responsive to all devices, running on almost any browser of your choice. It’s based on JavaScript, using JQuery as its foundation, and follows web standards. It eases your development with a client-side HTML5 rendering library including a rich set of controls, and supports data binding to different models including OData.
Power Query for Excel is a free Excel add-in that enhances the self-service Business Intelligence experience in Excel by simplifying data discovery, access and collaboration.
Tableau – an excellent client-side analytics tool – can now consume OData feeds
Teiid allows you to connect to and consume sources (odata services, relational data, web services, file, etc.) and deploy a single source that is available as an OData service out-of-the-box.
Telerik not only provides native support for the OData protocol in its products, but also offers several applications and services which expose their data using the OData protocol.
TIBCO Spotfire is a visual data discovery tool which can connect to OData feeds.
Sharepoint: Any data you’ve got on SharePoint as of version 2010 can be manipulated via the OData protocol, which makes the SharePoint developer API considerably simpler.
XOData is a generic web-based OData Service visualization & exploration tool that will assist in rapid design, prototype, verification, testing and documentation of OData Services.
…

OData vs SPARQL/RDF

To compare SPARQL with OData is somewhat misleading. After all SPARQL has its roots as a very powerful query language for RDF data, and is not intended as a RESTful protocol. Similarly OData has its roots as an abstract interface to any type of datastore, not as a specification of that datastore. Recently JSON-LD has emerged (Manu Sporny, 2014), providing a method of transporting Linked Data using JSON. However JSON-LD focusses on the serialization of linked data (RDF) as JSON rather than defining the protocol for a RESTful CRUD interface. Thus it is largely an alternative to, say, Turtle or RDF/XML serialization format.

OData and SPARQL/RDF: Contradictory or Complimentary?
	OData	SPARQL/RDF
Strengths	· Schema discovery · OData provides a data source neutral web service interface which means application components can be developed independently of the back end datasource. · Supports CRUD · Not limited to any particular physical data storage · Client tooling support · Easy to use from JavaScript · Growing set of OData productivity tools such as Excel, SharePoint, Tableau and BusinessObjects. · Growing set of OData frameworks such as SAPUI5, OpenUI5, and KendoUI · Growing set of independent development tools such as LINQPad, and XOdata · Based on open (OASIS) standards after being initiated by Microsoft · Strong commercial support from Microsoft, IBM, and SAP. · OData is not limited to traditional RDBMS applications. Vendors of real-time data such as OSI are publishing their data as an OData endpoint.	· Extremely flexible schema that can change over time. · Vendor independent. · Portability of data between triple stores. · Federation over multiple, disparate, data-sources is inherent in the intent of RDF/SPARQL. · Increasingly standard format for publishing open data. · Linked Open Data expanding. · Identities globally defined. · Inferencing allows deduction of additional facts not originally asserted which can be queried via SPARQL. · Based on open (W3C) standards
Weaknesses	· Was perceived as vendor (Microsoft) protocol · Built around the use of a static data-model (RDBMS, JPA, etc) · No concept of federation of data-sources · Identities defined with respect to the server. · Inferencing limited to sub-classes of objects	· Application development frameworks that are aligned with RDF/SPARQL limited. · Difficult to access from de-facto standard BI tools such as Excel. · Difficult to report using popular reporting tools

Table 1: ODATA AND SPARQL/RDF: CONTRADICTORY OR COMPLIMENTARY?

OData2SPARQL: OData complementing RDF/SPARQL

The data management strengths of SPARQL/RDF can be combined with the application development strengths of OData with a protocol proxy: OData4SPARQL. OData4SPARQL is the Janus-point between the application development world and the semantic information world.

Brings together the strength of a ubiquitous RESTful interface standard (OData) with the flexibility, federation ability of RDF/SPARQL.
SPARQL/OData Interop proposed W3C interoperation proxy between OData and SPARQL (Kal Ahmed, 2013)
Opens up many popular user-interface development frameworks and tools such as OpneUI5.
Acts as a Janus-point between application development and data-sources.
User interface developers are not, and do not want to be, database developers. Therefore they want to use a standardized interface that abstracts away the database, even to the extent of what type of database: RDBMS, NoSQL, or RDF/SPARQL
By providing an OData4SPARQL server, it opens up any SPARQL data-source to the C#/LINQ development world.
Opens up many productivity tools such as Excel/PowerQuery, and SharePoint to be consumers of SPARQL data such as Dbpedia, Chembl, Chebi, BioPax and any of the Linked Open Data endpoints!
Microsoft has been joined by IBM and SAP using OData as their primary interface method which means there will many application developers familiar with OData as the means to communicate with a backend data source.

Consuming RDF via OData: OData2SPARQL

All of the following tools are demonstrated accessing an RDF triple store via the OData2SPARQL porotocol proxy.

Development Tools

XOData

A new online OData development is XOData from (PragmatiQa, n.d.). Unlike other OData tools, XOData renders very useful relationship diagrams. The Northwind RDFD model published via OData4SPARQL endpoint is shown below:

Figure 2: Browsing the EDM model Published by ODaTa2SPARQL using XOData

XOData also allows the construct of queries as shown below:

Figure 3: Querying The OData2SPARQL Endpoints Using XODATA

LINQPad

(LINQPad, n.d.) is a free development tool for interactively querying databases using C#/LINQ. Thus it supports Object, SQL, EntityFramework, WCF Data Services, and, most importantly for OData4SPARQWL, OData services. Since LINQPad is centered on the Microsoft frameworks, WCF, WPF etc, this illustrates how the use of OData can bridge between the Java worlds of many semantic tools, and the Microsoft worlds of corporate applications such as SharePoint and Excel.

LINQPad shows the contents of the EDM model as a tree. One can then select an entity within that tree, and then create a LINQ or Lambda query. The results of executing that query are then presented below in a grid.

Figure 4: Browsing and Querying the OData2SPARQL Endpoints Using LINQPad

LINQPad and XOData are good for testing out queries against any datasource. Therefore this also demonstrates using the DBpedia SPARQL endpoint as shown below:

Figure 5: Browsing DBPedia SPARQLEndpoint Using LINQPad via OData2SPARQL

Browsing Data

One of the primary motivations for the creation of OData2SPARQL is to allow access to Linked Open Data and other SPARQLEndpoints from the ubiquitous enterprise and desktop tools such as SharePoint and Excel.

Excel/PowerQuery

“Power Query is a free add-in for Excel 2010 and up that provide users an easy way to discover, combine and refine data all within the familiar Excel interface.” (Introduction to Microsoft Power Query for Excel, 2014)

PowerQuery allows a user to build their personal data-mart from external data, such as that published by OData2SPARQL. The user can fetch data from the datasource, add filters to that data, navigate through that data to other entities, and so on with PowerQuery recording the steps taken along the way. Once the data-mart is created it can be used within Excel as a PivotTable or a simple list within a sheet. PowerQuery caches this data, but since the steps to create the data have been recorded, it can be refreshed automatically by allowing PowerQuery to follow the same processing steps. This feature resolves the issue of concurrency in which the data-sources are continuously being updated with new data yet one cannot afford to always query the source of the data. These features are illustrated below using the Northwind.rdf endpoint published via OData2SPARQL:

Figure 6: Browsing the OData4SPARQL Endpoint model with PowerQuery

Choosing an entity set allows one to start filtering and navigating through the data, as shown in the ‘Applied Steps’ frame on the right.

Note that the selected source is showing all values as ‘List’ since each value can have zero, one, or more values as is allowed for RDF DatatypeProperties.

Figure 7: Setting Up Initial Source of Data in PowerQuery

As we expand the data, such as the companyProperty, we see that the Applied Steps records the steps take so that they can be repeated.

Figure 8: Expanding Details in PowerQuery

The above example expanded a DatatypeProperty collection. Alternatively we may navigate through a navigation property such as Customer_orders, the orders that are related to the selected customer:

Figure 9: Navigating through related data with PowerQuery

Once complete the data is imported into Excel:

Figure 10: Importing data from OData2SPARQL with PowerQuery

Unlike conventional importing of data into Excel, the personal data-mart that was created in the process of selecting the data is still available.

Application Development Frameworks

There are a great number of superb application development frameworks that allow one to create cross platform (desktop, web, iOS, and Android), rich (large selection of components such as grids, charts, forms etc) applications. Most of these are based on the MVC or MVVM model both of which require a systematic and complete (CRUD) access to the back-end data via a RESTful API. Now that OData has been adopted by OASIS, the number of companies offering explicit support for OData is increasing, ranging from Microsoft, IBM, and SAP to real-time database vendors such as OSI. Similarly there are a number of frameworks, one of which is SAPUI5 (UI Development Toolkit for HTML5 Developer Center , n.d.) which has an open source version OpenUI5 (OpenUI5, n.d.).

SAPUI5

SAPUI5 is an impressive framework which makes MVVC/MVVM application development easy via the Eclipse-based IDE. Given that OData4SPARQL publishes any SPARQLEndpoint as an OData endpoint, it means that this development environment is immediately available for an semantic application development. The following illustrates a master-detail example against the Northwind.rdf SPARQL endpoint via OData4SPARQL.

Figure 11: SAPUI5 Application using OData4SPARQL endpoint

Yes we could have cheated and used the Northwind OData endpoint directly, but the Qnames of the Customer ID and Order Number reveals that the origin of the data is really RDF.

Handling Contradictions between OData and RDF/SPARQL

RDF is an extremely powerful way of expressing data, so a natural question to ask is what could be lost when that data is published via an OData service. The answer is very little! The following table lists the potential issues and their mitigation:

Issue	Description	Mitigation
OData 3NF versus RDF 1NF	RDF inherently supports multiple values for a property, whereas OData up to V2 only supported scalar values	Odata V3+ supports collections of property values, which are supported by OData4SPARQL proxy server
RDF Language tagging	RDF supports language tagging of strings	OData supports complex types, which are used to map a language tagged string to a complex type with separate language tag, and string value.
DatatypeProperties versus object-attributes	OWL DatatypeProperties are concepts independent of the class, bound to a class by a domain, range or OWL restriction. Conversely OData treats such properties bound to the class.	In OData4SPARQL The OWL DatatypeProperty is converted to an OData EntityType property for each of the DatatypeProperty domains.
Multiple inheritance	Odata only supports single inheritance via the OData baseType declaration within an EntityType definition.
Multiple domain properties	An OWL ObjectProperty will be mapped to an OData Association. An Association can be between only one FromRole and one ToRole and the Association must be unique within the namespace.	OData Associations are created for each domain. The OData4SPARQL names these associations {Domain}_{ObjectProperty}, ensuring uniqueness.
Cardinality	The capabilities of OData V3 allow all DatatypeProperties to be OData collections. However the ontology might have further restrictions on the cardinality.	OData4SPARQL assumes cardinality will be managed by the backend triple store. However in future versions, if the cardinality is restricted to one or less, then the EntityType property can be declared as a scalar rather than a collection.

Table 2: Contradictions between OData and RDF/SPARQL

Availability of OData2SPARQL

Two versions of OData2SPARQL are freely available as listed below:

inova8.odata2sparql.v2 : OData V2 based on the Olingo.V2 library supporting OData Version 2
inova8.odata2sparql.v4 : OData V4 based on the Olingo.V4 library supporting OData V4 (in progress)