IntelligentGraph embeds calculation and analysis capability within RDF knowledge graphs, rather than forcing analysis to be undertaken by exporting query results to external analysis applications such as Excel.

IntelligentGraph achieves this by embedding scripts into the RDF knowledge graph which are evaluated when queried with SPARQL.

Scripts, written in a variety of languages such as JavaScript, Java, and Python, access the underlying graph using simple pathPatternQL navigation.

Why IntelligentGraph?

At present calculations over stored data are either delivered by custom code or exporting the stored data to spreadsheets. The data behind these tools is inevitably tabular. In fact, so dominant are spreadsheets for analysis that the spreadsheet itself becomes the ‘database’ with the inherent difficulties of syncing that data with the source system of record.

The real-world is better represented as a network or graph of interconnected things Therefore a knowledge graph is a far better storage organization than tables or objects. However, there is still the need to perform ad hoc numerical analysis over this data. 

RDF DataCube can help organize data for analysis, but still the analysis has to be performed externally. Confronted with this dilemma, knowledge graph data would typically be exported in tabular form to a datamart or directly into, yet again, a spreadsheet where the analysis could be performed.

IntelligentGraph turns this approach on its head by embedding the calculations as scripts within the knowledge graph. These scripts are evaluated on query, and utilise the data in situ: no concurrency issues.. This allows the calculations to have knowledge of its neighbouring nodes and edges, just like Excel cells can access other cells in the spreadsheet. Access to other nodes within the graph uses pathPatternQL navigation.

Example Data and Analysis

An Industrial Internet of Things (IIoT) application is connecting all the measurements about a process plant, such as an oil refinery, into a knowledge graph that relates the measurements to the material flows through the process equipment.

Although there is an abundance of measurements and laboratory analyses available, the values required for operating and performance monitoring are not (and mostly cannot) be directly measured. 

For example:

  • Stream Mass-Flow: direct mass flow measurements are rare. Instead, a volume flow measurement is used in conjunction with a measured material density to calculate the mass-flow
  • Unit Mass Flow Throughput: this is calculated by summing either all feed stream mass flows or product stream mass flows.
  • Unit Mass Balance: this is calculated by differencing the feed from product mass flows
  • Product Stream Yield: this is the ratio of a stream’s mass-flow to the unit to which the stream is connected throughput.

Figure 1: Typical Process Flow Sheet

These are simple examples; however, they show the reliance on the knowledge graph structure to perform the analysis.

Solving data analysis, the traditional way

Data is in the database, analysis is done by the analysis engine (aka Excel), right?

Figure 2: Data analysis the traditional way

In this scenario, the local power user sets up a query to export data from the database and converts to a format that can be imported into Excel. Ever increasingly complex formulae are then written to wrangle the data into the results that are required.

Why is the spreadsheet approach risky?

  • The analysis is now separated from the data. Data changes will not be reflected in the analysis. Worse still, changes to the analysis might not be propagated to all the spawned copies of the spreadsheet.
  • The data is separated from the analysis. The analysis results are rarely re-imported into the data store where data vs analysis could be performed. Instead, even more data is extracted into the spreadsheet.
  • The difficultly of managing the separation of data from analysis becomes so great that in many cases the database is dispensed with entirely and the spreadsheet becomes the de-facto database.

Solving data analysis with an Intelligent Graph

The beauty of Excel is that a cell can contain either a value or a formula that can reference other cell’s values. Why not do the same with a graph: a node can have edges that terminate with a literal value, or a formula that can reference other node’s values.

This is illustrated in the diagram below:

  • The :massFlow property is not measured directly, so a formula is used for its value instead. This formula references $this, the node to which the calculation is attached, and uses the method getFact() to retrieve related values. The argument of getFact() is a pathPatternQL expression.
  • The :totalProduction property is not measured directly, so a formula is used instead which iterates over all of the ‘stream out’ nodes, retrieving the value of the :massFlow for each stream. The :massFlow value is, of course, in turn a calculation.

Figure 3: Intelligent Graph Data Analysis

Why is the IntelligentGraph approach so advantageous?

  • There is no separation between data and analysis, removing the risk of stale and inaccurate data and calculations.
  • The calculations embedded within the graph can take advantage of the knowledge that is contained within that graph. This makes the calculations far simpler than those that need to be embedded in spreadsheets.
  • The calculations will automatically utilize on the fly the changing knowledge.

How does IntelligentGraph Work?

IntelligentGraph can be deployed and used with these simple steps:

  1.     Addition of SPARQL extension functions to the triplestore.
  2.     Adding script literals as object values of subjects with datatype of the scripting language (groovy, javascript, python etc)
  3.     Modify any SPARQL query to use the SPARQL extension functions to process the script.

Installation of SPARQL Extension functions

These are available for any RDF4J compliant triplestore, and can be downloaded and installed as described in section:  Installation

Modelling with Scripts

Typically, a graph node will have associated attributes with values, such as a stream with volumeFlow and density values:

Stream Attributes:

:Stream_1
   :density ".36"^^xsd:float ;
   :volumeFlow "40"^^xsd:float .

The ‘model’ of the streams can be captured as edges associated with the Unit:

:Unit_1
   :hasProductStream :Stream_1 ;
   :hasProductStream :Stream_2 ;
.

The calculations are declared as literals[1] with a datatype whose local name corresponds to one of the installed script languages:

:Stream_1
   :massFlow "$this.getFact(‘:density’)* $this.getFact(‘:volumeFlow’);"^^:groovy ;
.
:Unit_1
   :totalProduction
      “var totalProduction =0.0;
      for(stream in $this.getFacts(‘:hasProductStream’))
      {
         totalProduction += stream.getFact(‘:massFlow’);
      }
      return totalProduction; "^^:groovy ;
.

SPARQL Queries

 Using SPARQL we would want to query this as follows:

select ?massFlow
{
    BIND(:objectValue(:Stream_1,:massFlow) as ?massFlow )
}

 

Where :objectValue is a custom SPARQL function, as allowed by the SPARQL standard and supported in RDF frameworks such as RDF4J Custom SPARQL functions.

The simplest of SPARQL queries is:

Select *
{ ?s ?p ?o }

This can be modified to become:

Select *
{ ?s ?p ?o . BIND(:ObjectValue(?s ?p ?o) as ?_o }

Or

Select *
{ ?s ?p ?o . BIND(:FactValue(?s ?p) as ?_o }

How to Write IntelligentGraph Scripts

SPARQL Extension Functions

The core of the IntelligentGraph is provisioned by the following SPARQL extension functions:

ObjectValue (IRI Subject, IRI Predicate, Value value, customQueryOptions … args)

Returns:  The evaluation of the property of the subject if the literal value is a script datatype, or the literal value otherwise

Positional arguments:

  1.     subject: IRI of the subject
  2.     predicate: IRI of the property to be evaluated for this subject
  3.     value: Value (IRI or literal) of the triple subject, predicate object
  4.     args…: optional parameter value pairs, for example
  5.     start,”…”^^xsd:dataTime
  6.     end, ”…”^^xsd:dataTime
  7.     aggregate,”Instant”|”Average”|”Maximum”|”Minimum”|”Totalized”

Example SPARQL

select ?massFlow
{
   VALUES(?s ?p)){(:Stream_1,:massFlow)}
   ?s ?p ?o .
   BIND(:objectValue(?s ?p ?o ) as ?massFlow )
}

ObjectProvenance (IRI Subject, IRI Predicate, Value literalValue, customQueryOptions … args)

Returns: The HTML formatted trace of the evaluation of the property of the subject

Positional arguments:

  1.     subject: IRI of the subject
  2.     predicate: IRI of the property to be evaluated for this subject
  3.     value: Value (IRI or literal) of the triple subject, predicate object
  4.     args…: optional parameter value pairs, for example
  5.     start,”…”^^xsd:dataTime
  6.     end, ”…”^^xsd:dataTime
  7.     aggregate,”Instant”|”Average”|”Maximum”|”Minimum”|”Totalized”

Example SPARQL

select ?massFlow
{
   VALUES(?s ?p)){(:Stream_1,:massFlow)}
   ?s ?p ?o . BIND(:objectProvenance (?s, ?p, ?o) as ?traceHTML )
}

FactValue (IRI Subject, IRI Predicate, customQueryOptions … args)

Returns: The evaluation of the property of the subject

Evaluates the script stored as the object value of a :subject :predicate.

The difference between factValue and objectValue is that the latter will take, as an argument, the objectValue in the triplestore and only process it is a script if it is of script datatype. Whereas factValue will always retrieve the objectValue from the triplestore and determine if it needs to be interpreted as a script or returned as a value. Thus, objectValue will be more efficient.

Positional arguments:

  1.     subject: IRI of the subject
  2.     predicate: IRI of the property to be evaluated for this subject
  3.     args…: optional parameter value pairs, for example
  4.     start,”…”^^xsd:dataTime
  5.     end, ”…”^^xsd:dataTime
  6.     aggregate,”Instant”|”Average”|”Maximum”|”Minimum”|”Totalized”

Example SPARQL:

PREFIX olgap: <http://inova8.com/olgap/>
SELECT  ?massFlow
WHERE {
   BIND( 
      olgap:factValue(
         <http://inova8.com/calc2graph/id/BatteryLimit1> , <http://inova8.com/calc2graph/def/density>
         ,'aggregate','Instant'
         ,'start','2010-08-01T00:00:00.000000000+00:00'^^xsd:dateTime
      ) 
      as ?massFlow)
}

Note that the additional arguments are optional.

FactProvenance (IRI Subject, IRI Predicate, customQueryOptions … args)

Returns: The HTML formatted trace of the evaluation of the property of the subject

Positional arguments:

  1.     subject: IRI of the subject
  2.     predicate: IRI of the property to be evaluated for this subject
  3.     scriptLiteral: A string literal containing the script to be executed
  4.     args…: optional parameter value pairs, for example
  5.     start,”…”^^xsd:dataTime
  6.     end, ”…”^^xsd:dataTime
  7.     aggregate,”Instant”|”Average”|”Maximum”|”Minimum”|”Totalized”

Example SPARQL

PREFIX olgap: <http://inova8.com/olgap/>
SELECT  ?provenance 
WHERE {
   BIND( 
      olgap:factProvenance(
         <http://inova8.com/calc2graph/id/BatteryLimit1> , <http://inova8.com/calc2graph/def/density>
         ,'aggregate','Instant'
         ,'start','2010-08-01T00:00:00.000000000+00:00'^^xsd:dateTime
      ) 
      as ?provenance )
}

Note that the additional arguments are optional.

FactDebug (IRI Subject, IRI Predicate,  Value script,  customQueryOptions … args)

Returns: The evaluation of the property of the subject

Evaluates the script supplied as the object value for a :subject :predicate. Overrides any stored script associated with the :subject: property :object triple. Thus, can be used to experiment with scripts.

Positional arguments:

  1.     subject: IRI of the subject
  2.     predicate: IRI of the property to be evaluated for this subject
  3.     scriptLiteral: A string literal containing the script to be executed
  4.     args…: optional parameter value pairs, for example
  5.     start,”…”^^xsd:dataTime
  6.     end, ”…”^^xsd:dataTime

aggregate,”Instant”|”Average”|”Maximum”|”Minimum”|”Totalized 

Example SPARQL

PREFIX olgap: <http://inova8.com/olgap/>
PREFIX script: <http://inova8.com/calc2graph/def/script/>
SELECT  ?result 
WHERE {
   BIND(
      “var result= $this.getFact(\"http://inova8.com/calc2graph/def/volumeFlow\",$customQueryOptions).floatValue()* $this.getFact(\"http://inova8.com/calc2graph/def/density\",$customQueryOptions).floatValue();  
      result;”^^script:groovy 
   as ?scriptLiteral)
   BIND( 
      olgap:factDebug(
         <http://inova8.com/calc2graph/id/BatteryLimit1> , <http://inova8.com/calc2graph/def/density>,
         ?scriptLiteral
         ,'aggregate','Instant'
         ,'start','2010-08-01T00:00:00.000000000+00:00'^^xsd:dateTime
      ) 
      as ?result )
}

Note that the additional arguments are optional

ClearCache()

Returns: true, having cleared the cache of calculated values.

Example SPARQL

select ?clearCache
{
   BIND(:clearCache() as ?clearCache)
}

SCRIPT Languages

Any Java 9 supported language can be used simply by making the corresponding language JAR available.

By default, JavaScript, Groovy, Python JAR are installed. The complete list of compliant languages is as follows

AWK, BeanShell, ejs, FreeMarker, Groovy, Jaskell, Java, JavaScript, JavaScript (Web Browser), Jelly, JEP, Jexl, jst, JudoScript, JUEL, OGNL, Pnuts, Python, Ruby, Scheme, Sleep, Tcl, Velocity, XPath, XSLT, JavaFX Script, ABCL, AppleScript, Bex script, OCaml Scripting Project, PHP, Python, Smalltalk, CajuScript, MathEclipse

 The following synonyms for JavaScript, Groovy, Python are recognized:

python, jython, groovy, Groovy, nashorn, Nashorn, js, JS, JavaScript, javascript, ECMAScript,  ecmascript.

Example Scripts

The following illustrate the IntelligentGraph scripts to perform the plant analysis example

Return Scalar Value

40;

Get Related Property Value

$this.getFact(":testProperty2").doubleValue()

Calculate Mass Flow

var result= $this.getFact(":volumeFlow").floatValue()* $this.getFact(":density>").floatValue();  
result;

Calculate Mass Throughput

var massThroughput=0.0; 
for(stream in $this.getFacts(":hasProductStream")){
   massThroughput += stream.getFact(":massFlow").doubleValue()
}; 
massThroughput;

Calculate Mass Yield

var result= $this.getFact(":massFlow").floatValue()/ $this.getFact("^:hasStream/:massThroughput").floatValue();  
result;

Calculate Mass Balance

var massFlowBalance=0.0; 
for(stream in $this.getFacts(":hasFeedStream")){
    massFlowBalance += stream.getFact(":massFlow").doubleValue()
}; 
for(stream in $this.getFacts(":hasProductStream")){
    massFlowBalance -= stream.getFact(":massFlow").doubleValue()
}; 
massFlowBalance;

Script Context Variables

In addition, each script has access to the following predefined variables that allow the script to access the context within which it is being run.

$this, a Thing corresponding to the subject of the triples for which the script is the object.  Since this available, helper functions are provided to navigate edges to or from this ‘thing’ below:

$property, a Thing corresponding to the predicate or property of the triples for which the script is the object.

$customQueryOptions, a HashMap<String, Value> of name/value pairs corresponding to the pairs of additional arguments to the SPARQL extension function. These are useful for passing application-specific parameters.

$builder, a RDF4J graph builder object allowing a graph to be constructed (and manipulated) within the script. A graph cannot be returned from a SPARQL function. However the IRI of the graph can be returned, and any graph created by a script will be persisted.

$tripleSource, the RDF4J TripleSource to which the subject, predicate, triple belongs.

Path Navigation Functions

The spreadsheets’ secret sauce is the ability of a cell formula to access values of other cells, either individually or as a set. The IntelligentGraph provides this functionality with several methods associated with Thing, which are applicable to the $this Thing initiated for each script with the subject Thing.

Thing.getFact(String pathPattern) returns Value

Returns the value of node referenced by the pathPattern, for example “:volumeFlow” returns the object value of the :volumeFlow edge relative to $this node. The pathPattern allows for more complex path navigation.

Thing.getFacts(String pathPattern) returns Values

Returns the values of nodes referenced by the pathPattern, for example :hasProductStream” returns an iterator for all object values of the :hasProductStream edge relative to $this node. The pathPattern allows for more complex path navigation.

Thing.getThing(String subjectIRI) returns Thing

Returns a node as defined by the IRI

Thing.prefix(String prefix, String IRI) returns Thing

Sets a prefix used in any pathPattern. Returning $this allows chaining.

Path Patterns

Spreadsheets are not limited to accessing just adjacent cells; neither is the IntelligentGraph. PathPatterns provide a powerful way of navigating from one Thing node to others. PathPatterns are inspired by SPARQL and propertyPaths, but a richer, more expressive, pathPattern was required for the IntelligentGraph.

Examples

Examples of pathPatterns are as follows:

$this.getFact(“:hasParent”)

will return the first parent of $this.

$this.getFact(“^:hasParent”)

will return the first child of $this.

$this.getFacts(“:hasParent/:hasParent”)

will return the grandparents of $this.

$this.getFacts(“:hasParent/^:hasParent”)

will return the siblings of $this.

$this.getFacts(“:hasParent[:gender :female]/:hasParent”)

will return the maternal grandparents of $this

$this.getFacts(“:hasParent[:gender :female]/:hasParent[:gender :male]”)

will return the maternal grandfather of $this.

$this.getFacts(“:hasParent[:gender [ rdfs:label “female”]]”)

will return the mother of $this but using the label instead of the IRI.

$this.getFacts(“:hasParent[eq :Peter]/:hasParent[:gender :male]”)

will return the grandfather of $this, who is the parent of :Peter.

$this.getFacts(“:hasParent[ne :Peter]/:hasParent[:gender :male]”)

will return grandfathers of $this, who are not the parent of :Peter.

The following diagram visualizes a path through a genealogical graph, from $this to the find the parents of a maternal grandfather born in Maidstone:

$this.getFacts(“/:parent[:gender :female]/:parent[:gender :male, :birthplace [rdfs:label ‘Maidstone’]]/:parent”)

 

Figure 4: PathPatternQL Example

How Is Performance?

IntelligentGraph takes the following actions to improve performance:

  1. All intermediate calculation results are cached, keyed by the subjectNode, predicate, and customQueryOptions.
  2. Cache can be cleared using the SPARQL function ClearCache.
  3. The SPARQL function ObjectValue takes as its argument the subject, predicate and objectValue. If the objectValue supplied is not of script datatype, the function will immediately return the objectValue.
  4. Circular functions, in which A calls B calls A, are detected and rejected.

Can I Debug?

Since IntelligentGraph combines calculations with the knowledge graph, it is inevitable that any evaluation will involve calls to values of other nodes which are in turn calculations. For this reason, several of the SPARQL functions support tracing and debugging:

 

Figure 4: Tracing Calculation

How Do I Add Intelligence to my RDFGraph?

Download

The project is located inGithub, from where the olgap-0.0.1-jar-with-dependencies.jar can be downloaded from here:

The olgap.jar does not include all of the scripting etc language dependencies, so to use it you would have to be certain all dependencies are already available.

Install

OLGAP will work only with RDF4J version 3.3.0 and above

Copy olgap-0.0.1-jar-with-dependencies.jar

To  /usr/local/tomcat/webapps/rdf4j-server/WEB-INF/lib/olgap.jar

The RDF4J server will need to be restarted for it to recognize this new JAR and initiating the scripting engine.

[1] In this case the script uses Groovy, but any Java 9 compliant scripting language can be used, such as JavaScript, Python, Ruby, and many more.