Confucius said, “Tell me and I forget, show me and I remember, let me try and I learn”. Following this sage’s path to intelligence, try IntelligentGraph and PathQL by installing the IntelligentGraph docker image available here.

The capability of IntelligentGraph and PathQL

The IntelligentGraph SAIL offers an extended capability for embedded calculation support within any RDF graph. When enabled as an RDF4J SAIL, it offers calculation functionality as part of the RDF4J engine, on top of any RDF4J repository, using a variety of script engines including JavaScript, Jython, and Groovy. It preserves the SPARQL capability of RDF4J, but with additional capabilities for calculation debugging and tracing. See Intelligent Graph = Knowledge Graph + Embedded Analysis

IntelligentGraph includes the PathQL query language. Just as a spreadsheet cell calculation needs to access other cells, an IntelligentGraph calculation needs to access other nodes within the graph. Although full access to the underlying graph is available to any of the scripts, PathQL provides a succinct, and efficient method to access directly or indirectly related nodes. PathQL can either return just the contents of the referenced nodes, or the contents and the path to the referenced nodes. See PathQL: Intelligently finding knowledge as a path through a maze of facts

PathQL can also be used standalone to query the IntelligentGraph-enabled RDF database. This supplements, rather than replaces, SPARQL and GraphQL, as it provides graph-path querying rather than graph-pattern querying capabilities to any IntelligentGraph-enabled RDF database. See Getting intelligent answers from knowledge graphs: use-cases

IntelligentGraph Benefits

IntelligentGraph moves analytics into the knowledge graph rather than moving data to the analytics engine, avoiding the groundhog-analysis-way:

  • Improves analyst performance and efficiency. Eliminates the need for analysts to create ELT to move data to the analysis engine. 
  • Simplifies complex calculations and aggregations. PathQL language greatly simplifies navigating and aggregating throughout the graph.
  • Ensures calculation and KPI concurrency. Calculations are performed in-situ with the data, so no need to re-export data to the analysis engine to view updated results.
  • Uses familiar scripting language. Scripts expressed in any of multiple scripting languages including Python, Javascript, Groovy, and Java.
  • Improves analysis performance and efficiency. Time-to-answer reduced or eliminated as analysis is equivalent to reporting
  • Ensures analysis effort is shared with all. Analysis results become part of the graph which can be used by others ensuring consistency.
  • Self-documenting analysis path to raw data. The IntelligentGraph contains calculation scripts that define which calculations will be performed on what data (or other calculation results).
  • Improves analysis accuracy by providing provenance of all calculations. Trace of any analysis through to raw data is automatically available.
  • Simplifies reporting. Standard reporting tools can be used that focus on report appearance rather than calculation capability since the latter is performed in the IntelligentGraph.
  • Highly scalable. IntelligentGraph is built upon the de-facto graph standard RDF4J, allowing for the use of any RDF4J compliant datastore.
  • Standard support. Access to IntelligentGraph for querying, reporting, and more is unchanged from any RDF-based KnowledgeGraph.
  • Evolutionary, not revolutionary modeling and analysis. Graph-based models offer the ability to evolve as data analysis needs grow, such as adding new dimensions to the data, unlike a ‘traditional’ data mart or warehouse which usually require a rebuild.
  • Creates the Intelligent Internet of Things. Scripts can access external data, such as IoT, on-demand allowing IoT-based calculations and analysis to be performed in-situ.
  • Eliminates spreadsheet-hell. All spreadsheet calculations and aggregations can be moved into the graph, leaving the spreadsheet as a presentation tool. This eliminates the problem of undocumented calculations and inconsistent calculations in different spreadsheets.

Availability of IntelligentGraph and PathQL

  • What is the best route, with the least changes, through the London Underground?
  • Have I unintentionally revealed PII (personally identifiable information) or copyright information in a custom query or report?
  • Who is the closest relative whose alma mater is Harvard?
  • What is the root-cause problem within an IoT/DigitalTwin graph of a process plant?

KnowledgeGraphs are the best way to capture numerous facts about things. Intelligence is the ability to connect individual facts into knowledge. However, uncovering that intelligence so that it can be acted upon to produce results can be challenging. 

IntelligentGraph’s PathQL provides an easy way to uncover that knowledge by describing paths and connections through these facts, as shown by these answers to the above questions. These paths can be visualized, however, PathQL can also reveal how it deduces that path through its trace capabilities.

What is IntelligentGraph?

IntelligentGraph is an extension to any RDF knowledge graph, thus it is applicable to any existing knowledge graphs. As well as providing the PathQL query capability, IntelligentGraph allows analysis formulae to be embedded in the graph as nodes within the KnowledgeGraph. These formulae nodes only get executed when that node is accessed via a query. This is equivalent to moving analysis formulae into the graph, alongside the data, rather than having to move the graph data into the analysis engine.

What is the best route, with the least changes, through the London Underground?

Use Case: An individual wants to find a route through the London Underground from one station to another with the minimum of changes.

Available data: A Knowledge Graph model of the London Underground with its stations, lines, and zones.

Solution:  Accessing the KnowledgeGraph using the IntelligentGraph, and expressing the path query using PathQL as follows obtains the shortest path

This PathQL query will start at the Mornington_Crescent station, transfer to any line, then optionally change lines at up to four stations, until a line going to Oakleigh_Park is found.

Results: One of the shortest routes is to get the Northern line at Mornington Crescent station, change at Old Street to the Great Northern line which goes through Oakleigh station.

Note however that this is just one of the shortest paths. Another is via Moorgate. PathQL will find successively longer routes if required.

Have I unintentionally revealed PII (personally identifiable information) or copyright information in a custom query or report? 

Use Case: An enterprise manages lots of reports based on queries, and those queries might be based on other queries as well as the underlying data. However, some of the underlying data contain personal or copyrighted data that cannot be revealed.

Available data: An ISO 11179-based Knowledge Graph of the database schema (MMS), together with the structure of the database views and reports.

Solution:  Access the KnowledgeGraph using the IntelligentGraph, and express the path query using PathQL as follows to discover any report dependencies on personal or copyright information

This pathQL query will return all paths between any data element that is declared as ‘Sensitive’, and any data element of the report that is being validated to ensure that it is not derived from sensitive information. 

Results: PII violation detected! PathQL discovers that there is a route for the PII information to leak into the Customer_List report.

Who is the closest relative whose alma mater is Harvard?

Use Case: We want to explore relationships, not just finding particular individuals but to explore their connections to particular institutions.

Available data: A genealogical KnowledgeGraph containing persons, genders, colleges, etc.

Solution:  Access the KnowledgeGraph using the IntelligentGraph, and express the path query using PathQL as follows discover the closest relative that went to Harvard

This pathQL query starts with any individual, in this case, ‘Arnold’, then searches through all immediate and indirect relatives until we encounter the first one that went to Harvard. If we continue to explore the paths, PathQL will return successively more distant connections to Harvard.

Results:

What is the root-cause problem within an IoT/DigitalTwin graph of a process plant?

Use Case: A DigitalTwin Knowledge Graph IoT shows an out-of-kilter measurement point. To assist root-cause diagnosis the engineer wants to quickly assemble all relevant information. This means following the process flow path upstream and identifying the equipment and associated measurement points.

Available data: A DigitalTwin KnowledgeGraph contains a PFD/P&ID model with measurement sources, pipes, lines, vessels, valves, and associated connectivity, including the process flow of material. The model used is the  Tennesse -Eastman Plant described by J.J Downs and E.F. Vogel, Computers chem. Engng, Vol17 No 3 pp 245-255. The IoT data is pulled in on-demand by the IntelligentGraph.

Solution:  Access the KnowledgeGraph using the IntelligentGraph, and express the path query using PathQL as follows to discover all relevant measurement points upstream, in a process-flow sense, of the out-of-kilter measurement point. 

PathQL can ‘walk’ back through the process flow any number of steps, starting with the plant item with an attribute provided by the out-of-kilter measurement point. As it walks back through the flowsheet, it identifies the attributes of the plant items it encounters, and for each attribute the associated measurement point that serves as the provider of the attribute. 

Note that PathQL supports both first-order predicate expressions, such as “:aObjectProperty”.
It also supports rdf:Statement reified predicate expressions, such as “@:aReifiedPredicate”, 
where the reification is rdf:Statement.
Additionally it supports rdfs:subClassOf rdf:Statement such as “:Transference@:ProcessFlow”, 
where :Transference is a rdfs:subClassOf rdf:Statement. 

Results: Returns a set of related measurement points, and the process-flow path from the out-of-kilter measurement point, which, since they are all upstream of the process problem, should immediately help diagnose the root cause of that problem.

Availability of IntelligentGraph and PathQL

The IntelligentGraph SAIL offers an extended capability for embedded calculation support within any RDF graph. When enabled as an RDF4J SAIL, it offers calculation functionality as part of the RDF4J engine, on top of any RDF4J repository, using a variety of script engines including JavaScript, Jython, and Groovy. It preserves the SPARQL capability of RDF4J, but with additional capabilities for calculation debugging and tracing. 

IntelligentGraph includes the PathQL query language. Just as a spreadsheet cell calculation needs to access other cells, an IntelligentGraph calculation needs to access other nodes within the graph. Although full access to the underlying graph is available to any of the scripts, PathQL provides a succinct, and efficient method to access directly or indirectly related nodes. PathQL can either return just the contents of the referenced nodes, or the contents and the path to the referenced nodes.

PathQL can also be used standalone to query the IntelligentGraph-enabled RDF database. This supplements, rather than replaces, SPARQL and GraphQL, as it provides graph-path querying rather than graph-pattern querying capabilities to any IntelligentGraph-enabled RDF database.

The creation of a DigitalTwin knowledge graph data model confronts the need for access to measurement data in order that the DigitalTwin can create timely performance metrics, identify promptly performance issues, and so on.

However, the quantity of raw data in an Industrial IoT is staggering. A typical process manufacturing plant might have greater than 100,000 measurement points each of which is streaming data by the second or even faster. So how can the raw data be integrated to allow performance analysis?

Just … important😓: 

  • Someone(?) determines a subset of the data that can and should be replicated. 
  • A problem is that it is inevitable someone will ask a yet unformulated question about data that has *not* been replicated. Can you afford to keep revising your ETL?

Just … in case😧: 

  • The bullet is bitten and all data is replicated. 
  • An issue is the DigitalTwin now has to resolve the problem of handling vast quantities of data that has already been solved by IIoT applications. Why not stick to solving unsolved problems?

Just … reporting😫: 

  • Someone writes a dedicated application that pulls data from the DigitalTwin and the IIoT just when reporting. 
  • A problem is that every new performance metric or calculation requires another dedicated application, even if that same metric has been already created in another application.

Just … in time😂: 

  • The KnowledgeGraph pulls the IIoT data whenever required by an IntelligentGraph calculation. 
  • There are no limits on what IIoT data can be requested, but no storage or replication issues either. Also, the IntelligentGraph calculations can access the results of other IntelligentGraph calculations, simplifying the deployment of metrics.

See this short video demonstration of how easy an #IoT-connected #DigitalTwin of a process plant #IntelligentGraph can be created:

The IntelligentGraph-way or the Groundhog-way to efficient data analytics.pdf

Data is rather like poor red wine: it neither travels nor ages well. IntelligentGraph avoids data traveling by moving analysis into the knowledge graph rather than moving data to the analysis engine, obsoleting the groundhog-analysis-way

Solving data analysis, the IntelligentGraph-way

Data is streamed to the IntelligentGraph datastore, and then analysis/calculation nodes are added to that graph which is accessible to all and by all applications.

The IntelligentGraph-way of data-analysis is to:

  • Extract-Load (losslessly) the source data into an Intelligent Knowledge Graph
  • Add analysis/calculation nodes to the KnowledgeGraph which calculate additional analysis values, aggregations, etc.
  • Report results, using any standard reporting tool
  • Add more analysis/calculation nodes as additional requests come through. These new calculations can refer to the existing results
  • … relax, you’ve become the star data analyst:-)

Solving data analysis, the Groundhog-way

Data is in the operational data-sources, data is staged in a data-warehouse/mart/lake, then the analysis is done by the analysis engine (aka Excel), right? And like poor red wine, constantly moving data damages it.

The Groundhog-way of data analysis is to:

  • Extract-Transform(aka probably damage the data as well)-Load the source data into a data-warehouse/mart/lake just to make it ‘easier’ to access.
  • Realizing the required analytical results are not in the data-warehouse/mart/lake, extract some data into Excel/PowerBI/BI-tool-of-choice where you can write your analysis calculations, aggregations, etc.
  • Report analysis results, but forget to or cannot put the analysis results back into the data-warehouse/mart/lake.
  • Repeat the same process every time there is a similar (or identical) analysis required.
  • … don’t relax, another analysis request shortly follows 🙁

IntelligentGraph Benefits

IntelligentGraph moves analysis into the knowledge graph rather than moving data to the analysis engine, avoiding the groundhog-analysis-way:

  • Improves analyst performance and efficiency
    • Eliminates the need for analysts to create ELT to move data to the analysis engine. 
  • Simplifies complex calculations and aggregations
    • PathQL language greatly simplifies navigating and aggregating throughout the graph.
  • Ensures calculation and KPI concurrency
    • Calculations are performed in-situ with the data, so no need to re-export data to the analysis engine to view updated results.
  • Uses familiar scripting language
    • Scripts expressed in any of multiple scripting languages including Python, Javascript, Groovy, and Java.
  • Improves analysis performance and efficiency
    • Time-to-answer reduced or eliminated as analysis is equivalent to reporting
  • Ensures analysis effort is shared with all
    • Analysis results become part of the graph which can be used by others ensuring consistency.
  • Self-documenting analysis path to raw data
    • The IntelligentGraph contains calculation scripts that define which calculations will be performed on what data (or other calculation results).
  • Improves analysis accuracy by providing provenance of all calculations
    • Trace of any analysis through to raw data is automatically available.
  • Simplifies reporting
    • Reporting tools can be used that focus on report appearance rather than calculation capability since the latter is performed in the IntelligentGraph. 
  • Highly scalable
    • IntelligentGraph is built upon the de-facto graph standard RDF4J, allowing for the use of any RDF4J compliant datastore.
  • Standard support
    • Access to IntelligentGraph for querying, reporting, and more is unchanged from any RDF-based KnowledgeGraph.
  • Evolutionary, not revolutionary modeling and analysis
    • Graph-based models offer the ability to evolve as data analysis needs grow, such as adding new dimensions to the data, unlike a ‘traditional’ data mart or warehouse which usually require a rebuild.
  • Creates the Intelligent Internet of Things
    • Scripts can access external data, such as IoT, on-demand allowing IoT-based calculations and analysis to be performed in-situ.
  • Eliminates spreadsheet-hell
    • All spreadsheet calculations and aggregations can be moved into the graph, leaving the spreadsheet as a presentation tool. This eliminates the problem of undocumented calculations and inconsistent calculations in different spreadsheets.

Since IntelligentGraph combines Knowledge Graphs with embedded data analytics, Jupyter is an obvious choice as a data analysts’ IntelligentGraph workbench.

The following are screen-captures of a Jupyter-Notebook session showing how Jupyter can be used as an IDE for IntelligentGraph to perform all of the following:

  • Create a new IntelligentGraph repository
  • Add nodes to that repository
  • Add calculation nodes to the same repository
  • Navigate through the calculated results
  • Query the results using SPARQL

GettingStarted is available as a JupyterNotebook here:

GettingStartedIntelligentGraph.ipynb

This document is available for download here:

IntelligentGraph-Getting Started.pdf

 

 

SPARQLing

Using the Jupyter ISparql, we can easily perform SPARQL queries over the same IntelligentGraph created above. 

GettingStarted Using SPARQL

We do not have to use Java to script our interaction with the repository. We can always use SPARQL directly as described by the following Jupyter Notebook.

GettingStartedSPARQL.ipynb

 

PathQL simplifies finding paths through the maze of facts within a KnowledgeGraph. Used within IntelligentGraph scripts it allows data analysis to be embedded within the graph, rather than requiring graph data to be exported to an analysis engine. Used with IntelligentGraph Jupyter Notebooks it provides powerful data analytics

I would suggest that Google does not have its own intelligence. If I search for, say, ‘Arnold Schwarzenegger and Harvard’, Google will only suggest documents that contain BOTH Arnold Schwarzenegger and Harvard. I might be lucky that someone has digested these facts and produced a single web page with the knowledge I want. I might, however, just as easily find a page of fake knowledge relating Arnold to Harvard.

It is undoubtedly true that Google can provide individual facts such as:

  1. Arnold married to Shriver
  2. Shriver daughter Joseph
  3. Joseph alma mater Harvard

However, intelligence is the ability to connect individual facts into a knowledge path.

  • KnowledgeGraph models can provide the facts to answer these questions.
  • PathQL provides an easy way to discover knowledge by describing paths and connections through these facts.
  • IntelligentGraph embeds that intelligence into any KnowledgeGraph as scripts. 

IntelligentGraph-PathQL and Scripting.pdf

Genealogical Example

Genealogy is a grandfather of graphs, it is, therefore, natural to organize family trees as a knowledge graph.  A typical PathQL question to ask would then be: who are the parents of a male ancestor, born in Maidstone, of this individual, and what is that relationship? 

Industrial Internet of Things (IIoT) Example

The Industrial Internet of Things (IIot) is best modeled as an interconnected graph of ‘thing’ nodes. These things might be sensors producing measurements, the equipment to which the sensors are attached, or how the equipment is interconnected to form a functioning plant. However, the ‘intelligence’ about why the plant is interconnected is what an experienced (aka intelligent, knowledgeable) process engineer offers. To support such intelligence with a knowledge graph requires answering PathQL questions such as

  1. If the V101 bottoms pump stops how does this affect the product flow from this production unit?
  2. If the FI101 instrument fails how does this affect the boiler feed temperature?
  3. What upstream could possibly be affecting this stream’s quality?
  4. … and so on.

Why PathQL?

SPARQL is a superb graph pattern query language, so why create another?

PathQL started out as the need to traverse the nodes and edges in a triplestore both without the benefit of SPARQL and within a scripting language of IntelligentGraph. IntelligentGraph works by embedding the calculations within the graph. Therefore, just like a spreadsheet calculation can access other ‘cells’ within its spreadsheet, IntelligentGraph needed a way of traversing the graph through interconnected nodes and edges to other nodes from where relevant values can be retrieved.

I didn’t want to create a new language, but it was essential that the IntelligentGraphprovided a very easy way to navigate a path through a graph. It then became clear that, as powerful as SPARQL is for graph pattern matching, it can be verbose for matching path patterns. PathQL was born, but not without positive prodding from my colleague Phil Ashworth.

Adding Intelligence to Graphs with Scripts

Typically, a graph node will have associated attributes with values, such as a stream with volumeFlow and density values. These values might have been imported from some external system or other:

Stream Attributes:

:Stream_1
    :density ".36"^^xsd:float ;
    :volumeFlow "40"^^xsd:float .
:Stream_2 ....

The ‘model’ of the streams can be captured as edges associated with the Unit:

:Unit_1
    :hasProductStream :Stream_1 ;
    :hasProductStream :Stream_2 .

However, most ‘attributes’ that we want to see about a thing are not measured directly. Instead, they need to be calculated from other values. This is why we end up with spreadsheet-hell: importing the raw data from the data sources into a spreadsheet simply so we can add calculated columns, the values of which are rarely exported back to the source-databases. 

IntelligentGraph allows these calculations to be embedded in the graph as literals[1] with a datatype whose local name corresponds to one of the installed script languages:

:Stream_1
    :massFlow
        "_this.getFact(‘:density’)*
        _this.getFact(‘:volumeFlow’);"^^:groovy .

:Unit_1
    :totalProduction
        "var totalProduction =0.0;
        for(Resource stream : _this.getFacts(‘:hasProductStream’))
        {
            totalProduction += stream.getFact(‘:massFlow’);
        }
        return totalProduction; "^^:groovy .

Instead of returning the object literal value (aka the script), the IntelligentGraph will return the result value for the script.

We can write this script even more succinctly using the expressive power of PathQL:

:Unit_1  :totalProduction  "return _this.getFacts(‘:hasProductStream/:massFlow’).total(); "^^:groovy .

PathQL

Spreadsheets are not limited to accessing just adjacent cells; neither is the IntelligentGraph. PathQL provides a powerful way of navigating from one Thing node to others. PathQL was inspired by SPARQL and propertyPaths, but a richer, more expressive, path searching was required for the IntelligentGraph.

Examples

Genealogy Example Graph

Examples of PathQL that explore this genealogy are as follows:

  • _this.getFact(“:parent”)
    • will return the first parent of _this.
  • _this.getFact(“^:parent”)
    •  will return the first child of _this.
  • _this.getFacts(“:parent/:parent”)
    •  will return the grandparents of _this.
  • this.getFacts(“:parent/^:parent”)
    •  will return the siblings of _this.
  •  _this.getFacts(“:parent[:gender :female]/:parent”)
    •  will return the maternal grandparents of _this
  • _this.getFacts(“:parent[:gender :female]/:parent[:gender :male]”)
    •  will return the maternal grandfather of _this.
  • _this.getFacts(“:parent[:gender [ rdfs:label “female”]]”)
    •  will return the mother of _this but using the label instead of the IRI.
  •  _this.getFacts(“:parent[eq :Peter]/:parent[:gender :male]”)
    •  will return the grandfather of _this, who is the parent of :Peter.
  •  _this.getFacts(“:parent[ne :Peter]/:parent[:gender :male]”)
    •  will return grandfathers of _this, who are not the parent of :Peter.
  • _this.getFacts(“:parent{0,4}/:parent[:hasLocation :maidstone]”)
    •  will return all ancestors whose parent was born in a location :maidstone
  • _this.getPath(“:parent{0,4}/:parent[:hasLocation :maidstone]”)
    •  will return the path to most recent ancestor whose parent was born in a location :maidstone
  • _this.getFacts(“:parent{0,4}/:parent[:hasLocation [rdfs:label ‘Maidstone’]]”)
    •  will return all ancestors whose parent was born in a location named “Maidstone”
  • _this.getPaths(“:connectedTo{1,10}[eq :BakerStreet]”)
    • will find all routes, starting with shortest,  between _this and :BakerStreet with a maximum of 10 connections, thus all on the same line
  • _this.getPaths(“:connectedTo{1,5}/:changeTo{0,2}/:connectedTo{1,5}[eq :BakerStreet]”)
    • will find routes, starting with the shortest, between _this and :BakerStreet with a maximum of two changes

PathQL Formal Syntax

The parts of a PathPattern are defined below. The formal syntax in BNF is here: PathPattern Formal Syntax

IRIRef:

The simplest pathPattern is an IRI of the predicate, property, or edge:

:parent

An unprefixed qname using the default namespace.

ft:parent

A prefixed qname using the namespace model.

<http://inova8.com/ft/hasParent>

A full IRI.

PathAlternative:

A pathPattern can consist of a set of alternative edges:

:parent|:hasChild

Alternative edges  to ‘close relatives’ of the :subjectThing.

PathSequence:

A pathPattern can consist of a sequence of edges: 

:parent/:hasChild

 sequence of edges to the set of siblings of the start thing.

Inverse Modifier:

A modifier prefix to a predicate indicating that it should be navigated in the reverse direction (objectàsubject) instead of subjectàobject:

:parent/^:parent

 A sequence of edges to the set of siblings of the start thing since ^:parent is equivalent to :hasChild.

Reified Modifier:

A modifier prefix to a predicate indicating that it should be assumed that the subject-predicate-object is reified.

@:marriedTo

navigates from the :subjectThing to the :objectThing when the edge has been reified as: 

[]  rdf:subject :subjectThing ;
    rdf:predicate :marriedTo ;
    rdf:object :objectThing .

Inverse modifier can also be applied to navigate from the :objectThing to :subjectThing:

^@:marriedTo

navigates from the :objectThing to the :subjectThing

Extended Reification Modifier:

The reification type and the predicate of an extended reification:

:Marriage@:civilPartnership

navigates from the :subjectThing to the :objectThing when the edge has been reified as a class that is a :Marriage, which is rdfs:subClassOf rdf:Statement with a predicate of :civilPartnership. For example: 

 [] a :Marriage
    :partner1 :subjectThing ;
    :marriageType :civilPartnership ;
    :partner2 :objectThing .

:Marraige rdfs:subClassOf rdf:Statement .
:partner1 rdf:subPropertyOf rdf:subject .
:marriageType rdf:subPropertyOf rdf:predicate.
:partner2 rdf:subPropertyOf rdf:object .

An inverse modifier can also be applied to navigate from the :objectThing to :subjectThing 

^:Marriage@:marriedTo

navigates from the :objectThing to the :subjectThing in the extended reification.

Dereification Modifier:

Instead of navigating to the objectThing of a reification, the dereification operator navigates to the reification thing: 

@:marriedTo#

navigates from the :subjectThing to the :marriage object.

@:marriedTo#/:at

navigates from the :subjectThing to the location :at which the marriage took place

@:marriedTo#/:when

navigates from the :subjectThing to the date :when the marriage took place

Path Filter:

A path filter can be applied to any point in a pathPattern to limit the subsequent paths. A path filter is like a SPARQL PropertyListNotEmpty graph pattern. However, it includes comparison operators lt, gt etc

:parent[:gender :male]

Navigates to the male parent.

:parent[:gender :male]/:parent[:gender :female]

Navigates to the paternal grandmother.

:volumeFlow[gt “50”]

Navigates only if the value is greater than “50”.

:appearsOn[eq :calc2graph1]

Navigates only if the objectNode value is :calc2graph1.

:appearsOn[ rdfs:label "Calc2Graph1"]

Navigates only if the objectNode has a rdfs:label with value “Calc2Graph1”.

:appearsOn[eq [ rdfs:label "Calc2Graph1"]]

Navigates only if the objectNode value is a node whose label is “Calc2Graph1”.

Cardinality:

Repeats the pathPattern between the range of minimum and maximum cardinality

:parent{1,5}

Finds the 1 to 5th ancestor of the reference node.

:parent[:gender :male]{1,5}

Finds the 1 to 5th male ancestor via the male line of the reference node.

Further Example Scripts

The following illustrates the IntelligentGraph scripts to perform the plant analysis example

Return Scalar Value

return 40;

Get Related Property Value

return _this.getFact(":testProperty2").doubleValue()

Calculate Stream Mass Flow

var result= _this.getFact(":volumeFlow").floatValue()* _this.getFact(":density").floatValue();  
return result;

Calculate Unit Mass Throughput

return _this.getFacts(":hasProductStream/:massFlow").total();

Calculate Stream Mass Yield

var result= _this.getFact(":massFlow").floatValue()/ _this.getFact("^:hasStream/:massThroughput").floatValue();  
return result;

Calculate Unit Mass Balance

return _this.getFacts(":hasFeedStream/:massFlow").total() 
- _this.getFacts(":hasProductStream/:massFlow").total();

Path Navigation Functions

The spreadsheets’ secret sauce is the ability of a cell formula to access values of other cells, either individually or as a set. The IntelligentGraph provides this functionality with several methods associated with Thing, which are applicable to the _this Thing initiated for each script with the subject Thing.

Thing.getFact(String pathQL) returns Resource

Returns the value of the node referenced by the pathQL, for example :volumeFlow returns the object value of the :volumeFlow edge relative to _this node. The pathPattern allows for more complex path navigation.

Thing.getFacts(String pathQL) returns ResourceResults

Returns the values of nodes referenced by the pathQL, for example “:hasProductStream” returns an iterator for all object values of the :hasProductStream edge relative to _this node. The pathPattern allows for more complex path navigation.

Thing.getPath(String pathQL) returns Path

Returns the first (shortest)  path referenced by the pathQL, for example “:parent{1..5}” returns the path to the first ancestor of _this node. The pathQL allows for more complex path navigation.

Thing.getPaths(String pathQL) returns PathResults

Returns all paths referenced by the pathQL, for example :parent{1..5}” returns an iterator, starting with the shortest path,  for all paths to the ancestors of _this node. The pathQL allows for more complex path navigation.

Graph.getThing(String subjectIRI) returns Thing

Returns a node as defined by the IRI

Script Context Variables

Each script has access to the following predefined variables that allow the script to access the context within which it is being run.

_this, a Thing corresponding to the subject of the triples for which the script is the object.  Since this is available, helper functions are provided to navigate edges to or from this ‘thing’ below:

_property, a Thing corresponding to the predicate or property of the triples for which the script is the object.

_customQueryOptions, a HashMap<String, Value> of name/value pairs corresponding to the pairs of additional arguments to the SPARQL extension function. These are useful for passing application-specific parameters.

_builder, a RDF4J graph builder object allowing a graph to be constructed (and manipulated) within the script. A graph cannot be returned from a SPARQL function. However the IRI of the graph can be returned, and any graph created by a script will be persisted.

_tripleSource, the RDF4J TripleSource to which the subject, predicate, triple belongs.

PathQL BNF

The formal syntax of the PathPattern is defined as follows using ANTLR4 BNF:

grammar PathPattern;

// PARSER RULES
queryString     : pathPattern queryOptions? EOF ;
queryOptions    : ( queryOption )+;
queryOption     : KEY '=' literal ('^^' type )?;
type            : qname;
pathPattern     : binding ('/'|'>') pathPatterns  #boundPattern
                | binding  #matchOnlyPattern
                | pathPatterns  #pathOnlyPattern;
binding         : factFilterPattern  ;
pathPatterns    : pathEltOrInverse cardinality?  #Path  
                | pathPatterns '|'  pathPatterns  #PathAlternative  
                | pathPatterns ('/'|'>')  pathPatterns  #PathSequence
                | negation? '(' pathPatterns ')'  cardinality? #PathParentheses;
cardinality     : '{'  INTEGER (',' ( INTEGER )? )?  '}'  ;
negation        : '!';
pathEltOrInverse: negation? INVERSE? predicate  ;
predicate       : ( reifiedPredicate 
                |  predicateRef 
                |  rdfType 
                |  anyPredicate ) factFilterPattern? ;
anyPredicate    : ANYPREDICATE ;
reifiedPredicate: iriRef? REIFIER predicateRef  factFilterPattern?  dereifier? ;
predicateRef    : IRI_REF  | rdfType  |  qname | pname_ns ;
iriRef          : IRI_REF |  qname | pname_ns ;  
dereifier       : DEREIFIER ;
factFilterPattern: '['  propertyListNotEmpty   ']';
propertyListNotEmpty: verbObjectList ( ';' ( verbObjectList )? )* ;  
verbObjectList  : verb objectList;
verb            : operator | pathEltOrInverse ;
objectList      : object ( ',' object )*;
object          : iriRef  | literal | factFilterPattern | BINDVARIABLE ;
qname           : PNAME_NS PN_LOCAL; 
pname_ns        : PNAME_NS ;   
literal         : (DQLITERAL | SQLITERAL) ('^^' (IRI_REF |  qname) )? ;  
operator        : OPERATOR ;
rdfType         : RDFTYPE ;

// LEXER RULES
KEY             : '&' [a-zA-Z]+ ;  
INTEGER         : DIGIT+ ; 
BINDVARIABLE    : '%' DIGIT+ ;
fragment
DIGIT           : [0-9] ;  
INVERSE         : '^';
REIFIER         : '@';
DEREIFIER       : '#';
RDFTYPE         : 'a';
ANYPREDICATE    : '*' ;
OPERATOR        : 'lt'|'gt'|'le'|'ge'|'eq'|'ne'|'like'|'query'|'property';
DQLITERAL       : '"' (~('"' | '\\' | '\r' | '\n') | '\\' ('"' | '\\'))* '"';
SQLITERAL       : '\'' (~('\'' | '\\' | '\r' | '\n') | '\\' ('\'' | '\\'))* '\'';
IRI_REF         : '<' ( ~('<' | '>' | '"' | '{' | '}' | '|' | '^' | '\\' | '`') | (PN_CHARS))* '>' ;      
PNAME_NS        : PN_PREFIX? (':'|'~')  ;   
VARNAME         : '?' [a-zA-Z]+ ;
fragment
PN_CHARS_U      : PN_CHARS_BASE | '_'  ;
fragment   
PN_CHARS        : PN_CHARS_U
                | '-'
                | DIGIT  ;
fragment
PN_PREFIX       : PN_CHARS_BASE ((PN_CHARS|'.')* PN_CHARS)? ;
PN_LOCAL        : ( PN_CHARS_U | DIGIT ) ((PN_CHARS|'.')* PN_CHARS)? ;
fragment
PN_CHARS_BASE   : 'A'..'Z'
                | 'a'..'z'
                | '\u00C0'..'\u00D6'
                | '\u00D8'..'\u00F6'
                | '\u00F8'..'\u02FF'
                | '\u0370'..'\u037D'
                | '\u037F'..'\u1FFF'
                | '\u200C'..'\u200D'
                | '\u2070'..'\u218F'
                | '\u2C00'..'\u2FEF'
                | '\u3001'..'\uD7FF'
                | '\uF900'..'\uFDCF'
                | '\uFDF0'..'\uFFFD' ;
WS  : [ \t\r\n]+ -> skip ; 


[1]  SCRIPT Languages

In this case, the script uses Groovy, but any Java 9 compliant scripting language can be used, such as JavaScript, Python, Ruby, and many more.

By default, JavaScript, Groovy, Python JAR are installed. The complete list of compliant languages is as follows

AWK, BeanShell, ejs, FreeMarker, Groovy, Jaskell, Java, JavaScript, JavaScript (Web Browser), Jelly, JEP, Jexl, jst, JudoScript, JUEL, OGNL, Pnuts, Python, Ruby, Scheme, Sleep, Tcl, Velocity, XPath, XSLT, JavaFX Script, ABCL, AppleScript, Bex script, OCaml Scripting Project, PHP, Python, Smalltalk, CajuScript, MathEclipse

 

 

IntelligentGraph embeds calculation and analysis capability within RDF knowledge graphs, rather than forcing analysis to be undertaken by exporting query results to external analysis applications such as Excel.

IntelligentGraph achieves this by embedding scripts into the RDF knowledge graph which are evaluated when queried with SPARQL.

Scripts, written in a variety of languages such as JavaScript, Java, and Python, access the underlying graph using simple pathPatternQL navigation.

IntelligentGraph=KnowledgeGraph+Embedded Analysis.pdf

Why IntelligentGraph?

At present calculations over stored data are either delivered by custom code or exporting the stored data to spreadsheets. The data behind these tools is inevitably tabular. In fact, so dominant are spreadsheets for analysis that the spreadsheet itself becomes the ‘database’ with the inherent difficulties of syncing that data with the source system of record.

The real-world is better represented as a network or graph of interconnected things Therefore a knowledge graph is a far better storage organization than tables or objects. However, there is still the need to perform ad hoc numerical analysis over this data. 

RDF DataCube can help organize data for analysis, but still the analysis has to be performed externally. Confronted with this dilemma, knowledge graph data would typically be exported in tabular form to a datamart or directly into, yet again, a spreadsheet where the analysis could be performed.

IntelligentGraph turns this approach on its head by embedding the calculations as scripts within the knowledge graph. These scripts are evaluated on query, and utilise the data in situ: no concurrency issues.. This allows the calculations to have knowledge of its neighbouring nodes and edges, just like Excel cells can access other cells in the spreadsheet. Access to other nodes within the graph uses pathPatternQL navigation.

Example Data and Analysis

An Industrial Internet of Things (IIoT) application is connecting all the measurements about a process plant, such as an oil refinery, into a knowledge graph that relates the measurements to the material flows through the process equipment.

Although there is an abundance of measurements and laboratory analyses available, the values required for operating and performance monitoring are not (and mostly cannot) be directly measured. 

For example:

  • Stream Mass-Flow: direct mass flow measurements are rare. Instead, a volume flow measurement is used in conjunction with a measured material density to calculate the mass-flow
  • Unit Mass Flow Throughput: this is calculated by summing either all feed stream mass flows or product stream mass flows.
  • Unit Mass Balance: this is calculated by differencing the feed from product mass flows
  • Product Stream Yield: this is the ratio of a stream’s mass-flow to the unit to which the stream is connected throughput.

Figure 1: Typical Process Flow Sheet

These are simple examples; however, they show the reliance on the knowledge graph structure to perform the analysis.

Solving data analysis, the traditional way

Data is in the database, analysis is done by the analysis engine (aka Excel), right?

Figure 2: Data analysis the traditional way

In this scenario, the local power user sets up a query to export data from the database and converts to a format that can be imported into Excel. Ever increasingly complex formulae are then written to wrangle the data into the results that are required.

Why is the spreadsheet approach risky?

  • The analysis is now separated from the data. Data changes will not be reflected in the analysis. Worse still, changes to the analysis might not be propagated to all the spawned copies of the spreadsheet.
  • The data is separated from the analysis. The analysis results are rarely re-imported into the data store where data vs analysis could be performed. Instead, even more data is extracted into the spreadsheet.
  • The difficultly of managing the separation of data from analysis becomes so great that in many cases the database is dispensed with entirely and the spreadsheet becomes the de-facto database.

Solving data analysis with an IntelligentGraph

The beauty of Excel is that a cell can contain either a value or a formula that can reference other cell’s values. Why not do the same with a graph: a node can have edges that terminate with a literal value, or a formula that can reference other node’s values.

This is illustrated in the diagram below:

  • The :massFlow property is not measured directly, so a formula is used for its value instead. This formula references $this, the node to which the calculation is attached, and uses the method getFact() to retrieve related values. The argument of getFact() is a pathPatternQL expression.
  • The :totalProduction property is not measured directly, so a formula is used instead which iterates over all of the ‘stream out’ nodes, retrieving the value of the :massFlow for each stream. The :massFlow value is, of course, in turn a calculation.

Figure 3: Intelligent Graph Data Analysis

Why is the IntelligentGraph approach so advantageous?

  • There is no separation between data and analysis, removing the risk of stale and inaccurate data and calculations.
  • The calculations embedded within the graph can take advantage of the knowledge that is contained within that graph. This makes the calculations far simpler than those that need to be embedded in spreadsheets.
  • The calculations will automatically utilize on the fly the changing knowledge.

How does IntelligentGraph Work?

Analysis is embedded in an IntelligentGraph simply by adding script literals as object values of subjects with datatype of the scripting language (groovy, javascript, python etc).

The IntelligentGraph engine is provided as an RDF4J Stackable SAIL. This means that its capabilities can be combined with any other RDF4J capabilities. The choice of RDF storage remains the same as for any other RDF4J compliant framework.

Modeling with Scripts

Typically, a graph node will have associated attributes with values, such as a stream with volumeFlow and density values:

Stream Attributes:

:Stream_1
   :density ".36"^^xsd:float ;
   :volumeFlow "40"^^xsd:float .

Of course, in the ‘real-world’ these measured values are sourced from outside the KnowledgeGraph and change over time. IntelligentGraph can deal with both of these requirements.

The ‘model’ of the streams can be captured as edges associated with the Unit:

:Unit_1
   :hasProductStream :Stream_1 ;
   :hasProductStream :Stream_2 ;
.

Calculate Mass Flow

The calculations are declared as literals[1] with a datatype whose local name corresponds to one of the installed script languages:

:Stream_1 :massFlow     
    "_this.getFact(':density')*    
     _this.getFact(':volumeFlow');"^^:groovy .

Calculate Total Production

A typical performance metric is to understand the total production from a unit, which is not of course directly measured. However, it can be easily expressed using existing calculated values:

:Unit_1   :totalProduction
    "var totalProduction =0.0;
    for(Resource stream : _this.getFacts(':hasProductStream'))
    {
        totalProduction += stream.getFact(':massFlow');
    }
    return totalProduction; "^^:groovy .

Instead of returning the object literal value (aka the script), the IntelligentGraph will return the result value for the script.

We can write this script even more succinctly using the expressive power PathQL:

:Unit_1  :totalProduction  
    "return _this.getFacts(':hasProductStream/:massFlow').total(); "^^:groovy

However, IntelligentGraph allows us to build upon existing calculations to simply express what would normally be difficult-to-calculate metrics, such as product yield or mass balance.

Calculate Mass Yield

Any production unit has different valued products. So a key metric is the yield of individual streams. This can easily be calculated as follows, using values that are themselves calculations.

var result= _this.getFact(":massFlow").floatValue()/ 
_this.getFact("^:hasStream/:totalProduction").floatValue();  
result;

Calculate Mass Balance

Measurements are not perfect, nor is the operation of a unit. One of the first indicators of a problem is when the mass flow in does not match the mass flow out. This can be expressed as another calculated property of a Unit:

return  _this.getFacts(":hasFeedStream/massFlow").total() -_this.getFacts(":totalProduction").total();

Querying Results

Access to the calculated values is via standard-SPARQL. However instead of returning the script literal, IntelligentGraph will invoke the script engine, 

Thus to access the :massFlow calculated value, the SPARQL is simply:

select ?massFlow
{
 :Stream_1 :massFlow ?massFlow 
}

If the script literal is required then the object variable can be postfixed with _SCRIPT:

select ?massFlow ?massFlow_SCRIPT 
{
 :Stream_1 :massFlow ?massFlow, ?massFlow_SCRIPT 
}

If a full trace of the calculation, including tracing calls to other scripts, is required then the object variable can be postfixed with _TRACE:

select ?massFlow ?massFlow_TRACE 
{
 :Stream_1 :massFlow ?massFlow, ?massFlow_TRACE 
}

How to Write IntelligentGraph Scripts?

Script Languages

Any Java 9 supported language can be used simply by making the corresponding language JAR available. 

By default, JavaScript, Groovy, Python JAR are installed. The complete list of compliant languages is as follows

AWK, BeanShell, ejs, FreeMarker, Groovy, Jaskell, Java, JavaScript, JavaScript (Web Browser), Jelly, JEP, Jexl, jst, JudoScript, JUEL, OGNL, Pnuts, Python, Ruby, Scheme, Sleep, Tcl, Velocity, XPath, XSLT, JavaFX Script, ABCL, AppleScript, Bex script, OCaml Scripting Project, PHP, Python, Smalltalk, CajuScript, MathEclipse

Script Context Variables

In addition, each script has access to the following predefined variables that allow the script to access the context within which it is being run.

  • _this, a Thing corresponding to the subject of the triples for which the script is the object.  Since this available, helper functions are provided to navigate edges to or from this ‘thing’ below:
  • _property, a Thing corresponding to the predicate or property of the triples for which the script is the object.
  • _customQueryOptions, a HashMap<String, Value> of name/value pairs corresponding to the pairs of additional arguments to the SPARQL extension function. These are useful for passing application-specific parameters.
  • _builder, a RDF4J graph builder object allowing a graph to be constructed (and manipulated) within the script. A graph cannot be returned from a SPARQL function. However the IRI of the graph can be returned, and any graph created by a script will be persisted.
  • _tripleSource, the RDF4J TripleSource to which the subject, predicate, triple belongs.

Fact and Path Functions

The spreadsheets’ secret sauce is the ability of a cell formula to access values of other cells, either individually or as a set. The IntelligentGraph provides this functionality with several methods associated with Thing, which are applicable to the _this Thing initiated for each script with the subject Thing.

Thing.getFact(String pathPattern) returns Value

Returns the value of node referenced by the pathPattern, for example “:volumeFlow” returns the object value of the :volumeFlow edge relative to _this node. The pathPattern allows for more complex path navigation.

Thing.getFacts(String pathPattern) returns Values

Returns the values of nodes referenced by the pathPattern, for example “:hasProductStream” returns an iterator for all object values of the :hasProductStream edge relative to _this node. The pathPattern allows for more complex path navigation.

Thing.getPath(String pathQL) returns Path

Returns the first (shortest)  path referenced by the pathQL, for example “:parent{1..5}” returns the path to the first ancestor of _this node. The pathQL allows for more complex path navigation.

Thing.getPaths(String pathQL) returns PathResults

Returns all paths referenced by the pathQL, for example :parent{1..5}” returns an iterator, starting with the shortest path,  for all paths to the ancestors of _this node. The pathQL allows for more complex path navigation.

Path Patterns

Spreadsheets are not limited to accessing just adjacent cells; neither is the IntelligentGraph. PathPatterns provide a powerful way of navigating from one Thing node to another. PathPatterns are inspired by SPARQL and propertyPaths, but a richer, more expressive, PathQL was required for the IntelligentGraph.

Examples

Examples of PathQL patterns are as follows:

_this.getFact(“:hasParent”)

will return the first parent of $this.

_this.getFact(“^:hasParent”)

will return the first child of $this.

_this.getFacts(“:hasParent/:hasParent”)

will return the grandparents of $this.

_this.getFacts(“:hasParent/^:hasParent”)

will return the siblings of $this.

_this.getFacts(“:hasParent[:gender :female]/:hasParent”)

will return the maternal grandparents of $this

_this.getFacts(“:hasParent[:gender :female]/:hasParent[:gender :male]”)

will return the maternal grandfather of $this.

_this.getFacts(“:hasParent[:gender [ rdfs:label “female”]]”)

will return the mother of $this but using the label instead of the IRI.

_this.getFacts(“:hasParent[eq :Peter]/:hasParent[:gender :male]”)

will return the grandfather of $this, who is the parent of :Peter.

_this.getFacts(“:hasParent[ne :Peter]/:hasParent[:gender :male]”)

will return grandfathers of $this, who are not the parent of :Peter.

The following diagram visualizes a path through a genealogical graph, from _this to the find the parents of a maternal grandfather born in Maidstone:

_this.getFacts(“/:parent[:gender :female]/:parent[:gender :male, :birthplace [rdfs:label ‘Maidstone’]]/:parent”)

 

Figure 4: PathPatternQL Example

How Is Performance?

IntelligentGraph takes the following actions to improve performance:

  1. All intermediate calculation results are cached, keyed by the subjectNode, predicate, and customQueryOptions.
  2. Cache can be cleared using the SPARQL function ClearCache.
  3. The SPARQL function ObjectValue takes as its argument the subject, predicate and objectValue. If the objectValue supplied is not of script datatype, the function will immediately return the objectValue.
  4. Circular functions, in which A calls B calls A, are detected and rejected.

Can I Debug?

Since IntelligentGraph combines calculations with the knowledge graph, it is inevitable that any evaluation will involve calls to values of other nodes which are in turn calculations. For this reason,  IntelligenrtGraph supports tracing and debugging:

 

Figure 4: Tracing Calculation

How Do I Add Intelligence to my RDFGraph?

Download

The project is located in Github, from where the intelligentgraph.jar can be downloaded from there:

The intelligentgraph.jar does not include all of the scripting etc language dependencies, so to use it you would have to be certain all dependencies are already available.

Install

IntelligentGraphwill work only with RDF4J version 3.3.0 and above. 

Copy intelligentgraph.jar

To  /usr/local/tomcat/webapps/rdf4j-server/WEB-INF/lib/intelligentgraph.jar

The RDF4J server will need to be restarted for it to recognize this new JAR and initiate the scripting engine.

[1] In this case the script uses Groovy, but any Java 9 compliant scripting language can be used, such as JavaScript, Python, Ruby, and many more.

Providing answers to users’ analysis, searching, visualizing or other questions of their own data

Creating an overall solution that presents data in a useful way can be challenging, but OData2SPARQL and Lens2OData solves this.

  • RDF-Graph: Data + Model = Information, allowing us to combine you raw data with an adaptable model to create meaningful information
  • OData2SPARQL: Information + Rules = Knowledge, provides you the ability to access that information combine with additional rules (SPIN and SHACL) to deliver useful knowledge that can be consumed by applications and BI tools.
  • Lens2OData: Knowledge + Action = Results, allows users to easily navigate, search, explore, and visualize this knowledge in such a way that it is easy to take action and produce results.

To see this all in action we have prepared a demonstrator that can be downloaded, and the following videos which illustrate the capabilities of this demonstrator.

  • Explore provenance of data sources which is retained by RDF- graph and OData2SPARQL rather than losing that provenance with typical ETL processing.

  • Explore the Transport For London train lines, stations and zones, illustrating how easy it is to transform any dataset to RDF_Graph and immediately get the benefits of OData access, and Lens UI/UX

To download and run this demonstrator go toDocker hub here