IntelligentGraph embeds calculation and analysis capability within RDF knowledge graphs, rather than forcing analysis to be undertaken by exporting query results to external analysis applications such as Excel.

IntelligentGraph achieves this by embedding scripts into the RDF knowledge graph which are evaluated when queried with SPARQL.

Scripts, written in a variety of languages such as JavaScript, Java, and Python, access the underlying graph using simple pathPatternQL navigation.

Why IntelligentGraph?

At present calculations over stored data are either delivered by custom code or exporting the stored data to spreadsheets. The data behind these tools is inevitably tabular. In fact, so dominant are spreadsheets for analysis that the spreadsheet itself becomes the ‘database’ with the inherent difficulties of syncing that data with the source system of record.

The real-world is better represented as a network or graph of interconnected things Therefore a knowledge graph is a far better storage organization than tables or objects. However, there is still the need to perform ad hoc numerical analysis over this data. 

RDF DataCube can help organize data for analysis, but still the analysis has to be performed externally. Confronted with this dilemma, knowledge graph data would typically be exported in tabular form to a datamart or directly into, yet again, a spreadsheet where the analysis could be performed.

IntelligentGraph turns this approach on its head by embedding the calculations as scripts within the knowledge graph. These scripts are evaluated on query, and utilise the data in situ: no concurrency issues.. This allows the calculations to have knowledge of its neighbouring nodes and edges, just like Excel cells can access other cells in the spreadsheet. Access to other nodes within the graph uses pathPatternQL navigation.

Example Data and Analysis

An Industrial Internet of Things (IIoT) application is connecting all the measurements about a process plant, such as an oil refinery, into a knowledge graph that relates the measurements to the material flows through the process equipment.

Although there is an abundance of measurements and laboratory analyses available, the values required for operating and performance monitoring are not (and mostly cannot) be directly measured. 

For example:

  • Stream Mass-Flow: direct mass flow measurements are rare. Instead, a volume flow measurement is used in conjunction with a measured material density to calculate the mass-flow
  • Unit Mass Flow Throughput: this is calculated by summing either all feed stream mass flows or product stream mass flows.
  • Unit Mass Balance: this is calculated by differencing the feed from product mass flows
  • Product Stream Yield: this is the ratio of a stream’s mass-flow to the unit to which the stream is connected throughput.

Figure 1: Typical Process Flow Sheet

These are simple examples; however, they show the reliance on the knowledge graph structure to perform the analysis.

Solving data analysis, the traditional way

Data is in the database, analysis is done by the analysis engine (aka Excel), right?

Figure 2: Data analysis the traditional way

In this scenario, the local power user sets up a query to export data from the database and converts to a format that can be imported into Excel. Ever increasingly complex formulae are then written to wrangle the data into the results that are required.

Why is the spreadsheet approach risky?

  • The analysis is now separated from the data. Data changes will not be reflected in the analysis. Worse still, changes to the analysis might not be propagated to all the spawned copies of the spreadsheet.
  • The data is separated from the analysis. The analysis results are rarely re-imported into the data store where data vs analysis could be performed. Instead, even more data is extracted into the spreadsheet.
  • The difficultly of managing the separation of data from analysis becomes so great that in many cases the database is dispensed with entirely and the spreadsheet becomes the de-facto database.

Solving data analysis with an Intelligent Graph

The beauty of Excel is that a cell can contain either a value or a formula that can reference other cell’s values. Why not do the same with a graph: a node can have edges that terminate with a literal value, or a formula that can reference other node’s values.

This is illustrated in the diagram below:

  • The :massFlow property is not measured directly, so a formula is used for its value instead. This formula references $this, the node to which the calculation is attached, and uses the method getFact() to retrieve related values. The argument of getFact() is a pathPatternQL expression.
  • The :totalProduction property is not measured directly, so a formula is used instead which iterates over all of the ‘stream out’ nodes, retrieving the value of the :massFlow for each stream. The :massFlow value is, of course, in turn a calculation.

Figure 3: Intelligent Graph Data Analysis

Why is the IntelligentGraph approach so advantageous?

  • There is no separation between data and analysis, removing the risk of stale and inaccurate data and calculations.
  • The calculations embedded within the graph can take advantage of the knowledge that is contained within that graph. This makes the calculations far simpler than those that need to be embedded in spreadsheets.
  • The calculations will automatically utilize on the fly the changing knowledge.

How does IntelligentGraph Work?

IntelligentGraph can be deployed and used with these simple steps:

  1.     Addition of SPARQL extension functions to the triplestore.
  2.     Adding script literals as object values of subjects with datatype of the scripting language (groovy, javascript, python etc)
  3.     Modify any SPARQL query to use the SPARQL extension functions to process the script.

Installation of SPARQL Extension functions

These are available for any RDF4J compliant triplestore, and can be downloaded and installed as described in section:  Installation

Modelling with Scripts

Typically, a graph node will have associated attributes with values, such as a stream with volumeFlow and density values:

Stream Attributes:

:Stream_1
   :density ".36"^^xsd:float ;
   :volumeFlow "40"^^xsd:float .

The ‘model’ of the streams can be captured as edges associated with the Unit:

:Unit_1
   :hasProductStream :Stream_1 ;
   :hasProductStream :Stream_2 ;
.

The calculations are declared as literals[1] with a datatype whose local name corresponds to one of the installed script languages:

:Stream_1
   :massFlow "$this.getFact(‘:density’)* $this.getFact(‘:volumeFlow’);"^^:groovy ;
.
:Unit_1
   :totalProduction
      “var totalProduction =0.0;
      for(stream in $this.getFacts(‘:hasProductStream’))
      {
         totalProduction += stream.getFact(‘:massFlow’);
      }
      return totalProduction; "^^:groovy ;
.

SPARQL Queries

 Using SPARQL we would want to query this as follows:

select ?massFlow
{
    BIND(:objectValue(:Stream_1,:massFlow) as ?massFlow )
}

 

Where :objectValue is a custom SPARQL function, as allowed by the SPARQL standard and supported in RDF frameworks such as RDF4J Custom SPARQL functions.

The simplest of SPARQL queries is:

Select *
{ ?s ?p ?o }

This can be modified to become:

Select *
{ ?s ?p ?o . BIND(:ObjectValue(?s ?p ?o) as ?_o }

Or

Select *
{ ?s ?p ?o . BIND(:FactValue(?s ?p) as ?_o }

How to Write IntelligentGraph Scripts

SPARQL Extension Functions

The core of the IntelligentGraph is provisioned by the following SPARQL extension functions:

ObjectValue (IRI Subject, IRI Predicate, Value value, customQueryOptions … args)

Returns:  The evaluation of the property of the subject if the literal value is a script datatype, or the literal value otherwise

Positional arguments:

  1.     subject: IRI of the subject
  2.     predicate: IRI of the property to be evaluated for this subject
  3.     value: Value (IRI or literal) of the triple subject, predicate object
  4.     args…: optional parameter value pairs, for example
  5.     start,”…”^^xsd:dataTime
  6.     end, ”…”^^xsd:dataTime
  7.     aggregate,”Instant”|”Average”|”Maximum”|”Minimum”|”Totalized”

Example SPARQL

select ?massFlow
{
   VALUES(?s ?p)){(:Stream_1,:massFlow)}
   ?s ?p ?o .
   BIND(:objectValue(?s ?p ?o ) as ?massFlow )
}

ObjectProvenance (IRI Subject, IRI Predicate, Value literalValue, customQueryOptions … args)

Returns: The HTML formatted trace of the evaluation of the property of the subject

Positional arguments:

  1.     subject: IRI of the subject
  2.     predicate: IRI of the property to be evaluated for this subject
  3.     value: Value (IRI or literal) of the triple subject, predicate object
  4.     args…: optional parameter value pairs, for example
  5.     start,”…”^^xsd:dataTime
  6.     end, ”…”^^xsd:dataTime
  7.     aggregate,”Instant”|”Average”|”Maximum”|”Minimum”|”Totalized”

Example SPARQL

select ?massFlow
{
   VALUES(?s ?p)){(:Stream_1,:massFlow)}
   ?s ?p ?o . BIND(:objectProvenance (?s, ?p, ?o) as ?traceHTML )
}

FactValue (IRI Subject, IRI Predicate, customQueryOptions … args)

Returns: The evaluation of the property of the subject

Evaluates the script stored as the object value of a :subject :predicate.

The difference between factValue and objectValue is that the latter will take, as an argument, the objectValue in the triplestore and only process it is a script if it is of script datatype. Whereas factValue will always retrieve the objectValue from the triplestore and determine if it needs to be interpreted as a script or returned as a value. Thus, objectValue will be more efficient.

Positional arguments:

  1.     subject: IRI of the subject
  2.     predicate: IRI of the property to be evaluated for this subject
  3.     args…: optional parameter value pairs, for example
  4.     start,”…”^^xsd:dataTime
  5.     end, ”…”^^xsd:dataTime
  6.     aggregate,”Instant”|”Average”|”Maximum”|”Minimum”|”Totalized”

Example SPARQL:

PREFIX olgap: <http://inova8.com/olgap/>
SELECT  ?massFlow
WHERE {
   BIND( 
      olgap:factValue(
         <http://inova8.com/calc2graph/id/BatteryLimit1> , <http://inova8.com/calc2graph/def/density>
         ,'aggregate','Instant'
         ,'start','2010-08-01T00:00:00.000000000+00:00'^^xsd:dateTime
      ) 
      as ?massFlow)
}

Note that the additional arguments are optional.

FactProvenance (IRI Subject, IRI Predicate, customQueryOptions … args)

Returns: The HTML formatted trace of the evaluation of the property of the subject

Positional arguments:

  1.     subject: IRI of the subject
  2.     predicate: IRI of the property to be evaluated for this subject
  3.     scriptLiteral: A string literal containing the script to be executed
  4.     args…: optional parameter value pairs, for example
  5.     start,”…”^^xsd:dataTime
  6.     end, ”…”^^xsd:dataTime
  7.     aggregate,”Instant”|”Average”|”Maximum”|”Minimum”|”Totalized”

Example SPARQL

PREFIX olgap: <http://inova8.com/olgap/>
SELECT  ?provenance 
WHERE {
   BIND( 
      olgap:factProvenance(
         <http://inova8.com/calc2graph/id/BatteryLimit1> , <http://inova8.com/calc2graph/def/density>
         ,'aggregate','Instant'
         ,'start','2010-08-01T00:00:00.000000000+00:00'^^xsd:dateTime
      ) 
      as ?provenance )
}

Note that the additional arguments are optional.

FactDebug (IRI Subject, IRI Predicate,  Value script,  customQueryOptions … args)

Returns: The evaluation of the property of the subject

Evaluates the script supplied as the object value for a :subject :predicate. Overrides any stored script associated with the :subject: property :object triple. Thus, can be used to experiment with scripts.

Positional arguments:

  1.     subject: IRI of the subject
  2.     predicate: IRI of the property to be evaluated for this subject
  3.     scriptLiteral: A string literal containing the script to be executed
  4.     args…: optional parameter value pairs, for example
  5.     start,”…”^^xsd:dataTime
  6.     end, ”…”^^xsd:dataTime

aggregate,”Instant”|”Average”|”Maximum”|”Minimum”|”Totalized 

Example SPARQL

PREFIX olgap: <http://inova8.com/olgap/>
PREFIX script: <http://inova8.com/calc2graph/def/script/>
SELECT  ?result 
WHERE {
   BIND(
      “var result= $this.getFact(\"http://inova8.com/calc2graph/def/volumeFlow\",$customQueryOptions).floatValue()* $this.getFact(\"http://inova8.com/calc2graph/def/density\",$customQueryOptions).floatValue();  
      result;”^^script:groovy 
   as ?scriptLiteral)
   BIND( 
      olgap:factDebug(
         <http://inova8.com/calc2graph/id/BatteryLimit1> , <http://inova8.com/calc2graph/def/density>,
         ?scriptLiteral
         ,'aggregate','Instant'
         ,'start','2010-08-01T00:00:00.000000000+00:00'^^xsd:dateTime
      ) 
      as ?result )
}

Note that the additional arguments are optional

ClearCache()

Returns: true, having cleared the cache of calculated values.

Example SPARQL

select ?clearCache
{
   BIND(:clearCache() as ?clearCache)
}

SCRIPT Languages

Any Java 9 supported language can be used simply by making the corresponding language JAR available.

By default, JavaScript, Groovy, Python JAR are installed. The complete list of compliant languages is as follows

AWK, BeanShell, ejs, FreeMarker, Groovy, Jaskell, Java, JavaScript, JavaScript (Web Browser), Jelly, JEP, Jexl, jst, JudoScript, JUEL, OGNL, Pnuts, Python, Ruby, Scheme, Sleep, Tcl, Velocity, XPath, XSLT, JavaFX Script, ABCL, AppleScript, Bex script, OCaml Scripting Project, PHP, Python, Smalltalk, CajuScript, MathEclipse

 The following synonyms for JavaScript, Groovy, Python are recognized:

python, jython, groovy, Groovy, nashorn, Nashorn, js, JS, JavaScript, javascript, ECMAScript,  ecmascript.

Example Scripts

The following illustrate the IntelligentGraph scripts to perform the plant analysis example

Return Scalar Value

40;

Get Related Property Value

$this.getFact(":testProperty2").doubleValue()

Calculate Mass Flow

var result= $this.getFact(":volumeFlow").floatValue()* $this.getFact(":density>").floatValue();  
result;

Calculate Mass Throughput

var massThroughput=0.0; 
for(stream in $this.getFacts(":hasProductStream")){
   massThroughput += stream.getFact(":massFlow").doubleValue()
}; 
massThroughput;

Calculate Mass Yield

var result= $this.getFact(":massFlow").floatValue()/ $this.getFact("^:hasStream/:massThroughput").floatValue();  
result;

Calculate Mass Balance

var massFlowBalance=0.0; 
for(stream in $this.getFacts(":hasFeedStream")){
    massFlowBalance += stream.getFact(":massFlow").doubleValue()
}; 
for(stream in $this.getFacts(":hasProductStream")){
    massFlowBalance -= stream.getFact(":massFlow").doubleValue()
}; 
massFlowBalance;

Script Context Variables

In addition, each script has access to the following predefined variables that allow the script to access the context within which it is being run.

$this, a Thing corresponding to the subject of the triples for which the script is the object.  Since this available, helper functions are provided to navigate edges to or from this ‘thing’ below:

$property, a Thing corresponding to the predicate or property of the triples for which the script is the object.

$customQueryOptions, a HashMap<String, Value> of name/value pairs corresponding to the pairs of additional arguments to the SPARQL extension function. These are useful for passing application-specific parameters.

$builder, a RDF4J graph builder object allowing a graph to be constructed (and manipulated) within the script. A graph cannot be returned from a SPARQL function. However the IRI of the graph can be returned, and any graph created by a script will be persisted.

$tripleSource, the RDF4J TripleSource to which the subject, predicate, triple belongs.

Path Navigation Functions

The spreadsheets’ secret sauce is the ability of a cell formula to access values of other cells, either individually or as a set. The IntelligentGraph provides this functionality with several methods associated with Thing, which are applicable to the $this Thing initiated for each script with the subject Thing.

Thing.getFact(String pathPattern) returns Value

Returns the value of node referenced by the pathPattern, for example “:volumeFlow” returns the object value of the :volumeFlow edge relative to $this node. The pathPattern allows for more complex path navigation.

Thing.getFacts(String pathPattern) returns Values

Returns the values of nodes referenced by the pathPattern, for example :hasProductStream” returns an iterator for all object values of the :hasProductStream edge relative to $this node. The pathPattern allows for more complex path navigation.

Thing.getThing(String subjectIRI) returns Thing

Returns a node as defined by the IRI

Thing.prefix(String prefix, String IRI) returns Thing

Sets a prefix used in any pathPattern. Returning $this allows chaining.

Path Patterns

Spreadsheets are not limited to accessing just adjacent cells; neither is the IntelligentGraph. PathPatterns provide a powerful way of navigating from one Thing node to others. PathPatterns are inspired by SPARQL and propertyPaths, but a richer, more expressive, pathPattern was required for the IntelligentGraph.

Examples

Examples of pathPatterns are as follows:

$this.getFact(“:hasParent”)

will return the first parent of $this.

$this.getFact(“^:hasParent”)

will return the first child of $this.

$this.getFacts(“:hasParent/:hasParent”)

will return the grandparents of $this.

$this.getFacts(“:hasParent/^:hasParent”)

will return the siblings of $this.

$this.getFacts(“:hasParent[:gender :female]/:hasParent”)

will return the maternal grandparents of $this

$this.getFacts(“:hasParent[:gender :female]/:hasParent[:gender :male]”)

will return the maternal grandfather of $this.

$this.getFacts(“:hasParent[:gender [ rdfs:label “female”]]”)

will return the mother of $this but using the label instead of the IRI.

$this.getFacts(“:hasParent[eq :Peter]/:hasParent[:gender :male]”)

will return the grandfather of $this, who is the parent of :Peter.

$this.getFacts(“:hasParent[ne :Peter]/:hasParent[:gender :male]”)

will return grandfathers of $this, who are not the parent of :Peter.

The following diagram visualizes a path through a genealogical graph, from $this to the find the parents of a maternal grandfather born in Maidstone:

$this.getFacts(“/:parent[:gender :female]/:parent[:gender :male, :birthplace [rdfs:label ‘Maidstone’]]/:parent”)

 

Figure 4: PathPatternQL Example

How Is Performance?

IntelligentGraph takes the following actions to improve performance:

  1. All intermediate calculation results are cached, keyed by the subjectNode, predicate, and customQueryOptions.
  2. Cache can be cleared using the SPARQL function ClearCache.
  3. The SPARQL function ObjectValue takes as its argument the subject, predicate and objectValue. If the objectValue supplied is not of script datatype, the function will immediately return the objectValue.
  4. Circular functions, in which A calls B calls A, are detected and rejected.

Can I Debug?

Since IntelligentGraph combines calculations with the knowledge graph, it is inevitable that any evaluation will involve calls to values of other nodes which are in turn calculations. For this reason, several of the SPARQL functions support tracing and debugging:

 

Figure 4: Tracing Calculation

How Do I Add Intelligence to my RDFGraph?

Download

The project is located inGithub, from where the olgap-0.0.1-jar-with-dependencies.jar can be downloaded from here:

The olgap.jar does not include all of the scripting etc language dependencies, so to use it you would have to be certain all dependencies are already available.

Install

OLGAP will work only with RDF4J version 3.3.0 and above

Copy olgap-0.0.1-jar-with-dependencies.jar

To  /usr/local/tomcat/webapps/rdf4j-server/WEB-INF/lib/olgap.jar

The RDF4J server will need to be restarted for it to recognize this new JAR and initiating the scripting engine.

[1] In this case the script uses Groovy, but any Java 9 compliant scripting language can be used, such as JavaScript, Python, Ruby, and many more.


A key process manufacturing problem yet to be solved is the management of knowledge, the know-how, know-who, and know-when. Just as we have been eliminating data silos by introducing common data repositories, and creating common information (semantic) services by adding structure to the data, we need a common rules repository rather than having them distributed in Excel spreadsheets, work-flows, documents, government regulations etc. which causes silos of rules, lack of consistency, difficulty to ensure consistent compliance, and many more problems.

Semantic models (RDF, RDFS, and OWL) are excellent at federating information from multiple sources, but there are competing approaches for information integration. However semantic models are also the perfect way to express rules (SPIN) because rules are also described as data within the same database, unlike other technologies where the rules have to be expressed as code.

Given the importance of consistent application of rules in the process manufacturing business, is this then the sweet spot for semantic technologies?

Is data transparent?

We have been struggling with managing knowledge about our process plants for many years. We tried to solve this problem starting in the 80’s with sophisticated data collection applications and real-time databases. However a data-centric alone solution only provides a view of the process plant as a very long list of measurement tags, reinforcing the definition of data as “being discrete, objective facts or observations, which are unorganized and unprocessed and therefore have no meaning or value because of lack of context and interpretation.

Thus our view of the knowledge about the plant, its equipment, its operation, its performance provides a new definition of ‘transparency’ or lack there-of. If we do want to obtain knowledge we must make use of application programs within which are encoded our knowledge-extraction rules. Most prevalent is the ubiquitous spreadsheet in which there are:

  • Object names written into cells
  • RTDB Tag names in hidden cells
  • A fixed number of feed/product rows
  • New spreadsheet for each unit/report

Consequences of this data-without-information or knowledge are a ‘gagging’ of spreadsheets: information is encoded in the tag names, knowledge is encoded in the Excel layout and formulae, action relies on a user running the application and using their experience to detect problems and deduce remedial actions, uncertainty as to the consistent application of the rules throughout the organization, and many more problems.

Data + Model = Information

The problems of the data-centric approach have driven the process manufacturing industries to seek better information management where information is defined as “organized or structured data, which has been processed in such a way that the information now has relevance for a specific purpose or context, and is therefore meaningful, valuable, useful and relevant.” Recognizing the deficiency of a data-only approach, much effort in the last 20 years has been expended on adding context to this data. This usually takes the form of a database schema which contextualizes the data within a model of the process business and plant.

One of the ways that structure is added to data is to use a relational data model. A cornerstone of the relational model is the use of referential integrity rules. An example of a referential integrity rule within the context of a process plant data model might be that a material movement must have one, and only one, source and destination. However there are limits to what rules can be expressed with referential integrity constraints alone.
To obtain knowledge using this information approach we must ask the database via a database query. The advantage of the information approach over the data approach is that it allows one to ask the database complex questions. For example, to obtain a unit’s material balance:

  • For selected unit
    • Find all feed streams
    • For each feed stream fetch desired criteria

This allows for the use of one report for all units throughout the enterprise, compared with a new spreadsheet for every unit. These reports are also more robust since, for example, if the plant changes in some way then the report will reflect those changes. This certainly tackles some of the inconsistency problems faced by a data-centric approach, however many problems remain. For example, it would be exceedingly difficult to express the rule that an operator’s training concurrency expires, say, 2 years after his last training. Thus we still need to resort to complex reports, application programs, Excel macros, etc into which we can encode our ‘rules’ about the information.

Information + Rules = Knowledge

As good as an information-centric approach may be, it still fails to solve many of the business problems that we face that can only be solved by creating knowledge, defined as “know-how, know-who, and know-when; knowledge is action, not a description of action”. Some of the business challenges that can only be solved by a rule-centric solution are shown below.

Business Rule Challenges
Business Issue Problem Rule-centric Solution Advantages
Business rules are distributed throughout  the business Very difficult to know all of the business rules in place, are they duplicated, are they consistently applied Provide common ‘Rules repository’ for the entire organization

Parallels concept of ‘data warehousing’ of data

Many Excel spreadsheets containing knowledge of how to handle information Data Management left to IT or DBA

End users cannot modify the model or *their* data

Instead they resort to Excel on the path to hell!

Use rules server to consistently apply the semantic rules that are then common to all applications

Use Excel for reporting against data, information, and inferred results of rules

Custom reports written against custom information models have encoded rules Reporting languages such as SQL end up containing many business rules

Rules are duplicated in similar but not identical reports

Report against information server and inferred results of application of rules against information
Manual work processes, some of which are not documented Exercises such as ISO90001 remain only as documented procedures with no means of automating those procedures Use common rules repository to define the rules.

Documentation of the rules can then be created from the rules repository

Regulatory compliance rules exist only as documents Difficult to assure compliance to regulations when it is left to individuals who must be familiar with entire regulation Translate regulations to rules deposited in the rules repository
Loss of skills with aging workforce Loss of knowledge in the form of the rules (aka experience) as to how to handle situations Capture the experience as rules within rules repository
Difficult to audit the adherence to rules Since the rules are not formalized it is difficult to ensure that procedures are followed.

Personnel might be trained on procedures, but if the system does not enforce them then management cannot be certain they are followed.

Since most actions are recorded, it is possible to verify that the actions taken comply with the rules even if they were not being forced to follow the rules in the form of a workflow
Complexity of data and information Difficult for users to determine what rules should apply The results of rules become inferred information that is available for reporting
Impossible end-user reporting The Holy-Grail but never achieved

Even if good informational model that provides context to the data, the ‘knowledge’ that should be the result of a report (‘poor yields’, ‘excessive material loss’, ‘pending equipment failure’’) is impossible for end-users to encode into their reports

Semantic information can be reported using ‘query by example’: far easier than any other reporting

Inferred results of rules is available for reporting using the same technique

Knowledge which is defined as “know-how, know-who, and know-when.” requires rules about information, which requires contextualized data Measurement tags disguise the model. Users forced to interact with abstract measurement tags (10FI107.OP)

General Information models become too complex.

Customers desire to support standards, but competing standards supported by different constituencies: PPDMA/ProdML/WitsML/ISA95/MIMOSA/IEC-CIM etc

Although rules can be applied to any informational model, a semantic informational model is a better match to semantic rules

To obtain knowledge from a data-centric approach we encoded many rules into the application such as Excel. Although the information-centric approach could encode certain types of rules into the database schema, such as the referential examples above, there are many rules about the business that cannot be expressed in this way. Throughout any business we have many rules distributed throughout spreadsheets, reports, application programs, work-flows, procedures, etc. Below are examples of rules that can be found throughout Process Manufacturing.

Example operational compliance rules throughout the Process Manufacturing Model (PMM)
  Validation Calculation Deduction Invocation
Validate that the information  is consistent with known rules such as a movement must have a source and a destination Calculating another (data) statement such as the power of a pump is the product of the flow and pressure rise Deducing additional (object) statements such as knowing that the measurement of something downstream is the same as the upstream measurement Invocating an external process to ensure the correct action is taken based upon the change of a rule
Materiel Chain Finance        
Safety Validate that equipment in use has a valid HAZOP assessment     Initiate the safety review work process after an incident.
Initiate the HAZOP review whenever major equipment changed
Docs       Initiate review of encoded rules when document containing rules is revised.
Assets HME Validate that the equipment has undergone appropriate repair or upgrade as recommended Calculate Overall Operational Efficiency (OEE) based on availability, planned, and actual Deduce the onset of increased operational risk based on past observations, and planned use of equipment Initiate maintenance or repair process
Fixed Plant   Provide efficiency, energy consumption calculation based on data and model Deduce links to MSDS, maintenance records, maintenance procedures and other documents  
Technical Infrastructure Validate that critical equipment has a valid security policy.
Validate that users with access to critical equipment have valid training.
  Deduce connectivity between critical equipment through the LAN Initiate remedial action to update software and utilities when risk identified.
Security Validate user has the correct access privileges to perform this action Calculate the currency of any users privileges to perform action Deduce that the building containing critical equipment has secure access controls Initiate security review when equipment moved to new location
People Validate the correct training status of individuals Calculate time remaining for currency of their training Deduce what assets and facilities an individual has based on their training Initiate retraining program when retraining is deduced to become necessary
Consumables Validate that inventory of consumables matches the measured consumption   Deduce the route of consumable materials (additives etc) into the product stream based on the topology so that the costs can be correctly calculated  
Raw
Materials
       
Utilities   Calculate the quantities of utilities in the absence of complete measurements Deduce the route of utilities (water, electricity, fuel etc) into the production facilities based on the topology so that the costs can be correctly assigned  
Emissions Validate that no measured emissions are exceeding regulations Calculate emissions that are not directly measured
calculate total emissions
Deduce the flow of regulated material from the plant topology  
Business Procedures Validate that the correct procedures are being followed.   Deduce which business processes should be applied in particular situations. For example, if an area is designated as secure, then all processes applied to sub-areas must follow that same designated work processes. Initiate a process to update work processes when deviates from following recommended work processes are detected.
Intellectual
Property
       
Product Resource Validate that there is a valid exploration-rights associated with options Calculate the time remaining to take advantage of exploration rights   Initiate review of exploration rights prior to their expiration.
Field Validate that the field has active contracts Calculate the royalty payments based on the individual contracts Deduce the applicable contract rules Initiate contract reviews and payment processes.
Well Validate that each well has an active contract.
Validate that each well is operating in accordance with its operating permits.
Calculate the actual flow based on pressure and temperature (in absence of flow measurement).
Calculate variables required for regulatory reporting.
Infer the line-up between well and receiving station based on topology of lines. Initiate transmission of regulatory reporting requirements
Pipeline Validate that the nomination and routing information is complete: source and destination, quantity, and quality      
Crude Storage Validate that a new crude from pipeline is not being run into the incorrect storage Calculate the overall assay of the inventory based on component assays Deduce the assay available at the crude unit based on the line-up of the crude tanks to units Initiate rerouting of incoming crude to more appropriate storage.
Initiate crude-switchover on crude unit based on assay of new crude tank
Processing Validate that the configured mode of operation matches the planned or scheduled mode of operation Calculate material balances, yields, qualities, efficiencies. Deduce measurements of downstream elements based on the operational configuration and knowledge of location of actual measurements.
Deduce the operational configuration and flow model of the plant given the material movement and battery limit flows
Initiate a work flow to switch modes of operation
Storage Check to ensure that material planned to flow into storage is compatible with in-store material Calculate the actual contents of the storage Deduce the grade of the stored material based on existing stored material and inbound movements Invoke rescheduling of blends when an actual blend is found to be out-of-specification
Pipeline Validate that material is not planned for a line that would contaminate the contents of the line or the planned material. Calculate the material movement based on either source or destination quantity measurements   Invoke custody transfer dispute when transfer outside of acceptable measurement deviation
Port Storage Validate that quantity available for planned shipments Calculate inventory remaining after current shipment commitments Deduce the grade of material based on mixed assay of storage or stockpile Initiate pull-through of more inventory when commitments exceed current inventory and planned receipts
Shipping Check that the vessel is compatible with the scheduled berth Calculate demurrage charges based on agreed rules Deduce the stored material destination based on the vessel berth. Initiate loading re-schedule in the event that a vessel is delayed
Customer Validate that customer order has valid contract upon which transfer can be based      

 

These rules have similar characteristics that caused us to resolve the original data-silo problem. A simple example is that we would want to calculate the corrected custody transfer quantities both for operational and financial needs. We can also observe that rules span multiple business areas. For example the currency of an operator’s privileges span the training records, access control to the building housing the equipment, maintenance records of the equipment, and more. Finally we do not want the rules to be passive. Instead we want any deviation of the rules to initiate, or at least recommend, the correct remedial action. Thus we want our rules and information to be combined to achieve active knowledge, as shown below.

Knowledge + Action = Results

Even if we have the perfect set of rules, they have no business value unless we act upon the know-how, know-when, and know-who. Thus it is important to close the business loop by taking action on the knowledge to produce the desired results; no action, then no results. This means that the knowledge must have a mechanism for invoking the remediation process.

Realizing a rule-centric solutions

How is it possible to abstract the handling of rules away from the individual applications into which they are encoded? Our recommendation is that there should be a separate rules repository; a container that defines all of the rules. Although there are several candidates for describing rules, one favored choice would be to semantically define the rules using something like SPARQL Rules or SPIN. Since information is conveniently modeled semantically, it then makes sense to harmonize the technologies and use the same for the rules repository. The complete rules-centric architecture is shown below.

 

Data:                Raw measurements collected from the instruments and data entry, stored in real-time databases and historians.

Technology:      Real-time historians

Model:             Context and structure added to the data to create information. It takes the form database schema in the case of a relational model, or ontology in the case of a semantic model plus the configuration that represents the plant: equipment, topology, etc.  

Technology:      Relational schema, object structure or semantic ontology

Information:     The combination of data and model manifested as a database, relational, object, or semantic.

Technology:      Relational, object or semantic data store

Rules:               A repository of the rules. Traditional information system architectures fail to separate this as a separation element. Instead the rules are distributed throughout the application systems. We propose that all rules should be held in a common repository, just like data and information. This repository should be able to handle all rules: validations, calculations, deductions, and invocations. The best choice for organizing such a repository is semantically as this allows both information and rules to share the same technology.

Technology:      Semantic rules data store

Knowledge:     The combination of information and knowledge manifested as an inference engine capable of executing the rules. However it is unrealistic to expect rules to be only executed within the inference engine, so rules within spreadsheets, workflows, applications, and calculation engines should be synchronized with the rules repository.

Technology:      Rules inference engine together with synchronization interfaces

Action:             The actions invoked by the knowledge manifested as a workflow engine capable of invoking external actions via web-service interfaces.

Technology:      Workflow or temporal rules engine.

Visualize:         A portal through which the data, information, knowledge, and actions can be presented, as well as through which the model and rules can be configured.

Technology:      Portal, preferably one whose presentation is semantically deduced from the action, knowledge, information, and data

Control:            Either part of the visualization portal or a separate application through which the users can execute control based on the actions.

Technology:      Conventional control system interface through which users can control the plant.

Is it feasible?

One of the key architectural elements is the management rules as data, along with the closely related model and action elements. Outside of the process manufacturing industry and especially in finance and insurance rules engines have been in use for some time, thus there are several vendors, shown below. Of particular interest is TopQuadrant, the sponsor of SPARQL Rules or SPIN, that provides a standards-based way to define rules and constraints for Semantic Web data, and OntoRule an EU project that brings together leading vendors of knowledge-based systems and a handful of top research institutions to develop the technology that will empower business policies in the enterprise of the future.

  • Corticon  Decision table or rulesheet-centric business rules management system
  • FICO Blaze Advisor General purpose business rules management system with .Net, Java and COBOL deployment
  • IDIOM Decision-centric business rules management system
  • IBM ILOG RulesGeneral purpose business rules management system with .Net, Java and COBOL deployment
  • InRule.Net based business rules management system
  • JBoss Drools/JBoss Enterprise BRMSOpen source business rules management system that is working on updating its Decision Tables)
  • ModellicaEuropean business rules management system focused on the credit risk business available in the US through GDS Link
  • OntoRuleLeading vendors of knowledge-based systems and a handful of top research institutions join their efforts to develop the technology that will empower business policies in the enterprise of the future.
  • OpenRules Decision Management SystemOpen source Excel-based business rules management system.
  • PegasystemsA unified business rules and process management environment now including the Chordiant decision management products.
  • Progress BRMSDrools-based business rules management system acquired with Savvion
  • Sparkling LogicA “social-logic” platform for managing business rules
  • TopQuadrantTopBraid Suite™ leverages emerging technology to help our customers connect silos of data, systems and infrastructure and to build flexible applications from linked data models. SPIN is a standards-based way to define rules and constraints for Semantic Web data.
  • Visual RulesJava-based business rules management system from Bosch Innovations
  • XpertRuleXpertRule develops advanced Business Rules Management and Expert System software that helps organizations:
    • Capture expertise and skills in risk assessment, advising, and performance improvement as well as in selling and supporting both products and services.
    • Comply with regulations, policies, laws and legislations.
    • Automate process orchestration both for intelligent front-end user interface navigation, back-end process flow and data interchange.
  • ZementisA cloud-based execution platform for business rules and analytic models.