Providing answers to users’ analysis, searching, visualizing or other questions of their own data

Creating an overall solution that presents data in a useful way can be challenging, but OData2SPARQL and Lens2OData solves this.

  • RDF-Graph: Data + Model = Information, allowing us to combine you raw data with an adaptable model to create meaningful information
  • OData2SPARQL: Information + Rules = Knowledge, provides you the ability to access that information combine with additional rules (SPIN and SHACL) to deliver useful knowledge that can be consumed by applications and BI tools.
  • Lens2OData: Knowledge + Action = Results, allows users to easily navigate, search, explore, and visualize this knowledge in such a way that it is easy to take action and produce results.

To see this all in action we have prepared a demonstrator that can be downloaded, and the following videos which illustrate the capabilities of this demonstrator.

  • Explore provenance of data sources which is retained by RDF- graph and OData2SPARQL rather than losing that provenance with typical ETL processing.

  • Explore the Transport For London train lines, stations and zones, illustrating how easy it is to transform any dataset to RDF_Graph and immediately get the benefits of OData access, and Lens UI/UX

To download and run this demonstrator go toDocker hub here

Integration Problem to be solved

  • Data in different databases, even with Linked Open data sources.
  • Misaligned models, different datasets have different meanings for classes and predicates that need to be aligned.
  • Misaligned names for the same concepts.
  • Replication is problematical.
  • Query definition and scope of querying difficult to define in advance.
  • Provence of data necessary.
  • Cannot depend on inferences being available in advance
  • Scalable architecture requires that all queries are stateless

Data Cathedrals versus Information Shopping Bazaars

Linked Open Data has been growing since 2007 from a few (12) interconnected datasets to 295 as of 2011, and it continues to grow. To quote “Linked Data is about using the Web to connect related data that wasn’t previously linked, or using the Web to lower the barriers to linking data currently linked using other methods.” (Linked Data, n.d.) 

Figure 1: Growth of the Linked Data ‘Cloud’

As impressive as the growth of interconnected datasets is, what is more important is the value of that interconnected data. A corollary of Metcalf’s law suggests that the benefit gained from integrated information grows geometrically[1] with the number of data communities that are integrated.

Many organizations have their own icebergs of information: operations, sales, marketing, personnel, legal, finance, research, maintenance, CRM, document vaults etc. (Lawrence, 2012) Over the years there have been various attempts to melt the boundaries between these icebergs including the creation of the mother-of-all databases that houses (or replicates) all information or the replacement of disparate applications with their own database with a mother-of-all application that eliminates the separate databases. Neither of these has really succeeded in unifying any or all data within an organization. (Lawrence, Data cathedrals versus information bazaars?, 2012). The result is a ‘Data Cathedral’ through which users have no way to navigate to find the information that will answer their questions.

Figure 2: Users have no way to navigate through the Enterprise’s Data Cathedral

Remediator at the heart of Linked Enterprise Data

Can we create an information shopping bazaar for users to answer their questions without committing heresy in the Data Cathedral?  Can we create the same information shopping bazaar as Linked Data within the Enterprise: Linked Enterprise Data (LED). That is the objective of Remediator.

First of all we must recognize that the enterprise will have many structured, aggregated, and unstructured data stores already in place:

Figure 3: Enterprise Structured, Aggregated, and Unstructured Data Icebergs

One of the keys to the ability of Linked Data to interlink 300+ datasets is that they are all are expressed as RDF. The enterprise does not have the luxury of replicating all existing data into RDF datasets. However that is not necessary (although still sometimes desirable) because there are adapters that can make any existing dataset look as if it contains RDF, and can be accessed via a SPARQLEndpoint. Examples are listed below

  1. D2RQ: (D2RQ: Accessing Relational Databases as Virtual RDF Graphs )
  2. Ultrawrap:(Research in Bioinformatics and Semantic Web/Ultrawrap)
  3. Ontop:(-ontop- is a platform to query databases as Virtual RDF Graphs using SPARQ)

Attaching these adapters to existing data-stores, or replicating existing data into a triple store, takes us one step further to the Linked Enterprise Data:

Figure 4: Enterprise Data Cloud, the first step to integration

Of course now that we have harmonized the data all as RDF accessible via a SPARQLEndpoint we can view this as an extension of the Linked Data cloud in which we provide enterprises users access to both enterprise and public data:

Figure 5: Enterprise Data Cloud and Linked Data cloud

We are now closer to the information shopping bazaar, since users would, given appropriate discovery and searching user interfaces, be able to navigate their own way through this data cloud.  However, despite the harmonization of the data into RDF, we still have not provided a means for users ask new questions:

What Company (and their fiscal structure) are we working with that have a Business Practise of type Maintenance for the target industry of Oil and Gas with a supporting technology based on Vendor-Application and this Application is not similar to any of our Application?

Such questions require pulling information from many different sources within an organization. Even with the Enterprise Data Cloud one has provided the capability to discover such answers. Would it not be better to allow a user to ask such a question, and let the Linked Enterprise Data determine from where it should pull partial answers which it can then aggregate into the complete answer to the question. It is like asking a team of people to answer a complex question, each contributing their own, and then assembling the overall answer rather than relying on a single guru.  Remediator has the role of that team, taking parts of the questions and asking that part of the question of the data-sources.

Figure 6: Remediator as the Common Entry Point to Linked Enterprise Data (LED)

Thus our question can become:

  1. What Business Practise of type Maintenance for the target industry of Oil and Gas?
  2. What Company are we working with?
  3. What Company have a Business Practise of type Maintenance?
  4. What Business Practise with a supporting technology based on Vendor- Application?
  5. What Company (and their fiscal structure)?
  6. What Vendor-Application and this Application is not similar to any of our Application?

This decomposition of a question into sub-questions relevant to each dataset is automated by Remediator:

Figure 7: Sub-Questions distributed to datasets for answers

Requirements for a Linked Enterprise Data Architecture

  • Keep it simple
  • Do not re-invent that which already exists.
  • Eliminate replication where possible.
  • Avoid the need for prior inferencing.
  • Efficient query performance.
  • Provide provenance of results.
  • Provide optional caching for further slicing and dicing of result-set.
  • Use Void only Void and nothing but Void to drive the query

[1]  If I have 10 database systems running my business that are entirely disconnected, then the benefits are 10 * K, where K is some constant. If I integrate these databases in pairs (operations + accounting, accounting + payroll, etc), then the benefits increase to 10 * K * 2. If I integrate in threes, (operations + accounting + maintenance, accounting + payroll + receiving, etc), then the benefits increase four-fold (a corollary of Metcalf’s law) to 10 * K * 4. For quad-wise integration my benefits would be 10 * K * 8 and so on. Now it might not be 8 fold but the point is there is a geometric, not linear, growth in benefits as I integrate all of my information across my organization.


Answering complex queries with easy-to-use graphical interface

The objectives of lens2odata are to provide a simple method of OData query construction driven by the metadata provide by OData services

  • Provides metamodel-driven OData query construction
    • Eliminates any configuration required to expose any OData service to lens2odata
  • Allows searches to be saved and rerun
    • Allows ease of use by casual users
  • Allows queries to be pinned to ‘Lens’ dashboard panels
    • Provides simple-to-use dashboard
  • Searches can be parameterized
    • Allows for easy configuration of queries
  • Compatible with odata2sparql, a service that exposes any triple store as an OData service
    • Provides a Query-Answering-over-Linked-Data (QALD) interface to any linked data.

Concept of Operation

Lens2odata consists of 3 primary pages with which users interact:

  1. Query: is the page in which users can
    1. add new OData services,
    2. compose queries,
    3. save those queries for reuse, and
    4. pin the queries as result fragments on a Lens
  2. Search: is the page in which users can
    1. select an existing query, and execute that query to explore the results
    2. from where they can navigate to Lens pages for specific entities or collections of entities
  3. Lens: are the pages, composed by users, which display fragments of details, optionally grouped into tabs, about a specific entity or collections of entities. Fragments can either be forms or tables. Other fragment layouts are being added.

Navigation between these pages are shown in the diagram below. Specifically these navigation paths are:

  1. Toggle between Search and Query to explore how the results would appear to a casual user
  2. Navigate to a concept’s Lens from Query preview hyperlinks
  3. Navigate to a concept’s Lens from Search results’ hyperlinks

Figure 1: lens2odata Navigation

Quick Start

Login to lens2odata

  1. Navigate to the Url provided by your administrator for lens2odata, http://<server>/lens2odata
  2. Enter users name and password
  3. Since no service has been previously setup, you will be prompted to enter the service display name and Url of that service. Check to use default proxy if not a local service
  4. After ‘save’, as long as your service was validated, you will immediately enter the Query page with a new query initialized with the first collection found in the service
  5. Execute ‘Preview’ to populate the Results Preview with a few values from the collection:
  6. You are now ready to:
    1. View the query via Search
    2. Explore the results further via Lens
    3. Expand the query with more values and filters

Let’s explore Search first of all

  1. Click on at the top-left of the Query page to navigate to the search page:

Search is the page that general users will access. From this page they can select a predefined query, and execute that query to start their discovery journey.

  1. Click on to populate the results form:

 

This shows more details than the Query page because, in the absence of any specific definition about what details of the instances of collections should be displayed, search will display whatever it can find.

  1. Click on the Url “Categories(1)” to navigate to the Lens for that type of instance.

The Lens page is an information dashboard that can be constructed for any type of instance that is discovered. In this case, since no specific lens page has been setup for ‘Category’ types of instance, a default page has been used with a single tab.

  1. There is another Uri on this page “Products”. Navigating this link will take you to a similar lens page, but one for a collection of instances:

 

  1. This Lens for Products shows a list of instances of products. The first column is a Uri to the lens page for the individual product. Click one to navigate to its Lens:
  2. You are now discovering information by navigating through the data, having started by querying a collection. Next steps would be:
  • The original query was not particularly specific. Lens2odata allows that query to be further refined by specifying which attributes should be displayed, adding navigation properties to other entities, adding filters to the query to restrict results or even parameterizing the query so that a user can execute the saved query, modifying the results just by supplying parameter values.
  • The Lens dashboard page can be further refined by adding more fragments of information on a tab, or adding more tabs to the lens page to logically group the information.

 

Availability of Lens2OData

A versions of Lens2OData  is available on GitHub at com.inova8.lens2odata


Enterprises create data cathedrals with an enforced dogma to control data purity, causing much information to be outside its walls where informal information bazaars thrive. These information bazaars have suspect quality, uncertain provenance, yet are responsive to users’ needs. Metcalf’s law suggests that the benefit gained from integrated information grows geometrically1
 with the number of data communities that are integrated. How can we balance the dogma of the data cathedrals and the spontaneity of the information bazaar?

Enterprise’s database cathedrals reflect corporate dogma. Nothing gets changed without approval from high. Change is very slow. New databases orders get integrated only after a considerably long time assuming that the new data is 100% squeaky clean. So there are a lot of databases that are entirely outside the database cathedrals’ walls. Badly behaved sources of data might even be excommunicated.

Where does the other data go? It is not as though this other data does not exist, although many would like to pretend it to be so. Instead they are all in the information bazaar. Anyone with any information can set up their own information stall, and store their own data in Excel, Access, anywhere they want. They only specialize in their own data for their own use. This data is pretty good because that is all they need for their business. They share well with others but on a barter basis. In fact the information bazaar is chaotic, but lively, always changing to users’ demands, and a fun place to be. 

Why do we have the conflict between the database cathedral and the information bazaars?

The data cathedral offers security, quality, and good provenance. It provides the system of record for users who then should have complete confidence in their decision making. It does this using accurate relational models capturing enterprise information. But a relational model is designed by the cathedral hierarchy based on the closed model: only pure data can be entered into the database; impure data can lead to excommunication. 

The information bazaar has few rules of entry. As demonstrated by the web, it allows anyone to say anything about anything (AAA). Even with this deficiency we will regularly search the web to help us with our decision making, not exploring sources that are suspect, and filtering information that we feel lacks accuracy until we end up with information to support our decision.

Can we resolve these conflicting objectives?

Can we expect the cathedral hierarchy to relax its admittance criteria to let in as much of the information bazaar as possible? Somewhat, but we cannot expect miracles.

Can we expect the information bazaar to become more sober and responsible so that it can securely provide information with guaranteed quality and provenance? Somewhat, but we cannot expect an evangelical conversion?

Really this is not optimal, because the benefit of having data integrated grows geometrically with the number of interconnected sources, yet the database cathedral cannot grow because the information bazaar does not meet their purity dogma.

So how can these conflicting objectives be redeemed?

One path to redemption is to unite the information bazaar through a common semantic model. This allows all information to be available within a universal graph (model). Of course some riff-raff will get in, but again that is an advantage for the semantic model as you can also declare rules that will verify the accuracy of the data even though it is already stored. 

At the same time the data cathedral can continue to expand, hopefully at faster pace, by integrating those graphs that meet their criteria. 

However we allow users to access both the data cathedral, from where they can obtain the system of record, and information bazaar. We could even report results federating form the two data-sources annotating that information from the information bazaar with its provenance and hence less certain data quality. Doing this in a standards compliant way turns existing enterprise information resources into connectable, responsive and interoperable semantic assets.

Harmony

Using this approach we don’t need to force the data cathedral to relax its dogma, nor do we ask the information bazaar to shut down. Yet we can offer users access to 99% of the enterprise information providing users the ‘Metcalf’1 benefits of full integration. As semantic assets grow and connect, they enable a resilient semantic ecosystem of meaningful interactions between people, applications and data irrespective of the differences in structures, data schemas, governance and technologies. The dividing boundaries between the cathedral and the bazaar no longer need to be obstacles to information users. Semantic ecosystem seamlessly embraces and provides integrated access to data cathedrals and information bazaars alike.

 

1 If I have 10 database systems running my business that are entirely disconnected, then the benefits are 10 * K, some constant. If I integrate these databases in pairs (operations + accounting, accounting + payroll, etc), then the benefits increase to 10 * K * 2. If I integrate in threes, (operations + accounting + maintenance, accounting + payroll + receiving, etc), then the benefits increase four-fold (a corollary of Metcalf’s law) to 10 * K * 4. For quad-wise integration my benefits would be 10 * K * 8 and so on. Now it might not be 8 fold but the point is there is a geometric, not linear, growth in benefits as I integrate all of my information across my organization


Icebergs of information loiter throughout process manufacturing IT waiting to sink any information integration project. The impact of semantic technologies is being felt in medicine, life sciences, intelligence, and elsewhere but can it solve this problem in process manufacturing? The ability to federate information from multiple data-sources into a schema-less structure, and then deliver that federated information in any format and in accordance with any standard schema uniquely positions semantic technology. Is this a sweet spot for semantic technologies?

Process Manufacturing Application Focus over the Years

Over the years we have been solving problems within process manufacturing IT only to uncover more problems. Once the problem was that of measurement data in silos which was solved by the introduction of real-time data historians. However that created the problems of data visibility, solved by the introduction of graphical user interfaces. This introduced data overload which was partially solved by the introduction of analytical tools to digest the information and produce diagnostics. Unfortunately these tools were difficult to deploy across all assets within an organization, so we have been trying to solve that problem with information models. The current problem is how to convert the diagnostics into actionable knowledge with the use of work-flow engines and ensuring the sustainability of applications as solutions increases in complexity.

Process Manufacturing Application Problems and Solutions over the Years
  1985-

1995

1990-

2000

1995-

2005

2000-

2010

2005-

2015

2010-
Problem Measurement data in silos Data access and visualization Analysis and business intelligence Contextualized information Consistent actioning Sustainability
Industry

Response

Real-time databases collecting measurements

(proprietary)

Graphical user interfaces, trending and reporting tools

(proprietary)

Analytical tools to digest data into information and diagnostics Plant data models (ProdML, ISA-95, ISO15926, IEC 61970/61968, Proprietary) ISO-9001 Outsourcing

Standards

Consequence Data but no user access Data overload Deployability of analysis to all assets Interpretation limited to experts Complexity, much more than RTDB, limiting sustainability Improved ongoing application benefits

 

However it is not only the increased technological complexity that is causing problems. Business decisions now cross many more business boundaries. When measurement data was trapped in silos we were content with unit-wide or plant-wide data historians. Now a well performance problem might involve a maintenance engineer located in Houston accessing a Mimosa[1]-based maintenance management system, an operations engineer located in Aberdeen accessing an OPC-UA[2]-based data historian, a production engineer located in London accessing a custom system driven by WITSML[3]-based feeds, and a facilities engineer using an ISO-15926[4] facilities management model. Not only are the participants in different locations and business units, but they also rely on different systems using different models to support their decision making. However they all should be talking about the same well, measured by the same instruments, producing the same flows, and processed by the same equipment.

The problem is that these operational support systems are not simply data silos whose homogeneous data we need to merge into one to answer our questions. In fact these operational support systems are icebergs of information. Above the surface they publish a public perspective focused on the core operational function of the application. However this data needs context, so below the surface is much of the same information that is contained in other systems. This information provides the context to the operational data so that the operational system can perform its required functions. For example the historian needs to know something about the instruments that are the source of its measurements; maintenance management systems need to know not only about the equipment to be maintained but the location of that equipment, physically and organizationally.

Figure 1: Icebergs of Information

Icebergs of information are not limited to the operational data stores deployed in organizations. An essential practice in these days of interoperability requirements is the adoption of model standards. However even these exhibit the same problems as shown by the diagram below. This diagram maps the available standards to its focus within the hydrocarbon supply chain.

Figure 2: Multiple Overlapping Model Standards

Increasing regulatory and competitive demands on the business are forcing decision making to be more timely, and to be more integrated across the traditional business boundaries. However these icebergs are getting in the way of effective decision making.

One way to make any or all of this information available to consumers is to create the bigger iceberg. ‘Simply’ create the relational database schema that covers every past, current, and future business need, and build adapters to populate this database from the operational data stores. Unfortunately this mega-store can only get more complex as it has to keep up with an expanding scope of information required to support the decision making processes.

Figure 3: Integration using the Bigger Iceberg

Alternatively we can keep building data-marts every time someone has a different business query.  However these do not provide the timeliness required to support operational decision making.

The Need for a Babel-Fish

We cannot meet the needs of the business, and solve their decision making needs by having one mega-store because it will never keep up with the changing business requirements. Instead we need a babel-fish (with thanks to the Hitchhikers Guide to the Galaxy).

This babel-fish can consume all of the different operational data in different standards, and translate them into any standard that the end-consumer wants. Thus the babel-fish will need to know that OPC UA’s concept ‘hasInstrument’ has the same meaning as Mimosa’s concept of ‘Instrumented’. Similarly 10FIC107 from an OPCUA provider is the same as 10-FIC-1-7 from Mimosa.

  1. Information providers (operational data stores) within the business will want to provide information according to their capabilities, but preferably using the standards appropriate for their application. For example measurements should be OPC UA, maintenance should use Mimosa
  2. Information consumers will want to consume information in the form of one or more standards appropriate for their application.

Figure 4: Integration babel-fish

The Semantic/RDF model comes to the rescue

First of all a definition: a semantic model means organizing all data and knowledge as RDF triples {subject, property, object}. Thus {:Peter, :hasAge, 21^^:years}, and {:Pump101, :manufacturedBy, :Rotek} are examples of RDF triples. RDF triples can be persisted in a variety of ways: SQL table, custom organizations, NoSQL, XML files and many more. If we were designing relational database to hold these RDF triples we would only have one ‘table’ so it may appear that we have no schema, in the relational database design-sense when we have key relationships to enforce integrity, and unique indices to enforce uniqueness. However we can add other statements about the data such as {:Pump101, :type, :ReciprocatingPump} and {:ReciprocatingPump, :subClassOf, :Pump}[5]. Used in combination with a reasoner we can infer consequences from these asserted facts, such as :Pump101 is a type of :Pump, and Peter is not a :Pump, despite rumors to the contrary.   These triples can be visualized as the links in a graph with the subject and object being the nodes of the graph, and the property the name of the edge linking these nodes:

Figure 5: RDF Triples as a graph

Over the years, new modeling metaphors have been introduced to solve perceived or actual problems with their predecessors. For example the Relational Model had perceived difficulties associated with reporting, model complexity, flexibility, and data distribution. A semantic model helps solve these problems.

Figure 6: Evolution of Model Metaphors

  • In response to the perceived reporting issues, OLAP techniques were introduced along with the data warehouse. This greatly eased the problem of user-reporting, and data mining. However it did introduce the problem of data duplication.
    • A semantic model can query against a federated model in which information is distributed throughout the original data sources.
  • In response to the perceived complexity issues, various forms of object-orientated modeling were introduced. There is no doubt that it is easier to think of one’s problem in terms of an object model rather than a complex relational or ER model, especially when there are a large number of entities and relations.
    • The semantic model is built around the very simple concept of statements of facts such as {:Peter, :hasAge, 21^^:years}, and {:Pump101, :manufacturedBy, :Rotek} combined with statements that describe the model such as {:Pump101, :type, :ReciprocatingPump} and {:ReciprocatingPump, :subClassOf, :Pump}.
  • The model flexibility problem occurs when, after the model has been designed, the business needs the model to change. In response to this flexibility issue, the choice is to make the original model anticipate all potential uses but then risk complexity, or use an object-relational approach in which it is possible to add new attributes without changing the underlying storage schema.
    • In semantic models these relationships are expressed in triples, using RDFS, SKOS, OWL, etc. Thus RDF is also used as the physical model (in RDF stores, at least).
  • There have been various responses to data distribution.
    • In the relational world there is not much choice other than to replicate the data from heterogeneous data stores using Extract-Transform-Load (ETL) techniques. In the case of homogenous but distributed databases distributed queries are possible, although it does require intimate knowledge of all the schemas in all of the distributed databases.
    • In the object-orientated world we are in a worse situation: it is very difficult to manage a distributed object in which different objects are distributed or attributes are distributed.

The good news is that a semantic approach is the ideal (or even the only) approach that can solve the information integration problem as follows:

  1. Convert to RDF normal form: Convert all source data into RDF. The data can be left at source and fetched on demand (federated) or moved into temporary RDF storage
    • There are already standard ways of doing this for any spreadsheet, relational database, XML schema, and more. For example, TopBraid Suite (http://www.topquadrant.com/products/TB_Suite.html) provides converters and adaptors for all common data sources. It is relatively easy to create more mappings such as OPCUA. The dynamic adapters act as SPARQLEndpoints[6].
  1. Federated data model: Create ‘rules’ that map one vocabulary to another.
    • The language of these rules would be RDFS, SKOS and OWL. For example you can declare {OPCUA:hasInstrument, owl:sameAs, Mimosa:Instrumented}. Note that these are simply additional statements expressed in RDF which are then used by a reasoner to infer the consequences such as :FI101 is actually the same as :10FIC101.
    • More sophisticated rules can also be created using directly RDF and SPARQL. For some examples, see SPIN or SPARQL Rules at http://spinrdf.org/ and http://www.w3.org/Submission/2011/SUBM-spin-overview-20110222/
  1. Chameleon data services: Create consumer queries that extract the information from the combined model into the standard required using SPARQL queries.
    • For example even though all instrument data is in OPCUA, a consumer could use a Mimosa interface to fetch this data. The results can then be published as web-services for consumption by external applications using SPARQLMotion (http://www.topquadrant.com/products/SPARQLMotion.html)

Figure 7: Federation End-to-End

Let’s look into these steps in detail:

Convert to RDF normal form

Despite the fact that data will be stored in different formats (relational, XML, object, Excel, etc) according to different schemas they can always be converted into RDF triples. Always is a strong word, but it really does work. There are already ways of doing this for any spreadsheet, relational database, XML schema, and more and it is relatively easy to create more mappings such as OPC-UA. The data can be left at source and fetched on demand (federated) or moved into temporary RDF storage. For example, TopBraid Suite (http://www.topquadrant.com/products/TB_Suite.html) provides converters and adaptors for all common data sources.

Figure 8: Conversion to RDF Normal Form

Federated Data Model

A federated data model allows different graphs (aka databases) to be aggregated by linking the shared objects. This applies to real-time measurements (OPC-UA), maintenance (MIMOSA), production data (ProdML), or any external database. We can visualize this as combining the graphs of the individual operational data stores into a single graph.

Of course there will be vocabulary differences between the different data-sources. For example, in the OPC-UA data-source you might have a property OPCUA:hasInstrument, and in a MIMOSA data-source the equivalent is called Mimosa:Instrumented. So the federated data model incorporates ‘rules’ that map one vocabulary to another. The language of these rules would be RDFS, SKOS, and OWL. For example, in OWL, you can declare {OPCUA:hasInstrument owl:sameAs Mimosa:Instrumented}. Note that these are simply additional statements expressed as RDF triples which are then used by a reasoner to infer consequences such as :FI101 is actually the same as :10FIC101.

There will also be identity differences between the different data-sources. These can also be handled by additional statements, such as {:TANK#102, owl:sameAs, :TK102 }. This allows a reasoner to infer that the statement {:TK102, :has_price, 83^^:$} also applies to :TANK#102, implying {:TANK#102, :has_price, 83^^:$}.

Figure 9: Information Federated from MulTiple Datasources

Chameleon Data Services

To extract information from the federated information, the best choice is SPARQL, the semantic equivalent of SQL only simpler. Whilst SQL allows one to query the contents of multiple tables within a database, SPARQL matches patterns within the graph. With SQL we need to know in which table each field belongs. With SPARQL we define the graph pattern that we want to match, and the query engine will search throughout the federated graphs to find the matches. In the example illustrated below we do not need to know that the price attribute comes from one data source, whilst the volume comes from another. In fact SPARQL allows even further flexibility. The price attribute for Tank#101 could come from a different data source than the price attribute for Tank#102. This is part of the magic of the semantic technology.

  

Figure 10: Graph Pattern matching with SPARQL

SPARQL can be used to directly query the federated graph for reporting purposes, however most consumers of the information will expect to interface to a web-service, with SOAP or REST being the most popular. These services do not have to be programmed. Instead they can be declared using SPARQLMotion (http:www.sparqlmotion.org) to produce easily consumed and adaptable web-services. The designer for SPARQLMotion is shown below:

Figure 11: Example SPARQLMotion

Semantic/RDF advantages for the Process Manufacturing

Despite solving a complex data integration problem, Semantic/RDF is inherently simpler. Can there be anything simpler than storing all knowledge as RDF triples?  Despite this simplicity, we do not lose any expressivity.

There is no predefined schema to limit flexibility. However the schema rules, encoded as tables and keys in the relational model, can still be expressed using RDFS, OWL, and SKOS statements.

Deconstructing all information into statements (triples) allows data from distributed sources to be easily merged into a single graph.

Any information model can be reconstructed from the merged graph using SPARQL and presented as web-services (SOAP or REST).

References

[1] MIMOSA is a not-for-profit trade association dedicated to developing and encouraging the adoption of open information standards for Operations and Maintenance in manufacturing, fleet, and facility environments. MIMOSA’s open standards enable collaborative asset lifecycle management in both commercial and military applications.

[2] The Unified Architecture (UA) is THE next generation OPC standard that provides a cohesive, secure and reliable cross platform framework for access to real time and historical data and events. 

[3] WITSML™ (Wellsite Information Transfer Standard Markup Language) is an industry initiative to provide open, non-proprietary, standard interfaces for technology and software that monitor and manage wells, completions and workovers.

[4] ISO 15926 provides integration of life-cycle data for process plants including oil and gas production facilities

[5] I should really be using URIs instead of text labels for subject, property, and objects, but the intent of the semantic model is conveyed more simply if we avoid identifiers like ‘http://www.example.org/equipment#Pump101’ and use :Pump#101

[6] SPARQL is a query language for RDF. A SPARQL endpoint is a protocol service that makes it possible to query a data source using SPARQL. The source itself does not need to be in RDF. It can, for example, be a traditional relational database. Later in this article we will describe SPARQL in more detail and show some query examples


Henry Ford had one-size-fits-all, and he wanted to own the entire supply chain. The world has been turned on its head to where every user wants to personalize everything, including their workspace.  How does this new world order fit with process manufacturing applications?

Not so long ago, process manufacturing applications (real-time databases (RTDB), manufacturing execution systems (MES), manufacturing operational management systems (MOM), enterprise resource planning systems (ERP)) were being developed by an enterprise’s own IT department. The first systems to be truly productized in the 1990’s were the real-time databases, with thanks to OSI, and the enterprise resource planning, with thanks to SAP.

At that time, in-house computing consisted mainly of big-iron mainframes supporting financial, HR, and similar corporate systems. Networks were limited to supporting remote terminal access. Mini-computers (a ridiculous label given today’s sizes, but they were truly transformational in their day) such as DEC’s PDP series were the best choice for departmental computing initiatives since corporate IT were not that interested in real-time data, recipe management, blend planning, and all the other detailed work necessary to operate a process plant. The mini-computer technology was a great choice for many reasons: they could be purchased under-the-radar of IT; they were local devices so had no dependencies on the immature networks; and they were close to the individuals that understood the actual business problems they were used to solve. 

Either the success of these departmental applications, the concern that corporate IT had regarding the increasing expenditure on these departmental initiatives, or increasing legislative or governance requirements meant that these applications were being pulled under the control of corporate IT. This was not entirely a bad thing because it could be argued that the focus should be on operating the plant, not creating and operating applications and associated infrastructure. However local ownership had its advantages as well: users would feel a sense of ownership and hence quality of the information; if there were missing applications they could be quickly created with Excel spreadsheets or Access databases, even though this contributes to spreadsheet-hell; locally-created applications could be quickly adapted to changing user requirements.  In summary, this was a period when users had a considerable amount of influence over their computing environment. However this influence has gradually declined over the last 20 years.

At the same time as we have seen a decline in the ability an enterprise application user has over the kind of and configuration of applications, we have seen the reverse trend in a user’s personal computing environment. Users can configure their own portals using iGoogle; they can set up their own meeting groups with FaceBook; they can browse for information using Bing and Google; and they can assess, locate, and install new applications by browsing app stores.

The challenge facing enterprise IT is to provide that framework so that end-users can create their own environment via a configurable portal, and even their own applications via configurable information, displays, analysis, applications, and work-flows. However LinkedIn, FaceBook, and Google have far more developer resources than enterprise IT can dream of. So instead of trying to emulate these social applications, why not embrace them to provide the capabilities the enterprise end-users need. As well as relieving enterprise IT of the development cost burden, it immediately provides access to those utilities (gadgets for iGoogle, web-parts for SharePoint) with which a user can fashion their own experience. 

Component Old Model: Enterprise IT New Model: Socialized Enterprise
Sourcing Role Sourcing Role Example
Platform Corporate IT architects Define and implement the platform upon which corporate applications will run. Cloud-sourcing Cloud-based PaaS platform reducing/eliminating the need for local platforms. Use Google App Engine
Framework Corporate IT Define development standards for in-house developed or purchased applications. Developer-sourcing Definition of Cloud-based Frameworks (data, workflow, BI, portals, etc) and SaaS designed for end-user configuration. Cordys or RunMyProcess cloud-hosted 
Applications Corporate Developers or Application vendors Buy/Build applications that met user-requirements as determined by requirements analysis. User-sourcing User-configured applications that adapt to the changing business needs without involving corporate resources: agile. User composes iGoogle with gadgets from the marketplace.
Utilities Users Locate utilities (Excel, mini-apps etc) that fill in the functionality gaps and can be deployed ‘under-the-radar’.  Crowd-sourcing User-acquired utilities from the open-market but which conform to the cloud-framework. iGoogle gadgets that solve specific user problems

 

Perhaps we are just turning the clock back to when the user within the process manufacturing plant had considerable control over their computing environment.