SQL2RDFincremental materialization uses a stream of source data changes (DML) and transforms them to the corresponding changes (inserts, deletes) in the triple store ensuring concurrency between RDBMS and triplestore. SQL2RDFincremental materialization is based on the same R2RML mapping used for virtualization (Ontop), and (Capsenta), and materialization (In4mium, and R2RML Parser) thus does not incur an additional configuration maintenance problem.

Mapping an existing RDBMS to RDF has become an increasingly popular way of accessing data when the system-of-record is not, or cannot be, in a RDF database. Despite its attractiveness, virtualization is not always possible for various reasons such as performance, the need for full SPARQL 1.1 support, the need to reason over the virtualized as well as other materialized data, etc. However materialization of an existing RDBMS to RDF is also not an ideal alternative. Reasons include the inevitable lack of concurrency between the system-of-record and the RDF.

Thus incremental materialization provides an alternative between virtualization and materialization, offering:

  • Easy ETL of source RDB as no additional configuration required other than the same R2RML required for virtualization or bulk materialization.
  • Improved concurrency of data compared with materialization.
  • Significantly less computational overhead than a full materialization of the source data.
  • Improved query performance compared with virtualization especially when reasoning is required.
  • Compatibility with R2RML thus reducing configuration maintenance.
  • Supports insert, update, and delete changes on the source RDBMS.
  • Source transactions can be batched, and the changes to the triplestore are part of a single transaction that can be rolled back in the event of a problem.
  • Supports change logging so that committed changes can be rolled back.

Contact  inova8 if you would like to try SQL2RDF

Let’s face it, RDF Graph datastores have not become the go-to database for application development like MySQL, MongoDB, and others have. Why? It is not that they cannot scale to handle the volume and velocity of data, nor the variety of structured and unstructured types.

Perhaps it is the lack of application development frameworks integrated with RDF. After all any application needs to not only store and query the data but provide users with the means to interact with that data, whether it be data entry forms, charts, graphs, visualizations and more.

However application development frameworks target popular back-ends accessible via JDBC and, now we are in the 21century, OData. RDF and SPARQL are not on their radar … that is unless we wrap RDF with OData so that the world of these application development environments is opened up to RDF Graph datastores.

OData2SPARQL provides that Janus-inflexion point, hiding the nuances of the RDF Graph behind an OData service which can then be consumed by powerful development environments such as OpenUI5 and WebIDE.

This article shows how an RDF Graph CRUD application can be rapidly developed, yet without losing the flexibility that HTML5/JavaScript offers, from which it can be concluded that there is no reason preventing the use of RDF Graphs as the backend for production-capable applications.

A video of this demo can be found here: https://youtu.be/QFhcsS8Bx-U

Figure 1: OData2SPARQL: the Janus-Point between RDF data stores and application development

Rapid Application Development Environments

There are a great number of superb application development frameworks that allow one to create cross platform (desktop, web, iOS, and Android), rich (large selection of components such as grids, charts, forms etc) applications. Most of these are based on the MVC or MVVM model both of which require a systematic and complete (CRUD) access to the back-end data via a RESTful API. Now that OData has been adopted by OASIS, the number of companies offering explicit support for OData is increasing, ranging from Microsoft, IBM, and SAP. Similarly there are a number of frameworks, one of which is SAPUI5 which has an open source version OpenUI5.

OpenUI5

OpenUI5 is an open source JavaScript UI library, maintained by SAP and available under the Apache 2.0 license. OpenUI5 lets you build enterprise-ready web applications, responsive to all devices, running on almost any browser of your choice. It’s based on JavaScript, using JQuery as its foundation, and follows web standards. It eases your development with a client-side HTML5 rendering library including a rich set of controls, and supports data binding to different models (JSON, XML and OData).

With its extensive support for OData, combining OpenUI5 with OData2SPARQL releases the potential of RDF Graph datasources for web application development.

Web IDE

SAP Web IDE is a powerful, extensible, web-based integrated development tool that simplifies end-to-end application development. Since it is built around using OData datasources as its provider, then WebIDE can be used as a semantic application IDE when the RDF Graph data is published via OData2SPARQL.

WebIDE runs either as a cloud based service supplied free by SAP, or can be downloaded as an Eclipse ORION application. Since the development is probably against a local OData endpoint, then the latter is more convenient.

RDF Graph application in 5 steps:

  1. Deploy OData2SPARQL endpoint for chosen RDF Graph store

The Odata2SPARQL war is available here, together with instructions for configuring the endpoints: odata2sparql.v2

The endpoint is this example is against an RDF-ized version of the ubiquitous Northwind databases. This RDF graph version can be downloaded here: Northwind

  1. Install WebIDE

Instructions for installing the Web IDE Personal edition can be found here: SAP Web IDE Personal Edition

  1. Add OData service definition

Once installed an OData service definition file (for example NorthwindRDF) can be added to the SAPWebIDE\config_master\service.destinations\destinations folder, as follows

Description=Northwind

Type=HTTP

TrustAll=true

Authentication=NoAuthentication

Name=NorthwindRDF

ProxyType=Internet

URL=http://localhost:8080

WebIDEUsage=odata_gen

WebIDESystem=NorthwindRDF

WebIDEEnabled=true

  1. Create new application from template

An application can be built using a template which is accessed via File/New/Project from Template. In this example the “CRUD Master-Detail Application” was selected.

The template wizard needs a Data Connection: choose Service URL, select the data service (NorthwindRDF) and enter the path of the particular endpoint (/odata2sparql/2.0/NW/).

Figure 2: WEB IDE Data Connection definition

At this stage the template wizard allows you to browse the endpoint’s entityTypes and properties, or classes and properties in RDF graph-speak.

Since the Web IDE and OpenUI5 is truly model driven, the IDE creates a master-detail given the entities that you define in the next screen.

The template wizard will ask you for the ‘object’ entityType which in this example is Category. Additionally you should enter the ‘line item’ but in this case there is only one navigation property (aka objectproperty in RDF graph-speak) which is the products that belong to the category.

Note that this template allows other fields to be defined, so titls and productUnitprice were selected.

Figure 3: WEB IDE Template Customization

  1. Run the application

The application is now complete and can be launched from the IDE. Right-click the application and select Run/Run as/As web application:

Figure 4: Application

Even this templated application is not limited to browsing data: it allows full editing of the categories. Note that even the labels are derived from the OData endpoint, but of course they can be changed by editing the application.

Figure 5: Edit Category

Alternatively a new category can be added:

Figure 6: Create New Category

Next Steps

That’s it: a fully functional RDF Graph web application built without a single line of code being written. However we can chose to use this just as a starting point:

  1. Publish data views, the SPARQL equivalent of a SQL view on the data, to the OData2SPARQL endpoint.
    • These are particularly useful when publishing reports in OpenUI5, Excel PowerQuery, Spotfire, etc.
  2. Modify the model that is published via the OData2SPARQL endpoint
    • The model that is published is extracted from the schema defined in the endpoint configuration. Thus the schema can be changed to suit what one wants in the endpoint metamodel.
  3. Edit the templated application
    • The templated application is just a starting point for adaptation: the code that is created is nothing more than standard HTML5/JavaScript using the OpenUI5 libraries.
  4. Build the application from first-principles
    • A template is not always the best starting point, so an application can always be built from first-principles. However the OpenUI5 libraries make this much easier by, for example, making the complete metamodel of the OData2SPARQL endpoint available within the application simply by defining that endpoint as the datasource.


Enterprises create data cathedrals with an enforced dogma to control data purity, causing much information to be outside its walls where informal information bazaars thrive. These information bazaars have suspect quality, uncertain provenance, yet are responsive to users’ needs. Metcalf’s law suggests that the benefit gained from integrated information grows geometrically1
 with the number of data communities that are integrated. How can we balance the dogma of the data cathedrals and the spontaneity of the information bazaar?

Enterprise’s database cathedrals reflect corporate dogma. Nothing gets changed without approval from high. Change is very slow. New databases orders get integrated only after a considerably long time assuming that the new data is 100% squeaky clean. So there are a lot of databases that are entirely outside the database cathedrals’ walls. Badly behaved sources of data might even be excommunicated.

Where does the other data go? It is not as though this other data does not exist, although many would like to pretend it to be so. Instead they are all in the information bazaar. Anyone with any information can set up their own information stall, and store their own data in Excel, Access, anywhere they want. They only specialize in their own data for their own use. This data is pretty good because that is all they need for their business. They share well with others but on a barter basis. In fact the information bazaar is chaotic, but lively, always changing to users’ demands, and a fun place to be. 

Why do we have the conflict between the database cathedral and the information bazaars?

The data cathedral offers security, quality, and good provenance. It provides the system of record for users who then should have complete confidence in their decision making. It does this using accurate relational models capturing enterprise information. But a relational model is designed by the cathedral hierarchy based on the closed model: only pure data can be entered into the database; impure data can lead to excommunication. 

The information bazaar has few rules of entry. As demonstrated by the web, it allows anyone to say anything about anything (AAA). Even with this deficiency we will regularly search the web to help us with our decision making, not exploring sources that are suspect, and filtering information that we feel lacks accuracy until we end up with information to support our decision.

Can we resolve these conflicting objectives?

Can we expect the cathedral hierarchy to relax its admittance criteria to let in as much of the information bazaar as possible? Somewhat, but we cannot expect miracles.

Can we expect the information bazaar to become more sober and responsible so that it can securely provide information with guaranteed quality and provenance? Somewhat, but we cannot expect an evangelical conversion?

Really this is not optimal, because the benefit of having data integrated grows geometrically with the number of interconnected sources, yet the database cathedral cannot grow because the information bazaar does not meet their purity dogma.

So how can these conflicting objectives be redeemed?

One path to redemption is to unite the information bazaar through a common semantic model. This allows all information to be available within a universal graph (model). Of course some riff-raff will get in, but again that is an advantage for the semantic model as you can also declare rules that will verify the accuracy of the data even though it is already stored. 

At the same time the data cathedral can continue to expand, hopefully at faster pace, by integrating those graphs that meet their criteria. 

However we allow users to access both the data cathedral, from where they can obtain the system of record, and information bazaar. We could even report results federating form the two data-sources annotating that information from the information bazaar with its provenance and hence less certain data quality. Doing this in a standards compliant way turns existing enterprise information resources into connectable, responsive and interoperable semantic assets.

Harmony

Using this approach we don’t need to force the data cathedral to relax its dogma, nor do we ask the information bazaar to shut down. Yet we can offer users access to 99% of the enterprise information providing users the ‘Metcalf’1 benefits of full integration. As semantic assets grow and connect, they enable a resilient semantic ecosystem of meaningful interactions between people, applications and data irrespective of the differences in structures, data schemas, governance and technologies. The dividing boundaries between the cathedral and the bazaar no longer need to be obstacles to information users. Semantic ecosystem seamlessly embraces and provides integrated access to data cathedrals and information bazaars alike.

 

1 If I have 10 database systems running my business that are entirely disconnected, then the benefits are 10 * K, some constant. If I integrate these databases in pairs (operations + accounting, accounting + payroll, etc), then the benefits increase to 10 * K * 2. If I integrate in threes, (operations + accounting + maintenance, accounting + payroll + receiving, etc), then the benefits increase four-fold (a corollary of Metcalf’s law) to 10 * K * 4. For quad-wise integration my benefits would be 10 * K * 8 and so on. Now it might not be 8 fold but the point is there is a geometric, not linear, growth in benefits as I integrate all of my information across my organization