Excel is the application platform of choice for many process manufacturing applications. Conversely, data reconciliation is not as widely used as it should be despite demonstrable benefits. Can we use Excel Solver to provide production-ready data reconciliation within Excel? This article describes an Excel template that uses Solver for data reconciliation. And the conclusions are …?
Why is data reconciliation not ubiquitous?
Excel is ubiquitous in process manufacturing as the end-user development and reporting platform of choice. It is used for data collection, end of shift reporting, accounting, modeling, scheduling, planning and (too?) much more, often frustrating the efforts of corporate IT who want more formal analysis tools and applications. Conversely over the last 25 years data reconciliation has not achieved the same ubiquitous status although it is an application that provides proven benefits of reduced accounting losses, better yield management, unreported material movement detection, and improved meter management amongst many other operational improvements. Why?
There are many aspects to the answer, but perhaps if data reconciliation were part of Excel rather than Excel just being a host for the raw data and reconciled results it could be used more easily in conjunction with the many data collection, reporting, accounting, modeling, scheduling and planning spreadsheet applications already in use.
The core computational engine of data reconciliation is the solver. Excel has an excellent built-in solver called, not surprisingly, Solver! Since Solver uses the “Generalized Reduced Gradient (GRG2) nonlinear optimization code , which was developed by Leon Lasdon, University of Texas at Austin, and Alan Waren, Cleveland State University…” it should be well suited for data reconciliation. Is it?
The best way to answer this question is to setup Excel using Solver to reconcile a typical reconciliation problem. Below is a description of my use of Solver which is contained in the attached Excel template, Reconciliation Example.xslm
Reconciliation model constraints
Although we use Excel to prepare reconciliation data or host the results of reconciliation, the reconciliation application will formulate its own model using its own syntax. I based this evaluation on this simple material balance:
By using Solver we can express the reconciliation model constraints as Excel formula within cells using the familiar Excel syntax. As usual Excel will display the numeric result of these formulae as shown below (extracted from the attached Reconciliation Example.xslm). I chose to organize the formula to show the total feed and total product in adjacent cells, but other formulations are possible.
However Excel also allows us to ‘Show Formulae’, offering the best of both worlds. Since I chose to use named ranges, the constraint formulae are very readable. Not only that, we can use ‘Trace Precedents’, and ‘Trace Dependents’ to track the relationship between constraints, a level of user interaction that is difficult to duplicate in dedicated reconciliation applications.
Also, of course, there is nothing to limit you to linear reconciliation problems in Excel.
Reconciled variables
I organized the reconciled variables as named cells so that the constraint equations are more readable. These will be the variables adjusted by Solver. In principle these variables could be distributed throughout the spreadsheet. However I chose to group them with the tolerances and estimates (measurements) in adjacent cells:
Furthermore the named ranges in the Labeled Variable column were in fact ‘arrays’ pointing to the Solution, Tolerance, and Estimates; this is a convenient way of grouping the solutions with the corresponding tolerances and estimates.
Reconciliation objective functions
Solver minimizes (or maximizes or drives to a value) a target cell which contains the value of the objective function. Most reconciliation systems will use a straight-forward sum-of-weighted-squares (Gaussian) formulation for this function. It can be argued that the measurement errors are drawn from a Gaussian distribution. Aside from this fact the Gaussian formulation is popular mainly because, in the case of linear constraints such as mass balances, the objective function can be solved without the need for an iterative search. However Solver uses a numeric search algorithm so we are not restricted to this Gaussian formulation. For this reason I created 4 alternative objective functions as VBA functions that can be called from within Excel. This avoids having to duplicate complex cell formulae, and thus reduces the risk of those pesky cell formula mistakes! The objective functions I chose were as follows, all taking one or more ranges of variables as arguments. The Fair and ContaminatedGaussian functions have additional parameters.
These functions are described below:
Gaussian
The ‘normal’ reconciliation objective function
e = (measurement-estimate)/tolerance
Gaussian= minimize sum{ e^2 }
Fair
A ‘robust’ estimator which is a generalization of the maximum-likelihood estimator:
Fair = minimize sum{ c^2 ( abs(e)/c – log( 1 + abs(e)/c)) }
Where c is a tuning parameter with approximate values in the range 10~20
Lorentzian
This objective function tries to avoid weighting gross errors unduly so they do not distort the overall result
Lorentzian = maximize sum{ 1/(1 + e^2/2) }
ContaminatedGaussian or Tjoa-Biegler
This objective function attempts a balance between the probability of a gross and Gaussian error; ‘nu’ is the probability of gross errors, and ‘b’ is the ratio defining the larger variance of gross errors with respect to the normal errors. Thus a value of b=10 means that the variance of gross errors is 100 (10*10) that of the normal errors. A value of nu=0.05 means there is a 5% probability of a gross error.
ContaminatedGaussian = minimize sum{ – log((1-nu) * exp(-e^2/2)
+ nu/b * exp(-e^2/(2*b)) )
+ log(sqrt(2*pi) * tolerance) }
ErrorCriticalLevel
A helper function that calculates critical error (global, constraint, or measurement) level given the degrees of freedom (number of constraints or measurements), and confidence level required, typically ~ 0.05, or 5%.
Using Solver
To solve, select Solver to display the dialog in which you can define the target cell containing the objective function, the range of cells containing the variables to be adjusted, and the range of cells containing the constraints. This setup can be conveniently saved into a range of cells for re-use.
Options allow various Solver properties to be tuned to optimize solving. However I found that the default setting of Solver would not converge to the actual solution and would drift off to an entirely different solution! I eventually found that the following parameters would usually converge to the expected solution irrespective of starting conditions, which I set to the measurements where available.
Comments on Objective Functions
The objective of this evaluation was the feasibility of using Solver for production data reconciliation, but since I tested different objective functions I want to share my observations, admittedly from a very small sample.
- Each objective function was given the same starting point (the measurements where available) and Solver options.
- I knew that the recycle was in gross error; it should have been 1505.
- Gaussian, Fair, and ContaminatedGaussian all produce approximately the same estimates with the Recycle at 1485.
- Lorentzian was far better at ignoring this gross error on the Recycle measurement, but is this what we wanted because it simply produced erroneous estimates for the unmeasured variables?
- All took about 25~75 iterations to converge which was not an issue given the size of this reconciliation but could be significant for larger problems.
- Given the similarity of results between the Gaussian, Fair, and ContaminatedGaussian objective functions, is it worth the complexity of moving away from the Gaussian?
Advantages of Solver-based Data Reconciliation
- Very easy to prepare data within Excel and to post process the results for presentation.
- Solver is not limited to linear-only constraints, thus component balance (bi-linear) reconciliation problems can be handled, although they were not tested here.
- Solver is not limited to equality constraints, thus for example all estimates can be forced to be positive.
Limitations of Solver-based Data Reconciliation
- The built-in Solver allows only for a maximum of 200 variables, although one can upgrade to the full version
- Solver uses a generic solver but is not optimized for the characteristics of a data reconciliation problem, such as linear or bi-linear problems, and quadratic cost functions.
- It is easy to define a problem that becomes unstable. In the example, adding the redundant constraint of the ‘world’ causes solution instability.
- The Solver was very sensitive to the estimates and derivative estimation methods when applied to the example problem and objective functions. Unfortunately it still indicated converged solution even though the results were way off. In the example we know the ‘correct’ answer allowing us to tune the solver until it gets the ‘right’ answer; in a real problem we do not.
- Irrespective of objective function, Solver took 25~75 iterations to converge. This could be an issue for realistically sized (100~1000’s of variables) reconciliation problems.
- There is no access to such intermediate products such as the covariance matrix with which to perform more sophisticated gross error detection.
Recommendations
Despite loving the ability to define non-standard objective functions and the use of Excel formula and pre-and post processing, the unreliability of the convergence to the known ‘correct’ solution means that I could not recommend it for production use especially when an ‘expert’ is not on hand to scrutinize the results.
However if we could use the Solver problem formulation within Excel but use a different optimization engine tuned for reconciliation then we would have the perfect production data reconciliation application… watch this blog!