Chief Architect, Quantiv
After a problem has taken place, you may find yourself thinking, ‘I should have seen that coming’ or ‘I should have seen that happening’.
With the benefit of hindsight, the cause of a problem may seem obvious. And equally, the way to avoid it in future may appear to be dependent on ‘just’ monitoring a few simple indicators.
This is certainly true when it comes to IT applications. The IT/technical team are often the first responders to system emergencies, so it’s natural they’d feel the most in need of early warning systems. Simple measures, such as checking resource usage, network activity and database performance can be good indicators of overall health without needing detailed knowledge of how particular applications operate.
If the developers of those services can get involved in that monitoring, too, then the internal functions of the applications can also be used as sources of information.
The inside-out (or bottom-up) approach
As these techniques have become more sophisticated and commonly adopted, so dedicated systems have emerged to support gathering of application performance monitoring (APM) information.
These methods rely on monitoring the operation of low-level components that make up a wider application or a system (think of them as an inside-out or bottom-up approach).
But while this provides a good picture of technical performance, it can be incomplete from an operational point of view. Plus, operational problems can sometimes still exist, even if all the technical components are working correctly.
And when resource usage and database performance are within limits, it’s possible for operational processing not to perform as you’d expect it. For example, orders not being received isn’t necessarily triggering a resource, networking or database alert. And even if orders are being received, they’re not necessarily being processed. Or worse, some orders are being received and processed, while a small but significant minority are becoming ‘stuck’.
Providing minimum as well as maximum alert conditions could help identify some of these problems (though probably wouldn’t help with the ‘stuck’ issue). Plus, it requires knowledge of operational processes being performed, and so strays from pure technical monitoring into operational monitoring.
The monitoring therefore changes from concentrating on what is happening to what is required. In effect, it becomes an outside-in or top-down activity.
To support that style of monitoring, the nature of information collection changes. Less fine-grained detail is needed, and the metrics may not need to be collected as frequently.
But to allow comparisons between metrics to be made (for example, to check the progress of an order or the status of a customer), more contextual reference data needs to be collected.
It’s also helpful if that context builds a model of the reference data collected, so the operation can be checked against an expected model.
And if both can be exposed through a simple application programming interface (API), the information can be used not only for operational reporting, but also for driving further processing. This, in turn, allows for simplified and automated integration between applications.
The above is exactly the pattern of metric collection supported by Quantiv’s NumberCloud product, which collects data you already know you need to know.
However, defining metrics based on your organisation’s operational processing presumes you already have a model for that processing. But sometimes even that isn’t clear, or at least not in the terms needed to make metric definition (and so monitoring) easy. And that’s where our NumberWorks method comes in because it helps you work out what you don’t yet know you need to know.
Helping you to monitor… and discover
Taken together, NumberCloud and NumberWorks help you to monitor what you already know you need to know and discover what you don’t yet know you need to know.