Service Outages and Downtime can bring huge setbacks to large enterprises. Depending on the type of business, outages can result in financial loss, loss of reputation or even severe penalties from customers or government authorities.
Having said that, IT Systems will fail. After all technology systems are designed keeping certain scale and environment in mind. So how do we overcome this, given that system thresholds will be breached sometime or the other.
Welcome ServiceNow Operational Intelligence – An Intelligent Platform which empowers IT Teams to proactively monitor IT Infrastructure across the enterprise.
Service Now Operational Intelligence is an extension to ServiceNow’s Events Management Framework. The main purpose of this platform is to reduce service outages, identify issues in advance and automate issue resolution.
We at Serviceberry have comprehensive experience with ServiceNow Operational Intelligence.
Here are things which we like about this platform and its practical usage in real life.
The operational intelligence platform captures unusual behavior by infrastructure components, which are not captured by the events framework. This is a step up for IT teams, as they no more have to reactively monitor infrastructure components.
The platform also sets infrastructure thresholds limits automatically. Using its machine learning based algorithm, it identifies patterns for each type of infrastructure component thus learning threshold limits in normal usage.
IT Teams don’t have to manually set threshold limits for thousands of infrastructure items they may be using.
The system does provide for manual override of these thresholds, to counter corner cases where the limit set by the ML based algorithm are inaccurate or human intelligence needs to be leveraged.
Anomaly Heat Map
This is one of our favorites. Anomaly heat map isolates issues based on issue score and probability of service outages due to the resources under play.
The idea is to intelligently mark all infrastructure resources with a heat map score. IT resources can drill down to each resource to further analyse the triggers and possible resolutions to avoid outages.
The Operational Intelligence platform also provides an alerts dashboard where anomalies are reported before they become issues. The objective is to identify trends which may become issues and proactively counter them.
This is done by applying anomaly score to each infrastructure component. High anomaly score indicates the infrastructure component is at risk and can cause an outage in the near future.
The Metrics Dashboard provides IT teams with a birds-eye view of all infrastructure components. They can triage alerts, identify issue triggers, correlate similar cases from historical data and find solutions to issues reported.
IT Teams can Search and Compare data across multiple infrastructure components or even multiple data centers.
ServiceNow Operational Intelligence bridges a huge gap in the NOW Platform. Issues which were not tracked by the events framework can now be proactively identified and acted upon. It brings Operational Metrics to the NOW Platform and complements the Event Management Framework.