Anomaly detection is the action of finding behaviors in the data that do not conform to expected patterns. These non-conforming behaviors can be referred to as anomalies, outliers, discordant observations, exceptions, aberrations, surprises, peculiarities, or contaminants in different application domains. Of these, anomalies and outliers are the two most commonly used terms in the context of anomaly detection.
Anomaly detection finds extensive use in a wide variety of telecommunication networks and IoT management systems, such as QoE impact, performance degradation, security and intrusion detection or predictive maintenance when the anomalies are detected in the early stages.
Anomaly detection is important because anomalies in data translate to significant (and often critical) actionable information in a wide variety of application domains. For example, an anomalous traffic pattern in a computer network could mean that a hacked computer is sending out sensitive data to an unauthorized destination, an anomalous MRI image might indicate the presence of malignant tumors, or anomalies in the electrical grid equipment sensor might indicate a potential blackout if preventive actions are not carried out.
Most IoT Operators today are flooded by the explosion of generated traffic from their sensors and equipment. One of the critical challenges is to maximize the data monetization, getting operational insights from the data and making informed data-driven decisions in this data saturated environment.
IoT Operators lack a good way of correlating, analyzing and acting on insights from this data in real-time to identify changes in customer behavior, continuously monitor and respond to network and infrastructure issues, and proactively detect and prevent revenue leakage and fraud. They need to find ways to deliver new revenue-generating and customer-satisfying services without overloading the network and in a cost-effective way.
Today, most IoT operators use offline data analysis for reporting and planning purposes. As such, their current analytics infrastructure does not provide real-time analytics capabilities that can help them continuously monitor and respond to customer issues as they occur and proactively detect and prevent network infrastructure threats and fraud.
Streaming analytics can effectively fill this gap in an IoT operator´s analytics infrastructure. A streaming analytics solution is designed to continuously ingest, correlate and analyze multiple streams of data in diverse formats and immediately trigger actions. This will help service IoT operators to take a preemptive, results-oriented approach to improve the overall experience for their high valuable demanding customers
The challenge of searching and recognizing patterns in data is fundamental and has a long and successful history. A pattern can be considered as the mathematical expression of specific knowledge, which either corresponds to recently discovered knowledge or something learned in the past and with the capability to recognize it in the present.
Pattern recognition is the automatic discovery of regularities in the data using computer algorithms. Actions can then be taken on the regularities, such as classifying the data into different categories to predict what will the most probable evolution for a given behavior be like, for instance, predicting when a wind turbine blade will break based on its vibration pattern.
This is a non-trivial problem because of the wide variability of potential vibration based on the material used to construct the blade, the wind turbine position, or the wind speed and direction. It could be tackled using static rules or heuristics for distinguishing the patterns based on the variable shapes previously defined, but in practice such an approach would lead to a proliferation of rules and exceptions to the rules, and so on, invariably providing poor results. Far better results would be obtained by adopting a machine learning approach in which a large set of samples, called a training set, is used to tune the parameters of an adaptive model.
Root Cause Isolation
Root Cause Isolation (RCI) is the process of identifying a source of anomalies (potentially problems) in a system using data observation. It represents a significant challenge in large-scale systems. Many of them, as diverse as IoT and Telecommunication networks, suffer from a common problem: when the system fails to function correctly, it is often difficult to determine which part of the system is the source of the problem. The fundamental challenge is that, often times, the symptoms of a failure manifest as end-to-end failures in the operation of the system, without causing obvious failures in the system components; noticing that something has gone wrong does not necessarily provide information about where to look to fix it.
It is well-known that diagnosing bugs in large-scale software systems using even the best debugging technologies can be a laborious task involving many man hours. Root Cause Isolation is the process of statistically determining the source of problems in a system by externally observing the behavior of the system. The communities operating networks have historically taken different approaches to solve this problem—alternatively referred to as fault diagnosis, alarm correlation, root cause analysis, and bug isolation in the context of a wide variety of systems. RCI captures the commonality across different system areas and components and the data flowing through, by defining an abstract system model and formalizing the root cause isolation problem for the abstract model.