- Apr 3, 2023
- 9 min read

Industrialized AI: A New Paradigm for the Telecom Industry's Network Operations

Problem definition

Nowadays, Communications Service Providers (CSPs) operate more complex and extensive networks than ever before, leading to increasing difficulties in assuring service performance and a consistent demand for higher operational costs. The fifth generation of mobile networks (5G) brings new opportunities for consumers and enterprises but, it also includes a significant pressure to maintain optimal network performance and guarantees for high service availability to support the increasing dependency on connectivity. Network engineers are expected to handle, manage, optimize, monitor, forecast and troubleshoot multi-layer, multi-technology, and multi-vendor networks, while customers/enterprises demand higher reliability and lower time to resolution.

The inability to achieve complete network performance visibility at any given time not only puts CSP revenues at risk but also significantly increases incident investigation times and operational expenses (OPEX). In today's world, customers demand high-quality service to support their increasingly demanding applications, and every minute the network fails to meet those demands represents a lost revenue opportunity for CSPs.

In the EU, telecom operators notify significant incidents to their national authorities. At the start of every calendar year, the national authorities summarize those reports to provide anonymised and aggregated information about major telecom incidents. In 2021 annual summary contains reports of 168 incidents submitted by national authorities from 26 EU Member States (MS) and 2 EFTA countries [1]. The total user hours lost was 5,106 million user hours, which represents 0.12% of the total billable hours. It does worth nothing mentioning that this report only shows highly relevant incidents. According to some commercial research reports, this figure might rise up to 1.5% of the billable hours [2] According to the Telecommunications Engineers professional association from Spain [3], 2021 might have impacted CSPs revenues only with relevant incidents in €102M for Mobile and €152M for broadband users.

System failures continue the most relevant factor in terms of impact but the downward trend continues thanks to the automation initiatives implemented by CSPs. However, system failures still accounted for 363 million user hours lost and 23% of total incidents have human errors as a root cause.

To summarize, the operational models currently followed by CSPs present several challenges in keeping subscriber hours free from service disruptions. The existing model is reactive to system failures and is prone to human errors. Furthermore, modern network operations' complexity poses additional challenges, including the inability to quickly detect and predict system degradation and failures leading to revenue impacts, reputational costs, and increasing churn rates.

To address these challenges, CSPs must adopt a predictive approach supported by mature and reliable AI-powered capabilities. Without such capabilities, it will be difficult for CSPs to meet the demanding requirements of the market. Therefore, CSPs must continue to invest in new tools and technologies that allow for effective proactive monitoring and management of their network to minimize disruptions and improve overall service quality. By adopting a predictive approach and leveraging advanced technologies, CSPs can enhance their service offerings, improve customer satisfaction, and maintain their competitive edge in the market.

The situation of AI in Automating Network Operations

After years of unrealistic expectations, artificial intelligence (AI) has finally become a practical reality and a competitive necessity for Telecom businesses. Despite the current frenzy of AI advancement and adoption, many leaders and decision-makers still have fundamental questions about what AI can do for their businesses beyond theoretical applications. This scepticism is understandable since AI adoption has matured disparately across industries, and in many cases, the return on investment did not reach the original expectations due to several underestimated reasons both organizational and technical.

However, it's important to note that AI has come a long way, and there are now proven use cases and tangible benefits across various industries. It is expectable for new disruptive technologies to take time before getting the maturity to produce predictable, repeatable, and cost-effective business value. From predictive maintenance in manufacturing to personalized customer experiences in retail, AI has demonstrated its ability to optimize processes, enhance decision-making, and drive business value. As such, it's no longer a matter of if businesses should adopt AI, but rather how they can best leverage it to gain a competitive advantage.

To achieve success with AI, businesses must first identify areas where AI can make the most significant impact and then implement the appropriate tools and infrastructure to support its adoption. Furthermore, they must prioritize building a strong foundation for data management, governance, and ethical considerations. By taking a thoughtful and strategic approach to AI adoption, businesses can harness its full potential and realize its benefits while avoiding common pitfalls.

AI currently is consumed in an artisanal, alchemy[4], handcrafted way produced by highly skilled craftsmen supported by experts in the application domain. Even though nowadays the Data Science creation process counts with a significant number of defined building blocks, engineering techniques, or templates; the reality is that the majority of those resources are created to enhance the artisanal process and not to replace it with an automatic, industrialized process. In addition, they were not created with a particular business problem in mind but with the aim of being generic. As a result, the vast majority of ML/AI projects will either fail to achieve their business objectives or will remain in Proof of Concept mode. Here we illustrate the main reasons for it:

Data is rarely there or properly governed across the organization.

Data governance manages the availability, usability, integrity, and security of data used in an organization. It includes the processes, policies, and standards that ensure data is reliable, accurate, consistent, and complete. Data governance is essential for AI projects as it ensures that the data used to train AI models is of high quality and can provide accurate and reliable insights.

When data governance is not properly implemented, it can hinder the success of AI projects in several ways. Here are some examples:

Poor Data Quality: AI models rely heavily on high-quality data to make accurate predictions and decisions. If the data used to train AI models is not reliable, inaccurate, or incomplete, the AI model's predictions and decisions may be flawed, resulting in incorrect or unreliable results.
Data silos: Data silos occur when data is stored in different formats or locations, making it difficult to integrate and analyze. AI models require large amounts of data to be trained effectively, and if the data is siloed, it can limit the AI model's ability to identify patterns and insights.
Lack of data governance policies: Without proper data governance policies in place, it can be challenging to ensure that data is managed in a consistent and secure manner. This can lead to data privacy issues, security breaches, and compliance violations, which can also undermine the success of AI projects.
Lack of stakeholder buy-in: Successful data governance requires collaboration between business units, IT departments, and other stakeholders. If stakeholders are not engaged and invested in the data governance process, they may not provide the necessary resources or support for AI projects.

In conclusion, data governance is a critical component of AI projects, and its absence or inadequacy can hinder the success of AI projects. Organizations need to prioritize data governance when implementing AI projects to ensure the data used is accurate, reliable, and secure.

Data Scientists are no Telco Experts

While statistical analysis is essential in gaining insights into business operations, it's important to note that statistical truths do not always equate to business insights. To extract real value from analytics, it's critical to pair statistical analysis with deep domain knowledge. However, this can be a challenging task, as it often requires a unique combination of skills that bridge the gap between Data Science potentiality and the actual business needs.

The lack of skills that connect Data Science potentiality with genuine business pains fuels a perception that many businesses face. There is often a significant gap between the insights generated through data analysis and their practical application to real-world business problems. The solution is to identify and hire skilled professionals with both deep data analysis expertise and a strong understanding of the business domain. These professionals can then effectively communicate insights to key stakeholders, translate business problems into analytical models, and ultimately drive positive outcomes for the organization.

ROI is uncertain (High Risk – High Time to Value)

In most cases, organizations need to analyse data to identify patterns and gain insights that can be used to inform decision-making and ultimately drive ROI. However, while the insights gained from data analysis can inform decision-making, it can be challenging to predict the ROI of data science projects in advance. Popular data analytics deployment models often include a significant component of experimentation, which can lead to longer delivery times, usually between 6-9 months.

This is because data science projects often involve significant uncertainty, and the outcomes are dependent on several factors, including the quality and quantity of data, the chosen modelling techniques, and the accuracy of predictions. Additionally, unforeseen external factors such as market conditions or changes in subscriber’s behaviour can also impact ROI. However, the uncertainty around ROI in data science endeavours is often related to the way data science is currently consumed, which, as previously stated in this document, still has a strong experimentation component at its core. Because data science tools and frameworks typically take a generic approach, rather than focusing on solving specific problems, each new project often starts from scratch. This lack of reusability can lead to increased uncertainty around the predictability of ROI and actionable results from AI.

Poor productization

It is difficult to estimate the exact percentage of machine learning (ML) models that never reach the live network environment, nevertheless, studies have shown that a significant proportion of ML models fail to make it into production.

There are several reasons for this, including the additional complexities that are not covered during the experimentation phase, such as scalability, elasticity, and model obsolescence. ML models that work well in a controlled experimental environment may encounter unexpected challenges when deployed in a production environment with real-world data and usage patterns.

In addition, data governance particularities can also impact the success of ML models in production. ML code and models are often difficult to repeat due to the unique characteristics of the data used to train them. This can make it challenging to reproduce results and ensure that models are accurate and reliable in a production environment.

Time to change Data Science Paradigm. Industrialized AI comes to the stage.

All technologies, across all industries, face challenges when translating scientific concepts into industrial methods, and AI is no exception. However, AI has unique features that make this challenge particularly complex. It covers a wide range of mathematical concepts and techniques applicable to a vast set of network operational situations, making it impossible for any individual to be fully proficient in all areas of AI. As a result, successful AI implementation requires a collective effort to identify network problems, required data, scalability and elasticity requirements, and issues that need to be specifically industrialized for standardized and efficient AI integration.

As previously discussed in this document, AI adoption in day-to-day Network Operations Centres is still in the experimentation phase. The concept of "Industrialized AI" refers to a standardized, systematic approach to integrating AI into a Telecom network's operations, creating a more efficient, scalable, and sustainable way of doing so.

Industrialized AI involves the development of a repeatable and scalable process for designing, building, deploying, and maintaining AI models. This process includes the use of standardized tools, methodologies, and platforms that can be used across the organization to ensure consistency and efficiency in AI development and deployment.

Key components of industrialized AI include the use of automated machine learning (AutoML) tools to streamline the model development process, standardized data governance frameworks to ensure data quality and consistency, and cloud-based infrastructure to provide scalable and secure environments for AI deployment.

The benefits of industrialized AI include faster time-to-market for AI-powered products and services, reduced development costs, and increased efficiency and accuracy in AI model development and deployment. Additionally, industrialized AI can help democratize AI across the organization by providing standardized tools and platforms for use by a range of stakeholders, including data scientists, developers, and business analysts.

In summary, industrialized AI represents a systematic, standardized approach to integrating AI across an organization's operations. By adopting industrialized AI practices, organizations can achieve faster time-to-market, reduce development costs, increase efficiency and accuracy, and democratize AI across the organization.

Conclusion

In today's telecom industry, communications service providers (CSPs) operate complex and extensive networks, which makes it increasingly difficult to assure service performance and maintain high service availability. The inability to achieve complete network performance visibility can put CSP revenues at risk and significantly increase operational expenses. CSPs must adopt a predictive approach supported by mature and reliable AI-powered capabilities to address these challenges. By leveraging advanced technologies, CSPs can enhance their service offerings, improve customer satisfaction, and maintain their competitive edge in the market.

However, the adoption of AI in the day-to-day Network Operations Centre is still in the experimentation phase, and many data science projects fail to achieve their business objectives. The lack of skills that connect data science potentiality with genuine business pains fuels a perception that many businesses face. Moreover, it can be challenging to predict the return on investment of data science projects in advance. Poor data governance, poor productization, and the additional complexities not covered during the experimentation phase, such as scalability, elasticity, and model obsolescence, are some of the main reasons for this.

To address these challenges, CSPs must embrace a new paradigm of data science, industrialized AI. Industrialized AI represents a systematic, standardized approach to integrating AI across an organization's operations. By adopting industrialized AI practices, organizations can achieve faster time-to-market, reduce development costs, increase efficiency and accuracy, and democratize AI across the organization.

In conclusion, it is clear that AI has become a competitive necessity for Telecom businesses, and the adoption of industrialized AI is key to unlocking its full potential. CSPs must prioritize data governance, prioritize building a strong foundation for data management and governance, and implement the appropriate tools and infrastructure to support the adoption of AI. By taking a thoughtful and strategic approach to AI adoption, CSPs can harness its full potential and realize its benefits while avoiding common pitfalls.

References [1] https://www.coit.es/sites/default/files/etno_state_of_digi_2022.pdf [2]https://www.lightreading.com/mobile/mobile-security/mobile-ops-lose-$15b-yearly-to-network-outages/d/d-id/706609 [3] https://www.coit.es/sites/default/files/etno_state_of_digi_2022.pdf [4] https://blogs.gartner.com/andrew_white/2019/01/03/our-top-data-and-analytics-predicts-for-2019/

Jose Pineda (jpineda@thingbook.io)

Roman Ferrando (roman.f@thingbook.io)