AI Powered Root Cause Analysis for Production Alerts

Duration: 25 mins
Deepa Elumalai
Site Reliability Engineer, PayPal

At PayPal, SRE team troubleshoots production alerts (from ~2500 applications and services). There is always an inherent urgency in resolving the alert. At times, we are swamped with alerts, all requiring attention at the same time.

In this talk, we will share how we have started employing Machine Learning from the ground-up to give our platform the necessary power to predict the probable root cause of alerts.

Also, will elaborate how we use the existing troubleshooting results (from traditional programming) in machine learning to help improve the accuracy of the prediction. The design, working and methodology followed in experimental trials to identify the best model. The model that we built is integrated with our platform and pronounces the root cause in real time. The model has been showing promising results and is a game changer for SREs.

This presentation will mainly walk you through the journey of how we have built the machine learning models and employed the same in production.

You may also be interested in

50 mins
Mental Bookmarks and the Fractal Nature of Success

Good discussions are supposed to diverge from their intended path. Free association is a feature, not a bug, and helps...

25 mins
Pull My Code: Effective Code Review

We need to talk about code reviews. Having a strong, effective code review process is the key-stone of quality, culture, learning...

50 mins
Definition of Ready & Done - A Guide to Achieving Predictability

Delivering software often takes longer than we anticipate. Why is that? Part of the reason is not understanding the nuances...

50 mins
10x productivity for Developers and Architects

Productivity is key to success in software development. We will be exploring different principles, so you do not have to...

50 mins
Principles of Productive Software Developers

When working as a software developer, as well as in any other job, it’s important to be productive and to...

25 mins
How Non-violent Communication Can Help Keep the Peace on your Team

Non-violent communication will help you communicate with your coworkers in a manner that enables productivity and helps you understand how...