Scalable Data Pipelines for Mastering & Integration - an ML Approach

Duration: 50 mins
Sonal Goyal
Founder & CEO, Nube Technologies

Integrating multiple and diverse datasets for analytics are an essential part of a data scientist's life. This is an essential part of the analytics journey, as feature engineering on dirty data will only be faulty. However, current tools do not make the process simpler. There is a wide variety of data attributes and formats to take care of. Preparing for analytics by matching and deduplicating records remains a challenge. Unifying matching records into a definite representation of an entity is both time consuming and error prone. Hence, preparing data for predictive analytics requires manual effort and occupies upto 60-70% of a data scientist's time.

In this talk, we discuss how data engineers and scientists can augment their data preparation by leveraging machine learning. We talk about schema mapping, identifying attributes on disparate data sources which refer to the same values. We discuss data mastering and how it is different from a typical clustering and classification problem. We also elaborate about scaling these approaches, and how machine learning can help.

Come see how ML can be leveraged for data preparation for analytics.

You may also be interested in

25 mins
Using Software, AI, DS to Source Growth Opportunities

Growing up in Honduras, Bolivia, India, Nepal and Indonesia, Thomas believes innovation decision making is one of the most far-reaching...

25 mins
Pull My Code: Effective Code Review

We need to talk about code reviews. Having a strong, effective code review process is the key-stone of quality, culture, learning...

180 mins
Foundations of Tech Leadership

According to a CareerBuilder study, only 40% of new engineering leaders receive formal training when they become a boss for...

50 mins
On Being an Effective Developer

As developers we not only operate in different contexts, but also often have these different contexts interplay as part of...

25 mins
Uncovering your Personal Values

We regularly review our code and attend retros, but what about tracking and reviewing our personal identity? We all know...

50 mins
The Dao of Tech Leadership

Effectively managing humans requires a certain level of self-awareness. Therefore, understanding your WHY and what drives you is vitally important...