Scalable Data Pipelines for Mastering & Integration - an ML Approach

Duration: 50 mins
Sonal Goyal
Founder & CEO, Nube Technologies

Integrating multiple and diverse datasets for analytics are an essential part of a data scientist's life. This is an essential part of the analytics journey, as feature engineering on dirty data will only be faulty. However, current tools do not make the process simpler. There is a wide variety of data attributes and formats to take care of. Preparing for analytics by matching and deduplicating records remains a challenge. Unifying matching records into a definite representation of an entity is both time consuming and error prone. Hence, preparing data for predictive analytics requires manual effort and occupies upto 60-70% of a data scientist's time.

In this talk, we discuss how data engineers and scientists can augment their data preparation by leveraging machine learning. We talk about schema mapping, identifying attributes on disparate data sources which refer to the same values. We discuss data mastering and how it is different from a typical clustering and classification problem. We also elaborate about scaling these approaches, and how machine learning can help.

Come see how ML can be leveraged for data preparation for analytics.

You may also be interested in

50 mins
The Dao of Tech Leadership

Effectively managing humans requires a certain level of self-awareness. Therefore, understanding your WHY and what drives you is vitally important...

25 mins
Developer is 'King' - Unleashing Innovation by Unblocking your Developers

As each industry is disrupted by the wave of digital transformation, harnessing and unlocking new ideas can only be done...

50 mins
All That Glitters Ain't Gold

Let’s use Kafka, everywhere! Let’s try event driven architecture! How about Rust for this service? Let’s use Elixir for this!...

25 mins
How Non-violent Communication Can Help Keep the Peace on your Team

Non-violent communication will help you communicate with your coworkers in a manner that enables productivity and helps you understand how...

50 mins
Mental Bookmarks and the Fractal Nature of Success

Good discussions are supposed to diverge from their intended path. Free association is a feature, not a bug, and helps...

50 mins
Remote Working - Is this the New Normal?

In this talk, Nainesh discusses the pros and cons of this new way of working that we have all had...