Data Versioning Explained the Open Source Way


Duration: 25 mins
Einat Orr
Co-founder & CEO, Treeverse

The demand for better versioning of data is growing. There are a plethora of open source projects providing tools for managing data using the best practices we learned from managing code.

In this talk we will go over the difference between these solutions by clustering them according to 4 main use cases: Collaboration over data, Managing ML pipelines, the need for mutability and ACID guarantees over an object storage data lake.

By the end of the talk, you should have a good understanding of how these solutions compare and which you should choose for different types of use cases.

You may also be interested in

50 mins
Mental Bookmarks and the Fractal Nature of Success

Good discussions are supposed to diverge from their intended path. Free association is a feature, not a bug, and helps...

25 mins
Writing Professionally

The most important thing you do in your job is write. It's in every email you send, every commit you...

50 mins
Application Security from the Inside Out

This talk tells the story of the implementation of an application security program in an agile, polyglot, cloud-first organisation. With fast-moving...

25 mins
Remote Working - Is this the New Normal?

In this talk, Nainesh discusses the pros and cons of this new way of working that we have all had...

25 mins
Pull My Code: Effective Code Review

We need to talk about code reviews. Having a strong, effective code review process is the key-stone of quality, culture, learning...

25 mins
Sailing through Digital Transformation

The current health crisis is rapidly reshaping, for the better, what was already a key agenda for organizations in 2020...