Word Embeddings - An Alternative and Efficient Approach to Search for Documents


Duration: 50 mins
Ananth Gundabattula
Senior Architect, Commonwealth Bank of Australia

Searching for documents in a collection is typically implemented via a TF/IDF principle in open source document search engines. However recent developments in the field of NLP has shown positive results in representing text into more concise vector representations as opposed to a bag of words construct. In addition to this, these approaches also add richness to the information models like taking care of analogies and semantics of the words. This talk would walk through an end to end data workflow to enable such a construct.

The first part of the session would describe the typical flow of how a search query is processed by default in any of the lucene powered search engines today. The concept of TF/IDF is also introduced in this part of the session.

The session then proceeds to describe the concept of word embeddings using a library like Facebooks fasttext.

Subsequently, a representative data pipeline is discussed as to how an incoming stream of data can be turned into vector representations and made amenable for searching with a few seconds of turn around time.

The session would close with a few references to the more recent developments in this space.

You may also be interested in

25 mins
Remote Working - Is this the New Normal?

In this talk, Nainesh discusses the pros and cons of this new way of working that we have all had...

25 mins
Pull My Code: Effective Code Review

We need to talk about code reviews. Having a strong, effective code review process is the key-stone of quality, culture, learning...

50 mins
Do You Know Da Wae

We build development teams based on individual ability to write code but development of a software project of any significance...

50 mins
Effecting Change—The Art Of Leading Teams

As leaders we want our teams to pursue great ideas and change directions to realize the goals. However, often we...

50 mins
Imposter Syndrome: Overcoming Self-Doubt in Success

Impostor Syndrome is the domain of the high achiever. Those who set the bar low are rarely it’s victim. What...

50 mins
Building Antifragile Teams

Antifragile systems thrive under stress and through failure. How can we help our teams – systems made up of people...