Vector Podcast

by Dmitry Kan

Vector Podcast is here to bring you the depth and breadth of Search Engine Technology, Product, Marketing, Business. In the podcast we talk with engineers, entrepreneurs, thinkers and tinkerers, who put their soul into search. Depending on your interest, you should find a matching topic for you -- whether it is deep algorithmic aspect of search engines and information retrieval field, or examples of products offering deep tech ...   ...  Read more

Podcast episodes

  • Season 3

  • Saurabh Rai - Growing Resume Matcher

    Saurabh Rai - Growing Resume Matcher

    Topics: 00:00 Intro - how do you like our new design? 00:52 Greets 01:55 Saurabh's background 03:04 Resume Matcher: 4.5K stars, 800 community members, 1.5K forks 04:11 How did you grow the project? 05:42 Target audience and how to use Resume Matcher 09:00 How did you attract so many contributors? 12:47 Architecture aspects 15:10 Cloud or not 16:12 Challenges in maintaining OS projects 17:56 Developer marketing with Swirl AI Connect 21:13 What you (listener) can help with 22:52 What drives you? Show notes: - Resume Matcher: https://github.com/srbhr/Resume-Matcher website: https://resumematcher.fyi/ - Ultimate CV by Martin John Yate: https://www.amazon.com/Ultimate-CV-Cr... - fastembed: https://github.com/qdrant/fastembed - Swirl: https://github.com/swirlai/swirl-search

  • Season 2

  • Sid Probstein - Creator of SWIRL - Search in siloed data with LLMs

    Sid Probstein - Creator of SWIRL - Search in siloed data with LLMs

    Topics: 00:00 Intro 00:22 Quick demo of SWIRL on the summary transcript of this episode 01:29 Sid’s background 08:50 Enterprise vs Federated search 17:48 How vector search covers for missing folksonomy in enterprise data 26:07 Relevancy from vector search standpoint 31:58 How ChatGPT improves programmer’s productivity 32:57 Demo! 45:23 Google PSE 53:10 Ideal user of SWIRL 57:22 Where SWIRL sits architecturally 1:01:46 How to evolve SWIRL with domain expertise 1:04:59 Reasons to go open source 1:10:54 How SWIRL and Sid interact with ChatGPT 1:23:22 The magical question of WHY 1:27:58 Sid’s announcements to the community YouTube version: https://www.youtube.com/watch?v=vhQ5LM5pK_Y Design by Saurabh Rai: https://twitter.com/_srbhr_ Check out his Resume Matcher project: https://www.resumematcher.fyi/

  • Atita Arora - Search Relevance Consultant - Revolutionizing E-commerce with Vector Search

    Atita Arora - Search Relevance Consultant - Revolutionizing E-commerce with Vector Search

    Topics: 00:00 Intro 02:20 Atita’s path into search engineering 09:00 When it’s time to contribute to open source 12:08 Taking management role vs software development 14:36 Knowing what you like (and coming up with a Solr course) 19:16 Read the source code (and cook) 23:32 Open Bistro Innovations Lab and moving to Germany 26:04 Affinity to Search world and working as a Search Relevance Consultant 28:39 Bringing vector search to Chorus and Querqy 34:09 What Atita learnt from Eric Pugh’s approach to improving Quepid 36:53 Making vector search with Solr & Elasticsearch accessible through tooling and documentation 41:09 Demystifying data embedding for clients (and for Java based search engines) 43:10 Shifting away from generic to domain-specific in search+vector saga 46:06 Hybrid search: where it will be useful to combine keyword with semantic search 50:53 Choosing between new vector DBs and “old” keyword engines 58:35 Women of Search 1:14:03 Important (and friendly) People of Open Source 1:22:38 Reinforcement learning applied to our careers 1:26:57 The magical question of WHY 1:29:26 Announcements See show notes on YouTube: https://www.youtube.com/watch?v=BVM6TUSfn3E

  • Connor Shorten - Research Scientist, Weaviate - ChatGPT, LLMs, Form vs Meaning

    Connor Shorten - Research Scientist, Weaviate - ChatGPT, LLMs, Form vs Meaning

    Topics: 00:00 Intro 01:54 Things Connor learnt in the past year that changed his perception of Vector Search 02:42 Is search becoming conversational? 05:46 Connor asks Dmitry: How Large Language Models will change Search? 08:39 Vector Search Pyramid 09:53 Large models, data, Form vs Meaning and octopus underneath the ocean 13:25 Examples of getting help from ChatGPT and how it compares to web search today 18:32 Classical search engines with URLs for verification vs ChatGPT-style answers 20:15 Hybrid search: keywords + semantic retrieval 23:12 Connor asks Dmitry about his experience with sparse retrieval 28:08 SPLADE vectors 34:10 OOD-DiskANN: handling the out-of-distribution queries, and nuances of sparse vs dense indexing and search 39:54 Ways to debug a query case in dense retrieval (spoiler: it is a challenge!) 44:47 Intricacies of teaching ML models to understand your data and re-vectorization 49:23 Local IDF vs global IDF and how dense search can approach this issue 54:00 Realtime index 59:01 Natural language to SQL 1:04:47 Turning text into a causal DAG 1:10:41 Engineering and Research as two highly intelligent disciplines 1:18:34 Podcast search 1:25:24 Ref2Vec for recommender systems 1:29:48 Announcements For Show Notes, please check out the YouTube episode below. This episode on YouTube: https://www.youtube.com/watch?v=2Q-7taLZ374 Podcast design: Saurabh Rai: https://twitter.com/srvbhr

  • Evgeniya Sukhodolskaya - Data Advocate, Toloka - Data at the core of all the cool ML

    Evgeniya Sukhodolskaya - Data Advocate, Toloka - Data at the core of all the cool ML

    Toloka’s support for Academia: grants and educator partnerships https://toloka.ai/collaboration-with-educators-form https://toloka.ai/research-grants-form These are pages leading to them: https://toloka.ai/academy/education-partnerships https://toloka.ai/grants Topics: 00:00 Intro 01:25 Jenny’s path from graduating in ML to a Data Advocate role 07:50 What goes into the labeling process with Toloka 11:27 How to prepare data for labeling and design tasks 16:01 Jenny’s take on why Relevancy needs more data in addition to clicks in Search 18:23 Dmitry plays the Devil’s Advocate for a moment 22:41 Implicit signals vs user behavior and offline A/B testing 26:54 Dmitry goes back to advocating for good search practices 27:42 Flower search as a concrete example of labeling for relevancy 39:12 NDCG, ERR as ranking quality metrics 44:27 Cross-annotator agreement, perfect list for NDCG and Aggregations 47:17 On measuring and ensuring the quality of annotators with honeypots 54:48 Deep-dive into aggregations 59:55 Bias in data, SERP, labeling and A/B tests 1:16:10 Is unbiased data attainable? 1:23:20 Announcements This episode on YouTube: https://youtu.be/Xsw9vPFqGf4 Podcast design: Saurabh Rai: https://twitter.com/srvbhr