Podcast Search & Discovery

From startup to scale: building podcast discovery systems at Audiosear.ch/Pop Up Archive, then scaling them to millions of users at Apple Podcasts.

Audiosear.ch / Pop Up Archive (2016–2017)

Founding ML Researcher · Oakland, CA

I led ML at a podcast discovery startup focused on making audio searchable and discoverable. The core challenge: how do you help listeners find podcasts when there are hundreds of thousands of shows and most discovery happens through word-of-mouth?

My approach was to use unsupervised topic modeling (LDA) on podcast transcripts to automatically cluster and organize content. This extracted 80+ coherent topics from 40k+ podcasts, enabling topic-based browsing and content-based recommendations that could surface the long-tail of great podcast content.

Interactive Visualizations

Episodes by Topic: Each dot is a podcast episode, colored and clustered by its dominant topic.

Shows by Topic: Each dot is a podcast show, positioned by its topic distribution.

Comparing Topics: Explore relationships between different topic clusters.

Read the full blog post →

Apple Podcasts (2017–2021)

Senior Machine Learning Engineer · San Francisco, CA

Pop Up Archive was acquired by Apple in 2017. At Apple, I worked on search and recommendations for Apple Podcasts, scaling the ideas from the startup to millions of users:

  • Launched topic-based semantic search and content classification across millions of podcasts
  • Built topic-based recommendation carousels for personalized content discovery
  • Built toxic content classifiers using BERT text embeddings
  • Trained seq2seq models for punctuation prediction on ASR output
  • Built audio classification and embedding pipelines

← Back to Home