Analyzing ESPN FC Daily Transcripts with Databricks

6 min readMay 24, 2023

I spend countless hours every weekend watching football (soccer). As if that isn’t enough, I also spend an embarrassingly long time watching analysis on ESPN FC. As someone who watches ESPN Daily show daily, I wondered how many times they end up discussing the GOAT (Greatest Of All Time) debate, Lionel Messi and other topics.

(Here’s the YouTube channel)

Being a data engineer at heart, I naturally decided to build a data pipeline in Databricks to analyze these YouTube videos and get the answer.

Project Goals

The objectives of this little project were:

Retrieve all transcripts using the YouTube API
Load the transcripts from ESPN FC channel’s Daily playlist into Databricks Delta tables
Utilize HuggingFace transformers to summarize the transcript and extract entities
Determine how many times the GOAT debate was discussed
Test with the following Large Language Models (LLMs): Dolly & GPT 3.5

Another personal goal was to spend less time watching content and more time playing FIFA 23 on my Xbox!

Process Overview

Let’s go over the steps involved:

1. Prerequisites and Setup

Before starting, ensure you have:

Access to a Databricks Workspace

Analyzing ESPN FC Daily Transcripts with Databricks

Project Goals

Process Overview

1. Prerequisites and Setup

Written by Rohit Bhagwat