logo
Published on

Building an AI YouTube Video Summarizer

Read in: 한국어
Authors

1. Digest a 1-Hour Video in 3 Minutes

YouTube is full of great lectures, interviews, and tech talks, but there is never enough time to watch them all. What if you had a tool that extracts just the key points from any URL?

YouTube summary overview

2. How It Works

How it works
1. Extract subtitles — Pull the YouTube captions (CC) as text
2. AI analysis — Send the long subtitle text to AI for key point extraction
3. Generate summary — Output key points, timestamps, and a one-line summary

3. Step 1: Extracting Subtitles

Here is how to get subtitles from a YouTube video:

# Download subtitles with yt-dlp
pip install yt-dlp

# Extract including auto-generated subtitles
yt-dlp --write-auto-sub --sub-lang ko,en --skip-download \
  --sub-format vtt -o "subtitle" "https://youtube.com/watch?v=VIDEO_ID"

Or you can fetch them directly in Python:

from youtube_transcript_api import YouTubeTranscriptApi

transcript = YouTubeTranscriptApi.get_transcript("VIDEO_ID", languages=['ko', 'en'])
text = " ".join([t['text'] for t in transcript])

For videos without subtitles, you can convert speech to text using Whisper:

# Download audio then convert with Whisper
yt-dlp -x --audio-format mp3 "https://youtube.com/watch?v=VIDEO_ID"
whisper audio.mp3 --language ko --model medium

4. Step 2: AI Summarization

Send the extracted subtitles to AI for summarization:

const prompt = `The following is a transcript from a YouTube video. Please summarize the key content.

Summary format:
1. One-line summary (1 sentence)
2. Key points (5-7 bullet points)
3. Key timestamps (time markers for important sections)
4. Conclusion/core message

Transcript:
${transcriptText}`;

Handling Long Videos

Subtitles from a 1-hour video can be tens of thousands of characters. To handle AI context limits:

  • Chunk splitting: Split into 10-minute segments, summarize each, then create an overall summary
  • Long-context models: Claude (200K tokens) or Gemini (1M tokens) can process most videos in a single pass

5. Step 3: Output Format

Markdown Summary

# Video Summary: "React 19 New Features Overview"

## One-Line Summary
React 19 makes Server Components the default, adding the use() hook and Actions

## Key Points
- Server Components adopted as the default architecture
- use() hook enables direct use of promises and context
- Actions simplify form handling
- Automatic memoization (React Compiler)
- Document Metadata managed directly from components

## Timestamps
- 00:00 Intro
- 03:25 Server Components explained
- 15:40 use() hook demo
- 28:10 Actions and form handling
- 42:00 React Compiler

6. Using It in Claude Code

You can register it as an MCP server in Claude Code or run it as a simple script:

# Usage example
node summarize.js "https://youtube.com/watch?v=VIDEO_ID"

# Or directly in Claude Code
"Summarize this YouTube video: https://youtube.com/watch?v=..."

7. Summary

StepToolRole
Subtitle extractionyt-dlp / youtube_transcript_apiVideo to text
Speech conversionWhisper (when no subtitles)Audio to text
AI summarizationClaude / Gemini APIText to summary
OutputMarkdownStructured summary document

With just a URL, you can grasp the key points of a 1-hour video in 3 minutes. Among the flood of daily content, you can filter for only what is truly worth watching.