No need to pay any subscription fees to any of the online transcription service, use OpenAI's Whisper (https://openai.com/index/whisper/), and run it locally on your computer to generate subtitles for all kinds of videos. This guide walks you through installing Python, Whisper, FFmpeg, and yt-dlp on macOS to transcribe videos into subtitles using OpenAI's Whisper. Optimized for M1/M2/M3 and Intel Macs, it includes setting up a virtual environment, downloading YouTube videos, and optimizing performance.
Step 1: Install Python
Check if Python is Installed
Open Terminal:
- Press
Cmd + Space
, typeTerminal
, and pressEnter
.
Type:
python3 --version
If you see a version like Python 3.10.6
, Python is installed. Ensure it’s 3.8–3.11. If it’s older (e.g., 3.7) or missing, proceed to install.
Note: macOS includes a system Python (e.g., /usr/bin/python3
). Don’t use it; we’ll install a separate version to avoid conflicts.
Install Python (if needed)
Download Python from python.org:
- Visit python.org.
- Choose Python 3.10 or 3.11 (e.g., 3.10.9 is stable as of 2025). Click the macOS installer link for your chip (universal, Intel, or Apple Silicon).
Run the installer:
- Double-click the .pkg file (e.g.,
python-3.10.9-macos11.pkg
). - Follow the prompts. It installs Python to
/Applications/Python 3.10/
and addspython3
to your PATH.
Verify installation:
Open a new Terminal and type:
python3 --version
You should see the installed version (e.g., Python 3.10.9
).
Check pip (Python’s package manager):
pip3 --version
It should show a version tied to Python 3.10 or 3.11 (e.g., pip 23.2 from ...
).
Update PATH (if needed)
If python3
isn’t found, add Python to your PATH:
Open Terminal and edit your shell profile:
nano ~/.zshrc
(macOS uses Zsh by default since Catalina; if using Bash, edit ~/.bash_profile
).
Add this line at the end:
export PATH="/Library/Frameworks/Python.framework/Versions/3.10/bin:$PATH"
Replace 3.10
with your version (e.g., 3.11
).
Save: Press Ctrl + O
, Enter
, then Ctrl + X
.
Apply:
source ~/.zshrc
Verify: python3 --version
should now work.
Step 2: Set Up a Project Folder and Virtual Environment
A virtual environment isolates Whisper’s dependencies, keeping your system clean.
Create a Project Folder
In Terminal, make a folder for your project:
mkdir ~/whisper_project
cd ~/whisper_project
This creates whisper_project
in your home directory (/Users/yourusername/
).
Set Up a Virtual Environment
Create the environment:
python3 -m venv venv
Activate it:
source venv/bin/activate
Your prompt should change (e.g., (venv) yourusername@MacBook-Pro whisper_project %
), showing the environment is active.
Note: Always activate the environment when working on this project. If you close Terminal, reopen it, navigate to ~/whisper_project
, and run source venv/bin/activate
again.
Deactivate (Later):
When done, type:
deactivate
to exit the virtual environment.
Step 3: Install Whisper
Install Whisper:
With the virtual environment activated, install Whisper:
pip3 install -U openai-whisper
This downloads Whisper and dependencies (e.g., PyTorch, ~1–2GB). It may take 2–5 minutes, depending on your internet speed.
On Apple Silicon (M1/M2/M3), PyTorch automatically uses Metal Performance Shaders (MPS) for acceleration if available.
Verify Installation:
Test Whisper:
whisper --help
You should see a help message listing command options (e.g., --model
, --language
). If you get “command not found,” ensure the virtual environment is active and reinstall Whisper.
Step 4: Install FFmpeg
Whisper needs FFmpeg to extract audio from video files (e.g., MP4 to WAV for transcription).
Install Homebrew (if not installed)
Homebrew is a package manager for macOS, making FFmpeg installation easy.
Check if Homebrew is installed:
brew --version
If it’s missing, install it:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Follow the prompts (may require your password). It installs to /usr/local/bin
(Intel) or /opt/homebrew/bin
(Apple Silicon).
Add Homebrew to PATH (if needed)
Add Homebrew to PATH if needed:
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zshrc
source ~/.zshrc
Install FFmpeg
With Homebrew installed, run:
brew install ffmpeg
This downloads and installs FFmpeg (~100–200MB, 1–3 minutes).
Verify:
ffmpeg -version
You should see FFmpeg’s version (e.g., ffmpeg version 6.0
). If not, ensure Homebrew’s PATH is set correctly.
Step 5 (Optional): Install macOS Certificates for Python
Why: Python 3.11 installed from python.org doesn’t automatically use macOS’s system certificate store (unlike the pre-installed /usr/bin/python3
). This can cause SSL verification to fail if certificates aren’t manually set up.
How:
- Install the certifi package, which provides a trusted certificate bundle:
source ~/whisper_project/venv/bin/activate
pip3 install certifi
- Run the Install Certificates.command script that came with your Python 3.11 installation:
- Open Finder, go to
/Applications/Python 3.11/
. - Double-click
Install Certificates.command
. It runs in Terminal and updates Python’s SSL certificates.
You’ll see output like:
-- pip install --upgrade certifi --
Successfully installed certifi-2023.x.x
-- removing any existing file or link --
-- creating symlink to certifi certificate bundle --
-- setting permissions --
-- update complete --
Step 6: Prepare Your Video File
You need a local video file for Whisper to transcribe. If your video is a YouTube URL, we’ll download it.
If You Have a Local Video
Copy your video (e.g., my_video.mp4
) to the whisper_project
folder:
cp /path/to/my_video.mp4 ~/whisper_project/
Example: If it’s in Downloads, use:
cp ~/Downloads/my_video.mp4 ~/whisper_project/
Supported formats: MP4, MOV, AVI, MKV, etc.
If Downloading from YouTube
Install yt-dlp:
pip3 install yt-dlp
Ensure the virtual environment is active.
Download the video:
yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best" "https://www.youtube.com/watch?v=VIDEO_ID" -o video.mp4
Replace VIDEO_ID
with the video’s ID (e.g., K3CR6RiWS2U
for the Nintendo Switch 2 video).
Example:
yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best" "https://www.youtube.com/watch?v=K3CR6RiWS2U" -o nintendo_switch_2.mp4
This saves nintendo_switch_2.mp4
in ~/whisper_project
.
Note: Only download videos you have permission to use (e.g., for personal use or fair use under copyright law). YouTube’s terms prohibit unauthorized redistribution.
Step 7: Generate Subtitles with Whisper
Now, we’ll transcribe the video’s audio into subtitles.
Run Whisper:
Ensure you’re in the whisper_project
folder with the virtual environment activated:
cd ~/whisper_project
source venv/bin/activate
Run Whisper on your video:
whisper video.mp4 --model medium --language en --output_format srt
Replace video.mp4
with your video’s filename (e.g., nintendo_switch_2.mp4
).
--model medium
: Uses the medium model for a good balance of accuracy and speed. Options:tiny
: Fastest, least accurate.base
: Fast, decent accuracy.small
: Balanced.medium
: Recommended for quality.large
: Most accurate, slowest.
--language en
: Assumes English audio. For other languages, use codes likees
(Spanish),fr
(French), or omit to auto-detect.--output_format srt
: Outputs subtitles in SRT format (timed text). Alternatives:txt
(plain text),vtt
(WebVTT).
Example for the Nintendo Switch 2 video:
whisper nintendo_switch_2.mp4 --model medium --language en --output_format srt
What Happens:
- Whisper uses FFmpeg to extract audio from the video.
- It transcribes the audio into text, segmenting it into subtitle entries with timestamps.
- It saves
video.srt
(e.g.,nintendo_switch_2.srt
) in~/whisper_project
.
Processing time depends on your Mac:
- M1/M2/M3 Mac: ~5–10 minutes for a 10-minute video (medium model) with MPS acceleration.
- Intel Mac: ~20–30 minutes for a 10-minute video (CPU only).
- Longer videos or larger models (e.g.,
large
) take proportionally longer.
Example SRT Output:
Open video.srt
with TextEdit:
open -a TextEdit video.srt
You’ll see something like:
1
00:00:00,000 --> 00:00:02,500
Yo, MKBHD here.
2
00:00:02,501 --> 00:00:05,000
Welcome to your first look and hands-on with the Nintendo Switch 2.
Each entry has a number, timestamp, and text, ideal for subtitles or article writing.
Step 8: Verify and Edit Subtitles
Check the SRT File:
Open video.srt
in TextEdit or a code editor like Visual Studio Code (free, code.visualstudio.com).
Read through to ensure accuracy. Whisper is ~90–95% accurate for clear English audio but may misinterpret names (e.g., “MKBHD” as “MKV HD”) or technical terms (e.g., “Joy-Con”).
Edit if Needed:
- Fix errors in TextEdit (e.g., change “Nintendo Switch too” to “Nintendo Switch 2”).
- For easier editing, use Aegisub (a free subtitle editor):
- Download from aegisub.org (~30MB).
- Open Aegisub, load
video.srt
(File > Open Subtitles), and drag your video file into the video pane. - Play the video to check timing and edit text/timestamps as needed.
- Save the corrected SRT (File > Save Subtitles).
- Install Aegisub via Homebrew (optional):
brew install aegisub
Convert to TXT (Optional):
If you want plain text for an article, run Whisper with TXT output:
whisper video.mp4 --model medium --language en --output_format txt
This creates video.txt
without timestamps (e.g., “Yo, MKBHD here. Welcome to your first look…”).
Alternatively, manually strip timestamps from video.srt
:
- Open in TextEdit, copy only the text lines (exclude numbers and timestamps), and paste into a new file (e.g.,
subtitles.txt
).
Step 9: Optimizing Performance on macOS
Apple Silicon (M1/M2/M3)
Whisper automatically uses Metal Performance Shaders (MPS) on Apple Silicon for faster processing.
To confirm MPS is active:
python3 -c "import torch; print(torch.backends.mps.is_available())"
Should return True
. If not, ensure PyTorch is updated:
pip3 install --upgrade torch torchvision torchaudio
MPS makes medium model ~2–3x faster than Intel CPUs (e.g., 5–10 minutes for a 10-minute video vs. 20–30).
Choose Model Size
Start with medium
for quality and speed. Use large
for complex audio (e.g., accents, background noise) but expect ~2x longer processing. Use small
or base
for quick tests or clear audio.
Manage Memory
Whisper’s medium model uses ~2–4GB RAM. If your Mac has low RAM (e.g., 8GB), close other apps or use small
model.
For long videos, split them with FFmpeg:
ffmpeg -i video.mp4 -c copy -map 0 -segment_time 600 -f segment chunk_%03d.mp4
This creates 10-minute chunks (e.g., chunk_000.mp4
). Transcribe each separately.
Batch Processing
Transcribe multiple videos:
for video in *.mp4; do whisper "$video" --model medium --language en --output_format srt; done
Save Disk Space
Delete the video file after transcription if not needed:
rm video.mp4
Step 10: Expected Performance on Mac
M1/M2/M3 Mac (e.g., MacBook Air M1, 8GB RAM):
- ~5–10 minutes for a 10-minute video (
medium
model). - ~10–20 minutes for
large
model.
Intel Mac (e.g., 2019 MacBook Pro, 4-core i5):
- ~20–30 minutes for a 10-minute video (
medium
model). - ~40–60 minutes for
large
model.
Accuracy: ~90–95% for clear English audio (e.g., YouTube tech reviews). Lower for noisy audio, accents, or overlapping speakers. Manual editing fixes most errors.
Output Size: SRT files are small (~10–100KB for a 10-minute video); TXT is similar.
Comments
Post a Comment