Skip to main content

How to Transcribe Videos with Whisper on macOS: A Complete Guide

No need to pay any subscription fees to any of the online transcription service, use OpenAI's Whisper (https://openai.com/index/whisper/), and run it locally on your computer to generate subtitles for all kinds of videos. This guide walks you through installing Python, Whisper, FFmpeg, and yt-dlp on macOS to transcribe videos into subtitles using OpenAI's Whisper. Optimized for M1/M2/M3 and Intel Macs, it includes setting up a virtual environment, downloading YouTube videos, and optimizing performance.

Step 1: Install Python

Check if Python is Installed

Open Terminal:

  • Press Cmd + Space, type Terminal, and press Enter.

Type:

python3 --version

If you see a version like Python 3.10.6, Python is installed. Ensure it’s 3.8–3.11. If it’s older (e.g., 3.7) or missing, proceed to install.

Note: macOS includes a system Python (e.g., /usr/bin/python3). Don’t use it; we’ll install a separate version to avoid conflicts.

Install Python (if needed)

Download Python from python.org:

  • Visit python.org.
  • Choose Python 3.10 or 3.11 (e.g., 3.10.9 is stable as of 2025). Click the macOS installer link for your chip (universal, Intel, or Apple Silicon).

Run the installer:

  • Double-click the .pkg file (e.g., python-3.10.9-macos11.pkg).
  • Follow the prompts. It installs Python to /Applications/Python 3.10/ and adds python3 to your PATH.

Verify installation:

Open a new Terminal and type:

python3 --version

You should see the installed version (e.g., Python 3.10.9).

Check pip (Python’s package manager):

pip3 --version

It should show a version tied to Python 3.10 or 3.11 (e.g., pip 23.2 from ...).

Update PATH (if needed)

If python3 isn’t found, add Python to your PATH:

Open Terminal and edit your shell profile:

nano ~/.zshrc

(macOS uses Zsh by default since Catalina; if using Bash, edit ~/.bash_profile).

Add this line at the end:

export PATH="/Library/Frameworks/Python.framework/Versions/3.10/bin:$PATH"

Replace 3.10 with your version (e.g., 3.11).

Save: Press Ctrl + O, Enter, then Ctrl + X.

Apply:

source ~/.zshrc

Verify: python3 --version should now work.

Step 2: Set Up a Project Folder and Virtual Environment

A virtual environment isolates Whisper’s dependencies, keeping your system clean.

Create a Project Folder

In Terminal, make a folder for your project:

mkdir ~/whisper_project
cd ~/whisper_project

This creates whisper_project in your home directory (/Users/yourusername/).

Set Up a Virtual Environment

Create the environment:

python3 -m venv venv

Activate it:

source venv/bin/activate

Your prompt should change (e.g., (venv) yourusername@MacBook-Pro whisper_project %), showing the environment is active.

Note: Always activate the environment when working on this project. If you close Terminal, reopen it, navigate to ~/whisper_project, and run source venv/bin/activate again.

Deactivate (Later):

When done, type:

deactivate

to exit the virtual environment.

Step 3: Install Whisper

Install Whisper:

With the virtual environment activated, install Whisper:

pip3 install -U openai-whisper

This downloads Whisper and dependencies (e.g., PyTorch, ~1–2GB). It may take 2–5 minutes, depending on your internet speed.

On Apple Silicon (M1/M2/M3), PyTorch automatically uses Metal Performance Shaders (MPS) for acceleration if available.

Verify Installation:

Test Whisper:

whisper --help

You should see a help message listing command options (e.g., --model, --language). If you get “command not found,” ensure the virtual environment is active and reinstall Whisper.

Step 4: Install FFmpeg

Whisper needs FFmpeg to extract audio from video files (e.g., MP4 to WAV for transcription).

Install Homebrew (if not installed)

Homebrew is a package manager for macOS, making FFmpeg installation easy.

Check if Homebrew is installed:

brew --version

If it’s missing, install it:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Follow the prompts (may require your password). It installs to /usr/local/bin (Intel) or /opt/homebrew/bin (Apple Silicon).

Add Homebrew to PATH (if needed)

Add Homebrew to PATH if needed:

echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zshrc
source ~/.zshrc

Install FFmpeg

With Homebrew installed, run:

brew install ffmpeg

This downloads and installs FFmpeg (~100–200MB, 1–3 minutes).

Verify:

ffmpeg -version

You should see FFmpeg’s version (e.g., ffmpeg version 6.0). If not, ensure Homebrew’s PATH is set correctly.

Step 5 (Optional): Install macOS Certificates for Python

Why: Python 3.11 installed from python.org doesn’t automatically use macOS’s system certificate store (unlike the pre-installed /usr/bin/python3). This can cause SSL verification to fail if certificates aren’t manually set up.

How:

  • Install the certifi package, which provides a trusted certificate bundle:
source ~/whisper_project/venv/bin/activate
pip3 install certifi
  • Run the Install Certificates.command script that came with your Python 3.11 installation:
    • Open Finder, go to /Applications/Python 3.11/.
    • Double-click Install Certificates.command. It runs in Terminal and updates Python’s SSL certificates.

You’ll see output like:

-- pip install --upgrade certifi --
Successfully installed certifi-2023.x.x
-- removing any existing file or link --
-- creating symlink to certifi certificate bundle --
-- setting permissions --
-- update complete --

Step 6: Prepare Your Video File

You need a local video file for Whisper to transcribe. If your video is a YouTube URL, we’ll download it.

If You Have a Local Video

Copy your video (e.g., my_video.mp4) to the whisper_project folder:

cp /path/to/my_video.mp4 ~/whisper_project/

Example: If it’s in Downloads, use:

cp ~/Downloads/my_video.mp4 ~/whisper_project/

Supported formats: MP4, MOV, AVI, MKV, etc.

If Downloading from YouTube

Install yt-dlp:

pip3 install yt-dlp

Ensure the virtual environment is active.

Download the video:

yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best" "https://www.youtube.com/watch?v=VIDEO_ID" -o video.mp4

Replace VIDEO_ID with the video’s ID (e.g., K3CR6RiWS2U for the Nintendo Switch 2 video).

Example:

yt-dlp -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best" "https://www.youtube.com/watch?v=K3CR6RiWS2U" -o nintendo_switch_2.mp4

This saves nintendo_switch_2.mp4 in ~/whisper_project.

Note: Only download videos you have permission to use (e.g., for personal use or fair use under copyright law). YouTube’s terms prohibit unauthorized redistribution.

Step 7: Generate Subtitles with Whisper

Now, we’ll transcribe the video’s audio into subtitles.

Run Whisper:

Ensure you’re in the whisper_project folder with the virtual environment activated:

cd ~/whisper_project
source venv/bin/activate

Run Whisper on your video:

whisper video.mp4 --model medium --language en --output_format srt

Replace video.mp4 with your video’s filename (e.g., nintendo_switch_2.mp4).

  • --model medium: Uses the medium model for a good balance of accuracy and speed. Options:
    • tiny: Fastest, least accurate.
    • base: Fast, decent accuracy.
    • small: Balanced.
    • medium: Recommended for quality.
    • large: Most accurate, slowest.
  • --language en: Assumes English audio. For other languages, use codes like es (Spanish), fr (French), or omit to auto-detect.
  • --output_format srt: Outputs subtitles in SRT format (timed text). Alternatives: txt (plain text), vtt (WebVTT).

Example for the Nintendo Switch 2 video:

whisper nintendo_switch_2.mp4 --model medium --language en --output_format srt

What Happens:

  • Whisper uses FFmpeg to extract audio from the video.
  • It transcribes the audio into text, segmenting it into subtitle entries with timestamps.
  • It saves video.srt (e.g., nintendo_switch_2.srt) in ~/whisper_project.

Processing time depends on your Mac:

  • M1/M2/M3 Mac: ~5–10 minutes for a 10-minute video (medium model) with MPS acceleration.
  • Intel Mac: ~20–30 minutes for a 10-minute video (CPU only).
  • Longer videos or larger models (e.g., large) take proportionally longer.

Example SRT Output:

Open video.srt with TextEdit:

open -a TextEdit video.srt

You’ll see something like:

1
00:00:00,000 --> 00:00:02,500
Yo, MKBHD here.

2
00:00:02,501 --> 00:00:05,000
Welcome to your first look and hands-on with the Nintendo Switch 2.

Each entry has a number, timestamp, and text, ideal for subtitles or article writing.

Step 8: Verify and Edit Subtitles

Check the SRT File:

Open video.srt in TextEdit or a code editor like Visual Studio Code (free, code.visualstudio.com).

Read through to ensure accuracy. Whisper is ~90–95% accurate for clear English audio but may misinterpret names (e.g., “MKBHD” as “MKV HD”) or technical terms (e.g., “Joy-Con”).

Edit if Needed:

  • Fix errors in TextEdit (e.g., change “Nintendo Switch too” to “Nintendo Switch 2”).
  • For easier editing, use Aegisub (a free subtitle editor):
    • Download from aegisub.org (~30MB).
    • Open Aegisub, load video.srt (File > Open Subtitles), and drag your video file into the video pane.
    • Play the video to check timing and edit text/timestamps as needed.
    • Save the corrected SRT (File > Save Subtitles).
  • Install Aegisub via Homebrew (optional):
  • brew install aegisub

Convert to TXT (Optional):

If you want plain text for an article, run Whisper with TXT output:

whisper video.mp4 --model medium --language en --output_format txt

This creates video.txt without timestamps (e.g., “Yo, MKBHD here. Welcome to your first look…”).

Alternatively, manually strip timestamps from video.srt:

  • Open in TextEdit, copy only the text lines (exclude numbers and timestamps), and paste into a new file (e.g., subtitles.txt).

Step 9: Optimizing Performance on macOS

Apple Silicon (M1/M2/M3)

Whisper automatically uses Metal Performance Shaders (MPS) on Apple Silicon for faster processing.

To confirm MPS is active:

python3 -c "import torch; print(torch.backends.mps.is_available())"

Should return True. If not, ensure PyTorch is updated:

pip3 install --upgrade torch torchvision torchaudio

MPS makes medium model ~2–3x faster than Intel CPUs (e.g., 5–10 minutes for a 10-minute video vs. 20–30).

Choose Model Size

Start with medium for quality and speed. Use large for complex audio (e.g., accents, background noise) but expect ~2x longer processing. Use small or base for quick tests or clear audio.

Manage Memory

Whisper’s medium model uses ~2–4GB RAM. If your Mac has low RAM (e.g., 8GB), close other apps or use small model.

For long videos, split them with FFmpeg:

ffmpeg -i video.mp4 -c copy -map 0 -segment_time 600 -f segment chunk_%03d.mp4

This creates 10-minute chunks (e.g., chunk_000.mp4). Transcribe each separately.

Batch Processing

Transcribe multiple videos:

for video in *.mp4; do whisper "$video" --model medium --language en --output_format srt; done

Save Disk Space

Delete the video file after transcription if not needed:

rm video.mp4

Step 10: Expected Performance on Mac

M1/M2/M3 Mac (e.g., MacBook Air M1, 8GB RAM):

  • ~5–10 minutes for a 10-minute video (medium model).
  • ~10–20 minutes for large model.

Intel Mac (e.g., 2019 MacBook Pro, 4-core i5):

  • ~20–30 minutes for a 10-minute video (medium model).
  • ~40–60 minutes for large model.

Accuracy: ~90–95% for clear English audio (e.g., YouTube tech reviews). Lower for noisy audio, accents, or overlapping speakers. Manual editing fixes most errors.

Output Size: SRT files are small (~10–100KB for a 10-minute video); TXT is similar.

Comments

Popular posts from this blog

用“高强度”来改变人生:真正12分的努力是什么样子

1. 寻求外部解决方案是一个陷阱 无论是企业还是个人,当遇到问题、陷入困境时,大部分人会极力去向外部寻找解决问题的方法,比如研究新的策略,开发新项目、新功能,来为公司和自己寻找出路。 但往往答案是:他们需要停止寻找解药。每个人都会犯这种错误,比如,如果我变胖了,我会突然开始生酮饮食、古法饮食,我会开始从书本或者网络来寻找瘦身解药。其实这是一个陷阱: 认为问题的答案在某个地方  —— 也许是在导师那里,也许是书本里有,也许是他们缺乏某种知识,总之有什么其它秘诀在阻碍自己成功。但事实可能并非如此。如果我和十个人交谈,或许只有一个人真正需要改变策略。对于其它九个人来说,答案就是提升你的 强度水平 。 2. 彼得·蒂尔和他的单一重点理论 高强度有个明显的好处就是它具有传染性,你实际上可以创造一种高强度文化。彼得·蒂尔(Peter Thiel)管理PayPal的故事就很好地说明了这一点。他是PayPal的CEO之一,也是Facebook的首位投资者。他的特点是不像是那种经典的管理者那样,依靠严密的组织结构来运营公司。他使用一个非常简单的系统:公司里的每个人都需要找到一个最重要的优先事项。你脑子里应该只有一个最重要的事情。你只要做一件事,那是什么? 实施单一重点 让大家找一个最重要事项,不是一系列待办事项的清单,而是一个唯一的、最重要的事情。这是一个令人难以置信的强大策略。彼得·蒂尔让会让公司每个人都必须说出一个东西,如果你说出的自己的最重要事情很愚蠢,他会通过反问你“那是你的最重要的事情?” ,来让你意识到自己选择的重点很可笑。当每个人都选择了一个自己的唯一重点后,这只是完成了一半,后一半就是要坚定地执行这个自己选择的事情。 为什么选择单一重点如此重要?由于人性,我们会觉得自己手头的所有事情都很重要。彼得·蒂尔指出,假设你选择了三个重点事情,即使你把三件事按重要程度做了排序,但你也会发现你的第一个重点经常会没有一个明确的解决方案。它之所以是一个对你重要的问题,很可能它也是不那么容易解决的问题。所以,当我们有了3个重要的事情,一旦在最重要的问题中遇到了阻力,我们就会倾向于去做第二个重点任务,特别是如果它更清晰、更简单的话。第二重点我知道该怎么做,所以我倾向于先去做已知的事情。 巴菲特找钥匙 这就像沃伦·巴菲特(Warren Buffett)讲过的...

你没听到过的5个当世亿万富翁,神秘且别具魅力

当乔布斯、伊隆马斯克的故事已经被传的人尽皆知,当今世上,其实还有许多身世同样跌宕起伏,充满魅力的成功企业家。今天我们挑出5个故事最为特殊,个性最为突出的亿万富翁,他们虽然少被媒体提及,但每一个背后的故事都有着电影一般的情节,充满了戏剧性的阴谋、抵抗和创新。 5. 帕维尔·杜罗夫Pavel Durov:一个喜欢发自拍的亿万富翁,也是抗争俄罗斯政府的孤胆斗士 帕维尔·杜罗夫(Pavel Durov)是俄罗斯社交媒体革命的先驱之一,因为创建了两个开创性的科技公司——VKontakte(VK)和Telegram而闻名,同时他与俄罗斯政府的对抗也让他成为一个备受关注的人物,身家过100亿美元的杜罗夫,其长相和经历都很像一部谍战大片。 早期生活与VKontakte 1984年,杜罗夫出生于列宁格勒(现为圣彼得堡)。他在圣彼得堡国立大学读书时发现,俄罗斯缺乏一个本地化的社交网络平台,而当时Facebook在全球范围内正迅速崛起。杜罗夫敏锐地察觉到这个机会,决定创建一个类似于Facebook的社交网络,但更适合俄罗斯的本地需求。 2006年,杜罗夫推出了VKontakte(简称VK),也可以看作是面向俄罗斯的“山寨”Facebook。当时,世界各国都出现了不少效仿Facebook的社交网络,但是在一波一波商业角逐里,能真正站住脚跟,不被Facebook击垮的几乎没有。而只有杜罗夫的VK,最终让Facebook创始人马克·扎克伯格本人也承认,VK是唯一一个在自己的游戏中击败Facebook的社交网络。虽然VK的成功很大程度上归功于其快速复制Facebook功能的能力,但杜罗夫和他的团队几乎可以做到实时地将Facebook的每个新功能引入到VK中,这使得VK能够在短时间内迅速崛起,成为俄罗斯最受欢迎的社交网络。VK不仅拥有丰富的功能和用户友好的界面,而且它敏锐地洞察了俄罗斯本地市场的需求,并快速高效的予以执行。 抗争政府 在VK的运营过程中,杜罗夫与俄罗斯政府发生了激烈的冲突。2011年,俄罗斯政府要求杜罗夫提供一些反对政府的用户数据,特别是那些参与抗议活动的用户信息。杜罗夫坚决拒绝了这一要求,并公开发布了政府的请求信,配上一张穿着连帽衫的狗的照片,作为对政府无理要求的回应。 杜罗夫的这一举动激怒了俄罗斯政府,导致他的公寓被突袭,他被迫出售VK的股份并逃离俄罗斯。政府对杜罗...

$0 投入到年入百万,未来两年切实可行的5个创业点子 ——《Shark Tank》Sabri Suby 亲自给出建议

今天我们来看澳大利亚的营销专家Sabri Suby,分享的他在未来18个月内看到的五个商业机会。这些点子不仅适合新手创业者,甚者不需要任何资本,还可以帮助有经验的企业家扩展业务。本文将详细介绍这五个商机,并解释每个点子的商业逻辑和实施步骤。 谁是 Sabri Suby Sabri Suby是澳大利亚著名的营销专家,他创立了其中一个澳大利亚最大的增长代理公司。他也是《Sell Like Crazy》的作者,并在火爆全球的创业真人秀节目《Shark Tank》(创智赢家)的澳大利亚节目中担任评委。Sabri以其创新的营销策略和卓越的销售技巧闻名,他的经验和建议对许多企业家和营销人员来说都是宝贵的资源。 点子一:多渠道广告代理服务(Multi-Channel Ad Agency Service) 商业模式: 很多企业在某一个广告平台上有很成功的打广告的历史,但缺乏在其他平台上拓展的经验和资源。 提供多渠道广告代理服务,可以帮助企业在其他平台上取得成功,而不增加他们的风险。 如何操作: 找到在某个平台(如Google Ads)上有成功广告活动的企业。 联系该企业,并未其提供免费服务,帮助他们在另一平台(如Facebook Ads)上建立广告活动。 收取广告所带来收入的一部分作为佣金。 实施步骤: 研究和联系在单一平台上有成功广告的企业。 为他们量身定制广告策略和销售漏斗(Sales Funnel)。销售漏斗是一种通过一系列步骤将潜在客户转化为实际客户的营销方法。 提供免费试用,以实际效果为依据收费。 Sabri提到:“你可以对这些企业说,‘你们已经在Google Ads上取得了成功,但你们在Facebook Ads上没有任何活动。我将免费为你们建立销售漏斗和投放广告,只收取销售额的一部分作为佣金。’”这种无风险的提议使得企业几乎无法拒绝。 点子二:SaaS产品的联盟营销(SaaS Affiliate Marketing) 商业模式: 许多SaaS(软件即服务,Software as a Service)公司提供高额的推荐佣金,而很多企业不知道或是不会使用符合自己需求的SaaS服务。 学习使用SaaS软件,并研究为小企业借用SaaS进行优化 通过帮助小企业优化销售漏斗,可以有效降低他们的客户获取成本(CAC),同时赚取联盟...

“销售是改变人生最重要的能力”,Sabri Suby 的百万销售策略

  谁是 Sabri Suby Sabri Suby是澳大利亚著名的营销专家,他创立了其中一个澳大利亚最大的增长代理公司。他也是《Sell Like Crazy》的作者,并在火爆全球的创业真人秀节目《Shark Tank》(创智赢家)的澳大利亚节目中担任评委。Sabri以其创新的营销策略和卓越的销售技巧闻名,他的经验和建议对许多企业家和营销人员来说都是宝贵的资源。 1. 结合目标市场建立销售漏斗 在营销中,如何让目标市场主动回应是一个关键问题。Sabri建议通过提供有价值的信息和资源,吸引潜在客户的注意力。同时,他分享了他通过书籍《Sell Like Crazy》建立的成功销售漏斗。 具体操作步骤和示例: 提供有价值的内容 :通过博客、电子书、视频等形式,提供有价值的内容,吸引潜在客户。例如,你可以创建一份详细的行业报告,分析当前市场趋势和最佳实践。 创建引导页面 :创建专门的引导页面,收集潜在客户的信息,进行后续跟进。例如,你可以设计一个简洁明了的引导页面,让客户填写联系方式以获取免费的行业报告。 设计激励机制 :提供优惠、试用或其他激励机制,鼓励潜在客户主动回应。例如,你可以提供一个免费试用期,让客户亲自体验你的产品或服务。 后续销售 :通过电子邮件、电话等方式,向客户提供更多的产品和服务,实现后续销售。例如,通过发送定期的电子邮件新闻简报,向客户介绍新的产品和优惠。 实际案例: Sabri通过他的书《Sell Like Crazy》建立了一个非常成功的销售漏斗。他说,“我们已经售出了超过一百万册书,这一切都是通过我们的销售漏斗实现的。”他的策略是通过免费赠书活动吸引潜在客户的注意力,然后通过引导页面收集客户信息,并进行后续跟进,实现后续销售。 Yes, Yes, Yes, No漏斗 : Sabri提出的“Yes, Yes, Yes, No”漏斗是一种非常有效的销售策略。他解释说,“通过逐步引导客户同意你的观点,最后提出一个无法拒绝的提议。” 具体操作步骤和示例: 逐步引导 :通过一系列问题和观点,引导客户逐步同意你的观点。例如,在销售过程中,你可以先提出一些客户无法拒绝的观点,如“您是否希望提高销售额?”“您是否希望降低广告成本?”等。 建立信任 :通过提供有价值的信息和资源,建立客户的信任。例如,你可以分享一些成功...

How to use yt-dlp to download video or audio from YouTube, directly from your Mac

Whenever I need to download video/audio from Youtube, I just google search "download video from youtube", and randomly pick one result from the first page. Until today, I come across a 'thing', called yt-dlp . It can easily let me download any thing from Youtube, fast. 1. What is yt-dlp : I don't know exactly, but it can let me directly:  Downloads videos, audio, subtitles, or metadata from websites like YouTube. Why Use It? : Faster and more reliable than web-based downloaders, with precise control (e.g., audio-only, specific formats). It’s command-line but easy to learn for my Mac. Learn more at:  https://github.com/yt-dlp/yt-dlp 2. Prerequisites for Mac Before using yt-dlp , ensure these are installed: Homebrew (package manager):                /bin/bash -c " $( curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh ) If already installed, update:           brew update FFmpeg...

看图认识全球可食用蘑菇

蘑菇的作为真菌,在严谨学术里面分类繁复。但是在日常生活中,将蘑菇分为以下两大类是比较常用的方法,即简单易懂,又具有实用性。 第一类叫做 Nongilled Mushroom,就是没有褶皱(菌褶)的蘑菇。也就是看蘑菇伞头下面的有没有密密的褶皱。这类蘑菇大部分都没有毒。 而第二类 Gilled Mushroom 就是有菌褶的蘑菇。虽然常见的口菇/洋菇(white button mushroom)属于有菌褶蘑菇,但是大部分剧毒蘑菇都属于这一类,所以食用需谨慎。 一、无菌褶的蘑菇 Nongilled Mushroom Morel 羊肚菌 Truffles 松露 Chanterelles 鸡油菌 (*为什么鸡油菌属会属于无菌褶蘑菇*) Trumpets 喇叭菇           1. Black trumpets 黑喇叭菇           2. King Trumpets mushroom / King Oyster mushroom (Pleurotus eryngii)杏鲍菇 Tooth Fungi( 亚齿菌属 ):           1. Bear’s Head 猴头菇                2. Hedgehog(or Sweet Tooth)羊蹄菇 (刺猬/甜齿菇) Coral Fungi (珊瑚菌):           1. Sweet coral club 刷把菌,枝瑚菌,杵棒菌           2. Cauliflower mushroom 绣球菌,椰菜菌           3. Gamba Mushroom (Thelephora Gambajun) 干巴菌 ( 全世界只有云南发现出产) Boletes / Porcini 牛肝菌 (黑牛肝、黄牛肝、...