How long does it take to transcribe a 1-hour podcast?

Processing time depends on the quality mode. In Fast mode, a 1-hour podcast takes roughly 10–15 minutes. In Best quality mode, expect 20–40 minutes. The actual time also depends on audio complexity — a single clear speaker is faster than a multi-guest discussion with background music. A typical 1-hour MP3 at 128 kbps is about 57 MB, well within the 100 MB file limit.

Can I transcribe a podcast with multiple speakers?

Yes. The AI transcribes all speech in the recording regardless of how many speakers are present. However, the current tool does not label or separate individual speakers (no speaker diarization). The transcript will contain all spoken words in chronological order. You can use SRT or VTT format to get timestamps, which makes it easier to identify who said what when editing the transcript.

What podcast file formats are supported?

All common podcast formats are supported: MP3, WAV, FLAC, OGG, M4A, AAC, and WMA. If your podcast is distributed as a video (MP4, MKV, MOV, WebM), those formats work too — the tool extracts the audio track automatically. Maximum file size is 100 MB.

Should I use TXT, SRT, or VTT format for my transcript?

Use TXT if you plan to edit the transcript into a blog post or show notes — it gives you clean text without timestamp clutter. Use SRT if you are uploading the podcast as a video to YouTube and want captions. Use VTT for web-based podcast players that support captions. If unsure, start with TXT for the cleanest editing experience.

How accurate is AI podcast transcription?

Accuracy ranges from 85% to 95% depending on audio quality. Podcasts recorded with good microphones in quiet environments typically achieve 90–95% accuracy. Episodes with heavy background music, low-quality phone call guests, or strong accents may see lower accuracy. Using Best quality mode significantly improves results on challenging audio. You should always review and edit the transcript before publishing.

Is my podcast file stored after transcription?

No. Your uploaded podcast file and the generated transcript are automatically deleted from our servers within 2 hours. All uploads use encrypted HTTPS (256-bit SSL). We do not listen to, share, or use your audio for any purpose other than generating the transcript. No account or signup is required.

Transcribe Podcast to Text with AI

How to Transcribe a Podcast Episode

Transcribing a podcast with AI takes three steps. No software to install, no account to create — just upload and download.

Upload your episode

Go to the Speech to Text tool and drag your podcast file onto the upload area. MP3, M4A, WAV, OGG, FLAC, and video formats are all supported. Maximum file size is 100 MB.

Choose your settings

Select the output format: TXT for clean text (best for blog posts and show notes), SRT for timestamped subtitles (YouTube uploads), or VTT for web captions. Pick Best quality for important episodes with multiple speakers.

Download and edit

The AI processes your audio and delivers a downloadable transcript. Review the output, correct any errors, and repurpose it into show notes, articles, social posts, or newsletter content.

Why Transcribe Your Podcast?

Publishing audio alone means you are leaving a significant part of your potential audience and discoverability on the table. Here is why every podcast episode deserves a text transcript.

SEO and discoverability. Search engines cannot listen to audio. Google, Bing, and other search engines index text, not sound waves. Without a transcript, the valuable insights, expert opinions, and keyword-rich dialogue in your podcast are invisible to search engines. A published transcript turns every episode into a searchable, indexable page that can rank for dozens of long-tail keywords your listeners are searching for. Podcasters who publish transcripts consistently report 2–5x more organic search traffic to their episode pages.
Accessibility for deaf and hard-of-hearing listeners. Approximately 430 million people worldwide have disabling hearing loss. A text transcript makes your content accessible to deaf and hard-of-hearing audiences who cannot consume audio content. Beyond the moral case, accessibility also matters legally — organizations in many countries are required to provide text alternatives for audio content under laws like the ADA and the European Accessibility Act.
Content repurposing. A single podcast transcript is a content goldmine. Pull direct quotes for social media posts. Extract key sections for newsletter content. Expand interview answers into standalone blog articles. Create infographics from statistics mentioned in the episode. One 45-minute episode can yield a week's worth of social media content, two or three blog posts, and newsletter material — all without creating anything from scratch.
Searchability for your listeners. Regular listeners often want to revisit a specific tip, quote, or recommendation from a past episode. Without a transcript, they have to scrub through audio trying to find the right moment. A transcript lets them search with Ctrl+F and find exactly what they need in seconds. This improves listener satisfaction and keeps people coming back to your episode pages.

From Transcript to Blog Post

A raw transcript is not a blog post — it needs editing and restructuring to work as written content. Here is a practical workflow for turning your podcast transcript into a published article.

Clean up filler words. Remove verbal crutches: "um," "uh," "you know," "like," "so," "I mean," and repeated false starts. A 30-minute conversation typically contains 50–150 filler instances. Removing them transforms rambling speech into clear prose. Most text editors can find-and-replace the most common ones quickly.
Add headings and structure. Podcast conversations flow naturally from topic to topic, but readers need visual structure. Read through the transcript and identify 4–8 distinct topics or segments. Add H2 headings for major sections and H3 headings for subtopics. This makes the article scannable and improves SEO by signaling content structure to search engines.
Pull out key quotes. Identify the most insightful, surprising, or quotable statements from your guest or co-host. Format them as block quotes or callouts within the article. These quotes also make excellent social media posts — pair them with an audiogram or episode art for sharing on Twitter, LinkedIn, and Instagram.
Add links and context. Conversations reference books, tools, websites, people, and events that listeners understand from context but readers need links for. Go through the transcript and hyperlink every reference. Add brief context where a listener would have understood tone or emphasis that does not translate to text.
Optimize for SEO. Identify the primary keyword phrase the article should target (usually the episode topic). Include it naturally in the title, first paragraph, one or two H2 headings, and the meta description. Add a compelling introduction that was not part of the original conversation — podcast episodes often start with small talk that does not work as an article opener.

Tip: Do not try to preserve every word from the conversation. A good blog post based on a transcript should be 40–60% of the original word count. Cut tangents, repeated ideas, and exchanges that only make sense in the flow of live conversation.

Podcast Show Notes from Transcripts

Show notes are the companion page published alongside each podcast episode. They help listeners navigate the episode, find mentioned resources, and decide whether to press play. A transcript makes creating thorough show notes fast and straightforward.

Timestamps and topic markers. Use the SRT or VTT output to find the exact moment each topic begins. List the major segments with clickable timestamps (e.g., 02:15 — Why we switched to remote recording). Listeners who only care about one topic can jump directly to it. Most podcast hosting platforms support timestamp links in show notes.
Topic summaries. For each major segment, write a 1–2 sentence summary based on the transcript. This lets potential listeners scan the episode content before committing 45 minutes. Good summaries also give search engines more text to index, improving the episode page's discoverability.
Guest quotes and highlights. Pull the best 2–3 statements your guest made and feature them in the show notes. This gives your guest shareable content they can post on their own channels (driving referral traffic back to your episode) and gives readers a taste of the conversation quality.
Links mentioned in the episode. Search the transcript for every tool, book, article, person, or website mentioned during the conversation. List them with proper links in the show notes. Listeners frequently visit show notes specifically to find these links — making them easy to find increases your episode page's utility and return visits.

Handling Long Episodes

Podcast episodes often run 60–120 minutes. Longer recordings require a few adjustments to get the best transcription results.

Check your file size. The tool accepts files up to 100 MB. A 1-hour podcast in MP3 at 128 kbps is about 57 MB — well within the limit. Episodes at 192 kbps or higher, or in uncompressed WAV format, may exceed 100 MB. If your file is too large, convert it to MP3 at 128 kbps first (the transcription accuracy is the same, since the AI model processes audio at 16 kHz internally regardless of source quality).
Split into segments if needed. For episodes over 90 minutes or files approaching the size limit, consider splitting the audio into two parts. Most audio editors (Audacity, GarageBand, even online tools) can split an MP3 at a natural break point — a topic change or ad break. Transcribe each part separately and combine the text afterward.
Use Best quality for important episodes. The Best quality mode uses a larger AI model that handles long audio more accurately. It is especially important for episodes with multiple speakers, overlapping dialogue, or background music — all common in podcast recordings. The processing time is longer, but the accuracy improvement is worth it for episodes you plan to publish as written content.
Choose TXT format for editing. When your goal is a blog post, show notes, or newsletter content, use TXT output. It gives you clean, continuous text without timestamp markup that would need to be stripped during editing. TXT is faster to process and easier to paste into any text editor or CMS.
Choose SRT for YouTube uploads. If you publish your podcast as a video on YouTube, use SRT format. YouTube accepts SRT files directly as captions. Upload the SRT alongside your video, review the auto-synced captions in YouTube Studio, and your episode gets searchable subtitles — which YouTube uses for search ranking and recommendation algorithms.

Note: AI transcription accuracy averages 85–95%. Always review the transcript before publishing, especially for proper nouns (guest names, brand names, technical terms) which the AI may misspell or misinterpret. A 5-minute review pass catches most errors.

Transcribe Podcast to Text with AI

How to Transcribe a Podcast Episode

Upload your episode

Choose your settings

Download and edit

Why Transcribe Your Podcast?

From Transcript to Blog Post

Podcast Show Notes from Transcripts

Handling Long Episodes

Frequently Asked Questions

More Speech to Text Guides

Transcribe Podcast to Text with AI

How to Transcribe a Podcast Episode

Upload your episode

Choose your settings

Download and edit

Why Transcribe Your Podcast?

From Transcript to Blog Post

Podcast Show Notes from Transcripts

Handling Long Episodes

Frequently Asked Questions

More Speech to Text Guides

Request a Feature