Skip to main content

STT (Speech-to-Text)

Definition

AI technology that converts spoken audio into written text. Also called ASR (Automatic Speech Recognition).

Why It Matters

STT (also called ASR, Automatic Speech Recognition) is the gateway for every workflow that starts with spoken input: meeting notes, podcast transcripts, voicemail-to-text, voice-controlled apps, accessibility captions. Modern STT is near-human accurate on clean audio and degrades gracefully on noise.

Key Points

  • Whisper large-v3: 1.5B parameters, 99-language support. WER ~3 % on English broadcast audio (released November 2023).
  • Word Error Rate (WER) = (substitutions + deletions + insertions) / total reference words. Below 10 % is usable; below 5 % is good.
  • Speaker diarisation (who said what, when) is a separate model, pyannote.audio is the most widely used open-source option.
  • Real-time STT requires streaming models; Whisper processes audio in fixed chunks and is not natively streaming. Faster-Whisper and Parakeet are streaming-capable alternatives.
  • Output formats: plain text, SRT subtitles, VTT subtitles, JSON with per-word timestamps and confidence scores.

Example

Whisper large-v3 transcribes a 1-hour meeting in ~5 minutes on a single A100 with speaker diarisation. Word error rate on conference-room audio runs 3–8 %; on phone-call audio, 8–15 %. Output can be plain text, SRT subtitles, or JSON with timestamps.

Common Misconception

Benchmark transcription accuracy numbers are always measured on clean, specific-domain audio recordings. Accent, background noise, overlapping speakers, and technical vocabulary (medical, legal, financial) each independently degrade WER by 5–20 percentage points from the published benchmark figure.

Related Terms

  • TTS (Text-to-Speech)AI technology that converts written text into natural-sounding spoken audio.
  • Computer VisionAI that can understand and analyze images and video content.
  • Multimodal AIAI models that can process multiple types of input, text, images, audio, video.

STT (Speech-to-Text) on Rewind.ai

Rewind.ai's transcribe tool runs Whisper large-v3 + Parakeet for speed–quality trade-offs. Upload audio; output is plain text, SRT, VTT, or timestamped JSON.

Explore the Tools

Quick Facts

TermSTT (Speech-to-Text)
RelatedTTS (Text-to-Speech), Computer Vision, Multimodal AI

Browse Glossary

View All AI Terms

FAQ

STT (Speech-to-Text) on Rewind.ai is a free AI tool. There's no charge and no sign up needed to start.

Yes. You get 2,500 free tokens per day to use STT (Speech-to-Text) and every other tool on Rewind.ai. A free account raises that to 5,000 tokens/day. You can buy more starting at $1.

STT (Speech-to-Text) runs open-source AI models on our GPU servers. Send your request and the result comes back in seconds.

No. You can use STT (Speech-to-Text) right away without signing up. A free account doubles your daily usage to 5,000 tokens and saves your history.

Anonymous users get 2,500 tokens/day. Free accounts get 5,000 tokens/day. Tokens reset every 24 hours. Each generation costs ~100-5,000 tokens depending on the operation.

Your data is processed on our servers and isn't stored permanently unless you choose to save it. We don't sell or share it.

Yes. Content from STT (Speech-to-Text) is yours to use for personal or commercial work. The AI models we run are commercially licensed.

STT (Speech-to-Text) matches the quality of paid services because it runs the latest open-source AI models. The difference is you don't pay per use.

STT (Speech-to-Text) runs open-source AI models including Qwen 2.5, FLUX and Whisper. We update to newer models as they ship.

Yes. STT (Speech-to-Text) works in any mobile browser, and the layout adapts to your screen size.

Sign up for a free account to get 5,000 tokens/day, double the anonymous limit. Or buy token packs starting at $5 for 200,000 tokens. See /pricing/ for all options.

Yes. After you generate content, you can download it, copy it, or share it via a unique link. Signed-in users can also view their generation history.

Love Rewind.ai? Tell your friends!

Rate this page