AI Voice-to-Text · Speak and Done · Works Offline

Speak Freely.
Write Perfectly.

Speak imperfectly,AI polishes it into professional text

Get Started Free View on GitHub

macOSApple Silicon

WindowsBeta

Manual Typing

45 wpm

Sumi

220 wpm

Speaking averages 150~200 wpm for most people. Typing averages 40~60.

What can you do with voice-to-text?

Dictation, meeting notes, voice editing, audio import,Sumi handles real work scenarios instantly.

Sumi is an AI voice-to-text tool supporting hotkey dictation, real-time meeting transcription with AI summary, voice editing, and audio file import,with automatic tone and format polishing.

Hotkey

One Press, Start Talking

Hold fn/Globe and start speaking. Sumi records, transcribes, and polishes,all in one flow.

Sumi Editor

Hey Alice, let's meet at the train station tomorrow... uh wait, actually, let's make it the main library.

fn / 🌐

Email That Writes Itself

Speak naturally, get a polished email. No editing needed.

“hey Alice um so something came up at work, I can't do lunch tomorrow, I've got meetings all afternoon,is Thursday good for you?”

Gmail- New Message

To[email protected]

SubjectLunch Reschedule

Hi Alice, I won't be able to make lunch tomorrow,something came up at work and I'll be in meetings all afternoon. Would Thursday work instead? Best

Send

Meeting

Meeting Mode

Start it before your call. Sumi transcribes in the background and saves everything to a note file. Nothing to babysit.

Meeting

Meeting 03-07 14:32

Today · ···

Weekly Sync

Mar 4 · 18m

Design Review

Mar 1 · 25m

Meeting 03-07 14:32

Alright, let's get started. Today's agenda,product roadmap and Q2 priorities.

Recording

File Import

Drop a Recording, Get a Transcript

Meeting recordings, interviews, podcast raw audio,drop it into Sumi and get a transcript with speaker identification.

Sumi

Drop audio file here

MP3 / M4A / WAV supported

AI Coding

Talk to Your AI Agent

Speak to Gemini, Claude Code, or Codex,no more switching keyboards between terminal windows.

Terminal

$ claude

> 🎤 "add error handling to the upload function"

I'll wrap the upload logic in a try-catch block and add proper error messages for network failures and file validation...

Gemini

Claude

OpenAI

Sumi vs. OpenAI Whisper

Accuracy and speed on Chinese speech recognition.

Chinese Accuracy (1 − CER)

higher is better

Sumi Cloud

96.7%

Sumi Local

94.7%

OpenAI Whisper

92.4%

* CER from public Chinese speech benchmarks. Sumi Local runs a model compressed to 30% of original size for on-device inference.

Local Processing Speed

2.2×

faster than OpenAI Whisper

Can privacy-sensitive professionals use voice-to-text?

Yes. Sumi's privacy mode runs speech recognition and AI polishing entirely on your device — no audio or text ever leaves your computer.

Lawyers, therapists, doctors, and accountants bound by confidentiality can use Sumi's privacy mode for voice-to-text — all processing happens on-device, nothing goes to the cloud.

Spending two hours writing up notes after every meeting

Lawyers drafting briefs, therapists writing session notes, doctors charting patient records, accountants preparing workpapers. Every day, someone stays late just to "finish writing up today's stuff."

Sumi transcribes and polishes in real time. When you're done talking, your notes are done too.

Want to use AI transcription, but afraid to upload to the cloud

Divorce settlements, trauma histories, psychiatric disclosures, tax strategies. All sent to a third-party server?

Sumi processes everything on your device. Audio never leaves the machine, never touches the network.

Not sure if your current tool is training AI on your data

Cloud services have long, vague privacy policies. Are your recordings being "used to improve service quality"? Nobody can tell you for sure.

Sumi is fully open source. You can audit every line of code. Your conversations never leave your device, so there's nothing to train on.

“Not just privacy law. Your professional ethics code.”

Which apps work with Sumi?

Every app where you can type. Sumi is a system-level tool,auto-formats as email in Gmail, bullet points in Notion, commands in your terminal.

Slack

VS Code

Gmail

Notion

Chrome

Safari

Discord

Slack

VS Code

Gmail

Notion

Chrome

Safari

Discord

Slack

VS Code

Gmail

Notion

Chrome

Safari

Discord

Figma

Arc

Teams

iTerm2

GitHub

Linear

Obsidian

Figma

Arc

Teams

iTerm2

GitHub

Linear

Obsidian

Figma

Arc

Teams

iTerm2

GitHub

Linear

Obsidian

LINE

Spotify

YouTube

Zoom

Trello

Evernote

LINE

Spotify

YouTube

Zoom

Trello

Evernote

LINE

Spotify

YouTube

Zoom

Trello

Evernote

...and every other app you use. If you can type in it, Sumi works there.

How does Sumi compare to other voice tools?

The only tool with dictation, meeting transcription, voice editing, and offline privacy — all in one.

Compared to Wispr Flow, VoiceInk, and SuperWhisper, Sumi is the only voice-to-text tool that combines AI polishing, meeting transcription, voice editing, privacy mode, and mixed Chinese-English recognition.

Feature	Sumi	Built-in Dictation	Wispr Flow	VoiceInk	SuperWhisper
Price	Local free	Free	$12~15/mo	$25~49	~$8/mo
Privacy Mode (Offline)			Cloud only
Open Source	GPLv3			GPLv3
Local STT		Apple Silicon
Local LLM Polish
AI Text Polish
Edit by Voice
Cloud Meeting Transcription
Meeting AI Summary
Speaker Diarization
File Import
Format Detection

Sourced from public product pages. Features may have changed.

Loved by Early Adopters

Hear from developers and creators who've made the switch.

“I dictate code comments and Slack messages while debugging. The local LLM polish is a game-changer,my ramblings become clean text without touching the cloud.”

Alex Chen

Software Engineer

“Sumi cut my drafting time in half. It auto-formats emails in Gmail for me,saves me from fiddling with layout.”

Sarah Kim

Content Creator

“Being fully open source was the deciding factor. I can verify my audio never leaves my device. No other voice tool gives me that level of trust.”

Marcus Liu

Security Researcher

“I write my thesis entirely by voice now. Switching between English and Mandarin mid-sentence just works. Saved me weeks of typing.”

Emily Zhang

PhD Student

“As a PM, I dictate meeting summaries right after standup. By the time I reach my desk, polished notes are already pasted in Notion.”

David Park

Product Manager

“I dictate code comments and Slack messages while debugging. The local LLM polish is a game-changer,my ramblings become clean text without touching the cloud.”

Alex Chen

Software Engineer

“Sumi cut my drafting time in half. It auto-formats emails in Gmail for me,saves me from fiddling with layout.”

Sarah Kim

Content Creator

“Being fully open source was the deciding factor. I can verify my audio never leaves my device. No other voice tool gives me that level of trust.”

Marcus Liu

Security Researcher

“I write my thesis entirely by voice now. Switching between English and Mandarin mid-sentence just works. Saved me weeks of typing.”

Emily Zhang

PhD Student

“As a PM, I dictate meeting summaries right after standup. By the time I reach my desk, polished notes are already pasted in Notion.”

David Park

Product Manager

“The multilingual support is incredible. I dictate translations in three languages and Sumi handles the code-switching perfectly.”

Léa Dubois

Freelance Translator

“I draft all my show notes by voice during my commute. What used to take an hour now takes 10 minutes.”

Ryan Torres

Podcast Host

“Responding to 50+ emails a day used to drain me. Now I just speak naturally and Sumi gives me polished, professional responses.”

Mika Tanaka

Startup Founder

“I document design decisions by voice while working in Figma. Paste it into Notion and it auto-formats into bullet points.”

Priya Sharma

UX Designer

“Running analyses and writing reports simultaneously was impossible before. Now I dictate findings while coding,Sumi handles the rest.”

James Wilson

Data Scientist

“The multilingual support is incredible. I dictate translations in three languages and Sumi handles the code-switching perfectly.”

Léa Dubois

Freelance Translator

“I draft all my show notes by voice during my commute. What used to take an hour now takes 10 minutes.”

Ryan Torres

Podcast Host

“Responding to 50+ emails a day used to drain me. Now I just speak naturally and Sumi gives me polished, professional responses.”

Mika Tanaka

Startup Founder

“I document design decisions by voice while working in Figma. Paste it into Notion and it auto-formats into bullet points.”

Priya Sharma

UX Designer

“Running analyses and writing reports simultaneously was impossible before. Now I dictate findings while coding,Sumi handles the rest.”

James Wilson

Data Scientist

Frequently Asked Questions

Common questions about setup, privacy, and plans.

Local features are free forever — dictation, local meeting recording, AI summary, and Edit by Voice, all unlimited. Cloud transcription (processed on Sumi servers) requires Starter ($9.99/mo) or above. Cloud features are currently in Beta and free to try.

Sumi can run entirely offline. Speech recognition runs locally on your device with Metal GPU acceleration, and AI text polishing uses a local model. Your voice recordings and text never leave your device. Privacy mode automatically deletes audio after transcription is complete.

macOS: macOS 14 (Sonoma) or later with Apple Silicon (M1/M2/M3/M4), 8 GB RAM or more recommended. Windows: Windows 10 or later (x64). The CPU version works on all PCs; the CUDA version requires an NVIDIA GPU with CUDA support for faster transcription.

After transcription, Sumi automatically removes filler words (umm, uhh) and corrects grammar. In Gmail it auto-formats as an email; in Notion it formats as bullet points. You can run polishing locally with an on-device LLM, or connect cloud providers like OpenAI or Groq using your own API keys.

Press a hotkey before the meeting and Sumi records and transcribes in the background. When the meeting ends, generate an AI summary that captures key decisions and action items. Pro adds speaker diarization so you know who said what.

Select any text, hold the hotkey, and speak an instruction,'translate to English', 'make it more formal', 'turn into bullet points'. Sumi rewrites the selected text in place. No copy-pasting needed.

The Free plan includes all local features: dictation, local meeting recording, AI summary, and Edit by Voice, all unlimited — plus 1 hour of cloud voice transcription per month. Starter ($9.99/mo) adds more cloud transcription quota (voice 20hr/mo, meeting 8hr/mo), speaker diarization, and file import.

Yes. Sumi is open source on GitHub under the GPLv3 license. You can inspect the code, report issues, submit Pull Requests, or help with translations.

Not at all. Sumi has built-in AI polishing that automatically removes filler words (um, uh, like), corrects grammar, and adjusts tone. Just speak naturally and Sumi turns it into clean, professional text. You don't need to speak perfectly,just speak.

Yes. Sumi's privacy mode runs speech recognition and AI polishing entirely on your device — no audio or text is ever uploaded to a server. This meets confidentiality requirements for legal and mental health professionals. Privacy mode also automatically deletes audio files after transcription.

Yes. Sumi's recognition engine is specifically optimized for mixed-language scenarios and significantly outperforms other tools. Whether you're in a bilingual meeting, giving voice commands while coding, or switching between Chinese and English naturally, Sumi accurately recognizes and outputs the correct text.

No. Sumi automatically adjusts formatting based on context,email format in Gmail, bullet points in Notion, code formatting in editors. Tone, punctuation, and paragraph breaks are all handled automatically. When you stop talking, the text is ready to send.

Absolutely. Voice input is the best alternative to keyboard typing. Sumi lets you dictate, edit text, and take meeting notes entirely without a keyboard. With Edit by Voice, you can even rewrite, translate, and reformat text by speaking,no hands required.

Ready to speak your mind?

Download Sumi and give your wrists a break.

Get Started Free View on GitHub

Speak Freely.Write Perfectly.

What can you do with voice-to-text?

One Press, Start Talking

Email That Writes Itself

Meeting Mode

Drop a Recording, Get a Transcript

Talk to Your AI Agent

Sumi vs. OpenAI Whisper

Chinese Accuracy (1 − CER)

Can privacy-sensitive professionals use voice-to-text?

Spending two hours writing up notes after every meeting

Want to use AI transcription, but afraid to upload to the cloud

Not sure if your current tool is training AI on your data

Which apps work with Sumi?

How does Sumi compare to other voice tools?

Loved by Early Adopters

Frequently Asked Questions

Ready to speak your mind?

Speak Freely.
Write Perfectly.