AI Voice-to-Text · Speak and Done · Works Offline
Speak Freely.
Write Perfectly.
Speak imperfectly,AI polishes it into professional text
What can you do with voice-to-text?
Dictation, meeting notes, voice editing, audio import,Sumi handles real work scenarios instantly.
Sumi is an AI voice-to-text tool supporting hotkey dictation, real-time meeting transcription with AI summary, voice editing, and audio file import,with automatic tone and format polishing.
One Press, Start Talking
Hold fn/Globe and start speaking. Sumi records, transcribes, and polishes,all in one flow.

Email That Writes Itself
Speak naturally, get a polished email. No editing needed.

Meeting Mode
Start it before your call. Sumi transcribes in the background and saves everything to a note file. Nothing to babysit.
Alright, let's get started. Today's agenda,product roadmap and Q2 priorities.

Drop a Recording, Get a Transcript
Meeting recordings, interviews, podcast raw audio,drop it into Sumi and get a transcript with speaker identification.
Drop audio file here
MP3 / M4A / WAV supported

Talk to Your AI Agent
Speak to Gemini, Claude Code, or Codex,no more switching keyboards between terminal windows.
Sumi vs. OpenAI Whisper
Accuracy and speed on Chinese speech recognition.
Chinese Accuracy (1 − CER)
higher is better* CER from public Chinese speech benchmarks. Sumi Local runs a model compressed to 30% of original size for on-device inference.
Local Processing Speed
faster than OpenAI Whisper
Can privacy-sensitive professionals use voice-to-text?
Yes. Sumi's privacy mode runs speech recognition and AI polishing entirely on your device — no audio or text ever leaves your computer.
Lawyers, therapists, doctors, and accountants bound by confidentiality can use Sumi's privacy mode for voice-to-text — all processing happens on-device, nothing goes to the cloud.
Spending two hours writing up notes after every meeting
Lawyers drafting briefs, therapists writing session notes, doctors charting patient records, accountants preparing workpapers. Every day, someone stays late just to "finish writing up today's stuff."
Sumi transcribes and polishes in real time. When you're done talking, your notes are done too.
Want to use AI transcription, but afraid to upload to the cloud
Divorce settlements, trauma histories, psychiatric disclosures, tax strategies. All sent to a third-party server?
Sumi processes everything on your device. Audio never leaves the machine, never touches the network.
Not sure if your current tool is training AI on your data
Cloud services have long, vague privacy policies. Are your recordings being "used to improve service quality"? Nobody can tell you for sure.
Sumi is fully open source. You can audit every line of code. Your conversations never leave your device, so there's nothing to train on.
“Not just privacy law. Your professional ethics code.”
Which apps work with Sumi?
Every app where you can type. Sumi is a system-level tool,auto-formats as email in Gmail, bullet points in Notion, commands in your terminal.
...and every other app you use. If you can type in it, Sumi works there.
How does Sumi compare to other voice tools?
The only tool with dictation, meeting transcription, voice editing, and offline privacy — all in one.
Compared to Wispr Flow, VoiceInk, and SuperWhisper, Sumi is the only voice-to-text tool that combines AI polishing, meeting transcription, voice editing, privacy mode, and mixed Chinese-English recognition.
| Feature | Sumi | Built-in Dictation | Wispr Flow | VoiceInk | SuperWhisper |
|---|---|---|---|---|---|
| Price | Local free | Free | $12~15/mo | $25~49 | ~$8/mo |
| Privacy Mode (Offline) | Cloud only | ||||
| Open Source | GPLv3 | GPLv3 | |||
| Local STT | Apple Silicon | ||||
| Local LLM Polish | |||||
| AI Text Polish | |||||
| Edit by Voice | |||||
| Cloud Meeting Transcription | |||||
| Meeting AI Summary | |||||
| Speaker Diarization | |||||
| File Import | |||||
| Format Detection |
Sourced from public product pages. Features may have changed.
Loved by Early Adopters
Hear from developers and creators who've made the switch.
“I dictate code comments and Slack messages while debugging. The local LLM polish is a game-changer,my ramblings become clean text without touching the cloud.”
“Sumi cut my drafting time in half. It auto-formats emails in Gmail for me,saves me from fiddling with layout.”
“Being fully open source was the deciding factor. I can verify my audio never leaves my device. No other voice tool gives me that level of trust.”
“I write my thesis entirely by voice now. Switching between English and Mandarin mid-sentence just works. Saved me weeks of typing.”
“As a PM, I dictate meeting summaries right after standup. By the time I reach my desk, polished notes are already pasted in Notion.”
“I dictate code comments and Slack messages while debugging. The local LLM polish is a game-changer,my ramblings become clean text without touching the cloud.”
“Sumi cut my drafting time in half. It auto-formats emails in Gmail for me,saves me from fiddling with layout.”
“Being fully open source was the deciding factor. I can verify my audio never leaves my device. No other voice tool gives me that level of trust.”
“I write my thesis entirely by voice now. Switching between English and Mandarin mid-sentence just works. Saved me weeks of typing.”
“As a PM, I dictate meeting summaries right after standup. By the time I reach my desk, polished notes are already pasted in Notion.”
“The multilingual support is incredible. I dictate translations in three languages and Sumi handles the code-switching perfectly.”
“I draft all my show notes by voice during my commute. What used to take an hour now takes 10 minutes.”
“Responding to 50+ emails a day used to drain me. Now I just speak naturally and Sumi gives me polished, professional responses.”
“I document design decisions by voice while working in Figma. Paste it into Notion and it auto-formats into bullet points.”
“Running analyses and writing reports simultaneously was impossible before. Now I dictate findings while coding,Sumi handles the rest.”
“The multilingual support is incredible. I dictate translations in three languages and Sumi handles the code-switching perfectly.”
“I draft all my show notes by voice during my commute. What used to take an hour now takes 10 minutes.”
“Responding to 50+ emails a day used to drain me. Now I just speak naturally and Sumi gives me polished, professional responses.”
“I document design decisions by voice while working in Figma. Paste it into Notion and it auto-formats into bullet points.”
“Running analyses and writing reports simultaneously was impossible before. Now I dictate findings while coding,Sumi handles the rest.”
Frequently Asked Questions
Common questions about setup, privacy, and plans.
Local features are free forever — dictation, local meeting recording, AI summary, and Edit by Voice, all unlimited. Cloud transcription (processed on Sumi servers) requires Starter ($9.99/mo) or above. Cloud features are currently in Beta and free to try.
Sumi can run entirely offline. Speech recognition runs locally on your device with Metal GPU acceleration, and AI text polishing uses a local model. Your voice recordings and text never leave your device. Privacy mode automatically deletes audio after transcription is complete.
macOS: macOS 14 (Sonoma) or later with Apple Silicon (M1/M2/M3/M4), 8 GB RAM or more recommended. Windows: Windows 10 or later (x64). The CPU version works on all PCs; the CUDA version requires an NVIDIA GPU with CUDA support for faster transcription.
After transcription, Sumi automatically removes filler words (umm, uhh) and corrects grammar. In Gmail it auto-formats as an email; in Notion it formats as bullet points. You can run polishing locally with an on-device LLM, or connect cloud providers like OpenAI or Groq using your own API keys.
Press a hotkey before the meeting and Sumi records and transcribes in the background. When the meeting ends, generate an AI summary that captures key decisions and action items. Pro adds speaker diarization so you know who said what.
Select any text, hold the hotkey, and speak an instruction,'translate to English', 'make it more formal', 'turn into bullet points'. Sumi rewrites the selected text in place. No copy-pasting needed.
The Free plan includes all local features: dictation, local meeting recording, AI summary, and Edit by Voice, all unlimited — plus 1 hour of cloud voice transcription per month. Starter ($9.99/mo) adds more cloud transcription quota (voice 20hr/mo, meeting 8hr/mo), speaker diarization, and file import.
Yes. Sumi is open source on GitHub under the GPLv3 license. You can inspect the code, report issues, submit Pull Requests, or help with translations.
Not at all. Sumi has built-in AI polishing that automatically removes filler words (um, uh, like), corrects grammar, and adjusts tone. Just speak naturally and Sumi turns it into clean, professional text. You don't need to speak perfectly,just speak.
Yes. Sumi's privacy mode runs speech recognition and AI polishing entirely on your device — no audio or text is ever uploaded to a server. This meets confidentiality requirements for legal and mental health professionals. Privacy mode also automatically deletes audio files after transcription.
Yes. Sumi's recognition engine is specifically optimized for mixed-language scenarios and significantly outperforms other tools. Whether you're in a bilingual meeting, giving voice commands while coding, or switching between Chinese and English naturally, Sumi accurately recognizes and outputs the correct text.
No. Sumi automatically adjusts formatting based on context,email format in Gmail, bullet points in Notion, code formatting in editors. Tone, punctuation, and paragraph breaks are all handled automatically. When you stop talking, the text is ready to send.
Absolutely. Voice input is the best alternative to keyboard typing. Sumi lets you dictate, edit text, and take meeting notes entirely without a keyboard. With Edit by Voice, you can even rewrite, translate, and reformat text by speaking,no hands required.