AI Voice-to-Text · Speak and Done · Works Offline

Speak Freely.
Write Perfectly.

Speak imperfectly,AI polishes it into professional text

macOSApple Silicon
WindowsBeta
Manual Typing
45 wpm
Sumi
220 wpm
Speaking averages 150~200 wpm for most people. Typing averages 40~60.

What can you do with voice-to-text?

Dictation, meeting notes, voice editing, audio import,Sumi handles real work scenarios instantly.

Sumi is an AI voice-to-text tool supporting hotkey dictation, real-time meeting transcription with AI summary, voice editing, and audio file import,with automatic tone and format polishing.

Hotkey

One Press, Start Talking

Hold fn/Globe and start speaking. Sumi records, transcribes, and polishes,all in one flow.

Sumi Editor
Hey Alice, let's meet at the train station tomorrow... uh wait, actually, let's make it the main library.
fn / 🌐
One Press, Start Talking
Email

Email That Writes Itself

Speak naturally, get a polished email. No editing needed.

hey Alice um so something came up at work, I can't do lunch tomorrow, I've got meetings all afternoon,is Thursday good for you?
Gmail- New Message
SubjectLunch Reschedule
Hi Alice, I won't be able to make lunch tomorrow,something came up at work and I'll be in meetings all afternoon. Would Thursday work instead? Best
Send
Email That Writes Itself
Meeting

Meeting Mode

Start it before your call. Sumi transcribes in the background and saves everything to a note file. Nothing to babysit.

Meeting
Meeting 03-07 14:32
Today · ···
Weekly Sync
Mar 4 · 18m
Design Review
Mar 1 · 25m
Meeting 03-07 14:32

Alright, let's get started. Today's agenda,product roadmap and Q2 priorities.

Recording
Meeting Mode
File Import

Drop a Recording, Get a Transcript

Meeting recordings, interviews, podcast raw audio,drop it into Sumi and get a transcript with speaker identification.

Sumi

Drop audio file here

MP3 / M4A / WAV supported

Drop a Recording, Get a Transcript
AI Coding

Talk to Your AI Agent

Speak to Gemini, Claude Code, or Codex,no more switching keyboards between terminal windows.

Terminal
$ claude
> 🎤 "add error handling to the upload function"
I'll wrap the upload logic in a try-catch block and add proper error messages for network failures and file validation...
Gemini
Claude
OpenAI

Sumi vs. OpenAI Whisper

Accuracy and speed on Chinese speech recognition.

Chinese Accuracy (1 − CER)

higher is better
Sumi Cloud
96.7%
Sumi Local
94.7%
OpenAI Whisper
92.4%

* CER from public Chinese speech benchmarks. Sumi Local runs a model compressed to 30% of original size for on-device inference.

Local Processing Speed

2.2×

faster than OpenAI Whisper

Can privacy-sensitive professionals use voice-to-text?

Yes. Sumi's privacy mode runs speech recognition and AI polishing entirely on your device — no audio or text ever leaves your computer.

Lawyers, therapists, doctors, and accountants bound by confidentiality can use Sumi's privacy mode for voice-to-text — all processing happens on-device, nothing goes to the cloud.

01

Spending two hours writing up notes after every meeting

Lawyers drafting briefs, therapists writing session notes, doctors charting patient records, accountants preparing workpapers. Every day, someone stays late just to "finish writing up today's stuff."

Sumi transcribes and polishes in real time. When you're done talking, your notes are done too.

02

Want to use AI transcription, but afraid to upload to the cloud

Divorce settlements, trauma histories, psychiatric disclosures, tax strategies. All sent to a third-party server?

Sumi processes everything on your device. Audio never leaves the machine, never touches the network.

03

Not sure if your current tool is training AI on your data

Cloud services have long, vague privacy policies. Are your recordings being "used to improve service quality"? Nobody can tell you for sure.

Sumi is fully open source. You can audit every line of code. Your conversations never leave your device, so there's nothing to train on.

Not just privacy law. Your professional ethics code.

Which apps work with Sumi?

Every app where you can type. Sumi is a system-level tool,auto-formats as email in Gmail, bullet points in Notion, commands in your terminal.

Slack
VS Code
Gmail
Notion
Chrome
Safari
Discord
Telegram
Slack
VS Code
Gmail
Notion
Chrome
Safari
Discord
Telegram
Slack
VS Code
Gmail
Notion
Chrome
Safari
Discord
Telegram
Figma
Arc
Teams
iTerm2
GitHub
Linear
Obsidian
WhatsApp
Figma
Arc
Teams
iTerm2
GitHub
Linear
Obsidian
WhatsApp
Figma
Arc
Teams
iTerm2
GitHub
Linear
Obsidian
WhatsApp
LINE
Spotify
X
Reddit
YouTube
Zoom
Trello
Evernote
LINE
Spotify
X
Reddit
YouTube
Zoom
Trello
Evernote
LINE
Spotify
X
Reddit
YouTube
Zoom
Trello
Evernote

...and every other app you use. If you can type in it, Sumi works there.

How does Sumi compare to other voice tools?

The only tool with dictation, meeting transcription, voice editing, and offline privacy — all in one.

Compared to Wispr Flow, VoiceInk, and SuperWhisper, Sumi is the only voice-to-text tool that combines AI polishing, meeting transcription, voice editing, privacy mode, and mixed Chinese-English recognition.

FeatureSumiBuilt-in DictationWispr FlowVoiceInkSuperWhisper
Price
Local free
Free
$12~15/mo
$25~49
~$8/mo
Privacy Mode (Offline)
Cloud only
Open Source
GPLv3
GPLv3
Local STT
Apple Silicon
Local LLM Polish
AI Text Polish
Edit by Voice
Cloud Meeting Transcription
Meeting AI Summary
Speaker Diarization
File Import
Format Detection

Sourced from public product pages. Features may have changed.

Frequently Asked Questions

Common questions about setup, privacy, and plans.

Local features are free forever — dictation, local meeting recording, AI summary, and Edit by Voice, all unlimited. Cloud transcription (processed on Sumi servers) requires Starter ($9.99/mo) or above. Cloud features are currently in Beta and free to try.

Sumi can run entirely offline. Speech recognition runs locally on your device with Metal GPU acceleration, and AI text polishing uses a local model. Your voice recordings and text never leave your device. Privacy mode automatically deletes audio after transcription is complete.

macOS: macOS 14 (Sonoma) or later with Apple Silicon (M1/M2/M3/M4), 8 GB RAM or more recommended. Windows: Windows 10 or later (x64). The CPU version works on all PCs; the CUDA version requires an NVIDIA GPU with CUDA support for faster transcription.

After transcription, Sumi automatically removes filler words (umm, uhh) and corrects grammar. In Gmail it auto-formats as an email; in Notion it formats as bullet points. You can run polishing locally with an on-device LLM, or connect cloud providers like OpenAI or Groq using your own API keys.

Press a hotkey before the meeting and Sumi records and transcribes in the background. When the meeting ends, generate an AI summary that captures key decisions and action items. Pro adds speaker diarization so you know who said what.

Select any text, hold the hotkey, and speak an instruction,'translate to English', 'make it more formal', 'turn into bullet points'. Sumi rewrites the selected text in place. No copy-pasting needed.

The Free plan includes all local features: dictation, local meeting recording, AI summary, and Edit by Voice, all unlimited — plus 1 hour of cloud voice transcription per month. Starter ($9.99/mo) adds more cloud transcription quota (voice 20hr/mo, meeting 8hr/mo), speaker diarization, and file import.

Yes. Sumi is open source on GitHub under the GPLv3 license. You can inspect the code, report issues, submit Pull Requests, or help with translations.

Not at all. Sumi has built-in AI polishing that automatically removes filler words (um, uh, like), corrects grammar, and adjusts tone. Just speak naturally and Sumi turns it into clean, professional text. You don't need to speak perfectly,just speak.

Yes. Sumi's privacy mode runs speech recognition and AI polishing entirely on your device — no audio or text is ever uploaded to a server. This meets confidentiality requirements for legal and mental health professionals. Privacy mode also automatically deletes audio files after transcription.

Yes. Sumi's recognition engine is specifically optimized for mixed-language scenarios and significantly outperforms other tools. Whether you're in a bilingual meeting, giving voice commands while coding, or switching between Chinese and English naturally, Sumi accurately recognizes and outputs the correct text.

No. Sumi automatically adjusts formatting based on context,email format in Gmail, bullet points in Notion, code formatting in editors. Tone, punctuation, and paragraph breaks are all handled automatically. When you stop talking, the text is ready to send.

Absolutely. Voice input is the best alternative to keyboard typing. Sumi lets you dictate, edit text, and take meeting notes entirely without a keyboard. With Edit by Voice, you can even rewrite, translate, and reformat text by speaking,no hands required.

Ready to speak your mind?

Download Sumi and give your wrists a break.