Offline text-to-speech
that doesn't sound like
a robot.

A Windows desktop app for AI text-to-speech with 13 built-in voices, voice cloning from a 6-second sample, AI audio enhancement, and queue-based batch generation. Runs 100% offline — no cloud, no account, no character caps. Fast mode is free forever. Pro unlocks everything — $12/mo or $89 once, your call.

VoxWild-Setup.exe · ~377 MB · Windows 10/11 · v1.1.2

ElevenLabs: $22/month. Forever.
VoxWild: $0, $12/mo, or $89 once.
You pick the plan.

See the full comparison ↓

Two engines

Two AI text-to-speech engines, because voice is a trade-off.

Kokoro is the "done in half a second" TTS engine — instant speech synthesis, no GPU needed. Chatterbox is the "wait a bit, get something you'd actually play for a client" engine — human-sounding narration with voice cloning. Pick per line, per project, whenever.

Fast mode

Kokoro TTS

Kokoro 82M · ONNX runtime · CPU

Near-instant generation. Sounds clean and consistent — perfect for narration, audiobooks, and anything where you need speed and clarity over emotional range.

  • Latency~0.4s per 30s of audio
  • Voices13 built-in (US + UK, M/F)
  • Model size82 MB (bundled)
  • RequiresCPU only · 4 GB RAM min
  • CostFree forever, unlimited
Natural mode

Chatterbox TTS

Chatterbox · PyTorch · CPU/GPU

Slower, but genuinely human-sounding. Clones any voice from a 6-second reference sample. The one you'd use for a real podcast, ad, or client deliverable.

  • Latency~4s per 10s of audio (CPU)
  • Voice cloningYes, 6s sample minimum
  • Model size~3 GB (first-use download)
  • Requires6 GB RAM min · GPU optional
  • Cost3 free, then Pro unlocks unlimited
Pricing

VoxWild vs. ElevenLabs, Murf, and PlayHT.

Three ways to pay, one app. Free if Fast mode text-to-speech is enough. Monthly if you want flexibility. Lifetime if you're in for the long haul.
VoxWild Free VoxWild Pro ElevenLabs Starter Murf Creator PlayHT Creator
Monthly cost $0 $0 after buy $22 $29 $31
3-year total $0 $89 total $792 $1,044 $1,116
Runs offline Yes Yes No No No
Voice cloning Yes, 6s sample Yes (≥Creator $22) No Yes
Voice count 13 13 + unlimited clones ~120 ~200+ ~800
Character limit unlimited unlimited 30k / month 200k / month 250k / month
Your text goes to your laptop your laptop their servers their servers their servers
Commercial use Yes Yes Yes Yes Yes
Open-source models Yes Yes No No No
Free
$0
Unlimited Fast mode. 3 free Natural mode + 3 free Enhancement.
Download
Pro Monthly
$12 /mo
Everything unlimited. Cancel anytime from Gumroad.
Subscribe
What's inside

Open-source AI speech models. No black boxes.

VoxWild is a Python desktop app that bundles a small number of open-source AI text-to-speech models. Here's exactly what they are and where they come from, because you shouldn't install a speech synthesis tool that won't tell you.

PlatformWindows 10/11 x64
Installer size~377 MB
Disk after install~800 MB (Fast mode only) · ~5 GB with Natural mode
Minimum RAM4 GB (Fast mode) · 6 GB (Natural mode)
GPUOptional · speeds up Enhancement if CUDA available
NetworkOnly for license activation and update checks
TelemetryNone
Who made this

We're Cookie Studios — a small independent team building desktop software that respects your computer and your wallet.

VoxWild started in early 2026 because the decent AI text-to-speech options all wanted $22+ a month to rent voices that run fine on a laptop. Nobody had shipped a proper desktop wrapper around Kokoro and Chatterbox — two genuinely great open-source speech models. So we built one. Then we kept building, because every time we used it we noticed something else it needed.

We're small enough that support goes straight to a person who knows the code. Email us if something breaks or you want a feature — we reply within a day, usually within an hour on weekdays.

— Cookie Studios
cookiestudios.dev@gmail.com
Frequently asked questions

VoxWild FAQ — the ones people actually ask.

— Is this real?
01
The installer is unsigned. Is it safe?

Yes, but we get why you're asking. SmartScreen warns because code-signing certificates cost ~$400/year and VoxWild is a small indie shop. The installer is hosted on GitHub Releases (public repo, every commit is visible) and the download link on this page points there directly. You can inspect every line of Python before trusting it. We plan to get the exe signed once revenue justifies the cost.

02
Does it actually run offline?

Yes. The only network requests the app ever makes are: (1) a single HTTPS call to Gumroad when you enter a license key, and (2) a daily check to GitHub for new versions. Your text, your audio, your voice clones — none of it leaves your computer. Ever. No account. No cloud.

03
Who's behind it?

Cookie Studios — a small independent team building desktop tools. Support email is above, and a real person replies within a day (usually within an hour on weekdays). We're not VC-funded; revenue comes from users buying the app.

— Will it do what I need?
04
What can I export?

MP3 (configurable bitrate, with ID3 metadata), WAV (16-bit), and SRT subtitle files timed to the audio. FLAC is on the roadmap.

05
Is there a max text length?

No. Paste a novel. The queue splits long text into chunks at sentence boundaries and concatenates the output. I've tested it with book-length scripts.

06
Can I use the audio commercially?

Yes, on every tier including Free, for both engines. Two caveats: (a) Chatterbox output contains an inaudible neural watermark (a Perth watermark, required by the upstream license — doesn't affect audio quality). (b) The EULA requires that you only clone voices you have rights to clone. Beyond that, what you make is yours.

07
How good is the 6-second voice cloning?

Honest answer: good enough to fool a listener in a podcast context, not good enough to fool the cloned person's spouse. Longer reference samples (30–60 seconds of clean audio) produce noticeably better clones. The quality of the source audio matters more than the length — a clean 8s sample beats a noisy 60s one.

— What if I change my mind?
08
Refund policy?

14 days, no questions asked, through Gumroad. Reply to your receipt email and ask for a refund.

09
What if I cancel Pro Monthly?

Fast mode keeps working forever. Natural mode and cloning stop generating new audio. Audio you already generated stays on your disk — it's yours, always.

10
Can I move my license to a new PC?

Yes. A single license activates on up to two machines. If the old one is dead and you need a seat freed, email us and we'll release it the same day.

11
What about Mac / Linux?

Windows only right now. Mac is possible eventually — the underlying models run fine on Apple Silicon — but packaging and testing on a second platform is a lot of work for a small team. If you want Mac support, drop us a line so we can gauge demand.

12
Roadmap?

Short-term: more export formats, pronunciation dictionary improvements, better clone management. Longer-term: FLAC, maybe a macOS build, maybe Linux if demand exists. No AI video, no avatars, no chatbot — this app does one thing.

— a little thing

Draw a wave.

Drag across the box. Release to hear it. This has nothing to do with the app.

drag to draw — release to play
Ready.

Download VoxWild for Windows.

Download Free
VoxWild-Setup.exe · ~377 MB · Windows 10/11 64-bit · v1.1.2
SmartScreen will warn the first time — click "More info" then "Run anyway". The app is unsigned (see FAQ #01).