messy audio in.
clean JSON out.
One call eats the 4-hour file, the speaker mess and the format hell — and hands your agent a stable schema. No ffmpeg. No babysitting.
One call eats the 4-hour file, the speaker mess and the format hell — and hands your agent a stable schema. No ffmpeg. No babysitting.
Every sample below is an actual file the live engine produced — play it, read it, then run it on your own.
Ngoài kia bao nhiêu câu ca tình nhân thơ tháng Các thế gian nghe như con sông trôi ngẹn ngào Và anh buồn sai bao nhiêu lệch trên bức trấn Để biết em rời đi không khi nào về lại Nói đi em bao dân gió trong đời oan cháy Mà sao em văn thân đi suốt kinh doanh
Anyone can call Whisper. What burns weeks is everything around it. Vocce owns those parts so your agent or automation makes one call and trusts the result.
Split a 4-hour recording, transcribe in parallel, and stitch it back with continuous, correct timestamps — no drift at the seams.
handles 10GB+ uploadsStable speaker maps that stay consistent across the whole file, with low-confidence segments flagged instead of silently guessed.
speaker_map.jsonRe-send the same job, get the same job_id. Transient backend failures retry automatically. Failed jobs aren't billed.
exactly-once semanticsMOV, M4A, WebM, weird bitrates, video with no normalized audio — Vocce cleans, normalizes loudness, compresses for ASR, accepts a URL or upload. You never touch ffmpeg.
20+ formats · clean · normalize · compressPush completed results to your hook, queue, or automation. The output schema is identical across MCP, CLI, API, and every node.
agent.v1 schemaOne job returns every artifact downstream tools need — structured, versioned, and stable enough to build on.
{
"speaker_turns": [
{ "speaker": "A", "start": 12.4, "text": "..." }
],
"chapters": ["Problem", "Decision", "Next steps"],
"artifacts": ["transcript.md", "subtitles.srt"]
}Real discovery happens where people build: the MCP registry, automation marketplaces, and your CI. Stable tool names and one schema across every channel.
{
"mcpServers": {
"vocce": {
"command": "npx",
"args": ["@vocce/transcribe-mcp"],
"env": { "VOCCE_API_KEY": "vc_..." }
}
}
}One example of going deep, not wide. The same primitives power podcast publishing, research interviews, and compliance captioning — but here's the workflow teams pay for first.
Drop a call recording (or point Vocce at the URL). It cleans the audio, separates speakers, extracts objections, commitments, and next steps, then pushes a structured summary straight into HubSpot, Salesforce, or Notion via webhook. No re-recording, no manual notes, no glued-together pipeline.
We're honest about the lane: not the cheapest STT, not a meeting bot. The reliable layer in the middle.
Deepgram, AssemblyAI, Whisper give you text. You still build chunking, diarization stitching, retries, and delivery. Vocce is that layer.
Otter and TurboScribe are built for humans in a dashboard. Vocce is built to be called by code, with a stable schema and webhooks.
Your own ffmpeg + queue + ASR script breaks on the 4-hour file at 2am. Vocce is the maintained version of that script.
Drop a file and get a free 3-minute preview: quality report, the first subtitle lines, and a sample of the agent JSON. When the export matters, pay per pack or wire up the API.
The same reliable pipeline behind focused landing pages — each opens with the exact job the visitor searched for.
A raw API returns text for one clean file. Vocce owns the pipeline around it: chunking and stitching multi-hour files without timestamp drift, speaker diarization with a stable map, format normalization, idempotent jobs, retries, and webhook delivery — returned as one versioned schema.
MCP, CLI, REST API, and pre-built nodes for n8n, Zapier, Make, plus a GitHub Action. Tool names and the output schema are identical across every channel.
No. Upload failures and failed jobs don't burn credits. Processing is only charged when a job completes successfully, with a clear failure reason exposed to your code.
TXT, Markdown, DOCX, SRT, VTT, clean audio, compressed audio, summary Markdown, a quality report, and the agent JSON schema for downstream tools.
The frontend posts jobs to a configurable API endpoint and fails closed if the backend isn't connected. Wire it to your upload, queue, ffmpeg, ASR, and export pipeline; the contract is documented in backend-contract.md.
Get an API key, install the MCP, or just drop a file. One call, one schema, every channel.