Vapi Webhook Architecture: A Builder's Complete Guide
Master Vapi webhook architecture: configure server URLs, handle all event types, write production TypeScript handlers, and avoid the pitfalls.
When you wire Vapi into a production system, the REST API handles call creation — but your webhook handler is where the business logic actually runs: returning a dynamic assistant at call start, executing mid-call tool functions, writing transcripts to your database, and triggering CRM updates when a call ends. Getting this architecture right from day one saves significant debugging time later.
This guide covers the complete Vapi webhook architecture: how Server URL event dispatch works, how to structure a Cloudflare Workers handler in TypeScript, and the production failure modes worth knowing before you deploy. If you haven't yet wired up a basic Vapi assistant, start with our Vapi voice agent setup guide first.
How Vapi Server URLs work
Vapi uses the term Server URL rather than "webhook" because the protocol is bidirectional on certain events. For assistant-request, your server doesn't just acknowledge receipt — Vapi waits for your JSON response and uses it to configure the live call. For events like end-of-call-report, it's a traditional one-way POST that only needs a 200 OK.
Server URLs can be configured at three scope levels:
- Organization level — set in the Vapi dashboard under Organization settings. Acts as the default for all calls unless overridden below.
- Phone number level — attached to a specific number via the Phone Numbers API. Overrides the org-level URL for calls on that number.
- Assistant level — attached directly to an assistant definition. Most specific scope; overrides both org and phone number settings.
The cascade is not a fallback chain. If an assistant-level URL is configured but returns a 500 or is unreachable, Vapi drops the event — it does not fall through to the phone-number or org URL. Design your URL hierarchy intentionally, and make sure the most specific URL is your most reliable endpoint.
Your Server URL must be publicly accessible over HTTPS. HTTP URLs are silently rejected — Vapi will not send events to them and will not produce an error that makes this obvious.
The webhook event lifecycle
Understanding which events fire and in what order is essential before writing handler code. Vapi's webhook events fall into three categories: call-start events that require a meaningful response, mid-call events that block the voice turn, and post-call events safe for async processing.
assistant-request
Fires exactly once when an inbound call connects, before any audio plays. Your server must respond with a full assistant configuration — model provider, voice settings, system prompt, tools, and first message — within 7.5 seconds end-to-end. Vapi's telephony provider enforces a 15-second absolute cap on call setup, and Vapi itself consumes roughly 7.5 of those seconds for internal routing, leaving your server the remaining time including network round-trip.
Miss that window and the call fails silently. The caller hears dead air or a generic error. This is the event where cold-start latency and slow database queries become real problems.
tool-calls
Fires mid-call whenever the assistant decides to invoke one of its configured tools. This event blocks the voice turn: the caller hears nothing while your handler runs. Target a response under 5 seconds; Vapi's recommended budget for tool-call round trips is approximately 6 seconds to account for network jitter. If a tool might legitimately take longer — a live inventory check, a payment authorization — return a graceful fallback string rather than letting the handler time out.
status-update
Fires as call state transitions through ringing, in-progress, forwarding, and ended. Useful for dashboards and monitoring pipelines. Only a 200 OK response is required.
hang
Fires when the caller disconnects mid-call. Acknowledge with 200 OK. No structured response is read.
end-of-call-report
Fires after the call ends and Vapi finishes post-processing (transcription, summarization, cost calculation). The payload includes transcript, summary, analysis (with a successEvaluation field your assistant's analysis config drives), the full messages conversation array, and a cost breakdown under call.costs.
This is the right place for heavier work: persisting call records, posting to a CRM, sending notifications. However, calls ending with busy, failed, or no-answer status do not reliably fire this event. Build a reconciliation job that polls GET /v1/calls/{id} for any call that opened without a corresponding report within five minutes.
Setting up the webhook handler
Cloudflare Workers is well-suited to Vapi webhook handlers: global anycast routing reduces RTT to Vapi's us-west-2 infrastructure, there is no cold-start penalty, and the 30-second CPU limit handles typical webhook payloads. The examples below use Hono, a minimal TypeScript router built for edge runtimes.
name = "vapi-webhook-handler"
main = "src/index.ts"
compatibility_date = "2026-01-01"
[vars]
VAPI_WEBHOOK_SECRET = "your-vapi-webhook-secret"
[[d1_databases]]
binding = "DB"
database_name = "your-db"
database_id = "your-d1-database-id"
import { Hono } from 'hono';
import { verifySignature } from './auth';
import { handleAssistantRequest } from './handlers/assistantRequest';
import { handleToolCalls } from './handlers/toolCalls';
import { handleEndOfCall } from './handlers/endOfCall';
type Env = {
VAPI_WEBHOOK_SECRET: string;
DB: D1Database;
};
const app = new Hono<{ Bindings: Env }>();
app.post('/webhook', async (c) => {
const signature = c.req.header('x-vapi-signature');
const rawBody = await c.req.text();
if (!await verifySignature(rawBody, signature, c.env.VAPI_WEBHOOK_SECRET)) {
return c.json({ error: 'Unauthorized' }, 401);
}
const payload = JSON.parse(rawBody);
const { type } = payload.message;
switch (type) {
case 'assistant-request':
return handleAssistantRequest(c, payload.message);
case 'tool-calls':
return handleToolCalls(c, payload.message);
case 'end-of-call-report':
// Acknowledge immediately; process after response is sent
c.executionCtx.waitUntil(handleEndOfCall(c.env, payload.message));
return c.json({ received: true });
case 'hang':
case 'status-update':
default:
return c.json({ received: true });
}
});
export default app;
Authenticating incoming webhooks
Vapi signs every outgoing POST with an HMAC-SHA256 digest of the raw request body, delivered in the x-vapi-signature header. Always verify it before processing — an unauthenticated endpoint accepts forged events with no effort.
Use the Web Crypto API for verification inside a Cloudflare Worker — Node's crypto.createHmac is not available at the edge. The comparison must be constant-time to prevent timing side-channel attacks.
export async function verifySignature(
body: string,
signature: string | undefined,
secret: string
): Promise<boolean> {
if (!signature) return false;
const encoder = new TextEncoder();
const key = await crypto.subtle.importKey(
'raw',
encoder.encode(secret),
{ name: 'HMAC', hash: 'SHA-256' },
false,
['sign']
);
const sigBytes = await crypto.subtle.sign('HMAC', key, encoder.encode(body));
const expected = Array.from(new Uint8Array(sigBytes))
.map(b => b.toString(16).padStart(2, '0'))
.join('');
// Constant-time comparison prevents timing side-channel attacks
if (expected.length !== signature.length) return false;
let mismatch = 0;
for (let i = 0; i < expected.length; i++) {
mismatch |= expected.charCodeAt(i) ^ signature.charCodeAt(i);
}
return mismatch === 0;
}
Handling assistant-request
The assistant-request handler is the most latency-sensitive point in your stack. The example below looks up the caller's phone number in a D1 database and returns a personalized assistant config. The voice is provided by ElevenLabs and the language model uses Anthropic's Claude — see the Anthropic API docs for available model IDs.
If your database query takes more than a few hundred milliseconds, cache the result in Cloudflare KV keyed on the normalized phone number with a 5-minute TTL. A KV read adds roughly 1ms; a D1 query on a large unindexed table can take 100–200ms. That difference determines whether you make the 7.5-second deadline on a cold path.
import type { Context } from 'hono';
type Customer = { first_name: string; last_visit: string };
export async function handleAssistantRequest(c: Context, message: any) {
const callerNumber: string = message.call?.customer?.number ?? '';
const customer = callerNumber
? await c.env.DB.prepare(
'SELECT first_name, last_visit FROM customers WHERE phone = ? LIMIT 1'
).bind(callerNumber).first<Customer>()
: null;
const systemPrompt = customer
? `You are Aria, a scheduling assistant for Bend Mountain Dental in Bend, Oregon.
The caller is ${customer.first_name}, a returning patient. Their last visit was on ${customer.last_visit}.
Confirm their details and offer to schedule a follow-up.`
: `You are Aria, a scheduling assistant for Bend Mountain Dental in Bend, Oregon.
Greet the caller, ask for their name, and help them book an appointment or answer questions about services.`;
return c.json({
assistant: {
name: 'Aria',
voice: {
provider: 'elevenlabs',
voiceId: 'pNInz6obpgDQGcFmaJgB',
},
model: {
provider: 'anthropic',
model: 'claude-haiku-4-5-20251001',
messages: [{ role: 'system', content: systemPrompt }],
temperature: 0.3,
},
firstMessage: customer
? `Hi ${customer.first_name}, great to hear from you again. How can I help?`
: 'Hi, thanks for calling Bend Mountain Dental. How can I help you today?',
serverMessages: ['tool-calls', 'end-of-call-report', 'hang', 'status-update'],
},
});
}
Handling tool-calls mid-call
Every tool-call invocation blocks the current voice turn. The dispatcher below runs a per-tool switch and wraps every branch in a try/catch. If a tool throws, the caller receives a graceful fallback string rather than silence. Return an array of { toolCallId, result } objects — one per invocation in the payload, since Vapi can batch multiple tool calls into a single event.
import type { Context } from 'hono';
export async function handleToolCalls(c: Context, message: any) {
const { toolCallList } = message;
const results = [];
for (const toolCall of toolCallList) {
const { id, name, arguments: rawArgs } = toolCall;
let result: string;
try {
const args = JSON.parse(rawArgs ?? '{}');
switch (name) {
case 'check_availability': {
const slots = await fetchAvailableSlots(args.date, args.service, c.env.DB);
result = slots.length
? `Available times on ${args.date}: ${slots.join(', ')}.`
: `No availability on ${args.date}. The next open day is ${await nextOpenDay(c.env.DB)}.`;
break;
}
case 'book_appointment': {
const confirmationId = await createBooking(args, c.env.DB);
result = `Booked. Confirmation number ${confirmationId}. A text confirmation will be sent to ${args.phone}.`;
break;
}
default:
result = `I don't have a tool called "${name}". Let me connect you with the front desk.`;
}
} catch (err) {
// Never propagate an exception here — it silences the call
console.error(`Tool "${name}" error:`, err);
result = "I'm having trouble with that right now. Let me connect you with our team.";
}
results.push({ toolCallId: id, result });
}
return c.json({ results });
}
Processing end-of-call-report
The end-of-call-report handler runs after the caller has disconnected, so there is no voice-turn latency pressure. Use c.executionCtx.waitUntil() in the main router to let the Worker continue processing after the 200 response is sent — this keeps webhook response time fast regardless of downstream work volume.
Type mismatch warning: The Vapi SDK's ServerMessageEndOfCallReport TypeScript type omits transcript, summary, and messages — but the actual payload includes all three. Cast the message to any or define your own interface; do not rely on the generated types for these fields.
type Env = { DB: D1Database; SLACK_WEBHOOK_URL?: string };
export async function handleEndOfCall(env: Env, message: any): Promise<void> {
const { call, transcript, summary, analysis } = message;
const durationSeconds =
call.startedAt && call.endedAt
? Math.round(
(new Date(call.endedAt).getTime() - new Date(call.startedAt).getTime()) / 1000
)
: null;
// ON CONFLICT handles rare duplicate delivery from Vapi retries
await env.DB.prepare(`
INSERT INTO call_records
(call_id, caller_number, transcript, summary, duration_seconds, ended_reason, created_at)
VALUES (?, ?, ?, ?, ?, ?, datetime('now'))
ON CONFLICT (call_id) DO UPDATE SET
transcript = excluded.transcript,
summary = excluded.summary
`).bind(
call.id,
call.customer?.number ?? null,
transcript ?? null,
summary ?? null,
durationSeconds,
call.endedReason ?? null
).run();
// Notify Slack only for calls the assistant evaluated as successful
if (analysis?.successEvaluation === 'true' && env.SLACK_WEBHOOK_URL) {
await fetch(env.SLACK_WEBHOOK_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
text: `Call completed\nCaller: ${call.customer?.number ?? 'unknown'}\nSummary: ${summary ?? 'n/a'}`,
}),
});
}
}
Production gotchas
SDK types don't match actual payloads
The Vapi SDK's generated ServerMessageEndOfCallReport type omits transcript, summary, and messages — all fields present in production payloads. This is a documented mismatch reported by multiple production users. Until Vapi updates the type definitions, define your own interface based on what you observe in actual event logs and cast the incoming message to any before accessing these fields.
end-of-call-report is not guaranteed for all outcomes
Calls ending with busy, failed, no-answer, or certain telephony-provider-specific statuses may not fire the end-of-call-report event. If your system must record every call, build a reconciliation Worker on a five-minute schedule: scan your database for calls in a pending state, cross-reference against GET /v1/calls from the Vapi REST API, and manually insert any missing records.
assistant-request cold paths burn the deadline
A D1 query scanning a large table without an index can take 100–200ms. On a 7.5-second budget, with network round-trip consuming 50–100ms, a handful of slow queries will blow the deadline consistently. Add a covering index on customers(phone) and cache the lookup result in Cloudflare KV with a short TTL for returning callers. An indexed miss (unknown caller, zero rows) is fast — it's the full table scan that causes deadline failures.
The URL cascade does not fall back on failure
If your assistant-level Server URL returns a 5xx on assistant-request, the call fails. Vapi does not retry against the org-level URL. Catch errors inside your Worker and return a static fallback assistant JSON rather than propagating a 500. A generic assistant is better than a dropped call.
Duplicate event delivery
Vapi may deliver the same webhook event more than once under network retry conditions. Use call.id as an idempotency key — the ON CONFLICT DO UPDATE pattern in the end-of-call handler above handles this for database writes. For tool-calls, check whether a booking already exists before creating a duplicate.
HTTPS is silently enforced
HTTP Server URLs are accepted in the Vapi dashboard but silently rejected during event dispatch. If your webhook stops receiving events after a configuration change, confirm the URL uses HTTPS and that your TLS certificate is valid.
Cloudflare Workers 30-second CPU ceiling applies to fan-out
The end-of-call-report handler can fan out to several external services. If combined CPU time across all awaited operations exceeds 30 seconds, Cloudflare terminates the Worker. For high-volume deployments, push a message to a Cloudflare Queue from the webhook handler and return 200 immediately — process the queue in a separate consumer Worker with its own CPU budget.
When NOT to build this yourself
A custom webhook handler is the right choice when you need dynamic assistant selection per caller, deep CRM integration with custom routing logic, or runtime behavior that varies based on external data. It is not always the right choice.
Skip it if you're integrating with a single SaaS tool. If your goal is to log call transcripts to HubSpot or book appointments in Calendly, you don't need a custom handler. A Zapier or Make workflow handles both without a deployment pipeline to maintain.
Skip it if your team doesn't have backend TypeScript experience. Debugging webhook timing failures — 7.5-second deadline violations, silent 5xx drops, duplicate deliveries — requires reading Cloudflare Workers logs, interpreting Vapi's event traces, and reproducing latency issues locally with ngrok and the Vapi CLI. If your team doesn't have that background, the time cost of learning it while supporting a production phone system will outweigh the benefits of customization.
Think carefully before handling PHI in a custom webhook. In healthcare deployments, a webhook server that receives call transcripts becomes a covered component under HIPAA. Your hosting provider needs a Business Associate Agreement, your logging pipeline needs audit trails, and your KV cache cannot hold protected health information without additional safeguards. This is solvable, but it's a deliberate compliance scope expansion — not an afterthought.
For businesses in Bend and across Central Oregon that want voice AI without the infrastructure overhead, WildRun AI handles the full webhook layer — call logging, CRM integration, and after-hours routing — without requiring you to deploy or maintain a handler.
Local development setup
Testing webhooks locally requires two components running simultaneously. Run vapi listen to forward Vapi events to localhost on port 4242. Then expose that port using ngrok http 4242 or cloudflared tunnel --url http://localhost:4242 and paste the resulting HTTPS URL into your Vapi Server URL config.
The free ngrok plan generates a new URL on every restart, which means updating your Vapi config each session. A static ngrok domain or a named cloudflared tunnel avoids this. When developing locally, keep your handler response times well inside the production budgets — tunnel RTT is typically higher than a deployed Worker, not lower, so any timing issue you see locally will be worse under a real telephony load.
Architecture
Inbound call
|
v
+-------------+ assistant-request (once, <7.5s) +---------------------------+
| | ----------------------------------> | |
| Vapi | <-- assistant config JSON --------- | Your Webhook Handler |
| Platform | | (Cloudflare Workers) |
| (us-west-2) | tool-calls (per turn, <6s) | |
| | ----------------------------------> | |
| | <-- tool result string ------------ | |
| | | | |
| | end-of-call-report (async) | | |
| | ----------------------------------> | | |
+-------------+ +-----------+---------------+
|
+------------------------------------------------+
| | |
v v v
+------------------+ +-------------------+ +-------------------+
| D1 / KV Store | | CRM / Calendar | | Cloudflare Queue |
| (call logs, | | (HubSpot, etc.) | | (async fan-out) |
| caller cache) | +-------------------+ +-------------------+
+------------------+
Frequently asked questions
What is the difference between a Vapi Server URL and a traditional webhook?
Traditional webhooks are one-way fire-and-forget. Vapi's Server URLs are bidirectional on some events — assistant-request requires your server to return a full assistant configuration JSON within 7.5 seconds, not just acknowledge receipt with a 200 status code.
Why isn't my end-of-call-report webhook firing?
First check that your assistant's serverMessages array includes 'end-of-call-report'. Also note that calls ending with busy, failed, or no-answer status do not reliably fire this event. Build a reconciliation job that polls GET /v1/calls for any call that doesn't generate a report within five minutes.
How do I test Vapi webhooks locally?
Run vapi listen to forward Vapi events to a local port (default 4242), then expose that port publicly using ngrok or cloudflared tunnel. The vapi listen command does not create a public URL by itself — you need both tools running simultaneously, with the ngrok HTTPS URL set as your Vapi Server URL.
What should I do when my assistant-request handler is too slow?
Add a covering index on your customers table's phone column, cache frequent caller lookups in Cloudflare KV with a 5-minute TTL, and set a short database query timeout. If you're still approaching the 7.5-second limit, return a default static assistant immediately and personalize through tool-calls events during the conversation instead.
Can I change the assistant's behavior mid-call via webhook?
No. The assistant-request event fires exactly once at call start and the assistant configuration is fixed for the duration. To change behavior mid-call, use tool-calls to update context or trigger a call transfer to a different number.
Does Vapi retry failed webhook deliveries?
Vapi retries some events on network failures, but retries are not guaranteed and the window is limited. Acknowledge the webhook quickly with 200 OK and push work to a queue for durable processing — don't write directly to your database inside the request handler path.