Vapi Guide · 2026-06-21 · 11 min read · WildRun AI Engineering

Deploy Vapi Agents on Cloudflare Workers: Setup Guide

Set up Vapi Cloudflare Workers webhook handlers in TypeScript. Step-by-step guide covering assistant routing, function calling, secrets, and production gotchas.

IntermediateTools:Vapi Cloudflare Workers Wrangler CLI TypeScript

Deploy Vapi Agents on Cloudflare Workers: Setup Guide

Vapi fires server-side webhooks for every significant moment in a call — assistant assignment, mid-call function execution, and post-call reporting. Cloudflare Workers is a natural host for those handlers: V8 isolates boot in roughly 5ms (effectively no cold start), the free tier handles thousands of concurrent webhooks, and Wrangler makes secrets management straightforward. This guide walks through a complete, production-ready setup — from wrangler init to live calls with real function calling and dynamic assistant routing.

Why Cloudflare Workers fits Vapi's latency model

Vapi enforces a 7.5-second end-to-end deadline on assistant-request webhook responses. Miss that window and the call drops silently — no error surfaced to the caller. Traditional serverless functions on AWS Lambda or Google Cloud Functions can add 500–1,500ms of cold-start overhead on an idle function. Cloudflare Workers avoid that entirely: they run as lightweight V8 isolates that start executing in roughly 5ms, keeping the full 7.5 seconds available for your logic.

The default CPU time limit is 30 seconds per invocation on the Workers Free plan. On the Workers Paid plan you can raise that ceiling to 5 minutes by setting limits.cpu_ms = 300000 in wrangler.toml. For most Vapi webhook handlers — a CRM lookup and a JSON response — 30 seconds is never the bottleneck. The constraint is almost always the external API you call, not Workers itself.

What you'll build

A single Cloudflare Worker handling three Vapi webhook event types: assistant-request (assign an assistant dynamically when an inbound call arrives without one pre-configured), function-call (execute business logic mid-conversation and return a plain-text result the LLM reads back to the caller), and end-of-call-report (capture transcript, cost, and summary for logging or CRM sync). The router lives in one entry point; each event type has its own handler module.

If you're newer to how Vapi routes calls before they reach your server, the post Vapi webhook architecture guide covers the platform-side flow in detail.

Architecture

Caller (PSTN / WebRTC)
        |
        v
   Vapi Platform
   |-- STT (speech-to-text)
   |-- LLM (conversation turns)
   +-- TTS (voice synthesis)
        |  webhook events --> POST /webhook
        v
 Cloudflare Worker  (your code, src/index.ts)
   |-- assistant-request  --> pick assistant by caller number or account
   |-- function-call      --> CRM lookup, booking API, etc.
   +-- end-of-call-report --> log transcript, cost, AI summary

   Secrets (wrangler secret put)
   VAPI_PRIVATE_KEY  |  OPENAI_API_KEY

Prerequisites

Node.js 18+ and pnpm (or npm)
A Vapi account with a private API key — find it under Dashboard → Account → Vapi Keys
Wrangler CLI installed and authenticated: npm i -g wrangler && wrangler login
A phone number or web-call source configured in your Vapi dashboard

Project setup

Scaffold a new Worker project using the TypeScript Hello World template. Skip framework templates — a bare Worker keeps the bundle under 1MB and the build fast.

npm create cloudflare@latest vapi-worker -- --type=hello-world-ts
cd vapi-worker
pnpm install

Update wrangler.toml to declare your required secrets. Wrangler validates at deploy time that every declared secret has been set — it blocks a deploy that would ship a Worker with undefined credentials.

name = "vapi-webhook-worker"
main = "src/index.ts"
compatibility_date = "2025-01-01"

[vars]
ENVIRONMENT = "production"

# Declare required secrets — values set via `wrangler secret put`
[secrets]
VAPI_PRIVATE_KEY = ""
OPENAI_API_KEY = ""

Type definitions

Vapi does not yet publish an official TypeScript SDK for server-side webhook payload types. Define just enough to keep your handlers type-safe. These interfaces match Vapi's documented event shapes as of mid-2026 and cover the three events this guide implements.

export type VapiEventType =
  | 'assistant-request'
  | 'function-call'
  | 'end-of-call-report'
  | 'status-update'
  | 'hang';

export interface VapiMessage {
  type: VapiEventType;
  call?: {
    id: string;
    customer?: { number?: string; name?: string };
    assistantId?: string;
  };
  functionCall?: {
    name: string;
    parameters: Record<string, unknown>;
  };
  artifact?: {
    transcript: string;
    messages: Array<{ role: string; content: string }>;
    recordingUrl?: string;
  };
  analysis?: {
    summary?: string;
    successEvaluation?: string;
  };
  durationSeconds?: number;
  cost?: number;
}

export interface VapiWebhookBody {
  message: VapiMessage;
}

Entry point and router

The main src/index.ts handles routing, OPTIONS preflight for browser-based testing, and dispatches webhook events to typed handler modules based on message.type. Every event type Vapi sends that you don't explicitly handle still needs a 200 response — a non-200 triggers a retry.

import { handleAssistantRequest } from './handlers/assistant-request';
import { handleFunctionCall } from './handlers/function-call';
import { handleEndOfCallReport } from './handlers/end-of-call-report';
import type { VapiWebhookBody } from './types';

export interface Env {
  VAPI_PRIVATE_KEY: string;
  OPENAI_API_KEY: string;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const url = new URL(request.url);

    // CORS preflight — needed for Vapi dashboard browser-based testing
    if (request.method === 'OPTIONS') {
      return new Response(null, {
        headers: {
          'Access-Control-Allow-Origin': '*',
          'Access-Control-Allow-Methods': 'POST, OPTIONS',
          'Access-Control-Allow-Headers': 'Content-Type, Authorization',
        },
      });
    }

    if (url.pathname === '/health') {
      return json({ status: 'ok' });
    }

    if (url.pathname === '/webhook' && request.method === 'POST') {
      let body: VapiWebhookBody;
      try {
        body = await request.json();
      } catch {
        return json({ error: 'Invalid JSON' }, 400);
      }

      const { message } = body;

      switch (message.type) {
        case 'assistant-request':
          return handleAssistantRequest(message, env);
        case 'function-call':
          return handleFunctionCall(message, env);
        case 'end-of-call-report':
          await handleEndOfCallReport(message, env);
          return json({ received: true });
        default:
          return json({ received: true });
      }
    }

    return new Response('Not found', { status: 404 });
  },
};

export function json(data: unknown, status = 200): Response {
  return new Response(JSON.stringify(data), {
    status,
    headers: { 'Content-Type': 'application/json' },
  });
}

The assistant-request handler

When an inbound call has no pre-assigned assistant, Vapi fires assistant-request with the caller's phone number. You must respond within 7.5 seconds with either an assistantId (reference a saved assistant from your dashboard) or a full transient assistant object. Transient assistants let you inject per-caller context — account tier, name, prior history — directly into the system prompt without pre-creating hundreds of assistant variants.

import type { VapiMessage } from '../types';
import type { Env } from '../index';
import { json } from '../index';

// Pre-loaded in module scope — no per-request overhead
const VIP_NUMBERS = new Set(['+15415550100', '+15415550101']);

export async function handleAssistantRequest(
  message: VapiMessage,
  _env: Env
): Promise<Response> {
  const callerNumber = message.call?.customer?.number;

  // VIP callers get a transient assistant with a personalized system prompt
  if (callerNumber && VIP_NUMBERS.has(callerNumber)) {
    return json({
      assistant: {
        name: 'Priority Support',
        model: {
          provider: 'openai',
          model: 'gpt-4o-mini',
          messages: [
            {
              role: 'system',
              content:
                'You are a priority support agent for returning customers. ' +
                'Be concise, warm, and aim to resolve issues on the first call.',
            },
          ],
        },
        voice: { provider: 'playht', voiceId: 'jennifer' },
        firstMessage: 'Thanks for calling — how can I help you today?',
      },
    });
  }

  // Default path: reference a pre-built assistant from the Vapi dashboard
  return json({ assistantId: 'your-default-assistant-id-here' });
}

Important: if this handler makes an external database call to look up caller info, cache the result in Cloudflare KV with a short TTL. A slow lookup that misses the 7.5-second window drops the call with zero indication to the caller — it just goes silent.

The function-call handler

Mid-conversation, the LLM invokes a tool defined on your Vapi assistant. The webhook payload includes the function name and parameters the model parsed from the conversation. Your handler executes the logic and returns a result string. Always return a plain string, never a JSON object — Vapi passes the value directly to the LLM, which reads raw JSON syntax aloud character by character if you return an object.

import type { VapiMessage } from '../types';
import type { Env } from '../index';
import { json } from '../index';

export async function handleFunctionCall(
  message: VapiMessage,
  _env: Env
): Promise<Response> {
  const { name, parameters } = message.functionCall ?? {};

  if (!name) {
    return json({ result: 'No function name received.' });
  }

  try {
    const result = await dispatch(name, parameters ?? {});
    return json({ result });
  } catch (err) {
    console.error(`function-call "${name}" error:`, err);
    // Graceful fallback string — never let the handler throw here
    return json({ result: 'I ran into an issue with that. Let me connect you with someone.' });
  }
}

async function dispatch(
  name: string,
  params: Record<string, unknown>
): Promise<string> {
  switch (name) {
    case 'checkAvailability': {
      const date = params.date as string;
      const data = await fetchWithTimeout<{ slots: string[] }>(
        `https://your-booking-api.example.com/slots?date=${encodeURIComponent(date)}`
      );
      return data.slots.length > 0
        ? `Available times on ${date}: ${data.slots.join(', ')}.`
        : `No openings on ${date}. Would another date work?`;
    }
    case 'bookAppointment': {
      const { date, time, customerName } = params as Record<string, string>;
      await fetchWithTimeout('https://your-booking-api.example.com/bookings', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ date, time, customerName }),
      });
      return `Done — ${customerName} is confirmed for ${time} on ${date}. You will get a confirmation text shortly.`;
    }
    default:
      return `I do not have a handler for that request yet.`;
  }
}

// Wrap every external fetch with a hard 5-second timeout.
// Workers CPU time counts wall-clock time spent waiting on fetch().
async function fetchWithTimeout<T>(url: string, init?: RequestInit): Promise<T> {
  const controller = new AbortController();
  const id = setTimeout(() => controller.abort(), 5_000);
  try {
    const res = await fetch(url, { ...init, signal: controller.signal });
    if (!res.ok) throw new Error(`HTTP ${res.status} from ${url}`);
    return res.json() as Promise<T>;
  } finally {
    clearTimeout(id);
  }
}

The end-of-call-report handler

This event fires when any call ends, regardless of who hung up. The payload includes the full message transcript, call duration in seconds, cost in USD, and an AI-generated summary. Return a 200 immediately and do your downstream work after — this is fire-and-forget from Vapi's perspective.

import type { VapiMessage } from '../types';
import type { Env } from '../index';

export async function handleEndOfCallReport(
  message: VapiMessage,
  _env: Env
): Promise<void> {
  const callId = message.call?.id ?? 'unknown';
  const summary = message.analysis?.summary ?? '';
  const durationSec = message.durationSeconds ?? 0;
  const cost = message.cost ?? 0;
  const transcript = message.artifact?.transcript ?? '';

  // Structured log line — Cloudflare Logpush can forward these to BigQuery,
  // S3, or any log sink you configure in the dashboard.
  console.log(
    JSON.stringify({
      event: 'call_ended',
      callId,
      durationSec,
      costUsd: cost.toFixed(4),
      summary,
      transcriptChars: transcript.length,
      timestamp: new Date().toISOString(),
    })
  );

  // To push to a CRM or analytics endpoint, add an async fetch here.
  // Keep it fire-and-forget — don't await it before the outer handler returns.
}

Secrets and local dev

Push production secrets to your live Worker with wrangler secret put. For local development, Wrangler reads from a .dev.vars file in the project root — not from .env. This is the single most common gotcha for developers coming from Node.js or Vercel backgrounds: if you create .env instead of .dev.vars, your secrets will be undefined at runtime and every authenticated call will silently fail.

wrangler secret put VAPI_PRIVATE_KEY
# Paste value when prompted — it's encrypted at rest

wrangler secret put OPENAI_API_KEY
# Paste value when prompted

VAPI_PRIVATE_KEY=your_vapi_private_key_here
OPENAI_API_KEY=sk-your_openai_key_here

Deploying and wiring the server URL

Start local development with wrangler dev — it reads .dev.vars and hot-reloads on each file save. When ready for production:

wrangler deploy
# Output: https://vapi-webhook-worker.your-account.workers.dev

Copy the output URL. In your Vapi dashboard, navigate to Account → Settings → Server URL and set it to https://vapi-webhook-worker.your-account.workers.dev/webhook. You can override this URL per-assistant for multi-tenant deployments where different clients route to different handlers.

Testing webhooks locally

The Vapi CLI (install with npm i -g @vapi-ai/cli) ships a built-in tunnel that exposes your wrangler dev port to Vapi's servers — no ngrok account or paid plan required. Run vapi webhook from your project directory, paste the tunnel URL into your Vapi dashboard Server URL field, and trigger a test call from the dashboard's call panel.

Production gotchas

The 7.5-second assistant-request deadline is absolute

If your assistant-request handler performs a cold database lookup, you risk missing the window. The fix is to pre-cache assistant configs in Cloudflare KV with a 60-second TTL. Cache hits return in under 200ms. A missed deadline drops the call with no error surfaced anywhere — not in Vapi's dashboard, not in your Worker logs. The caller hears silence. Diagnosing this in production is time-consuming.

Local secrets require .dev.vars, not .env

wrangler secret put writes only to the remote Worker. Wrangler's local dev server reads secrets from a file named exactly .dev.vars in the project root — not .env, not .env.local. Create .dev.vars with the same key names and add it to .gitignore immediately after creating it. Many developers spend an hour on this before finding it in the Wrangler docs.

Vapi retries on 5xx responses — design for it

Vapi retries webhook delivery on 5xx responses, which means a throwing handler can produce duplicate end-of-call-report events. If your end-of-call handler sends an SMS, writes to a ledger, or triggers a workflow, duplicate delivery will cause duplicate side effects. Wrap all handlers in try/catch and return 200 { "error": "..." } for non-critical events. Only propagate exceptions for assistant-request, where a retry is intentional.

function-call results must be human-readable strings

Returning a JSON object as the result value causes Vapi's LLM to literally read out the JSON syntax: "Open curly brace, slots colon open bracket..." Always convert to a natural-language sentence before returning. "Three openings on Tuesday: 9am, 11am, and 2pm", not { "slots": ["9am", "11am", "2pm"] }. This will catch you in testing if you miss it, but it's a jarring caller experience if it slips to production.

Workers CPU limit counts fetch() wait time

The 30-second CPU budget on the free plan includes wall-clock time spent waiting on fetch() calls — it's not purely compute time. A slow external API can exhaust your budget before your handler even reaches the response line. Set an AbortController with a 5-second timeout on every outbound fetch made during an active call. The fetchWithTimeout helper in the function-call handler above shows this pattern.

CORS preflight is required for dashboard testing

Vapi's platform-to-Worker webhook calls are server-to-server and don't require CORS headers. But when you test calls from the Vapi web dashboard, the dashboard makes browser-originated requests to your Worker. Without an OPTIONS preflight handler returning the correct CORS headers, those requests fail with a network error that looks like a Worker crash. The entry point above includes the preflight handler — keep it even if you think you won't need browser-based testing.

When NOT to build this yourself

A custom Cloudflare Worker gives you full control over routing, function calling, and data flow. That control comes with ongoing maintenance:

Single-location businesses: If you're deploying a phone receptionist for one business location, the DIY path adds weeks of setup for no unique advantage. A managed platform handles infrastructure, telephony compliance, and voice tuning so you can focus on the business problem. Book a demo to see what a pre-built agent looks like for your use case.
Teams without TypeScript ownership: Vapi's built-in Code Tools let you write function handlers that run directly on Vapi's infrastructure — no Worker, no Wrangler, no deploy pipeline. The right choice for teams that don't want to own serverless infrastructure.
Webhook idempotency at scale: If your function-call handler charges a payment method, creates a calendar event, or sends a notification, you need idempotency keys to handle Vapi's retry behavior safely. That is real infrastructure work — a Cloudflare KV-backed idempotency layer or a message queue (Cloudflare Queues) sitting between the Worker and external actions. Underestimating this is a common mistake on first deployments.
Long-running post-call processing: Transcript embedding, audio transcoding, or ML inference over call recordings doesn't belong in the webhook response path. Route those tasks to a Cloudflare Queue consumer or a Durable Object where the 30-second wall doesn't apply.

For agencies and platform builders deploying voice agents across multiple clients, this Worker-based architecture scales well and keeps costs low. For a business that just needs reliable call handling without owning infrastructure, the managed path is the right trade-off.

The official example from the Vapi team is available at VapiAI/server-example-serverless-cloudflare — it covers the custom LLM proxy integration path not addressed in this guide. For a deeper look at how Vapi's platform routes and queues events, see how to build an AI voice agent with Vapi.

Architecture

Caller (PSTN / WebRTC)
        |
        v
   Vapi Platform
   |-- STT (speech-to-text)
   |-- LLM (conversation turns)
   +-- TTS (voice synthesis)
        |  webhook events --> POST /webhook
        v
 Cloudflare Worker  (src/index.ts)
   |-- assistant-request  --> pick assistant by caller or account
   |-- function-call      --> CRM lookup, booking API, etc.
   +-- end-of-call-report --> log transcript, cost, AI summary

   Secrets (wrangler secret put)
   VAPI_PRIVATE_KEY  |  OPENAI_API_KEY

Frequently asked questions

What is Vapi's assistant-request webhook deadline on Cloudflare Workers?

Vapi requires a response to the assistant-request event within 7.5 seconds end-to-end. Cloudflare Workers start in roughly 5ms due to their V8 isolate model, keeping the full budget available for your logic — but external database calls in the handler must be cached to avoid missing this window.

Do Cloudflare Workers have cold start issues for Vapi webhooks?

No. Cloudflare Workers run as V8 isolates rather than containers and start executing in roughly 5ms. This is well inside Vapi's 7.5-second assistant-request deadline, so you do not need warm-up pings or keep-alive workarounds.

How do I store API keys securely in a Cloudflare Worker for Vapi?

Use wrangler secret put to push secrets like VAPI_PRIVATE_KEY to the remote Worker. Never store them in wrangler.toml under [vars]. For local development, create a .dev.vars file — not .env — in the project root. Wrangler reads secrets from .dev.vars during wrangler dev.

What format should Vapi function-call results be returned in?

Always return a plain string in the result field, never a JSON object. Vapi passes the result value directly to the LLM, which reads it aloud. Returning a JSON object causes the model to read the raw JSON syntax to the caller.

Does Vapi retry webhook requests to my Cloudflare Worker?

Yes. Vapi retries on 5xx responses, which can cause duplicate end-of-call-report events. Always wrap non-critical handlers in try/catch and return 200 with an error field rather than throwing. Only let exceptions propagate from assistant-request, where a retry is intentional.

What is the CPU time limit for Cloudflare Workers handling Vapi webhooks?

The Workers Free plan allows 30 seconds of CPU time per invocation, which includes time spent waiting on outbound fetch() calls. The Paid plan raises this to 5 minutes via limits.cpu_ms = 300000 in wrangler.toml. Set AbortController timeouts on all external API calls to stay within budget.

Written by

Thom Wilson

Founder & AI Engineer, Wild Run AI

SEO consultant turned AI engineer. Built WildRun after years getting small businesses found online — custom AI voice agents, sales and operations automation, and AI-era SEO, deployed on Cloudflare and managed end-to-end.

About the author → · Last reviewed: June 2026

Deploy Vapi Agents on Cloudflare Workers: Setup Guide

Why Cloudflare Workers fits Vapi's latency model

What you'll build

Architecture

Prerequisites

Project setup

Type definitions

Entry point and router

The assistant-request handler

The function-call handler

The end-of-call-report handler

Secrets and local dev

Deploying and wiring the server URL

Testing webhooks locally

Production gotchas

The 7.5-second assistant-request deadline is absolute

Local secrets require .dev.vars, not .env

Vapi retries on 5xx responses — design for it

function-call results must be human-readable strings

Workers CPU limit counts fetch() wait time

CORS preflight is required for dashboard testing

When NOT to build this yourself

Frequently asked questions

Ready to stop losing calls?

Across the Wild Run AI network

Deploy Vapi Agents on Cloudflare Workers: Setup Guide

Why Cloudflare Workers fits Vapi's latency model

What you'll build

Architecture

Prerequisites

Project setup

Type definitions

Entry point and router

The assistant-request handler

The function-call handler

The end-of-call-report handler

Secrets and local dev

Deploying and wiring the server URL

Testing webhooks locally

Production gotchas

The 7.5-second assistant-request deadline is absolute

Local secrets require .dev.vars, not .env

Vapi retries on 5xx responses — design for it

function-call results must be human-readable strings

Workers CPU limit counts fetch() wait time

CORS preflight is required for dashboard testing

When NOT to build this yourself

Frequently asked questions

Ready to stop losing calls?

Related articles

Vapi Webhook Architecture: A Builder's Complete Guide

How to Build an AI Voice Agent with Vapi in TypeScript

Across the Wild Run AI network