If you operate an AI voice agent at any meaningful volume, call recording is not optional. It is how you prove what was said when a customer disputes a charge, how you train better models on real conversations, how you debug a hallucination the night it happens, and how you satisfy auditors who want to see the call that led to a sale. The question is not whether to record but how, where, and for how long.
This guide walks through the technical and legal sides of AI voice agent call recording — how the audio actually gets captured, where it lives, what it costs, and what the regulators will want to see when they come knocking.
> Now Live — TurboCall Local Call Recording for IVR. Our Outbound IVR and Inbound IVR products ship with local recording: every call saved as a stereo WAV to your tenant's storage volume, playable inline from the call detail page. No S3, no per-minute fees, no third-party in the audit chain. Start free.
How AI Voice Agents Actually Record Calls
The mechanics depend on whether your platform runs on Twilio (or another cloud telephony API) or on Asterisk / FreeSWITCH (open-source telephony servers).
On Twilio
Twilio exposes a record=true flag on the Calls.create() API and a status callback that returns a RecordingUrl once the recording is finalized. The audio lives in Twilio's S3 buckets by default, with a per-minute storage charge of about $0.0025/min. You can download the WAV via the recording URL and copy it to your own storage if you want. Most platforms do exactly that — pull the WAV off Twilio after the call ends, upload it to their own S3 bucket, and patch the recording URL on the call record in MongoDB or Postgres.
On Asterisk
The open-source path is more powerful but more work. Asterisk has two recording mechanisms:
- MixMonitor — a dialplan application that records the channel directly to /var/spool/asterisk/recording/{name}.wav.
- Snoop channels + ARI
POST /channels/{id}/record— snoop on the spy=in and spy=out sides of the call separately, get two mono WAV files, combine them into stereo at hangup.
The snoop approach is what serious platforms use because it produces a stereo file with the caller cleanly separated from the bot, which makes post-call analysis (sentiment, hold time, interruptions) much easier. TurboCall's IVR uses this pattern — two snoops, pydub to merge into stereo, then a local HTTP POST to the API to persist the WAV to disk.
Stereo Matters More Than You Think
A mono recording is fine for "did the agent say the right thing." A stereo recording is what you need for:
- •Interruption detection — when did the caller talk over the bot? Easy to spot in the stereo waveform.
- •Sentiment analysis per speaker — was the caller frustrated while the bot was upbeat? Mono blends them.
- •Per-channel transcription — Deepgram, AssemblyAI, and Whisper all produce cleaner transcripts on isolated channels. Diarization on a mono mix is error-prone.
- •Forensic analysis — when legal asks "did the customer say X before the agent said Y," you need a timeline you can prove.
If your platform records mono, you are losing analytical leverage. Push back on the vendor.
Where the Audio Lives
Three architectures dominate. Each has different operational and compliance properties.
Architecture 1 — Cloud Object Storage (S3, GCS, Azure Blob)
The most common path. Recordings get uploaded to a bucket immediately after the call ends, with a signed URL stored in the database. Playback is browser-friendly because S3 has good range-request support.
Pros: scales infinitely, durable, easy to set up CDN in front for fast playback, every cloud provider has compliance certifications (SOC 2, HIPAA BAA, GDPR DPA).
Cons: per-minute storage fees add up at scale ($0.023/GB/month on S3 standard, plus egress charges). Data residency is hard to enforce across regions. Adds a third-party data processor to your audit chain — if you are HIPAA-bound, you need a BAA with AWS specifically for the bucket holding patient calls.
Architecture 2 — Local Disk on the Tenant's Server
The path TurboCall's upcoming IVR uses. Recordings get written directly to a volume mounted on the API server (e.g., uploads/recordings/ on the same Docker volume as the rest of the tenant's files).
Pros: zero per-minute storage fees beyond the volume cost, no third-party in the audit chain, data residency is wherever your server is, playback is one filesystem read away.
Cons: you have to manage retention manually (or with a cron job), the volume needs to be backed up, and horizontal scaling is harder because the file lives on one box. Fine for small-to-medium volumes (under 100k calls/month). At larger scale you graduate to S3 anyway.
Architecture 3 — Hybrid (Local Hot + S3 Cold)
Recent recordings stay on local disk for 7–30 days for fast playback; older recordings get pushed to S3 Glacier for long-term retention. Best of both worlds, but harder to operate.
Most teams start with Architecture 1 or 2, then graduate to the hybrid once their compliance team or their bill forces them to.
Ready to try AI voice agents?
Deploy in minutes with 119+ pre-built templates. No code required.
Consent: The Boring But Critical Part
You cannot record a call legally just because you want to. The rules vary by jurisdiction.
Two-Party Consent States (US)
California, Florida, Illinois, Maryland, Massachusetts, Montana, New Hampshire, Pennsylvania, and Washington require all parties on the call to consent to recording. Your AI voice agent must announce the recording before any substantive conversation begins. A line like "This call is being recorded for quality and training purposes" at the start of the greeting is the industry-standard pattern.
GDPR (EU + UK)
You need a lawful basis to record. The two practical options:
- •Consent. Recipient must affirmatively agree, recorded as part of the call itself. Easy to capture, hard to revoke retroactively.
- •Legitimate interest. Recording is necessary for a legitimate business purpose (compliance, dispute resolution) and balanced against the recipient's privacy. Document the balancing test before you record.
Either way, retention must be proportionate. "Forever" is not proportionate. Twelve months for sales calls, seven years for financial advice, three years for general customer service — pick a number based on your business rationale and write it down.
TCPA (US)
The TCPA does not directly regulate recording, but it does regulate the placement of the call. If you cannot legally call the number, the recording is moot. Always scrub against the National DNC Registry before dialing.
HIPAA (Healthcare, US)
Calls that contain Protected Health Information (PHI) — appointment confirmations, refill reminders, lab result notifications — must be encrypted at rest and in transit. The cloud bucket storing the WAVs needs a Business Associate Agreement (BAA) with the cloud provider. Local-disk storage avoids the BAA-with-AWS problem but you still need to encrypt the volume.
Retention: How Long to Keep What
| Recording type | Typical retention | Why |
|---|---|---|
| Marketing / sales | 12 months | Lawsuit window, model retraining |
| Customer service | 6–12 months | Quality assurance, dispute resolution |
| Financial advice | 7 years | SEC / FINRA require it |
| Healthcare (PHI) | 6 years | HIPAA minimum |
| Recruitment | 12 months | Discrimination defense |
| EU subjects (GDPR) | Whatever you documented | Data minimization principle |
Set retention as a per-tenant config on your platform. Run a nightly cleanup job that deletes anything past the cutoff. If your platform does not surface retention controls, ask the vendor.
Playback Architectures
Two patterns dominate for serving recordings to your dashboard:
- •Direct
tag pointing at a signed URL. Simple, works for cloud and local storage. The URL embeds a time-limited token to prevent hot-linking. Browser handles range requests for seekable playback. - •Streaming through an authenticated API endpoint. The browser hits
GET /api/calls/{id}/recordingwith the user's JWT, the API streams the file with chunked transfer. Slower than direct S3 but lets you enforce per-call access checks. Required in HIPAA environments because direct URLs bypass your authz layer.
TurboCall's IVR uses the first pattern in dev and the second pattern in production. The dev shortcut is fine for development velocity; the production path is the one that satisfies an auditor.
Cost Modeling
Per-call recording cost has three components:
- •Capture cost — usually zero (your existing telephony stack does it for free).
- •Storage cost — $0.023/GB/month on S3 standard. A 3-minute stereo WAV at 8kHz is about 5MB, so $0.000115 per recording per month. At 50,000 calls/month with 12-month retention, that is about $40/month in storage.
- •Egress cost — only matters when you replay calls. AWS charges $0.09/GB to the public internet. Each playback of a 5MB file costs $0.00045. Cheap unless your QA team is replaying every call.
Local disk storage zeros out the second two line items. For most platforms under 100k calls/month, local disk is a no-brainer.
What to Look for in a Vendor
When evaluating an AI voice agent platform for call recording:
- Stereo, not mono. Non-negotiable for analytical work.
- Storage location disclosed up front. "We use AWS us-east-1" is fine. "We don't say" is a red flag.
- BAA available if you need HIPAA. Get it in writing before signing.
- Configurable retention. Per-tenant, per-call-type, ideally per-step.
- Searchable transcripts alongside audio. Bonus points if the transcript is per-channel.
- Export API. You should be able to bulk-download all your recordings if you ever leave.
TurboCall's existing Inbound and Outbound AI Call products all check these boxes. The upcoming IVR feature adds local recording to the same call detail UI — same playback, same export, same retention controls.
Bottom Line
Call recording for AI voice agents is a solved problem technically — every platform does it. The differentiation is in storage architecture, compliance posture, and how easy it is to find the one call you need at 3am during an incident. Pick a vendor that makes those things obvious, not one that buries them in the docs.