AI Voice Basics

What Is a Voice Agent? The Complete 2026 Guide to Types, Costs, and Limits

A voice agent handles phone conversations end to end — and in 2026 the term almost always means AI. Taxonomy, real costs, limits, and when humans still win.

Sam ChenIndustry Playbooks Lead, MapleVoiceJun 12, 2026 · 28 min read

A voice agent is a system that handles real, two-way spoken conversations over the phone: it listens to what a caller says, works out what they need, takes an action such as booking an appointment or answering a question, and replies in natural speech. For most of the call center era, the term meant a person — a voice agent was the human rep working the phone lines, as opposed to a chat or email agent. In 2026 the unqualified term has flipped: when people search for, buy, or sell a voice agent today, they almost always mean AI software.

That shift matters because the two meanings have very different price tags, strengths, and failure modes. This guide covers both: what human and AI voice agents each do well, a plain-English map of the overlapping terms vendors throw around, how the AI version works, what it honestly costs in 2026, the laws that apply, and the situations where a voice agent of either kind is the wrong tool.

One scope note: this is the broad guide to the term itself. If you already know you want the AI kind and need the component-by-component anatomy, our pillar guide at /blog/what-is-an-ai-voice-agent goes several levels deeper on architecture, evaluation, and deployment. Here we summarize that material and point to it.

Why "voice agent" means something different in 2026

For decades, a voice agent was a job title. Contact centers split their workforce by channel: voice agents took phone calls, chat agents handled live chat, email agents worked the ticket queue. Outsourcing firms still post jobs for voice process roles in exactly this sense, and if you say voice agent to a BPO operations manager, a person is probably still what they picture.

The software sense has now taken over. An AI voice agent is a program that holds open-ended spoken conversations over the phone and completes tasks by connecting to business systems: calendars, CRMs, order systems, ticketing. As of mid-2026, search this exact phrase and the first page of results is vendor guides explaining the AI version — none are about hiring people. Language follows money, and the money moved.

The scale of that move is measurable. According to Straits Research figures cited by AssemblyAI, the voice and speech recognition market was worth $14.8 billion in 2024 and is forecast to pass $61 billion by 2033, and the same AssemblyAI guide cites McKinsey data showing roughly 66% of businesses had automated at least one business process as of 2024. The practical consequence for a business owner: when you evaluate a voice agent in 2026, you are evaluating software, and the rest of this guide treats the term that way unless it says otherwise.

Human voice agents vs. AI voice agents: an honest comparison

Before the taxonomy, the comparison most buyers actually need. Humans and AI are not interchangeable on the phone; they are good at different calls, and pretending otherwise is how bad deployments happen.

The pattern in this comparison is consistent. AI wins on availability, speed, volume, consistency, and record-keeping. Humans win on empathy, judgment, negotiation, and anything emotionally loaded. There is also earned skepticism on the buyer side: AssemblyAI's internal research found that nearly 95% of users have been frustrated with a voice agent at some point. The right conclusion is not that AI replaces every call. It is that AI is now clearly better at high-volume, repeatable calls, and humans remain clearly better at ambiguous, high-stakes ones. The best phone operations in 2026 run both, with clean handoffs between them.

DimensionHuman voice agentAI voice agent
AvailabilityShifts, breaks, sick days, holidays24/7, every day, including holidays
ConcurrencyOne call at a timeMany simultaneous calls; no busy signal
Answer speedDepends on staffing; queues at peak timesImmediate; well-built agents pick up in seconds
ConsistencyVaries by person, mood, and turnoverIdentical discipline on call 1 and call 10,000
Empathy and judgmentHigh; reads emotion and subtextLimited; detects sentiment, cannot truly empathize
Complex negotiationStrongWeak; should hand off to a person
Data captureManual notes, often incompleteAutomatic recording, transcript, and summary
Cost structureSalary, benefits, training, turnoverSubscription or usage fees; no turnover
Time to productiveWeeks of hiring and trainingDays of configuration and testing

Seven terms, untangled: the voice agent family tree

Vendors use at least seven overlapping labels for phone automation, and the differences are real enough to affect what you actually get. One-sentence definitions, in plain English:

Whatever the label on the box, ignore the noun and interrogate the verbs: can it hold a multi-turn conversation, can it take real actions in your systems, and can it transfer to a human with context? Those three questions sort the field faster than any glossary.

  • AI voice agent: software that holds open-ended phone conversations and completes tasks by acting in your business systems. This is what the unqualified term voice agent means in 2026.
  • AI receptionist: an AI voice agent packaged for front-desk work at a specific business — answering, greeting, routing, booking, and message-taking. Same technology, narrower job description.
  • Voicebot: an older or simpler class of speech automation that recognizes set phrases and triggers scripted responses. It can answer "what are your hours" but falls apart in real multi-turn conversation.
  • Voice assistant: a consumer product like Siri or Alexa. It serves the owner of the device, not the callers of a business, and it is not what this article is about.
  • IVR and conversational IVR: the press-1-for-sales menu tree. The conversational variant adds speech recognition on top, but the rigid tree underneath remains — which is why saying "representative" into the void is a national pastime.
  • Intelligent virtual agent (IVA): contact-center industry jargon for enterprise automation that usually spans both chat and voice inside a larger software suite.
  • AI phone agent and AI call assistant: marketing synonyms for AI voice agent, usually aimed at small businesses. If a vendor uses these, evaluate them exactly as you would any AI voice agent.

How an AI voice agent works, and where the technology is in 2026

Every AI voice agent, whatever the brand, runs some version of a five-step loop, dozens of times per call:

That five-step pipeline is what the industry calls a cascading architecture, and as of 2026 it is no longer the only design. AssemblyAI's architecture guide describes three approaches: cascading systems built from separate specialized models, which are modular and easier to debug but add latency at every handoff; end-to-end speech-to-speech systems, where a single model takes audio in and produces audio out — the design behind the realtime models from the major AI labs, such as OpenAI's realtime speech models and Google's Gemini Live — which cut latency and capture tone and hesitation that text-based pipelines flatten; and hybrids that use cascading logic for predictable workflows and switch to speech-to-speech for fluid, open-ended stretches of conversation. You will rarely choose the architecture yourself as a buyer, but it explains why agents built in 2026 pause less and sound less robotic than ones demoed even two years ago.

For the full anatomy — model choices, grounding, guardrails, and evaluation criteria — see the deep dive at /blog/what-is-an-ai-voice-agent. The five-step summary here is all you need for the rest of this guide.

  • Telephony: the call arrives on a phone number connected to the agent, and audio streams in real time.
  • Speech-to-text: automatic speech recognition converts the caller's words into text as they speak. Accuracy is strong but not perfect; a NIST report cited by AssemblyAI notes top systems reach a word error rate as low as 4.9%.
  • Reasoning: a large language model reads the conversation so far, holds context across turns, identifies what the caller wants, and decides what to do next.
  • Action: the agent executes against real systems — it checks calendar availability, looks up the CRM record, places the order, or creates the ticket.
  • Text-to-speech: the response is synthesized into a natural voice and spoken back, and the loop repeats.

Voice agent vs. IVR vs. voicebot vs. chatbot

These four get conflated constantly, and the confusion costs buyers money, because IVR-era products are sometimes sold with voice agent language. Here is the capability comparison:

Chatbots sit outside this comparison because they converse in text. The conversational brain can be similar, but voice adds speech recognition, voice synthesis, and an unforgiving real-time constraint: a three-second pause in chat is invisible, while on a call it sounds like the line died. The performance gap with legacy tools is stark. According to voice-AI vendor Vomyra, traditional IVR systems resolve only about 12% of issues through self-service, and traditional chatbots about 32% without human help. Modern voice agents exist because both of those numbers are embarrassing.

CapabilityTraditional IVRVoicebotAI voice agent
InputKeypad presses, single wordsA set of recognized phrasesNatural, open-ended speech
Conversation flowFixed menu treeShort scripted exchangesMulti-turn; handles interruptions and topic changes
Task completionRouting and basic lookupsCanned answers to simple questionsBooks, orders, qualifies, updates records
Off-script handlingFails or loops the menuFails or escalatesAsks clarifying questions and recovers
Improvement over timeManual reprogrammingManual reprogrammingTuned continuously from real call data

Who needs a voice agent, and what problems it solves

The phone is still where revenue and urgency live. Salesforce's own research found 81% of service professionals say the phone is a preferred channel for complex issues. Delight.ai (formerly Sendbird), citing global surveys, reports 68% of consumers prefer phone for customer support and that more than 76% make it their primary support channel. The same source notes 71% of customers find calling support more stressful than the problem itself — which captures the whole opportunity in one line: demand for the phone is high, and the experience of calling most businesses is bad.

The top-ranking guides for this term all speak to enterprise contact centers, but the buyer they ignore is the local business, where the math is simplest. A dental front desk cannot answer while checking in a patient. An HVAC company's most profitable calls come at 9 pm in January. A restaurant phone rings hardest exactly when staff cannot touch it. A law firm that misses a call often loses that consult to the next firm in the search results. For these businesses the problem is not deflecting volume; it is catching it. And the shift is already underway down-market: AssemblyAI, citing ServiceNow, reports that 35% of small and medium businesses credit automation with significantly improving their customer service and support.

Across both segments, the problems a voice agent reliably solves:

  • Missed calls: every call answered, including during rushes, lunch, and after hours.
  • Hold time: callers get an immediate answer instead of a queue.
  • After-hours coverage: bookings and intake continue at night and on weekends without on-call staffing.
  • Inconsistent intake: every caller gets asked the same qualifying questions, and every answer is recorded.
  • No-shows: outbound reminder and confirmation calls reduce empty slots.
  • Surge load: promotions, outages, and seasonal spikes no longer require temporary hiring.
  • Data loss: every call leaves a transcript and summary instead of a sticky note.

Five local-industry playbooks: the calls a voice agent actually takes

Generic industry lists are everywhere; specific call types are rarer and far more useful. Here is what businesses in five verticals typically hand to a voice agent first — these are the patterns MapleVoice builds against across the 20 industries it supports, and the full set, from dental to home services, is mapped at /industries.

  • Dental and medical clinics: appointment booking and rescheduling, recall and hygiene reminders, new-patient intake details, insurance-participation questions, and after-hours triage that routes true emergencies to the on-call line. One non-negotiable: a patient's name attached to an appointment reason is protected health information, so for qualifying healthcare deployments the vendor must sign a business associate agreement.
  • Home services (HVAC, plumbing, electrical): emergency triage with safety questions asked before scheduling, same-day dispatch booked against the live calendar, quote requests captured in full, and the 9 pm no-heat call answered instead of handed to the next company in the search results.
  • Restaurants: phone orders written straight into the POS, reservations booked and changed, hours-menu-and-parking questions absorbed during the dinner rush, and catering inquiries captured with event date, headcount, and budget so a human can follow up with a real quote.
  • Law firms: new-client intake with the same qualifying questions asked every time, consultation scheduling, urgent-versus-routine triage, and a transcript trail of every inquiry — useful when intake consistency is the difference between a growing practice and a leaky one.
  • Real estate and mortgage: listing and rate questions answered instantly, showings scheduled, leads qualified on budget, timeline, and pre-approval status, and speed-to-lead callbacks placed within minutes, where the first responder usually wins the client.

Inbound vs. outbound: two different animals

A distinction none of the top-ranking guides treat structurally: which direction the calls flow changes the technology priorities, the economics, and the law. Inbound agents answer calls your business receives — reception, booking, order taking, FAQs, intake, overflow when humans are busy. The legal posture is simple, because the caller initiated contact. The ROI logic is capture: you already paid, in marketing or reputation, to make that phone ring, and the agent's job is to stop the leak.

Outbound agents place calls: appointment reminders, missed-call recovery within minutes, lead follow-up, reactivation, surveys. The upside is real, but the legal burden is heavier. The TCPA governs automated outbound calling in the US, and the FCC ruled in February 2024 that AI-generated voices count as artificial or prerecorded voices under the law — which means outbound marketing calls with an AI voice require prior express written consent. Reminder and service calls to existing customers sit in safer territory, but the rules are technical and the penalties accrue per call. If you are starting out, start inbound; add outbound only on a platform with consent management and calling-hours controls built in.

What a voice agent call actually sounds like

None of the pages ranking for this query show an actual conversation, which is strange, because the conversation is the product. Example (illustrative): the transcript below is written to demonstrate the mechanics, not taken from a real call. For real calls, listen to the samples at /call-recordings.

Watch for four behaviors as you read: a safety triage question, an interruption absorbed without losing the thread, live system lookups mid-call, and a handoff that does not make the caller repeat themselves. Those unglamorous moves are the product. When you evaluate any vendor, ask for a live demo call and listen for exactly these four.

📞 Call transcript
AgentThanks for calling Northside Heating and Air — I'm the office's automated assistant. How can I help? (Identity disclosed in the first breath.)
CallerMy furnace is making a clicking noise and there's no heat. Can someone come out today?
AgentSorry to hear that — no heat in January is urgent. One safety check first: do you smell gas anywhere? (Triage before scheduling; a yes here triggers an emergency script instead.)
CallerNo, no gas smell.
AgentGood. I can have a technician there today between 2 and 4, or tomorrow between 8 and 10. Which works? (Real availability, pulled from the scheduling system mid-sentence.)
CallerHang on — what's the call-out fee?
AgentThe diagnostic visit is 89 dollars, waived if you proceed with the repair. So — today 2 to 4, or tomorrow morning? (Absorbs the interruption, answers, returns to the close.)
CallerToday works.
AgentYou're booked for today, 2 to 4 pm, at the address ending in Maple Street. A text confirmation is on its way. Anything else? (Writes the job to the calendar and CRM, sends the SMS.)
CallerActually, can I ask a person about your maintenance plans?
AgentOf course. Transferring you to Dana now — she'll already have a summary of this call. (Warm transfer with context, not a cold dump into a queue.)

What voice agents cost in 2026

Only one of the three top-ranking guides publishes any numbers, so credit where due: according to AssemblyAI, full-stack voice agents typically run $0.01 to $0.05 per minute in raw component costs, speech-to-text alone starts around $0.15 per hour, and platform subscriptions commonly land between $50 and $500 per month plus overages. Those figures are honest but incomplete, because component prices exclude telephony charges, phone numbers, the engineering time to build, and the ongoing labor to maintain prompts and flows.

Two structural points matter more than any single number. First, per-minute meters mean your bill spikes in exactly the months the agent earns its keep, which makes budgeting a guess. Second, the cheapest option on paper — building it yourself — is only cheap if you price your own engineering time at zero. Run the comparison at your real call volume, count your hours honestly, and the ranking of these six options becomes obvious for your situation.

OptionHow it is billedPublished range (as of 2026)What is usually extra
DIY component stackPer minute, per component$0.01-$0.05 per minute full stack, per AssemblyAITelephony, phone numbers, engineering build, ongoing maintenance
Voice AI platform subscriptionMonthly tier plus overages$50-$500 per month plus overages, per AssemblyAISetup labor, integrations, busy-month overage bills
Enterprise suiteCustom contractRarely publishedImplementation partners, licenses, long deployments
Done-for-you managed serviceFlat monthly feeVaries by provider; MapleVoice is flat monthly with no per-minute meterTypically little; setup and tuning included
Human receptionistSalary plus benefitsA full-time salary, benefits, and training; business hours onlyNights, weekends, sick days, turnover and rehiring
Live answering servicePer minute or per callVaries widely by providerOverage fees; takes messages more than it completes tasks

Answer speed and the human handoff: two quality tests vendors gloss over

All three top-ranking guides say "low latency" and leave it there, and every vendor promises "seamless escalation." These two phrases hide the most important quality differences in the category, so test both yourself. On speed: two numbers matter. Time to answer is how long the phone rings before pickup; turn latency is the pause between the caller finishing a sentence and the agent starting its reply. Long dead air reads as a dropped call, and a caller who hangs up on a local business often dials the next result in the search listings rather than calling back. A voice agent's structural advantage is that it never makes a caller wait to be picked up — MapleVoice agents, for example, answer in under 2 seconds, within the first ring, at 2 pm and 2 am alike. For turn latency there is no honest universal benchmark to quote, so do the practical thing: place a live test call to any vendor you are considering and listen for the gaps. If the pauses make you check your screen, your callers will hang up.

On the handoff: a cold transfer dumps the caller into a queue or a ringing line with no context, so they repeat everything; it is barely better than an IVR. A warm transfer passes the conversation along with a context package — the caller's name, why they called, what the agent already tried, and how urgent it is — delivered to the human as a screen pop or a spoken summary before the connection. The caller never starts over.

Three handoff questions to put to any vendor. First, what happens after hours when there is no human to transfer to: the right answers are book directly, take a structured message, or page an on-call number for true emergencies, and the wrong answer is voicemail. Second, when does the agent give up: it should yield immediately when a caller asks for a person, after two consecutive misunderstandings, and at any sign of distress or an out-of-scope safety issue. Third, what containment should we expect: simple call types like appointment booking contain at much higher rates than complex troubleshooting, and while AssemblyAI cites case studies of AI handling up to 77% of level-1 and level-2 support, treat any vendor's containment number as an upper bound until it is measured on your own call mix.

Risks and limitations: when a voice agent is the wrong choice

Salesforce's guide is the only top-ranking page that even names challenges — accuracy, contextual understanding, and emotional intelligence — and none of the three tells you when to walk away. So here is that section. Skip the voice agent, or scope it narrowly, when any of the items below describe your calls.

Beyond that list, two risks deserve flat statements. First, an agent that is not grounded in your approved business data can confidently say wrong things; insist on grounding and review transcripts, especially in the first month. Second, voice impersonation is a real fraud vector — a risk Salesforce flags in its own guide — so never use a caller's voice as the only factor for authentication or account changes. And remember the frustration data earlier in this guide: a badly deployed agent is worse than none, because it spends your reputation with every failed call.

  • Calls are emotionally heavy. Bereavement, bad medical news, angry escalations: these need a human, and routing them to a machine damages the relationship more than a missed call would.
  • The call is a negotiation. Complex sales closes, pricing exceptions, and retention saves depend on judgment and rapport that current agents do not have.
  • Your brand is the human touch. If white-glove personal service is what customers pay a premium for, put the AI behind the humans as overflow and after-hours coverage, not in front of them.
  • Your callers skew hard against automation. Some share of every caller base wants a person; the agent must yield gracefully, and if that share is most of your callers, automate less.
  • Audio conditions are hostile. Heavy accents, weak cell signal, and loud job sites degrade speech recognition; expect more clarifying questions and some outright failures.
  • Calls involve several people at once. Multi-party and conference-style conversations remain a weak spot.
  • Volume is tiny. If three calls a day arrive and none get missed, the subscription may not pay for itself.

Build, buy, or done-for-you: picking your path

The guides ranking for this term each assume one answer: AssemblyAI assumes you will build, while Salesforce and Delight.ai assume you will buy their platform. The honest version is a three-way decision.

Decide on four axes: engineering capacity, call volume, how unusual your workflows are, and how fast you need to be live. The more your answers look like "none, moderate, not very, and this month," the further down the list your answer sits.

Whichever path you pick, implementation follows the same arc. AssemblyAI's guide breaks it into six steps — define the use case, choose the platform, design conversation flows, integrate and test, deploy gradually, then monitor and optimize — and the difference between paths is who does that work and how long it takes. A first DIY build is measured in weeks, an enterprise suite rollout in months, and a done-for-you provider compresses it to days — MapleVoice's typical go-live is about 48 hours — because the flows, integrations, and edge cases have been built many times before.

  • Build it, on developer platforms like Vapi, LiveKit, or the open-source Pipecat — the three AssemblyAI names. You get full control of latency, voice, and logic, and you own the telephony, the integrations, and the maintenance pager. Right for product teams embedding voice into software, and for companies with genuinely unusual call workflows and engineers to spare.
  • Buy a platform, such as Salesforce Agentforce, or contact-center vendors like PolyAI and Kore.ai from Delight.ai's roundup. Configuration replaces coding, and the agent lives inside an ecosystem you may already run. Right for enterprises with admins, existing CRM investments, and procurement patience.
  • Use a done-for-you service, where the provider builds the agent, wires the integrations, monitors calls, and tunes continuously while you review outcomes. Right for clinics, firms, restaurants, and home-services companies that have a phone problem, not an engineering roadmap. The trade-off is less granular control than building it yourself.

How to evaluate a voice agent, before and after it answers

Evaluation has two halves: the features you verify before you buy, and the numbers you track after launch. For the first half, Delight.ai's guide carries the most complete feature rubric of the three top-ranking pages — interruption handling, context awareness, integrations, fallback, observability, redaction — and the demo-call checklist at the end of this section turns that rubric into tests you can run in ten minutes on a live call.

For the second half, Salesforce's guide has the best measurement framework on this topic, and it is worth adopting: track containment rate, average handle time, lead qualification rate, appointment booking rate, and customer satisfaction trends. Add the before-and-after numbers that matter most to a smaller operation: missed-call rate and booked appointments per week, compared with the month before launch. The category's upside is documented — AssemblyAI cites a Salesforce survey in which customer service departments saw a 37% ROI from automation — but your own before-and-after numbers are the only proof that should move your budget.

Underneath the metrics sits an auditability requirement. Every call should produce a recording, a transcript, a summary, a call reason, an outcome, and a next step. If a vendor cannot show you those six artifacts for any given call, you cannot audit the agent, and you are trusting a black box with your phone line. In the first month, read transcripts weekly; it is the fastest way to find the calls the agent fumbles and fix them. And before any contract is signed, run this checklist on a live demo call:

  • Interruption handling: talk over the agent mid-sentence; it should stop, absorb the new input, and recover without losing the thread.
  • Context memory: give a detail early — your name, the reason for calling — and check whether the agent still has it three turns later.
  • Real actions, not promises: confirm it writes to an actual calendar, CRM, or POS during the demo, not a mock screen.
  • Fallback to humans: ask for a person, time how fast the agent yields, and ask exactly what context arrives with the transfer.
  • Multilingual coverage: if your callers speak more than one language, test the second language as hard as the first.
  • Observability: ask to see the recording, transcript, and summary of the demo call itself within minutes of hanging up.
  • Compliance and redaction: ask how payment data is redacted, where recordings are stored, and — for healthcare — whether the vendor signs a BAA.

Where MapleVoice fits, and your next step

Our cards on the table: MapleVoice is a done-for-you AI voice agent service — the third path above. We build, integrate, and tune the agent for you, and typical go-live is about 48 hours. Pricing is a flat monthly fee with no per-minute meter. Agents answer 24/7 in under 2 seconds, book appointments, qualify leads, and take orders; they warm-transfer to your team with context; and they come industry-tuned for 20 verticals, from dental to home services, with booking, CRM, and POS integrations. For qualifying healthcare customers we operate HIPAA-aware and sign a BAA, and outbound calling runs with TCPA controls. Every call produces the six artifacts above: recording, transcript, summary, call reason, outcome, and next step.

Equally on the table: when we are not the answer. If you have engineers and want full control, build on a developer platform. If your calls are long, emotional consultations, keep humans in front and use AI only for overflow and after-hours. If you rarely miss a call today, measure first and buy later. Next steps, in order: listen to real recorded calls at /call-recordings, see the setup process at /how-it-works, and if you want the technical deep dive on the AI itself, read /blog/what-is-an-ai-voice-agent.

Frequently asked questions

What is a voice AI agent and how does it work?

A voice AI agent is software that holds spoken phone conversations and completes tasks like booking or answering questions. It converts caller speech to text, uses a language model to understand intent and decide an action, executes that action in systems like a calendar or CRM, then replies in a natural synthesized voice.

How are AI voice agents different from chatbots?

Chatbots converse in text; voice agents converse in speech over a live phone call. That adds speech recognition, voice synthesis, and hard real-time latency requirements, because a pause that is fine in chat feels broken on a call. Voice agents also handle interruptions and accents — problems chatbots never face.

What is the difference between a voice agent and an IVR?

An IVR is a fixed menu tree that routes callers with keypad presses; a voice agent is open conversation that completes tasks. IVRs fail when callers go off script. According to voice-AI vendor Vomyra, traditional IVR resolves only about 12% of issues through self-service, which is why callers smash zero for an operator.

How much does an AI voice agent cost?

Published 2026 ranges run from $0.01-$0.05 per minute for DIY component stacks to $50-$500 monthly platform subscriptions plus overages, according to AssemblyAI. Done-for-you services typically charge a flat monthly fee; MapleVoice has no per-minute meter. DIY totals also include telephony, phone numbers, and engineering time.

Are AI voice calls legal?

Yes, with rules that depend on direction. Answering inbound calls is broadly low-risk. Outbound is regulated by the TCPA, and a February 2024 FCC ruling confirmed AI-generated voices count as artificial or prerecorded, so outbound marketing calls require prior express written consent. Recording-consent and HIPAA rules can also apply.

Can callers tell they are talking to an AI?

Often, yes — and that is fine. Modern synthesized voices sound natural, but pacing and phrasing can still give the agent away, and a minority of callers will ask directly. Best practice is to disclose up front: callers care far more about getting helped in one call than about who answered.

Can a voice agent transfer a call to a human?

Yes, and the transfer quality is a key buying criterion. A good agent performs a warm transfer: it passes the caller's name, reason, and conversation context so the human picks up informed. A cold transfer that dumps callers into a queue to repeat themselves defeats the purpose. Test this before buying.

Can AI voice agents personalize customer interactions?

Yes, within limits. An agent connected to your CRM can greet returning callers by name, see appointment history, and tailor responses to past interactions — Salesforce's guide highlights exactly this CRM-grounded personalization. The limit is data: an agent with no integrations can only personalize on what the caller says during the call.

Do I need a team of AI experts to build a voice agent?

No. There are three paths: developers can assemble agents on platforms like Vapi or LiveKit, enterprises can configure suites like Salesforce Agentforce, and businesses without technical staff can use a done-for-you service where the provider builds, integrates, and tunes everything. Only the first path requires engineers.

What technologies are used in AI voice agents?

Four layers: automatic speech recognition to convert caller audio to text, a large language model for understanding and reasoning, an action layer that connects to calendars, CRMs, and other systems, and text-to-speech for the reply. Newer speech-to-speech models merge several layers to cut latency, with hybrids common in 2026.

The “What is…” series

Ten definitive guides to AI voice technology — plain English, honest math, no hype.

Keep reading

Hear it answer a real call

MapleVoice builds and runs a fully-managed AI voice agent for your business — live in about 48 hours, flat monthly price.