LLM (Large Language Model)
An AI model trained on large text datasets, used as the reasoning engine in AI voice agents.
A Large Language Model (LLM) is an AI model trained on massive text datasets to predict and generate natural language. LLMs like GPT-4o, Claude, and Gemini are the reasoning engine at the heart of modern AI voice agents — they interpret caller intent, generate responses, decide what actions to take, and maintain conversation context across multiple turns.
How LLMs work in voice agents
- The caller's speech is transcribed to text by the STT engine
- The text is sent to the LLM along with a system prompt defining the agent's persona, knowledge base, and allowed actions
- The LLM generates a text response and, if appropriate, a structured action (book appointment, look up order, transfer call)
- The response is sent to the TTS engine to produce audio
System prompts are the instructions that define how the LLM behaves during a call. They typically include the agent's name and personality, the business's policies and FAQs, instructions for specific scenarios (angry caller, HIPAA-sensitive topics), and the available integrations and actions.
Token context window: LLMs process text in chunks called tokens. A longer context window means the agent can remember more of the conversation history and respond consistently across a long call.
Grounding: For domain-specific knowledge (product catalog, appointment availability, order status), LLMs are grounded with real-time data retrieved from external systems via function calls or RAG (retrieval-augmented generation).