Function Calling in LLMs: How AI Agents Use Tools (Practical Guide)
Author
ZTABS Team
Date Published
Function calling — also called tool use — is the capability that transforms LLMs from text generators into AI agents that can act in the real world. Without function calling, an LLM can only produce text. With function calling, an LLM can search databases, call APIs, send emails, process payments, update CRM records, and execute virtually any operation you define.
If you are building AI agents, function calling is the most important capability to understand deeply. It is the bridge between the LLM's reasoning and your application's functionality.
How Function Calling Works
The flow is straightforward once you understand the mechanics.
Step 1: Define functions
You describe the functions (tools) the LLM can call — name, description, parameters, and parameter types. These descriptions are passed to the LLM as part of the prompt.
Step 2: LLM decides to call a function
Based on the user's message and the available function descriptions, the LLM decides whether to call a function and which one. The LLM does not execute the function — it returns a structured JSON object indicating which function to call and what arguments to pass.
Step 3: Your code executes the function
Your application receives the function call request, executes the actual function (API call, database query, etc.), and returns the result to the LLM.
Step 4: LLM incorporates the result
The LLM receives the function result and uses it to generate its final response to the user.
User: "What's the weather in Houston?"
↓
LLM: "I should call get_weather with city='Houston'" (function call)
↓
Your code: calls weather API → returns "72°F, partly cloudy"
↓
LLM: "The current weather in Houston is 72°F and partly cloudy."
The LLM never has direct access to your systems. It can only request that you execute functions on its behalf. This separation is critical for security.
Function Calling Across Providers
OpenAI (GPT-4o, GPT-4o-mini)
from openai import OpenAI
client = OpenAI()
tools = [
{
"type": "function",
"function": {
"name": "get_order_status",
"description": "Look up the current status of a customer order by order ID",
"parameters": {
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "The order ID (e.g., ORD-12345)"
}
},
"required": ["order_id"]
}
}
},
{
"type": "function",
"function": {
"name": "search_knowledge_base",
"description": "Search the company knowledge base for product information, policies, and FAQs",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query"
}
},
"required": ["query"]
}
}
}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are a helpful customer support agent."},
{"role": "user", "content": "Where is my order ORD-12345?"}
],
tools=tools,
tool_choice="auto"
)
tool_call = response.choices[0].message.tool_calls[0]
# tool_call.function.name == "get_order_status"
# tool_call.function.arguments == '{"order_id": "ORD-12345"}'
Anthropic (Claude)
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "get_order_status",
"description": "Look up the current status of a customer order by order ID",
"input_schema": {
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "The order ID (e.g., ORD-12345)"
}
},
"required": ["order_id"]
}
}
]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system="You are a helpful customer support agent.",
tools=tools,
messages=[
{"role": "user", "content": "Where is my order ORD-12345?"}
]
)
for block in response.content:
if block.type == "tool_use":
# block.name == "get_order_status"
# block.input == {"order_id": "ORD-12345"}
pass
Google (Gemini)
import google.generativeai as genai
get_order_status = genai.protos.FunctionDeclaration(
name="get_order_status",
description="Look up the current status of a customer order",
parameters=genai.protos.Schema(
type=genai.protos.Type.OBJECT,
properties={
"order_id": genai.protos.Schema(type=genai.protos.Type.STRING)
},
required=["order_id"]
)
)
model = genai.GenerativeModel(
"gemini-1.5-pro",
tools=[genai.protos.Tool(function_declarations=[get_order_status])]
)
response = model.generate_content("Where is my order ORD-12345?")
The Tool Loop Pattern
In practice, an AI agent often needs to call multiple tools in sequence — search for information, then look up a record, then take an action. This requires a tool loop.
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_message}
]
while True:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools
)
assistant_message = response.choices[0].message
if not assistant_message.tool_calls:
# No more tool calls — return the final response
print(assistant_message.content)
break
messages.append(assistant_message)
for tool_call in assistant_message.tool_calls:
result = execute_tool(tool_call.function.name, tool_call.function.arguments)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": str(result)
})
This loop continues until the LLM generates a text response instead of a tool call — signaling that it has gathered enough information to answer.
Limiting tool calls
Without limits, the agent could loop indefinitely. Always set a maximum.
MAX_TOOL_CALLS = 5
tool_call_count = 0
while tool_call_count < MAX_TOOL_CALLS:
response = client.chat.completions.create(...)
if not assistant_message.tool_calls:
break
tool_call_count += len(assistant_message.tool_calls)
Writing Good Function Descriptions
The quality of your function descriptions directly determines how reliably the LLM selects the right function and passes correct arguments. This is prompt engineering for tools.
Rules for function descriptions
Be specific about what the function does:
Bad: "Gets data"
Good: "Retrieves the current shipping status, tracking number, and estimated
delivery date for a customer order by its order ID"
Describe parameter formats and constraints:
Bad: "date": { "type": "string", "description": "Date" }
Good: "date": { "type": "string", "description": "Date in YYYY-MM-DD format (e.g., 2026-03-04)" }
Specify when to use the function vs when not to:
"description": "Search the product catalog for items matching a query.
Use this for product-related questions. Do NOT use this for order status
or account questions — use get_order_status or get_account_info instead."
Use enum types to restrict parameter values:
"status_filter": {
"type": "string",
"enum": ["pending", "shipped", "delivered", "returned"],
"description": "Filter orders by status"
}
Parallel Function Calling
GPT-4o and Claude can request multiple function calls in a single response. This is useful when the agent needs independent data from multiple sources.
User: "Compare my last order with my current subscription"
LLM responds with TWO tool calls:
1. get_recent_orders(customer_id="C-123", limit=1)
2. get_subscription(customer_id="C-123")
Both calls can execute in parallel, and results are returned together. This halves the latency compared to sequential calls.
import asyncio
async def handle_parallel_calls(tool_calls):
tasks = [
execute_tool_async(tc.function.name, tc.function.arguments)
for tc in tool_calls
]
results = await asyncio.gather(*tasks)
return results
Security Considerations
Function calling gives the LLM indirect access to your systems. Take security seriously.
Input validation
Never trust the LLM's arguments blindly. Validate every parameter before execution.
def get_order_status(order_id: str) -> str:
if not re.match(r'^ORD-\d{5,10}$', order_id):
return "Invalid order ID format"
order = db.orders.find_one({"id": order_id})
if not order:
return "Order not found"
return f"Status: {order['status']}, Tracking: {order['tracking']}"
Authorization boundaries
Not every function should be available to every user. Implement per-user tool access.
def get_available_tools(user_role: str) -> list:
base_tools = [search_knowledge_base, get_order_status]
if user_role == "admin":
return base_tools + [process_refund, update_account]
return base_tools
Rate limiting
Prevent runaway agents from overwhelming your systems.
Audit logging
Log every function call with the user context, arguments, result, and timestamp. This is essential for debugging, security, and compliance. See our AI governance guide.
Function Calling vs MCP
Function calling is the low-level mechanism. Model Context Protocol (MCP) is the high-level standard built on top of it.
| Aspect | Function Calling | MCP | |--------|-----------------|-----| | Scope | Single model, single application | Universal standard across models | | Discovery | You define tools in code | Client auto-discovers tools from servers | | Portability | Model-specific API format | Works across any MCP-compatible model | | Ecosystem | Custom per project | Growing ecosystem of pre-built servers | | Best for | Simple applications with few tools | Complex applications with many tools or multi-model support |
Rule of thumb: Use native function calling for simple agents with 1–5 tools. Use MCP when you have many tools, need model portability, or are building a platform.
Getting Started
- Start with one function. Build an agent that can call a single function reliably before adding more.
- Write excellent function descriptions. This is the highest-leverage optimization for function calling reliability.
- Implement the tool loop pattern. This is the standard architecture for AI agents.
- Add validation, rate limiting, and logging from day one.
- Test with adversarial inputs. What happens when the user tries to trick the agent into calling functions it should not?
For help building AI agents with production-grade function calling, explore our AI agent development services or contact us. Our team builds agents across customer support, e-commerce, and enterprise automation using both native function calling and MCP.
Need Help Building Your Project?
From web apps and mobile apps to AI solutions and SaaS platforms — we ship production software for 300+ clients.
Related Articles
AI Agent Orchestration: How to Coordinate Agents in Production
AI agent orchestration is how you coordinate multiple agents, tools, and workflows into reliable production systems. This guide covers orchestration patterns, frameworks, state management, error handling, and the protocols (MCP, A2A) that make it work.
10 min readAI Agent Testing and Evaluation: How to Measure Quality Before and After Launch
You cannot ship an AI agent to production without a testing strategy. This guide covers evaluation datasets, accuracy metrics, regression testing, production monitoring, and the tools and frameworks for testing AI agents systematically.
10 min readAI Agents for Accounting & Finance: Bookkeeping, AP/AR, and Reporting
AI agents automate accounting tasks — invoice processing, expense management, reconciliation, and financial reporting — reducing manual work by 60–80% while improving accuracy. This guide covers use cases, ROI, compliance, and implementation.