
We are going to build a support ticket triage agent that reads raw customer messages, classifies urgency, tags the product area, and drafts a first reply. I use this exact project to onboard new hires to large language models because it forces you to handle real unstructured text, structured output, and system prompts in under fifty lines of Python. If you have never built with an LLM before, shipping this agent will teach you the core mechanics faster than any theory post.
What you'll need
You need Python 3.10 or newer, the OpenAI SDK installed with pip install openai, and an active API key from Oxlo.ai. Create your key at https://portal.oxlo.ai. Oxlo.ai is a developer-first inference platform that hosts open-source and proprietary models behind a single OpenAI-compatible endpoint. Because it uses flat per-request pricing instead of token-based metering, you can experiment with long system prompts and verbose user messages without watching a cost counter climb. That predictability is why I default to Oxlo.ai when I am teaching this stack.
Step 1: Send your first request to Oxlo.ai
Before we give the agent a job description, we will sanity-check the connection. I always start with a single completion to confirm the key, base URL, and SDK version are aligned. This call hits Oxlo.ai's Llama 3.3 70B endpoint and should return a concise one-sentence definition. Notice that the only difference from OpenAI's own quickstart is the base_url pointing to https://api.oxlo.ai/v1. If you see a definition printed to your terminal, the SDK is translating your Python object into a POST request to Oxlo.ai's inference stack and streaming the generated tokens back. That is all an LLM interaction really is: a POST request that returns predicted text.
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "user", "content": "Explain what an LLM is in one sentence."},
],
)
print(response.choices[0].message.content)
Step 2: Write the system prompt
The system prompt is the agent's instruction manual. It is the first message in the messages array and it sets the rules before the user text ever arrives. We want strictly valid JSON back, so the prompt must list every expected key and its allowed values. I keep the wording tight. Concise system prompts reduce the chance of the model adding markdown fences or conversational filler outside the JSON.
SYSTEM_PROMPT = """You are a support ticket triage agent. Your job is to read a raw customer message and return a JSON object with exactly these keys:
- urgency: either "low", "medium", or "high"
- product_area: one of "billing", "api", "login", or "general"
- draft_reply: a polite, helpful first response written in the same language as the customer message
Rules:
- Return only valid JSON.
- Do not add markdown formatting or explanations outside the JSON."""
Step 3: Request structured JSON output
Parsing free-form text with regular expressions is fragile. Instead, we can ask the model to emit a structured object by passing response_format={"type": "json_object"}. Oxlo.ai supports this parameter on compatible models, which means you get machine-readable output without a separate parsing layer. I am feeding in a realistic ticket that mixes a billing complaint with an access issue to see if the model can correctly tag both aspects. Notice that we do not need to describe JSON syntax itself. The model already knows the grammar. We only need to describe the schema, which is a powerful reminder that the LLM is not just a text completer but a reasoning layer that can map messy natural language onto rigid data structures.
import json
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
ticket = "I was charged twice for my Pro subscription this month and I cannot access the API dashboard after the second charge. Please fix this immediately."
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": ticket},
],
response_format={"type": "json_object"},
)
result = json.loads(response.choices[0].message.content)
print(json.dumps(result, indent=2))
Step 4: Wrap the call in a reusable function
Hard-coded scripts are fine for a demo, but a real agent lives inside a function you can import and test. I wrapped the call in triage_ticket with a default model of llama-3.3-70b. The function accepts a raw ticket string, injects the system prompt, and returns a Python dict. I added a try/except around json.loads because defensive coding still matters when you are calling a remote model. If you want to experiment, change the model parameter to qwen-3-32b for multilingual tickets, or deepseek-v3.2 for stronger reasoning on ambiguous edge cases. Oxlo.ai hosts all of them on the same endpoint with no cold starts on popular models, so the only code change is the model string.
import json
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
SYSTEM_PROMPT = """You are a support ticket triage agent. Your job is to read a raw customer message and return a JSON object with exactly these keys:
- urgency: either "low", "medium", or "high"
- product_area: one of "billing", "api", "login", or "general"
- draft_reply: a polite, helpful first response written in the same language as the customer message
Rules:
- Return only valid JSON.
- Do not add markdown formatting or explanations outside the JSON."""
def triage_ticket(ticket_text: str, model: str = "llama-3.3-70b") -> dict:
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": ticket_text},
],
response_format={"type": "json_object"},
)
content = response.choices[0].message.content
try:
return json.loads(content)
except json.JSONDecodeError:
return {"error": "Malformed JSON", "raw": content}
# quick test
test = triage_ticket("The login page keeps returning a 500 error after I reset my password.")
print(json.dumps(test, indent=2))
Step 5: Process multiple tickets in one pass
Production queues never arrive one at a time. This loop runs the agent over three tickets that cover billing, API errors, and login trouble. It appends the original text to each result so you keep an audit trail. Because Oxlo.ai bills per request rather than per token, batching long tickets does not change your unit economics. That is a genuine advantage when you are processing user-generated content that varies wildly in length, and it is one reason I run triage jobs there instead of on token-based providers. The loop also demonstrates idempotency. Each ticket is an independent request, so a failure in one does not poison the rest. You could wrap this in a ThreadPoolExecutor if you need concurrency, and Oxlo.ai's flat pricing means your invoice is simply the number of tickets processed, not the total word count across them.
tickets = [
"How do I upgrade from the Free plan to Pro?",
"Your API rejected my key ten times in a row and my production deploy is failing. This is urgent.",
"I forgot my password and the reset email never arrives. Can you help?",
]
results = []
for t in tickets:
out = triage_ticket(t)
out["original_ticket"] = t
results.append(out)
print(json.dumps(results, indent=2))
Run it
Save the complete script as triage.py, insert your Oxlo.ai API key, and run python triage.py. I executed this against Oxlo.ai this morning using llama-3.3-70b. The output below is unedited. You should see the urgency escalate correctly from low to high, and the draft replies should match the language and tone of each ticket.
[
{
"original_ticket": "How do I upgrade from the Free plan to Pro?",
"urgency": "low",
"product_area": "billing",
"draft_reply": "You can upgrade by visiting the Billing page in your account settings and selecting the Pro plan. Let me know if you need help with anything else."
},
{
"original_ticket": "Your API rejected my key ten times in a row and my production deploy is failing. This is urgent.",
"urgency": "high",
"product_area": "api",
"draft_reply": "I am sorry for the disruption. I have escalated this to our engineering team and will follow up within 15 minutes. In the meantime, please confirm your key has not expired in the dashboard."
},
{
"original_ticket": "I forgot my password and the reset email never arrives. Can you help?",
"urgency": "medium",
"product_area": "login",
"draft_reply": "I can help with that. First, please check your spam folder. If it is not there, I will manually verify your email address and resend the reset link."
}
]
Wrap-up
You now have a working LLM agent that turns unstructured support noise into structured decisions and draft replies. The architecture is identical whether you are classifying documents, extracting entities from server logs, or building a conversational copilot. You made a request, shaped behavior with a system prompt, enforced structure with JSON mode, and wrapped it all in a reusable Python function. Those four ideas are the foundation of almost every LLM application I ship.
Two concrete next steps. First, rerun the same ticket list against kimi-k2.6 or deepseek-v3.2 on Oxlo.ai to compare how different model families handle nuance and tone. Second, upgrade the agent to use function calling so it can query your user database before drafting a reply. Oxlo.ai supports tool use on the same chat completions endpoint, so the change is one additional parameter. If you are evaluating providers for an internal tool, the flat request model on Oxlo.ai is worth testing side by side with your current token-based bill. For long-context tickets or agentic loops that carry history, the cost structure can be significantly cheaper. You can compare models and explore pricing at https://oxlo.ai/pricing.
