Guaranteed 15% off your current AI inference bill for team spending up to $20000 / month.

Book a call →
Back to Blogs
Learn AI

LLMs for Natural Language Understanding: A Comprehensive Guide

If you run a SaaS product with more than a few hundred users, manual support triage becomes a bottleneck. I am going to build a production-grade ticket NLU...

LLMs for Natural Language Understanding: A Comprehensive Guide

If you run a SaaS product with more than a few hundred users, manual support triage becomes a bottleneck. I am going to build a production-grade ticket NLU agent that reads raw customer messages and emits structured JSON for downstream routing and analytics. We will run the entire pipeline on Oxlo.ai so that every request, from a one-line question to a 5,000 word complaint thread, costs the same flat amount. You can verify the request-based rates at https://oxlo.ai/pricing.

What you'll need

  • Python 3.10 or newer installed locally.
  • The OpenAI SDK. Install it with pip install openai.
  • An Oxlo.ai API key from https://portal.oxlo.ai. Export it as OXLO_API_KEY in your shell before running the script.

Oxlo.ai exposes a fully OpenAI-compatible API, so every code block below uses the standard client without vendor-specific adapters.

Step 1: Scaffold the project and authenticate with Oxlo.ai

I create a single file named ticket_nlu.py and import the OpenAI client. I read the API key from an environment variable because hardcoding secrets is a habit I refuse to ship. Pointing base_url to https://api.oxlo.ai/v1 and firing a quick health check to llama-3.3-70b proves the connection is live, and because Oxlo.ai keeps popular models warm there are no cold starts on the first request. I default to llama-3.3-70b because it is Oxlo.ai's general-purpose flagship and handles structured instructions without drift.

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key=os.environ.get("OXLO_API_KEY"),
)

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": "Say hello and confirm you are online."},
    ],
)

print(response.choices[0].message.content)

Step 2: Lock down the schema with a strict system prompt

The entire reliability of this pipeline depends on the system prompt acting like a schema contract. I define exactly five output fields with enumerated values where possible, and I force null defaults when data is missing so the JSON shape never changes. I iterated this prompt against twenty edge cases, including empty product names and angry run-on sentences, before locking it. Keeping the prompt in a module-level constant makes it easy to diff in version control as the schema evolves.

SYSTEM_PROMPT = """You are an NLU engine for customer support tickets.
Extract the following fields from the user's message and return valid JSON only:
- intent: one of [refund, technical_issue, billing_question, general_inquiry]
- entities: object with order_id (string or null) and product (string or null)
- sentiment: one of [angry, frustrated, neutral, happy]
- urgency: integer 1 to 5
- summary: max 20 words

Rules:
- If no order_id is present, use null.
- If the product is implied but not explicitly named, use null.
- Respond only with the JSON object. Do not wrap it in markdown fences."""

Step 3: Extract structured data using JSON mode

I wrap the chat completion in a parse_ticket function that leverages Oxlo.ai's native JSON mode via response_format. This removes the need for brittle regex parsers or output retries. I use llama-3.3-70b here because it is fast, handles English support tickets reliably, and on Oxlo.ai the cost is identical whether the user message is ten words or ten thousand. I wrap json.loads in a defensive try block because even with JSON mode I prefer defensive parsing in production pipelines.

import json

def parse_ticket(text: str) -> dict:
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": text},
        ],
        response_format={"type": "json_object"},
    )
    raw = response.choices[0].message.content
    try:
        return json.loads(raw)
    except json.JSONDecodeError:
        return {"error": "invalid_json", "raw": raw}

Step 4: Route multilingual and long-form tickets to a stronger model

Not every ticket is a short English paragraph. When a message exceeds 800 characters or contains mixed languages, I escalate it to qwen-3-32b, which was built for multilingual reasoning and agent workflows. On token-based providers this escalation would be financially risky on long rants, but Oxlo.ai's per-request pricing means the length of the complaint does not inflate the bill. You can tune the 800-character threshold based on your own ticket distribution. I picked it after histogramming our median ticket length.

def parse_ticket_smart(text: str) -> dict:
    if len(text) > 800:
        model = "qwen-3-32b"
    else:
        model = "llama-3.3-70b"

    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": text},
        ],
        response_format={"type": "json_object"},
    )
    raw = response.choices[0].message.content
    try:
        return json.loads(raw)
    except json.JSONDecodeError:
        return {"error": "invalid_json", "raw": raw}

Step 5: Batch process a realistic ticket queue

Production workloads never arrive one at a time. I define a list of three representative tickets covering billing, Spanish language support, and a technical bug. Because Oxlo.ai bills per request, I can calculate this month's NLU spend by multiplying expected ticket volume by the flat rate, with no surprises from a viral product launch generating verbose complaints. In a real deployment I would swap this list for a Redis queue or Kafka consumer, but the core loop is identical.

tickets = [
    "I was charged twice for order #99821 on my credit card and I need this fixed immediately.",
    "Hola, mi pedido numero 7742 no ha llegado y no tengo numero de rastreo. Necesito ayuda por favor.",
    "The app keeps crashing when I click export. This is making my team lose an entire day of work and I am extremely frustrated.",
]

Run it

I execute python ticket_nlu.py from the terminal. The script classifies every ticket in the queue and prints clean JSON that any CRM, webhook, or SQL database can ingest without further cleaning. Notice that the Spanish ticket is handled without a separate translation layer, and the technical ticket correctly flags the app as the affected product.

if __name__ == "__main__":
    for ticket in tickets:
        parsed = parse_ticket_smart(ticket)
        print(json.dumps(parsed, indent=2))
        print("---")

Example output:

{
  "intent": "billing_question",
  "entities": {"order_id": "99821", "product": null},
  "sentiment": "frustrated",
  "urgency": 4,
  "summary": "Double charge on order 99821"
}
---
{
  "intent": "general_inquiry",
  "entities": {"order_id": "7742", "product": null},
  "sentiment": "neutral",
  "urgency": 3,
  "summary": "Missing order 7742 without tracking"
}
---
{
  "intent": "technical_issue",
  "entities": {"order_id": null, "product": "app"},
  "sentiment": "angry",
  "urgency": 5,
  "summary": "App crashes on export button"
}

Wrap-up and next steps

The agent is now a solid preprocessing layer. Two concrete upgrades come to mind. First, generate embeddings for each summary using Oxlo.ai's BGE-Large endpoint and store them in a vector database so you can surface similar historical tickets automatically. Second, wire the JSON output into Oxlo.ai's function calling feature to open refund or bug tickets directly through your existing API. If you need deeper reasoning on ambiguous cases, swap the router target to kimi-k2.6 or deepseek-v3.2 without changing any other code. Both paths stay inside the same OpenAI-compatible client, so your integration layer does not change.

Ready to build with Oxlo.ai?

Get started building high-performance AI inference applications today.

Get started
Ox Assistant
Online
OxBot
OxBot

Hi there! Try our cost calculator to see what you'd save with Oxlo.ai.