Guaranteed 15% off your current AI inference bill for team spending up to $20000 / month.

Book a call →
Back to Blogs
Learn AI

LLM for Natural Language Processing: A Deep Dive

Support teams drown in tickets that vary wildly in tone and urgency. I built a small NLP pipeline that reads an incoming message, classifies its intent, and...

LLM for Natural Language Processing: A Deep Dive

Support teams drown in tickets that vary wildly in tone and urgency. I built a small NLP pipeline that reads an incoming message, classifies its intent, and drafts a first response in one shot. It runs entirely on Oxlo.ai, and because the platform bills per request instead of per token, I can pass the full system prompt and long ticket threads without watching a meter run.

What you'll need

Step 1: Design the system prompt

I need the model to act like a structured backend service, not a chatbot. The prompt below pins the output format to JSON and restricts the allowed values for classification fields. I also instruct the model to keep the draft_reply concise so human agents do not need to rewrite it from scratch. Giving the model a closed vocabulary for sentiment, urgency, and category makes downstream routing trivial. I can send billing tickets to the finance queue and critical items to the on-call channel without any fuzzy matching.

SYSTEM_PROMPT = """You are a support ticket triage assistant.
Analyze the user's message and return a JSON object with exactly these keys:
- sentiment: one of [frustrated, confused, neutral, happy]
- urgency: one of [low, medium, high, critical]
- category: one of [billing, technical, account, general]
- summary: a one-sentence description of the problem
- draft_reply: a polite, helpful first response

Output only valid JSON. Do not wrap it in markdown fences."""

Step 2: Initialize the Oxlo.ai client

Oxlo.ai exposes an OpenAI-compatible endpoint, so the only difference from a standard OpenAI setup is the base URL. I import the SDK and set the key. If you are already using OpenAI elsewhere, this is a one-line change. The client handles streaming, function calling, and JSON mode exactly like the standard SDK, which means existing tutorials and middleware work without patches.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

Step 3: Build the classification call

Now I wire the prompt to Llama 3.3 70B. I chose this model because it handles instruction following well for English support tickets, but you could switch to qwen-3-32b for multilingual queues or deepseek-v3.2 for heavier reasoning without touching the client code. Since Oxlo.ai does not charge per token, I do not need to strip out pleasantries from the system prompt to save money. I parse the JSON response defensively because LLMs occasionally add markdown fences even when told not to. A small helper strips those fences before json.loads runs.

import json

def triage_ticket(user_message: str) -> dict:
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
    )
    raw = response.choices[0].message.content.strip()
    if raw.startswith("```"):
        raw = raw.split("\n", 1)[1].rsplit("```", 1)[0].strip()
    return json.loads(raw)

Step 4: Add batch processing

In a real queue, tickets arrive in bursts. I wrap the single-ticket function in a loop that collects results and catches parse errors so one bad response does not crash the whole batch. This is where flat per-request pricing matters. If I were paying per token, I would be tempted to truncate long tickets to save money. On Oxlo.ai, I pass the full text every time, which means I keep all the context that might reveal whether an issue is truly critical.

def process_queue(tickets: list[str]) -> list[dict]:
    results = []
    for ticket in tickets:
        try:
            result = triage_ticket(ticket)
            results.append(result)
        except Exception as e:
            results.append({"error": str(e), "raw_ticket": ticket})
    return results

Step 5: Review and send

I keep a human in the loop. The pipeline prints a short report that a support lead can scan in seconds. The draft_reply is ready to be piped into a CRM or a Slack channel for approval before it reaches the customer. Separating triage from delivery also means I can log the structured fields to a database for analytics. Over time, those logs become training data for finer-grained classifiers.

def print_report(results: list[dict]):
    for idx, r in enumerate(results, 1):
        if "error" in r:
            print(f"Ticket {idx}: FAILED - {r['error']}")
            continue
        print(f"Ticket {idx}")
        print(f"  Category : {r['category']}")
        print(f"  Urgency  : {r['urgency']}")
        print(f"  Sentiment: {r['sentiment']}")
        print(f"  Summary  : {r['summary']}")
        print(f"  Draft    : {r['draft_reply'][:120]}...")
        print()

Run it

Here is the complete entry point. I feed the agent three tickets that cover billing frustration, technical confusion, and positive feedback. The output below is what I get back on the first run. Notice how the model adapts its tone from apologetic to instructional to enthusiastic based on the sentiment it detected.

if __name__ == "__main__":
    tickets = [
        "I was charged twice last month and I need a refund immediately. This is unacceptable.",
        "How do I reset my webhook URL? The docs mention a settings page but I cannot find it.",
        "Love the new dashboard. Just wanted to say thanks!"
    ]

    report = process_queue(tickets)
    print_report(report)

Example output:

Ticket 1
  Category : billing
  Urgency  : high
  Sentiment: frustrated
  Summary  : Customer was double-charged and is requesting a refund.
  Draft    : I am sorry to see that you were charged twice. I have escalated this to our billing team and you will see the refund within 2-3 business days...

Ticket 2
  Category : technical
  Urgency  : medium
  Sentiment: confused
  Summary  : Customer cannot locate the webhook settings page to reset their URL.
  Draft    : No problem. You can reset your webhook URL under Project Settings > Integrations > Webhooks. Here is a direct link to that section...

Ticket 3
  Category : general
  Urgency  : low
  Sentiment: happy
  Summary  : Customer is expressing appreciation for the new dashboard.
  Draft    : Thank you so much for the kind words. We are thrilled you are enjoying the new dashboard...

What to build next

Turn this script into a FastAPI endpoint that listens for webhooks from your helpdesk software. You could also add a second Oxlo.ai call that rewrites the draft_reply in a different tone if the sentiment detector flags the ticket as frustrated. Because every call costs the same regardless of prompt length, you can chain multiple models or steps without the bill exploding. If you need higher throughput or dedicated capacity, the Enterprise tier offers dedicated GPUs and a guaranteed discount over your current provider. Check out the exact plan details at https://oxlo.ai/pricing.

Ready to build with Oxlo.ai?

Get started building high-performance AI inference applications today.

Get started
Ox Assistant
Online
OxBot
OxBot

Hi there! Try our cost calculator to see what you'd save with Oxlo.ai.