Building Utilities Tools with LLMs: A Step-by-Step Guide

I recently shipped a small CLI utility that turns plain English sysadmin requests into explained, safety-checked shell commands. It uses an LLM to translate intent into syntax, then flags destructive operations before they run. I built it on Oxlo.ai because the flat per-request pricing keeps costs predictable even when I stuff the prompt with long log excerpts or few-shot examples. In this guide, I will walk through the exact code so you can adapt it to your own stack.

What you'll need

Before starting, make sure you have Python 3.10 or newer installed. You will also need the OpenAI SDK, which you can install with pip install openai. Finally, grab an API key from the Oxlo.ai portal. The platform is fully OpenAI SDK compatible, so no extra client libraries are required.

Step 1: Bootstrap the Oxlo.ai client

First, instantiate the client pointing at Oxlo.ai. Because Oxlo.ai is fully OpenAI SDK compatible, the only change from a standard OpenAI setup is the base_url. You keep using the same chat.completions.create interface you already know. I use qwen-3-32b here because it handles multilingual reasoning and agent workflows cleanly, though llama-3.3-70b is a solid alternative if you prefer a general-purpose flagship model. Either way, there are no cold starts on popular models, so the first request returns as fast as any other.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

response = client.chat.completions.create(
    model="qwen-3-32b",
    messages=[
        {"role": "user", "content": "Generate a shell command that lists all running Docker containers."},
    ],
)
print(response.choices[0].message.content)

Step 2: Write the system prompt

The system prompt is the safety layer and the schema contract. It forces the model to return strict JSON and assigns a risk rating to every command. Keeping this in a top-level constant makes it easy to iterate without touching business logic. I also explicitly forbid markdown fences in the instructions, though the parser in the next step handles them gracefully if they appear.

SYSTEM_PROMPT = """You are a safe shell command generator.
The user describes a system task in plain English.
Respond with a single JSON object containing exactly these keys:
- command: the exact shell command
- explanation: one sentence describing what it does
- risk: "low", "medium", or "high". Use "high" for any command that deletes, overwrites, or modifies data.

Output only valid JSON. Do not wrap it in markdown fences."""

Step 3: Generate and parse structured output

Next, wire the prompt into a helper that sends the request to Oxlo.ai and parses the response. I strip markdown fences defensively because some instruction-tuned models occasionally wrap JSON in triple backticks despite the system prompt. Because Oxlo.ai's request-based pricing charges a flat cost per API call regardless of prompt length, you can include large system prompts, few-shot examples, or long user context without watching token meter creep. That makes this utility cheap to run even if you later extend it to ingest multi-line log files or large directory listings. For now, the helper simply validates that the output is valid JSON and returns the parsed dictionary.

import json

def generate_command(user_request):
    response = client.chat.completions.create(
        model="qwen-3-32b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_request},
        ],
    )
    raw = response.choices[0].message.content.strip()

    # Strip markdown fences if the model produces them
    if raw.startswith("```"):
        raw = raw.removeprefix("```json").removeprefix("```").removesuffix("```").strip()

    return json.loads(raw)

result = generate_command("Find all log files in /var/log modified in the last 24 hours")
print(result)

Step 4: Add the safety gate

Now add the execution wrapper. It prints the command, explanation, and risk level, then requires explicit confirmation for anything rated medium or high. I treat any command that deletes, overwrites, or modifies data as risky by default. This keeps the utility helpful but prevents accidental data loss. You could extend this layer to maintain a blocklist of forbidden binaries, require a second confirmation for recursive rm operations, or log every command to an audit file before execution. Keeping the gate in pure Python means you do not need extra LLM calls for the safety check, which keeps latency low.

import subprocess

def run_with_confirmation(parsed):
    print(f"Command:     {parsed['command']}")
    print(f"Explanation: {parsed['explanation']}")
    print(f"Risk:        {parsed['risk']}")

    if parsed['risk'] in ("medium", "high"):
        confirm = input("This command may be destructive. Run anyway? [y/N]: ")
        if confirm.lower() != "y":
            print("Aborted.")
            return None

    return subprocess.run(parsed['command'], shell=True)

parsed = generate_command("Delete all .tmp files in the current directory")
run_with_confirmation(parsed)

Step 5: Wrap it in a CLI

Finally, add argument parsing so you can call the script from anywhere without editing the source. I use sys.argv to keep dependencies minimal, but you could swap in argparse or click later if you need flags or subcommands. At this point the tool is a single file that you can drop into /usr/local/bin or a project bin directory.

if __name__ == "__main__":
    import sys

    if len(sys.argv) < 2:
        print("Usage: python shell_helper.py ''")
        sys.exit(1)

    request = " ".join(sys.argv[1:])
    parsed = generate_command(request)
    run_with_confirmation(parsed)

Run it

Save the full script as shell_helper.py, export your YOUR_OXLO_API_KEY, and try a low-risk query followed by a destructive one. Here is what the interaction looks like in my terminal. The low-risk command executes immediately, while the high-risk command pauses for human confirmation.

$ python shell_helper.py "find Python files larger than 1MB"
Command:     find . -name "*.py" -size +1M
Explanation: Recursively searches for Python files over 1MB.
Risk:        low
(execution proceeds)

$ python shell_helper.py "remove every file in /tmp"
Command:     rm -rf /tmp/*
Explanation: Deletes all files in the /tmp directory.
Risk:        high
This command may be destructive. Run anyway? [y/N]: n
Aborted.

Wrap up

The utility is already useful, but two improvements make it production-ready. First, add a --dry-run flag that prints the command without invoking subprocess, which is helpful for auditing outputs on sensitive systems before you apply changes. Second, integrate Oxlo.ai's function calling support to chain additional validation steps, such as checking disk space or confirming file backups, before a high-risk operation is approved. Both extensions are straightforward because the OpenAI-compatible client already supports tools and streaming on Oxlo.ai. If you want to experiment with stronger reasoning models for more complex multi-step sysadmin tasks, swap qwen-3-32b for kimi-k2.6 or deepseek-v3.2 without changing any other code.

Building Utilities Tools with LLMs: A Step-by-Step Guide

What you'll need

Step 1: Bootstrap the Oxlo.ai client

Step 2: Write the system prompt

Step 3: Generate and parse structured output

Step 4: Add the safety gate

Step 5: Wrap it in a CLI

Run it

Wrap up

Related articles

Revolutionizing Education with LLM: Opportunities and Challenges

Building Gaming Tools with LLM: A Step-by-Step Guide

Unlocking LLM Potential in Gaming

Engineering LLMs for Entertainment Applications

Optimizing LLMs for Media and Entertainment

LLM Applications in Entertainment

Ready to build with Oxlo.ai?