We are building a CLI agent that ingests CI/CD failure logs and returns a structured diagnosis: root cause, likely file, and suggested fix. If you have ever stared at 5,000 lines of Gradle or Kubernetes output, this saves you the scrolling. The whole thing is under 80 lines of Python, costs a flat per-request fee on Oxlo.ai, and drops into your existing toolchain without any wrapper libraries.

What you'll need

Before we start, make sure you have the following ready.

Python 3.10 or newer installed locally
The OpenAI SDK installed with pip install openai
An Oxlo.ai API key from https://portal.oxlo.ai
A sample build log, any text file from a failed CI run will do

Step 1: Read the log

I start with a minimal CLI that takes one argument, the path to the log file. I read it as a plain string and ignore encoding errors because CI output often contains odd byte sequences from progress bars, ANSI color codes, or truncated Unicode. We will trim and send it to the model in later steps. Keeping the file I/O separate from the LLM call makes the script easier to unit test, and I use argparse instead of sys.argv so the script is self-documenting when you pass --help.

import argparse

def main():
    parser = argparse.ArgumentParser(description="Diagnose a CI/CD failure log")
    parser.add_argument("logfile", help="Path to the failure log")
    args = parser.parse_args()

    with open(args.logfile, "r", encoding="utf-8", errors="ignore") as f:
        raw_log = f.read()

    print(f"Read {len(raw_log)} characters from {args.logfile}")

if __name__ == "__main__":
    main()

Step 2: Write the system prompt

The system prompt is the contract. I tell the model exactly what hat to wear and what output shape I expect. Keeping the format rigid makes the result parseable if I later want to extract fields automatically with a regex or a small Pydantic model. I also forbid unnecessary markdown so the output stays clean in a terminal. If you prefer JSON, you could add "Respond in valid JSON" and switch on JSON mode in the SDK, but plain text is easier to read during local debugging. I keep the default temperature because the prompt is specific enough to constrain the output.

SYSTEM_PROMPT = """You are a senior site reliability engineer reviewing CI/CD logs.
Analyze the provided log excerpt and produce a concise diagnosis in this exact format:
- Root Cause: ...
- Likely File: ...
- Suggested Fix: ...
If the log is truncated or ambiguous, state what additional information you need.
Do not include markdown code blocks unless you are showing a specific file patch."""

Step 3: Connect to Oxlo.ai

Oxlo.ai exposes an OpenAI-compatible endpoint, so I can use the official SDK without adapter code or custom HTTP clients. I instantiate the client with the Oxlo.ai base URL and my project key. I default to llama-3.3-70b because it is a strong general-purpose model, but you can swap in deepseek-v3.2 or kimi-k2.6 for deeper reasoning without changing any other code. That swap is a one-line edit, which is the point of using a standard interface. I wrap the call in a small function so the rest of the script does not need to know about the transport layer.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

def diagnose(log_text: str) -> str:
    user_message = log_text
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
    )
    return response.choices[0].message.content

Step 4: Handle large logs

CI logs can balloon to hundreds of thousands of characters. In practice, the error is almost always in the last few thousand lines, so I keep a trailing window rather than summarizing upfront or chunking across multiple requests. I avoid chunking and looped summarization because that adds latency and state. A single request keeps the architecture simple. Because Oxlo.ai uses request-based pricing, sending a large prompt does not inflate the cost the way token-based billing would. You pay a flat fee per request, which makes long-context diagnosis cheap and predictable. See the exact rates on the Oxlo.ai pricing page. I set the window to 12,000 characters, which is usually enough to capture the failing command plus the surrounding stack trace.

MAX_LOG_CHARS = 12000

def truncate_log(text: str) -> str:
    if len(text) <= MAX_LOG_CHARS:
        return text
    return text[-MAX_LOG_CHARS:]

Step 5: Assemble the CLI

Now I wire everything together. I read the file, trim it, pass it to the model, and print the result. The script stays stateless, so you can alias it in your shell or drop it into a GitHub Actions workflow later without worrying about leftover context from previous runs. I also print the character count so I know how much context the model actually saw. If you want to batch process an entire directory of logs, the stateless design means you just loop over the files and call diagnose each time. I keep the import block at the top so the script is easy to read top-to-bottom, and the truncate function lives outside the diagnose function so I can test it independently with a one-liner in the REPL.

import argparse
from openai import OpenAI

SYSTEM_PROMPT = """You are a senior site reliability engineer reviewing CI/CD logs.
Analyze the provided log excerpt and produce a concise diagnosis in this exact format:
- Root Cause: ...
- Likely File: ...
- Suggested Fix: ...
If the log is truncated or ambiguous, state what additional information you need.
Do not include markdown code blocks unless you are showing a specific file patch."""

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
MAX_LOG_CHARS = 12000

def truncate_log(text: str) -> str:
    if len(text) <= MAX_LOG_CHARS:
        return text
    return text[-MAX_LOG_CHARS:]

def diagnose(log_text: str) -> str:
    user_message = truncate_log(log_text)
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
    )
    return response.choices[0].message.content

def main():
    parser = argparse.ArgumentParser(description="Diagnose a CI/CD failure log")
    parser.add_argument("logfile", help="Path to the failure log")
    args = parser.parse_args()

    with open(args.logfile, "r", encoding="utf-8", errors="ignore") as f:
        raw_log = f.read()

    print(f"Read {len(raw_log)} characters. Diagnosing...\n")
    result = diagnose(raw_log)
    print(result)

if __name__ == "__main__":
    main()

Run it

Save the script as diagnose.py, grab a failed build log, and run it. I use a Python packaging failure as an example, but the same prompt works for Java stack traces, Rust compiler errors, or Terraform plan failures. I tested this against logs from GitHub Actions, GitLab CI, and a local Drone instance. The prompt generalizes well because it does not assume a specific language or build tool. If you notice the model fixating on the wrong section of a massive log, reduce MAX_LOG_CHARS or add a header line like "ERROR SUMMARY" to the truncated chunk.

$ python diagnose.py failed_build.log
Read 48291 characters. Diagnosing...

- Root Cause: The Docker build step failed because the requirements.txt file references a package version that does not exist on PyPI.
- Likely File: ./requirements.txt
- Suggested Fix: Pin the package to an existing version or remove the strict pin. Run pip install -r requirements.txt locally to verify.

The diagnosis is concise and actionable. You could pipe this into a Slack webhook or a GitHub PR comment if you wanted. Because Oxlo.ai does not impose cold starts on popular models, the response comes back in seconds even if this is the first request of the day. That matters when you are debugging a production outage and waiting costs money.

Next steps

This agent turns an hour of log spelunking into a ten-second CLI call. Two directions to take it next. First, wire the script into your CI pipeline as a post-failure step so it posts the diagnosis directly into the pull request as a comment. Second, store every result in a cheap SQLite database and run weekly queries to find which jobs fail most often. Both are easy to layer on top because Oxlo.ai is fully OpenAI SDK compatible, so you spend zero time fighting the API.

If you are currently on a token-based provider and your costs spike every time someone sends a full Kubernetes event stream, moving this workload to Oxlo.ai will flatten your bill without touching the Python code. You can also retarget the same skeleton at server syslog files or Lambda execution logs by changing the system prompt and leaving the Oxlo.ai client code untouched. That is the benefit of building on a standard API. If you want to experiment with larger context windows, try models such as kimi-k2.6 and simply increase MAX_LOG_CHARS. The request-based pricing means you will not get penalized for the extra characters.

Unlocking LLM Potential for Engineering

What you'll need

Step 1: Read the log

Step 2: Write the system prompt

Step 3: Connect to Oxlo.ai

Step 4: Handle large logs

Step 5: Assemble the CLI

Run it

Next steps

Ready to build with Oxlo.ai?

Unlocking LLM Potential for Engineering

What you'll need

Step 1: Read the log

Step 2: Write the system prompt

Step 3: Connect to Oxlo.ai

Step 4: Handle large logs

Step 5: Assemble the CLI

Run it

Next steps

Related articles

The Role of LLMs in Mathematics

A Practical Guide to Using LLMs for Engineering

The Role of LLMs in Scientific Research and Technology Innovation

Building Technology Tools with LLMs: A Step-by-Step Guide

LLMs for Scientific Research

Building Research Tools with LLMs

Ready to build with Oxlo.ai?