Guaranteed 15% off your current AI inference bill for team spending up to $20000 / month.

Book a call →
Back to Blogs
Engineering

A Practical Guide to Using LLMs for Engineering

I built a lightweight system design review agent that reads architecture writeups and returns categorized, severity-ranked feedback. My team uses it to catch...

A Practical Guide to Using LLMs for Engineering

I built a lightweight system design review agent that reads architecture writeups and returns categorized, severity-ranked feedback. My team uses it to catch missing cache layers, authentication gaps, and single points of failure before we commit to a design. It runs on Oxlo.ai through the standard OpenAI SDK, so the integration is a single base URL change and there is no custom client to maintain.

What you'll need

You will need Python 3.10 or newer and the OpenAI SDK installed with pip install openai. You also need an Oxlo.ai API key from https://portal.oxlo.ai. Because Oxlo.ai charges a flat rate per request rather than per token, you can paste in full multi-page design documents without watching metered costs climb. This matters for engineering workflows where context is long and prompts vary wildly in size. See the exact pricing at https://oxlo.ai/pricing.

Step 1: Define the system prompt

The entire behavior of the agent is determined by the system prompt. I treat the model as a senior staff engineer with a strict output schema. This removes ambiguity and keeps reviews consistent across runs. Without this constraint, the model drifts between bullet points and paragraphs, which breaks automation. The prompt below asks for four specific categories and enforces JSON output so downstream tools can consume the results reliably.

SYSTEM_PROMPT = """You are a senior staff engineer reviewing system design documents. Analyze the provided design for:

1. Scalability bottlenecks (database choice, caching strategy, load balancing)
2. Reliability risks (single points of failure, retry logic, circuit breakers)
3. Security gaps (authentication, authorization, input validation, secrets management)
4. Operational concerns (observability, deployment strategy, data migration)

Return your findings as a JSON object with this exact structure:
{
  "summary": "One sentence overall assessment",
  "findings": [
    {
      "category": "scalability|reliability|security|operations",
      "severity": "high|medium|low",
      "issue": "Description of the problem",
      "recommendation": "Concrete fix or alternative to consider"
    }
  ]
}

Be specific. Reference technologies mentioned in the design. If something is missing, say exactly what is missing."""

Step 2: Build the design input parser

Design docs usually arrive as Markdown or plain text, often exported from Notion or Confluence with YAML frontmatter. I wrote a small loader that reads the file and strips that frontmatter so we only send the narrative to the model. This keeps the context clean and avoids confusing the reviewer with metadata like author names and Jira ticket numbers. I also added a guard for empty files so the script exits early with a clear message.

import pathlib

def load_design_doc(filepath: str) -> str:
    text = pathlib.Path(filepath).read_text(encoding="utf-8")
    lines = text.splitlines()
    
    # Strip YAML frontmatter if present
    if lines and lines[0].strip() == "---":
        try:
            end = lines[1:].index("---") + 1
            lines = lines[end + 1:]
        except ValueError:
            pass
    
    body = "\n".join(lines).strip()
    if not body:
        raise ValueError("Design document is empty after stripping frontmatter.")
    return body

Step 3: Call the model and parse feedback

With the document loaded, the next step is sending it to the model. I use Llama 3.3 70B on Oxlo.ai because it handles long technical contexts and reasoning tasks well. Oxlo.ai's request-based pricing is useful here: a ten-page design doc costs the same as a one-paragraph query, which makes this agent economical to run in CI pipelines. The client instantiation is identical to OpenAI, only the base URL and key change.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

def review_design(doc_text: str) -> str:
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": doc_text},
        ],
    )
    return response.choices[0].message.content

Step 4: Add structured output

Raw text is difficult to pipe into Jira, Slack, or GitHub comments. I force JSON mode so the model returns a machine-readable object every time. I then parse the payload with the standard library and print a severity-sorted report. If the model ever returns malformed JSON, the script fails fast and loud. I chose to keep the validation lightweight, using only json.loads and dictionary access, so there are no extra dependencies beyond the OpenAI SDK.

import json

def review_design_json(doc_text: str) -> dict:
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": doc_text},
        ],
        response_format={"type": "json_object"},
    )
    raw = response.choices[0].message.content
    return json.loads(raw)

def print_review(review: dict):
    print(f"Summary: {review['summary']}\n")
    for item in review.get("findings", []):
        sev = item["severity"].upper()
        marker = "HIGH" if sev == "HIGH" else "MED " if sev == "MEDIUM" else "LOW "
        print(f"[{marker}] {item['category'].upper()}")
        print(f"    Issue: {item['issue']}")
        print(f"    Fix:   {item['recommendation']}\n")

Step 5: Wrap it in a CLI

Finally, I wrapped the logic in a small CLI so engineers can run it from pre-commit hooks or GitHub Actions. It takes a file path, runs the review, and prints findings to stdout. You can redirect this to a file or grep for HIGH severity issues to block a merge. The argparse interface keeps it familiar to anyone who has used standard Unix tools.

import argparse

def main():
    parser = argparse.ArgumentParser(description="Review a system design doc")
    parser.add_argument("filepath", help="Path to the design document")
    args = parser.parse_args()

    doc_text = load_design_doc(args.filepath)
    review = review_design_json(doc_text)
    print_review(review)

if __name__ == "__main__":
    main()

Run it

I tested this against an old design doc for a URL shortener built with Flask and a single PostgreSQL node. The agent immediately flagged the database as a single point of failure, noted the absence of caching, and called out missing rate limiting on the redirect endpoint. I ran it with the command below. The output is exactly what I would expect from a staff engineer during a design review, except it took under two seconds.

$ python review.py design.md

Summary: The design is functional for a prototype but lacks caching, redundancy, and rate limiting required for production traffic.

[HIGH] SCALABILITY
    Issue: Single PostgreSQL node handling all read and write traffic with no read replicas or connection pooling.
    Fix:   Add PgBouncer for connection pooling and at least one read replica. Consider caching hot URLs in Redis.

[HIGH] RELIABILITY
    Issue: No retry logic or circuit breaker around the database. A brief outage will cascade into total service failure.
    Fix:   Implement retries with exponential backoff and a circuit breaker using a library such as pybreaker.

[MED ] SECURITY
    Issue: The API accepts arbitrary destination URLs without validation, enabling open redirect abuse.
    Fix:   Add an allow-list or reputation check on submitted URLs. Reject private IP ranges and known malicious domains.

Next steps

Two concrete directions. First, upgrade to Kimi K2.6 and pass architecture diagrams as base64 images alongside the text. Oxlo.ai hosts vision models on the same chat completions endpoint, so the code change is minimal: you only add an image object to the messages array. Second, wire this into a GitHub Action that triggers on pull requests when Markdown files in a docs/ directory change, posting the review as a PR comment. This turns the agent into an automated gatekeeper.

Ready to build with Oxlo.ai?

Get started building high-performance AI inference applications today.

Get started
Ox Assistant
Online
OxBot
OxBot

Hi there! Try our cost calculator to see what you'd save with Oxlo.ai.