
I built a lightweight CLI that turns scattered engineering notes into structured technical documentation. It uses an LLM to outline, expand, and review content before writing a Markdown file. In this tutorial, I will walk through the exact code so you can adapt it for internal docs, runbooks, or blog posts.
What you'll need
- Python 3.10 or newer installed locally.
- The OpenAI SDK installed with
pip install openai. - An Oxlo.ai API key from https://portal.oxlo.ai. Oxlo.ai uses flat per-request pricing, which means generating an outline, expanding five sections, and running a review pass each cost the same regardless of how many tokens you send or receive. That predictability matters when you are iterating on long documents.
Step 1: Configure the Oxlo.ai client
I initialize the client once and read the key from the environment. Oxlo.ai exposes a fully OpenAI-compatible endpoint at https://api.oxlo.ai/v1, so the official SDK drops in without wrappers or adapter classes. I set the default model to llama-3.3-70b because it follows long technical instructions accurately and has no cold start latency. That means the first request after import returns immediately, which is important when you are running the tool dozens of times during an editing session. If you prefer, you can store the key in a .env file and load it with python-dotenv, but for a single-script tool the environment variable keeps dependencies minimal.
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.oxlo.ai/v1",
api_key=os.environ.get("OXLO_API_KEY", "YOUR_OXLO_API_KEY")
)
Step 2: Define the system prompt
The system prompt is the only contract that controls tone across every stage. I lock it to Markdown, active voice, and short paragraphs so the output is usable without heavy manual editing. I avoid few-shot examples in the system prompt because they bloat the context window. The model already understands Markdown from its pre-training, so a short behavioral description is enough. I also write a thin helper so I do not repeat the message assembly logic. Keeping the helper generic makes it easy to swap in deepseek-v3.2 or kimi-k2.6 later if I need stronger reasoning for a particular article.
SYSTEM_PROMPT = """You are a senior technical writer. You write in clear, structured Markdown.
You prefer short paragraphs, active voice, and precise terminology.
When given an outline, expand each point into a full section with code examples where relevant.
When reviewing, flag vague phrasing and suggest concrete fixes."""
def generate(client, user_prompt, model="llama-3.3-70b"):
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_prompt},
],
)
return response.choices[0].message.content
Step 3: Generate an outline from raw notes
I feed rough bullet points into the model and ask for a numbered list of H2 sections. I keep the parsing intentionally simple: split lines and keep anything that starts with a digit. The model sometimes adds an introductory sentence before the list, so the digit-check filter prevents that preamble from breaking the pipeline. If the outline comes back with nested bullet points instead of a flat numbered list, the digit filter will drop them. In practice, Llama 3.3 70B respects the numbered format reliably when the instruction is explicit. If your notes are already structured, you can skip this stage, but for messy meeting minutes it saves time.
def generate_outline(client, raw_notes):
prompt = (
"Given the following rough notes, produce a numbered outline for a technical article. "
"Each item should become an H2 section. Include an introduction and conclusion.\n\n"
f"Notes:\n{raw_notes}"
)
outline_text = generate(client, prompt)
lines = [line.strip() for line in outline_text.splitlines() if line.strip()]
sections = [line for line in lines if line and line[0].isdigit()]
return sections
Step 4: Expand each section into prose
I loop over the outline and call Oxlo.ai once per section. Because Oxlo.ai bills per request, not per token, I can stuff prior section summaries into the context window to reduce repetition without worrying about ballooning costs. I accumulate a running context string so each section knows what came before it. On token-based providers, that growing context would inflate the bill on every iteration, but here the cost stays flat per call. I also pass the article title in every prompt because it keeps the model from drifting off topic, especially when the outline covers infrastructure, code, and process in the same document.
def expand_sections(client, outline, title):
expanded = {}
context = f"Article title: {title}\n"
for item in outline:
prompt = (
f"{context}"
f"Expand the following outline section into a full Markdown section. "
f"Use H2 for the section title. Include a code example if applicable.\n\n"
f"Section: {item}"
)
expanded[item] = generate(client, prompt)
context += f"Previously covered: {item}\n"
return expanded
Step 5: Review the assembled draft
Once the sections are concatenated, I send the full draft back for a critique pass. I switch to qwen-3-32b here because it handles agentic reasoning workflows well, though llama-3.3-70b works too. I explicitly ask for concrete fixes rather than generic praise, because feedback like "this is unclear" is useless without a suggested rewrite. The model returns a revised draft with tighter transitions and consistent terminology. I do not stream the review response because I want the full revised draft in one string. If you prefer to watch the review generate in real time, Oxlo.ai supports streaming on all chat models, but for this pipeline a single completion is simpler to capture and write to disk.
def review_draft(client, draft):
prompt = (
"Review the following technical draft. List up to five concrete issues "
"with vague phrasing, inconsistent terminology, or missing transitions. "
"Then output a revised full draft.\n\n"
f"{draft}"
)
return generate(client, prompt, model="qwen-3-32b")
Step 6: Save the final Markdown file
The last helper writes the output to a timestamped file. I slugify the title and append the date so drafts do not collide. I often generate multiple variants with different models to compare tone, so the date stamp keeps the directory organized. You could extend this to upload the file to an S3 bucket or Confluence page, but for my workflow a local Markdown file is the handoff point to my static site generator.
from datetime import datetime
def save_article(title, content):
slug = title.lower().replace(" ", "-").replace(",", "")
filename = f"{slug}-{datetime.now().strftime('%Y%m%d')}.md"
with open(filename, "w", encoding="utf-8") as f:
f.write(content)
return filename
Run it
I tie everything together in a __main__ block. Replace the notes string with your own source material. When you run the script, it streams progress to stdout and writes the final file to disk. The resulting Markdown is valid for MkDocs, Docusaurus, or GitHub rendering with no extra cleanup. The entire pipeline makes six API calls for a six-section article: one for the outline, four for expansion, and one for review. On Oxlo.ai, that is six predictable charges regardless of whether each section is two hundred or two thousand tokens long.
if __name__ == "__main__":
client = OpenAI(
base_url="https://api.oxlo.ai/v1",
api_key=os.environ["OXLO_API_KEY"]
)
notes = """
- moving from stateful sessions to JWT
- need refresh token rotation
- redis rate limiter per user
- backward compatibility for old cookies
"""
print("Generating outline...")
outline = generate_outline(client, notes)
print("Expanding sections...")
sections = expand_sections(client, outline, "Migrating to Stateless JWT Authentication")
print("Assembling draft...")
draft = f"# Migrating to Stateless JWT Authentication\n\n"
for item in outline:
draft += sections[item] + "\n\n"
print("Reviewing...")
final = review_draft(client, draft)
path = save_article("Migrating to Stateless JWT Authentication", final)
print(f"Done: {path}")
Example output:
Generating outline...
['1. Introduction', '2. Why Stateless JWT', '3. Implementing Refresh Token Rotation', '4. Redis Rate Limiting', '5. Backward Compatibility for Legacy Cookies', '6. Conclusion']
Expanding sections...
--- 1. Introduction ---
Stateless authentication shifts session storage from the server to the client...
Reviewing...
Done: migrating-to-stateless-jwt-authentication-20250128.md
Next steps
Consider wiring this script into a Git pre-commit hook so every time you edit an outline file, the tool expands it into a full draft before you push. You could also add a vision layer with Oxlo.ai's gemma-3-27b-it or kimi-vl-a3b to read whiteboard photos and fold that context into the outline generation step. That would let you point your phone at an architecture diagram and get a full explainer document minutes later. Another direction is to turn the review step into a multi-turn conversation. You could feed the critique back into a second pass and ask the model to defend its edits, which often surfaces edge cases you missed.

