Guaranteed 15% off your current AI inference bill for team spending up to $20000 / month.

Book a call →
Back to Blogs
Learn AI

Document Summarization with Oxlo.ai

We are going to build a command-line document summarizer that ingests any plain-text file and emits a structured JSON summary. It extracts a title, a short...

Document Summarization with Oxlo.ai

We are going to build a command-line document summarizer that ingests any plain-text file and emits a structured JSON summary. It extracts a title, a short abstract, key points, and action items. If you regularly process meeting transcripts, research papers, or server logs, this tool will save you from reading pages of noise just to find the signal. We will use the OpenAI Python SDK as a drop-in client for Oxlo.ai, so the code will look familiar, but the economics are different because Oxlo.ai charges a flat rate per request instead of metering tokens.

What you'll need

You need Python 3.10 or newer installed on your machine. You also need the OpenAI Python SDK, which you can install with pip install openai. This SDK is fully compatible with Oxlo.ai, so there is no custom client to learn. Finally, grab an API key from the Oxlo.ai portal at https://portal.oxlo.ai. If you are just prototyping, the free tier includes 60 requests per day across more than sixteen models, which is plenty for testing. Oxlo.ai uses request-based pricing, so one API call costs the same whether you summarize a short email or a fifty-page transcript. That flat rate is useful here because document summarization is inherently a long-context task, and with token-based providers your cost scales with every paragraph you feed in. You can see the exact plan details at https://oxlo.ai/pricing.

Step 1: Initialize the Oxlo.ai client

Create a new file named summarize.py. Start by importing the openai module and instantiating the client so it points at Oxlo.ai. Replace YOUR_OXLO_API_KEY with the key you copied from the portal. I recommend exporting it as an environment variable in production, but I will leave the placeholder here for clarity. I set the default model to llama-3.3-70b because it is Oxlo.ai's general-purpose flagship and it handles summarization reliably across technical and business text. The platform also hosts kimi-k2.6, deepseek-v3.2, and qwen-3-32b if you want to experiment with alternatives later. The critical detail is the base_url: it must be https://api.oxlo.ai/v1 so the SDK routes traffic to Oxlo.ai instead of the default provider. Because Oxlo.ai keeps popular models warm, there are no cold starts, so the first request after idle time returns just as quickly as any other.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

Step 2: Define the system prompt

The system prompt is the contract between your code and the model. It tells the model what role to play and what shape the output must take. I keep it strict: ask for exactly four keys in JSON, set length limits, and forbid markdown wrappers. This keeps parsing trivial and prevents the model from adding conversational fluff that would break a JSON parser. I also ask for exactly five key points because that constraint forces the model to rank importance rather than listing everything. You can adjust these constraints later, but starting with a tight prompt gives you reproducible results immediately.

SYSTEM_PROMPT = """You are a precise document summarizer. Read the user's document and produce a JSON object with exactly these keys:
- title: a concise title
- summary: a one-paragraph abstract of at most 100 words
- key_points: an array of the five most important takeaways as strings
- action_items: an array of specific next steps mentioned in the text, or an empty array if none exist

Respond only with valid JSON. Do not wrap the output in markdown code fences."""

Step 3: Read the input file

Next we need a utility to load the document from disk. The helper below reads an entire file into memory as a single string using UTF-8 encoding, which handles most modern text sources correctly. For most business documents, technical reports, and call transcripts, this single-pass approach is the right choice. Because Oxlo.ai pricing is per request rather than per token, you do not pay more for longer inputs. That makes this architecture economical for long-context workloads compared to token-based providers, where cost scales linearly with input length. If you ever hit a context window with a true novel-length manuscript, you could swap this reader for a paragraph-level chunking strategy, but a single pass is the correct default for operational summaries. Models like deepseek-v4-flash on Oxlo.ai support context windows up to one million tokens for the rare cases where you need them.

def read_document(path: str) -> str:
    with open(path, "r", encoding="utf-8") as f:
        return f.read()

Step 4: Call the summarizer

Now the core function. It takes a file path, loads the text, and sends it to the model. Notice that the user message contains the full raw text of the document. There is no need to truncate or prepend metadata headers. The system prompt already frames the task, so the user content can be exactly the document itself. I activate JSON mode through the response_format parameter to increase the odds of receiving valid machine-readable output. The response is parsed with the standard library json module and returned as a Python dictionary. If you notice the model missing nuance in highly technical documents, try switching the model string to kimi-k2.6 or deepseek-v3.2 inside the same function. Both are available on Oxlo.ai with the same client and base URL, so the change is literally one string. For multilingual teams, qwen-3-32b is a strong alternative that handles non-English source documents well. Since every model is accessible through the identical chat/completions endpoint, you can A/B test without refactoring your networking code.

import json

def summarize(path: str) -> dict:
    text = read_document(path)

    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": text},
        ],
        response_format={"type": "json_object"},
    )

    return json.loads(response.choices[0].message.content)

Step 5: Wire up the CLI

Finally, add a small CLI wrapper so you can run the script from the terminal. It checks for a file path argument, calls summarize(), and prints the result with two-space indentation. This keeps the interface minimal and composable with Unix pipes. You could redirect the output to a file with > summary.json or pipe it into jq for further filtering. I keep the argument parsing manual with sys.argv to avoid extra dependencies, but you could swap in argparse or click if this grows into a larger tool.

if __name__ == "__main__":
    import sys

    if len(sys.argv) < 2:
        print("Usage: python summarize.py <path-to-document.txt>")
        sys.exit(1)

    result = summarize(sys.argv[1])
    print(json.dumps(result, indent=2))

Run it

Create a sample document named quarterly_update.txt and paste in a few paragraphs of unstructured text. Because Oxlo.ai charges a flat rate per request, this long example costs the same as a one-sentence input, which is why request-based pricing is a natural fit for summarization pipelines. Then run the script. You should see clean JSON printed to stdout. If the model returns malformed JSON, check that your input text does not contain conflicting instructions. In practice, with JSON mode enabled, I have found the output to be valid nearly every time. Here is an example invocation and the kind of output you can expect.

$ python summarize.py quarterly_update.txt
{
  "title": "Q3 Engineering Review",
  "summary": "The engineering team completed the migration to the new event pipeline, reducing latency by 40%. Two critical bugs in the billing service were patched, and the mobile SDK entered private beta.",
  "key_points": [
    "Event pipeline migration is production-ready.",
    "Latency dropped from 120 ms to 72 ms p99.",
    "Billing service patches close race conditions on refund processing.",
    "Mobile SDK private beta has 12 external testers.",
    "Q4 roadmap prioritizes multi-region failover."
  ],
  "action_items": [
    "Schedule load-testing for multi-region failover before November 1.",
    "Publish mobile SDK documentation for beta testers."
  ]
}

Wrap-up and next steps

That is the entire tool. It is under fifty lines of Python, uses a single Oxlo.ai request per document, and gives you structured data you can feed into a database, a Slack bot, or a Notion page. Because the cost is flat per request, you can point this at long documents without the runaway bills you would see on token-based platforms. Two concrete ways to extend it: first, add PDF support with a library like pymupdf so you can point the script at .pdf files instead of plain text. Second, wrap the script in a GitHub Action that automatically summarizes attached meeting transcripts whenever an issue is opened. Both builds reuse the same Oxlo.ai client and system prompt without changes. You could also add Pydantic models to validate the response shape against a schema, which is useful if you start consuming the summary in a downstream API. If you need higher volume, the Pro plan offers one thousand requests per day across all models. For details on request limits and plan tiers, check https://oxlo.ai/pricing.

Ready to build with Oxlo.ai?

Get started building high-performance AI inference applications today.

Get started
Ox Assistant
Online
OxBot
OxBot

Hi there! Try our cost calculator to see what you'd save with Oxlo.ai.